view counter

Fast, Safe, Cheap : Pick 3

Thanks to Roch Bourbonnais for this story

Today, we're making performance headlines with Oracle's ZFS Storage

view counter
Appliance.

SPC-1 : Twice the performance of NetApp at the same latency; Half the $/IOPS;




I'm proud to say that, yours truly, along with a lot of great
teammates in Oracle, is not totally foreign to this milestone.

We are announcing that Oracle's href="http://www.oracle.com/us/products/servers-storage/storage/nas/zfs7420/overview/index.html">7420C
cluster acheived 137000 SPC-1
IOPS
with an average latency of less than 10 ms. That is
double the results of NetApp's 3270A while delivering the same
latency. As compared to the NetApp 3270 result, this is a 2.5x
improvement in $/SPC-1-IOPS (2.99$/IOPS vs $7.48/IOPS). We're also showing that when
the ZFS Storage Appliance runs at the rate posted by the 3270A (68034
SPC-1 IOPS), our latency of 3.26ms is almost 3X lower than theirs
(9.16ms). Moreover, our result was obtained with 23700 GB of user
level capacity (internally mirrored) for 17.3 $/GB while NetApp's
, even using a space saving raid scheme, can only deliver 23.5$/GB. This
is the price per GB of application data actually used in the
benchmark. On top of that the 7420C still had 40% of space headroom
whereas the 3270A was left with only 10% of free blocks.

These great results were at least partly made possible with the
availability of 15K RPM Hard Disk Drives (HDD). Those are great to run
the most demanding databases because they combine a large IOPS
capability and are generally of smaller capacity. The ratio of IOPS/GB
makes them ideal to store high intensity database modeled by SPC-1.
On top of that, this concerted engineering effort lead to improved
software not just for those running on 15K RPM. We actually used this benchmark to seek out how to
increase the quality of our products. The preparation runs, after an initial
diagnostic of some issue, we were attached to finding solutions that
where not targeting the idiosyncrasies of SPC-1 but based on sound
design decision. So instead of changing the default value of some
internal parameter to a new static default, we actually changed the way
the parameter worked so that our storage systems or all types and sizes would benefit.


So not only are we getting a great SPC-1 results, but all existing
customers will benefit from this effect even if they are operating
outside of the intense conditions created by the benchmark.

So what is SPC-1 ? It is one of the few benchmarks which counts for
storage. It is maintained by Storage Performance Council ( href="www.storageperformance.org">SPC). SPC-1 simulates multiple
databases running on a centralized storage or storage cluster. But
even if SPC-1 is a block based benchmark, within the ZFS Storage
appliance, a block based FC or iSCSI volume is handled very much the
same way as would be a large file subject to synchronous operation.
And by Combining modern network technologies (Infiniband or 10Gbe
Ethernet), the CPU power packed in the 7420C storage controllers and
Oracle's custom dNFS technology for databases, one can truly acheive
very high database transaction rates on top of the more manageable and
flexible file based protocols.

The benchmarks defines three Application Storage Unit (ASU): ASU1 with a heavy 8KB block
read/write component, ASU2 with a much lighter 8KB block read/write
component, and ASU3 which is subject to hundreds of write streams. As
such it's is not too far from a simulation of running hundreds of Oracle
database onto a single system : ASU1 and ASU2 for datafiles and ASU3
for redolog storage.

The total size of the ASUs is constrained such that all of the stored data
(including mirror protection and disk used for spares) must exceed 55%
of all configured storage. The benchmark team is then free to decide
how much total storage to configure. From that figure, 10% is given to
ASU3 (redo log space) and the rest divided equally between heavily
ASU1 and lightly used ASU2.

The benchmark team also has to select the SPC-1 IOPS throughput level it wishes to run.
This is not a light decision given you want to balance high IOPS; low
latency and $/user GB.

Once the target IOPS rate is selected, there are multiple criteria
needed to pass a successful audit; one of the most critical is that
you have to run at the specified IOPS rate for a whole 8 hour. Note
that the previous specifications of the benchmark used by NetApp
called for an 4 hour run. During that 8 hour run delivering a solid
137000 SPC-1 IOPS
, the avg latency of must be less than 30ms
(we did much better than that).

After this brutal 8 hour run, the benchmark then enters another critical
phase: the workload is restarted (using a new randomly selected working set)
and performance is measured for a 10 minute period. It is this 10 minute
period that decides the official latency of the run.

When everything is said and done, you press the trigger; go to sleep
and wake up to the result. As you could guess we were ecstatic that
morning. Before that glorious day, for lack of a stronger word, a lot
of hard work had been done during the extensive preparation runs. With
little time, and normally not all of the hardware, one runs through
series of run at incremental loads, making educated guesses as to how
to improve the result. As you get more hardware you scale up the
result tweaking things more or less until the final hour.

SPC-1, with it's requirement of less than 45% of unused space, is
designed to trigger many disk level random read IOPS. Despite this
inherent random pattern of the workload, we saw that our extensive
caching architecture was as helpful for this benchmark as it is in
real production workloads. While the 15K RPM HDDs normally levels off
with random operation at a rate slightly above 300 IOPS, our 7420C, as
a whole, could deliver almost 500 user-level SPC-1 IOPS per HDDs.

In the end one of the most satisfying aspect was to see that
the data being managed by ZFS was stored rock solid on disk, properly
checksummed, all data could be snapshot, compressed on demand, and
delivering an impressively steady performance.

2X the absolute performance, 2.5X cheaper per SPC-1 IOPS, almost 3X lower
latency, 30% cheaper per user GB with room to grow... So, If you have
a storage decision coming and you need, FAST, SAFE, CHEAP : pick 3,
take a fresh look at the ZFS Storage appliance.







SPC-1, SPC-1 IOPS, $/SPC-1 IOPS reg tm of Storage Performance Council (SPC).
More info www.storageperformance.org. Sun ZFS Storage 7420 Appliance and

Oracle Sun ZFS Storage Appliance 7420
_http://www.storageperformance.org/results/benchmark_results_spc1#a00108 _As of October 3, 2011
Netapp FAS3270A
_http://www.storageperformance.org/results/benchmark_results_spc1#ae00004 _As of October 3, 2011


The views expressed on this blog are my own and do not necessarily reflect
the views of Oracle.


Read the entire article at its source

view counter