At InterModal Data we build large systems with many components running in highly available configurations 24x7x365. For such systems, understanding how the components are working is very important. Our analytics system measures and records thousands of metrics from all components and makes these measurements readily available for performance analysis, capacity planning, and trouble shooting.
The other day someone asked on the #zfs IRC (irc.freenode.net) chat about using ZFS at home. As one of the early adopters, I can say it is a great idea! I've been running ZFS at home since late 2005. The first pool of "stuff" I created has been upgraded, expanded, and had its drives replaced. In 2008 I created the latest version of "stuff" as a simple mirrored pair of HDDs. The prior version of "stuff" was transferred to the 2008 pool which is still in use.
One of the nice changes to the kstat (kernel statistics) command in illumos is its conversion to C from perl. There were several areas in the illumos (nee OpenSolaris) code where perl had been used. But these were too few to maintain critical mass and it is difficult for interpreted runtimes to change at the pace of an OS, so keeping the two in lockstep is simply not worthwhile.
Latency and performance problems in storage subsystems can be tricky to understand and tune. If you've ever been stuck in a traffic jam or waited in line to get into a concert, you know that queues can be frustrating to understand and trying on your patience. In modern computing systems, there are many different queues and any time we must share a constrained resource, one or more queues will magically appear in the architecture.
We are hosting illumos and ZFS day events in San Francisco October 1 - 3, 2012. Our good friends from DDRdrive, Delphix, Joyent, and Nexenta are also sponsoring the event. I will be talking about how to optimize the design of ZFS-based systems and explain how to get the best bang for your buck. Jason and Garrett are also on the speakers list, talking about how illumos has really taken hold as a foundation for building modern businesses.
When I originally wrote cifssvrtop (top for CIFS servers), all of the systems I tested with had one thing in common: the workstations (clients) had names. Interestingly, I recently found a case where the workstations are not named, so the results were less useful than normal.
Many of my friends have been asking where I've been lately and why they haven't seen me lurking around in the usual haunts. In January of this year, Jason Yoho, Garrett D'Amore, and I started a new company, DEY Storage Systems. I'm the E.
Modern systems are continuing to evolve and become more tolerant to failures. For many systems today, a simple performance or availability analysis does not reveal how well a system will operate when in a degraded mode. A performability analysis can help answer these questions for complex systems.
A legacy view of system performance is that bigger I/O is better than smaller I/O. This has led many to worry about things like "jumbo" frames for Ethernet or setting the maximum I/O size for SANs. Is this worry justified? Let's take a look...
This post is the second in a series looking at the use and misuse of IOPS for storage system performance analysis or specification.
In case you missed the DTrace conference on April 3, 2012, Dierdre recorded all of the sessions and is publishing the videos. I had a few minutes to discuss the Aura Graph work that was demonstrated in Nexenta's booth at VMworld 2011. The short video explains what we were visualizing and why it is useful for operators.
Today, we routinely hear people carrying on about IOPS-this and IOPS-that. Mostly this seems to come from marketing people: 1.5 million IOPS-this, billion IOPS-that. Right off the bat, a billion IOPS is not hard to do, the metric lends itself rather well to parallelization...
This post is the first in a series looking at the use and misuse of IOPS for storage system performance analysis or specification.
Today NexentaStor 3.1 was released to the world. Download a free trial copy today! We've been working hard on this release for some time and it offers significant improvements in stability and performance. Here is a small sample of the changes that I think are cool.
This week we've launched NexentaStor 3.0.4. In many ways this is a significant milestone, far beyond what may be immediately obvious. For existing Nexenta customers, the feature list will look largely unchanged -- many of the same, great universal storage features available since 3.0.0 earlier this year. But for one who studies how organizations grow and mature, the release represents the best quality, stability, and maturity ever. We have been working hard to earn your trust for protecting your data.
Shortly after my arrival in Rotterdam last Sunday, the final game of the World Cup began. Though I am a soccer (and football) fan, I must confess that the trip was planned long before the Dutch secured a spot in the final. Arriving in the Netherlands after traveling for nearly 24 hours, I was looking forward to a quick nap before exploring the city. No rest for the weary. The party was already starting and people were collecting in a street near the Stadhuis, a mere block or two away from my hotel.
Next week I'll be offering an in-depth ZFS and NexentaStor training session in Rotterdam, the Netherlands. INPROVE is hosting the event and we have a full classroom. Though I did not plan for arriving in Amsterdam at the same time as the World Cup finals, I'm sure it will be a festive occasion. I'm sure my Spanish friends will understand that I will be wearing Orange.
Many people are flocking towards ZFS solutions because of the sound, fundamental science behind ZFS and the features it provides. Two features which complete the hybrid storage pool design are separate logs and cache devices.
ZFS now offers triple-parity raidz3. Conceptually, raidz3 is an N+3 parity protection scheme. Today, there are few, if any, other implementations of triple parity protection, so when we say "raidz is similar to RAID-5" and "raidz2 is similar to RAID-6" there is no similar allusion for raidz3. I prefer to say "raidz3 is like raidz2 with one additional level of parity protection. But how much better is raidz3 than raidz2?
The video generation is taking hold in the OpenSolaris community. Recently, Michelle Knight, a self-described general lunatic, asked for help on the OpenSolaris ZFS forum. But quite unlike most folks who get help and quietly wander away, or (hopefully) post a summary for posterity, she made a video describing what she learned and posted to YouTube. Very cool. Well done, Michelle!
I use an Apple Magic Mouse and really do love it. I use Adobe InDesign CS4 for writing technical documents. I also have a ring finger. Each of these things works well by itself. Together, they don't work well. This is a typical systems engineering problem. Each part works as designed, but together they don't work well.
Today, Oracle is presenting a webcast describing their strategy for the company going forward after the Sun acquisition. In the first 20 minutes there was much discussion about delivering integrated systems: applications + database + OS + hardware. This is a tremendous value proposition. It is such a tremendous value proposition that it could have been taken from the slides we put together 8 years ago in Sun's Integrated Systems Engineering group.
I am helping a client work through some performance problems and thought I might share a view with you. The data was collected for 57 seconds during a production run. The problem we are chasing is the usual performance problem: latency. In some cases the latency is close to 100ms, which would make everyone except a floppy disk user unhappy. The view of the data is intended to shed some light on where problems might exist that we need to further explore.
I was doing the holiday cleaning recently and came across a blast from the past. Long ago, when I worked for NASA at the Kennedy Space Center, we were doing some work with an Apple Lisa and an Apple ///. Naturally, when the Macintosh first appeared, it created quite a stir. Along with the rush of announcements and press was the creation of a new magazine, St. Mac. I dusted off a copy of the premier edition. Apparently the magazine didn't last long, only 7 issues, but it brought back a lot of memories.
I came across an interesting microbenchmark this week. It shows that some workloads can show confusing results, or head fakes, can lead to difficulty in understanding benchmark results. In this case, a method we use for finding the performance envelope for ZFS is not effective.Before I dive into the microbenchmark, a few words about the ZFS Intent Log (ZIL). ZFS is a transactional file system, which means that it collects I/O into a transaction group (txg) and commits that txg to persistent storage.
A few years ago I planted some macadamia trees in the orchard. It takes a few years before they begin to bear fruit. 2009 was the first year that we had blooms. I was very excited!This small cluster of blooms produced 6 nuts. Of these, all disappeared over the summer (I blame the ravens) except one. Last week, I harvested the one, special nut.
I have posted my slides for the ZFS Tutorial at the USENIX LISA09 conference on slideshare.net. I apologize for the delay, I've been fighting the beast trying to get the PDF uploaded. I finally gave up and uploaded the keynote presentation. As such, I've also disabled the file download. If you want a copy of the PDF, drop me an e-mail and I'll send it to you.These slides are the full deck.
The USENIX LISA09 conference is next week and I'll be leaving the ranch to travel back east to attend. On Monday, November 2 I'll be giving a tutorial on ZFS. Much has changed since I gave a similar tutorial at the USENIX Technical Conference last June. Somehow I've got to get the content down from 200+ slides into something feasible to deliver as a one-day tutorial.
Recently, I was asked about what I've done for cloud computing. Personally, I think "cloud computing" is just the latest marketing buzzword, and represents a passing fad. But the concepts people use when trying to describe the cloud are a good foundation for providing computing services. Many of these concepts have been in place for 25-30 years, at least in the engineering workstation market, but perhaps not widely applied to other markets.
Recently, there have been a number of discussions about how to backup an active file system with millions of files. This is a challenge because traditional backup tools do a file system walk -- traversing the file system from top to bottom looking at the modification time for each file. This works well for file systems with a modest number of files. For example, one of my OpenSolaris systems has around 62,000 files in the root file system and backups go at media speed.
Denis Ahrens recently posted a compression comparison of LZO and LZJB to the OpenSolaris ZFS forum. This is interesting work and there are plenty of opportunities for research and development of new and better ways of compressing data. But when does it make sense to actually implement a new compression scheme in ZFS?The first barrier is the religilous arguments surrounding licensing. I'd rather not begin to go down that rat hole.
Is the best technology the pathway to success? Nope. In this post, I'll take a strategic look at the future of the Btrfs file system.Using B-trees (or modified B-trees) for space allocation has been the rage among file system designers in the past few years. Some of the more notable efforts are ZFS, Btrfs, Reiser4, and NILFS. The availability of open source operating systems, especially BSD and Linux, has enabled explorations of interesting new ways to manage storage and implement file systems. This is a good thing.
We were eating dinner this evening out on the deck (hot wings, one of my favorites) when nature created a beautiful sight, purple rain. Some tropical moisture flowed into the San Diego area today and made some beautiful clouds. A few dropped out some rain, though most of it was virga. As the Sun was setting, the colors were just right for a few moments and I was able to snap a picture of the purple rain over Ramona.
I've made a change to zilstat which will show how much data is written to the ZIL between txg commits. The old behaviour of time-based reporting is still available.The way I implemented this was to allow you to specify "txg" as an interval instead of a numerical interval in seconds. However, since there may be multiple pools, zilstat will also require that you specify a pool when you want to look at the txg intervals.
Earlier today there was a conversation on the Cloud Computing group where Christopher Steel wrote, "I also [see] a play for Oracle to get into the Database appliance space. With the Sun acquisition, they now have the hardware, OS, and support pieces to delivery out-of-the-box enterprise Database solution."I got a tickle out of this because back in 2002, I was Chief Architect for Sun's Enterprise Engineering group where we designed appliances.