(Or “How I help my clients pick the best storage for their needs”)
This blog post is going to be about my methodology around helping a client understand their storage options and picking the solution that best fits their needs.
I break down any potential storage solution into two of four categories; frame-based/frame-less and block-based/file-based. Every storage array on the market can be placed into two of these quadrants. So we’re all on the same page, let me give a quick background of what each quadrant means.
Frame-based arrays are your traditional SANs. You buy a controller (or two) and however much disk you need, generally added in trays. You can continue to add disk until you have reached the limit of disk that your controller can support.
Frame-less arrays are a similar in concept to grid computing systems; every time a node is added, CPU, memory, network and disk are added. You’re not locked into the one or two controllers the way you are with frame based computing.
Block Based arrays are traditionally what was considered a SAN. They don’t run traditional file systems like Windows’ NTFS or Solaris’ ZFS. They carve up raw storage into LUNs and serve that out via block based protocols (FC, iSCSI, FCoIP, etc.)
File Based arrays use an internal filesystem, like ZFS, WAFL, or NTFS, and are able to serve files to clients via file based protocols (CIFS/SMB, NFS, HTTP/S, FTP). They have traditionally been called NAS devices, but this is changing as many are now able to serve our block based protocols.
So now we know how to classify storage arrays, but how does this help my clients find the correct storage solution? The answer is that each of these architectures has specific advantages and disadvantages.
Frame based arrays
A frame based array will be easier to design and potentially easier to manage, thus lowering costs. When designing a frame based array you make an assumption that there will be at most two controllers. This makes inter-controller communication easier, as well as adding features easier. This can lead to faster development and richer feature sets.
Frame-less arrays
A frame-less array has a few advantages over their frame-based counterparts. It has to do with what I call “linear growth”. With a frame-less architecture your array is composed of nodes. Generally each node has compute resources (CPU), network resources (FC, Ethernet, etc), cache (DRAM, NVRAM, SSD), and storage (disk). This means that every time you add capacity, you’re also adding compute, network, and cache resources. This means you maintain a balanced system as you scale the capacity of the array. Contrast this to a frame based array, where you must anticipate your
future growth and buy that upfront. If you under estimate your growth, you’re going to end up doing a rip-and-replace to move up to the next larger controller. If you overestimate your growth, you’ve just wasted money on a controller that is too large for your environment. A storage engineer (like me J ) can help mitigate some of this risk based on our experience with other clients, but unless you have a crystal ball that can tell us how much storage you’re going to need in the next three to five years, it’s still going to be an approximation.
Block Based arrays
Traditional SANs use a block based design where disks are assigned to RAID groups and carved up into LUNs, which are then presented to servers as block devices. The server formats it with its native filesystem, and proceeds to use it like a local disk. This is easier to design as it has fewer moving parts than a file based array. Block based arrays are generally less expensive in the low and mid range, and scale larger at the high end than file based solutions. People generally associate block based arrays as being faster than file based arrays. This does not necessarily hold true today with high-performance file based products like the NetApp FAS, and the Oracle 7000. Keep reading to find out why.
File Based arrays
Once relegated to mundane file sharing tasks, file based arrays have improved leaps and bounds over the past five or so years, thanks in no small part to Moore’s law. File based array’s require more CPU power to do the same amount of work as their block based counterparts. This was an issue when we had Xeons running at 800 MHz, but now that we have six core Xeons running at 3+ GHz we have more CPU power than we know what to do with. File based arrays can leverage this abundance of CPU power to meet and in some cases exceed the performance of block based arrays. They do this several ways including leveraging advanced caching algorithms to prefetch blocks from disk into cache before they’re needed by the client. O f course having equal performance certainly isn’t a good reason to go file based over block based. So why are file based arrays gaining rapidly in popularity? Feature set and TCO. Because a file based array will have a local filesystem on the array, it unlocks a huge amount of features not possible with traditional block storage, for example DeDuplication and encryption. One only has to look at the (admittedly dizzying) selection of software features on the NetApp website to understand my point. The other big feature of file based storage is what’s been coined Unified Storage. A unified storage device can serve both block based data (iSCSI, FC) and file level data (NFS, CIFS, HTTP, FTP, etc). This means a single device can take the place of a block based array and consolidate all of your file servers. This can save a lot of time between patching and administering windows file servers. Of course there are disadvantages of file based arrays; they tend to be more expensive and they don’t scale much over a petabyte.
When I do this for a client I have the benefit of being much more interactive. I can take their needs and pain points into account and tailor the discussion around them. By the end of the meeting a client has a much better idea of what they want, and then I can offer several different solutions that fit their needs. We can the drill down on those products and the client can weigh the pros and cons of each one. My clients appreciate that we’re vendor agnostic and that we can give them solutions and let them pick the best one, rather than trying to force a specific technology on them.
Do you agree with me? Think I’ve got it all wrong? Please let me know in the comments.
