STORAGE
The grid storage facade
By Jon Toigo, Network World -- Lately the term "grid storage" has crept into the product literature of vendors ranging from storage stalwarts IBM Corp. and Network Appliance Inc. to numerous start-ups. While grid storage appears to borrow conceptually from grid computing — a set of technologies used to build supercomputers from clusters of inexpensive processors — the similarity ends there. The two have little else to do with each other. Grid storage refers to two items: a topology for scaling the capacity of network-attached storage (NAS) in response to application requirements, and a technology for enabling and managing a single file system so that it can span an increasing volume of storage.
One way to view grid storage is as a means to scale NAS storage horizontally and vertically while avoiding the problems associated with each.
Currently, scaling horizontally means adding more NAS arrays to a LAN. This works until the number of NAS boxes becomes unmanageable. In a "grid" topology, NAS heads are joined together using clustering technology to create one virtual head. NAS heads are the components containing a thin operating system optimized for Network File System (NFS) protocol support and storage device attachment.
Conversely, the vertical scaling of NAS is accomplished by adding more disk drives to an array. Scalability is affected by NAS file system addressing limits (how many file names you can read and write) and by such physical features as the bandwidth of the interconnect between the NAS head and the back-end disk. In general, the more disk placed behind a NAS head, the greater the likelihood the system will become inefficient because of concentrated load or interconnect saturation.
Grid storage, in theory, attacks these limits by joining NAS heads into highly scalable clusters and by alleviating the constraints of file system address space through the use of an extensible file system.
Who Needs Grid Storage?
Grid storage would be useful to anyone with a large complement of NAS arrays to administer, according to a manager of a national Internet e-mail portal service who asked not to be named. He complains that his current complement of several hundred NAS storage devices from a prominent NAS vendor creates a huge management problem. Managing the capacity on each array requires that he access each array's self-generated status and configuration Web page, which is "like surfing the Web all day." To him, the possibility of one virtual NAS array, created from a cluster of individual arrays, is a management boon.
The development of storage grids clearly is geared toward NAS users today — primarily because NAS vendors are spearheading such efforts. But others might one day benefit from the grid storage concept, particularly those who have unruly Fibre Channel fabrics. Take for example a hospital in northern Virginia with several isolated storage-area network (SAN) islands — the result of uncoordinated storage acquisitions made by various corporate turf lords. Making disparate SANs communicate and share data with each other in the face of non-interoperable switching equipment is a nightmare for the hospital. Conceivably, by using clustered NAS devices serving as gateways and managers of the back-end SANs, the hospital would gain improved capacity, file sharing and management generally.
For those organizations with file storage consisting of millions of discrete files, the limitations of current file system address spaces can impose major hurdles for centralized management and capacity efficiency. Including this data into a massively scalable storage grid-based file system would promise more efficient file sharing.
Competition in the Making
Established vendors such as Network Appliance and Silicon Graphics Inc. (SGI), and newcomers such as Panasas Inc., are working on clustered NAS technologies that sometimes are called grid storage. SGI might be ahead of the game with its application of proprietary server clustering technology to the NAS head, and Panasas has begun shipping a system based on Linux Beowulf clustering. Both companies' products primarily target high-performance computing.
For grid storage, Network Appliance plans to use technology it gained when it acquired Spinnaker Networks in February, says Chris Bennet, a senior director with the vendor. Network Appliance's challenge is particularly daunting. While the Filer products use a proprietary implementation of the Berkeley Fast File System, the Spinnaker products had used the Andrews File System. The two file systems have fundamental architectural differences that might require a departure from current product design. "Several years will be required to converge the technology at the code-line level," Bennet concedes.
Like competitors, Network Appliance seeks to improve management of multiple physical NAS heads and to create one scalable, synchronized directory. This directory would represent all files stored on all the NAS arrays as the number of arrays is expanded. Here grid storage appears to be less about NAS architecture and more about file sharing.
At IBM's Almaden Research Center, work is proceeding on a self-described grid storage project aimed at creating a "wide-area file sharing" approach, says Leo Luan, research staff manager on IBM's Distributed Storage Tank (DST) project. The objective is to extend the capabilities in a "Storage Tank" — a set of storage technologies IBM rolled out last year that includes virtualization services, file services and centralized management — to meet the needs of large, geographically distributed corporations. Such sprawling companies struggle to replicate and distribute copies of files among their disbursed data centers.
The heart of grid storage is a methodology, whether based on clustered NAS or other distributed storage topologies, to enable synchronized file sharing. IBM is looking at untapped capabilities in the NFS Version 4 standard to help meet the need. "DST extends to NFS clusters that can be used to build a much larger grid with a single global file namespace across a geographically distributed environment," Luan says.
Making the approach open and standards-based requires a schema for file sharing that is independent of a server's file and operating systems, and that does not require the deployment of a proprietary client on all machines. IBM is working with the Global Grid Forum's File System Working Group because its intent is to produce a standards-based Lightweight Directory Access Protocol server to act as the master namespace server.
Industry observers disagree about the timeframe for, and even the likelihood of, a truly vendor-agnostic grid storage solution reaching the market. Some believe that the underlying technologies for global file namespace management, including virtualization and synchronized replication, are simply too immature or too prone to vendor infighting to be ready for prime time. Others take exception with the disruption inherent in most current extensible file systems, which commonly require either the modification or wholesale replacement of server file systems. To be successful, grid storage must be non-disruptive and transparent to users and applications.
Rival Technologies
Yet others question the relevance of such complex grid storage architectures in the face of rival technologies. For example, global namespace servers, such as NuView Inc.'s StorageX, and networked file sharing appliances, such as Tacit Networks Inc.'s Ishared, deploy without interfering with existing file systems. They also provide file accessibility and synchronization services for wide-area file sharing that users might find perform well enough for their needs.
The death knell for grid storage ultimately might result from a failure to define the term. Not only is there the confusion surrounding use of the word "grid," but there also is a similarity between much of the grid storage discussion and the description of storage utilities in 2003 — and of SANs the year before. Without a common industry definition for the term, it will remain more "marketecure" than architecture.
Toigo is CEO of Toigo Partners International, a technology analysis firm. He can be reached at jtoigo@intnet.net.