Buried
The proliferation of mobile handsets with increasingly advanced data capabilities, coupled with greatly widened transmission pipes, has made life easier, more productive and even safer for personnel in the enterprise, public utility and first-responder sectors. But it also has created giant headaches for those on the back end who have to figure out how to keep from being buried by the incredible amount of data these devices generate. Further complicating matters is the size of some of these files — streaming video comes immediately to mind — as wider pipes inevitably lead to much bigger data files.
More data and larger files demand bigger and better storage solutions that must be reliable, retrievable and highly secure. Demand for such solutions is being driven not only by market forces, but also by federal rule-makings. For example, businesses and federal agencies are now required to keep all of their data for a longer period of time to adhere to a late 2007 amendment to the Federal Rules of Civil Procedure (FRCP) by the U.S. Supreme Court that addressed electronic discovery of evidence during federal court cases. Similarly, the health care sector also must comply with stringent data-security regulations mandated by the Health Insurance Portability and Accountability Act of 1996.
This costs money and energy. Economically and environmentally speaking, housing data can be a large capital expense, said Greg Schulz, founder of StorageIO, an IT infrastructure technology analyst firm. Powering hardware and software — and protecting the data via encryption, backups and replications — can place a tremendous strain on any budget — even that of a large federal agency.
Schulz said federal data centers account for about 10% of all power consumed by information-technology data centers located within the U.S. A federal law enacted in 2006 requires that these centers install energy-efficient computer servers.
“Buried in the law is that federal agencies have to show some improvement in energy efficiency,” Schulz said. “Data center operators need to determine how to develop the environment to be 20% more efficient and carry that efficiency over to the next year — when data storage needs increase 20%.”
The challenge is convincing commercial and federal data centers that what’s good for the environment also is good for their bottom lines. Cooling systems and leased space cost significant money, and large amounts of both are needed to sustain large-scale data centers. If faster storage devices are moving larger amounts of data, not only is power conserved, but also the number of transactions that can be completed could be increased by a factor of two — or more.
A new spin
As a result, more vendors are offering large-scale, long-term data storage and retrieval systems that offer a higher disk density — the capability of storage solutions to store more data on smaller systems. Today digital data typically is stored using two platforms: solid-state random access memory and magnetic hard disk drives.
The former, which has no mechanical parts, allows rapid access to data but may cost 100 times more per bit compared to a magnetic disk drive, according to a report from Montgomery Research. The latter stores data cheaply, but because it relies on the mechanical rotation of a disk, it is slow and somewhat unreliable.
“There are certain possibilities that the disk can crash and you can lose that data,” said Stuart Parkin, an IBM fellow and experimental physicist at the company’s Almaden Research Center in San Jose, Calif.
But one long-term, magnetic data storage system, dubbed MAID (massive array of idle disks), has gained traction. The system uses hundreds to thousands of hard drives to deliver near-line data storage, which is the onsite storage of data on removable media, such as magnetic disk, magnetic tape and compact disc.
MAID is designed for write once, read occasionally (WORO) applications where each drive is only spun as needed to access the data stored on that particular drive. Will Layton is cofounder of COPAN Systems, a MAID-centric data storage provider for federal users that has received $88.4 million from Tier 1 venture capital investors, most recently closing a $32.4 million Series D round of financing in the third quarter of 2007.
Layton said that federal agencies traditionally have stored data on disk or on tape, but he added that magnetic tape is a sequential media that is hard to manage and also can be unreliable and labor intensive. Users wouldn’t know whether data stored on tape is recoverable until they actually try to access it — which can affect the agency’s ability to track down and prosecute a criminal, he said.
“If you want to go back and look at surveillance data and you are just learning you have it or don’t have it that day, that is the wrong time to be finding that out,” Layton said. “And that’s the data that they have to keep around longer, because you have to go back and look at a point in time at one camera. If you cannot do that quickly, the value of that data is almost zero.”
The company’s MAID infrastructure is database-centric and acts as an archive by creating multiple copies of data — for redundancy purposes — and storing them on disk. The system uses an Intelligent Drive Electronics, or IDE, drive — an interface for mass-storage devices in which the controller is integrated into the disk or CD-ROM drive. This saves space and ensures data is secure and accessible, Layton said
Layton said the system duplicates records and keeps up to 17 copies of data on file, based on the client’s program applications. Clients can control the data onsite at their facilitates. The solution houses up to 896 IDE drives in a single cabinet — providing the equivalent of 672 terabytes storage capacity in 1 square meter — that run on a single 30-inch circuit.
However, agencies interested in investing in the technology will have to budget for it. The company’s smallest system can be purchased at 28 terabytes for $100,000, Layton said.
Before they do, they should be aware that MAID systems have one significant limitation, according to StorageIO’s Schulz: They are not known for the swift retrieval of data because only 25% of the drives are spinning at one time. As a result, MAID users suffer performance penalties, such as reduced throughput.
“That means users will be waiting to recover the data,” Schulz said.
Layton concedes that the first transaction in the system runs at slower speeds. However, he said that after the first 12-second transaction, data is retrieved in milliseconds. For example, the solution can move 5 terabytes per hour or 1.44 gigabytes per second.
Better safe than sorry
Another approach is offered by AmeriVault — a 10-year-old data-services protection company, which offers offsite data storage solutions. Kevin Harris, the company’s co-founder and chief information officer, recognizes that federal and commercial users face an exuberant growth in overall data acquisition, and he agrees that federal regulations banning organizations from purging data and requiring agencies to accurately find content in an acceptable amount of time add to the complexity.
As a result, public safety agencies and businesses alike must invest further in applications that will perform category indexing in order to support search capabilities across terabytes of information, Harris said.
“On a 911 system that must be running all the time, certain [public safety agencies] would want to make sure that the system is running on the best servers and the best storage so it’s always functioning,” he said. “At the same time, maybe those 911 recordings after one year … don’t warrant the cost it takes to run those systems any longer, but [the agency] is still required to access the data.”
Consequently, public safety agencies have to decide which data can be moved to an archival data storage system. Such systems offer slower disk-system performance with less reliability, but they cost less, Harris said.
Another challenge is recovering data after a disaster. AmeriVault offers a disaster-recovery service dubbed Restart IT that is a hosting service of client-provided equipment. According to Harris, servers are stored in one of the company’s nationwide data centers, and clients can remotely recover their data from those systems through either a wireless or wired link. Clients are provided a software application that lets them create backup copies of their data that then are stored on AmeriVault’s servers.
“We looked at the market and we realized the three-person law firm that still has valuable business data to recover didn’t have the option to spend $100,000 on a system,” Harris said. “So we give them a place to put a spare server, using the online backup service to guarantee the recoverability of that data. For less than $10,000 a year, they have the ability to recover their business.”
Newer innovations are on the horizon, and Schulz predicted an increase in the use of drives that use solid-state memory to store data. But current solutions aren’t going to solve future data storage conundrums, according to IBM’s Parkin, who believes nanotechnology is the next step for data storage. In 1994, his research led to the development of a disk-drive read/write head based on a spin-valve sensor, a nanotechnology device.
The device — launched in 1997 — became the core of modern data-storage devices. Parkin boasted the device enabled a more than 30-fold increase in disk-drive data densities — from 2.4 gigabits per square inch to more than 70. Over the next five years, IBM further increased the storage capacity of a magnetic disk drive by about one thousandfold.
“That was because of our work on multilayered, atomically thin engineered magnetic structures, which led to the field of spintronics,” he said.
Storage devices based on spintronics, or spin-based electronics, let users store enormous amount of data. The technique also lets users access the data quickly in conjunction with software-based database tools that ensure the data is useful and in a searchable form, Parkin said.
For the past three years, Parkin and his team of researchers have developed and tested a solid-state disk drive on a chip based on their spintronics research.
He explained that devices using solid-state memory to store data emulate a conventional hard-disk drive. But with no moving parts, a solid-state drive largely eliminates latency and other electro-mechanical delays and failures associated with conventional hard-disk drives.
He added that the new storage technology, dubbed the “magnetic racetrack” — which is still being designed and likely won’t be beta-tested for another four years — could provide another revolution in the ability to access and manipulate digital information.
“In other words, we are working on a new memory device that would have no moving parts, use much less power than a disk drive and would be a million times faster,” Parkin said.
Forward thinking
1940
Data is sorted on punch cards and punched on paper tape; small punched pits store the information.
Late 1940s
First magnetic memory is introduced. Dubbed magnetic cores, each core stores one bit of data.
1951
UNIVAC 1, or the universal automatic computer, is the first computer to use magnetic tape for storage.
1956
IBM introduces the random access method of accounting and control, or RAMAC — the first commercial hard-disk drive.
1962
IBM invents the laser diode, which becomes the fundamental technology for read-write optical storage devices.
1963
IBM introduces the first storage unit with removable disks (IBM 1311), effectively ending the era of punched cards.
1967
IBM discontinues the development of magnetic core memory in favor of volatile monolithic semiconductor memory chips with much faster data access and lower cost.
1970
The portable storage era begins with the invention of the floppy disk.
1977
Sony, Mitsubishi and Hitachi demonstrate their optical digital audio disk (DAD) systems that use large disks, about 30 cm in diameter. The modern CD is born.
1978
The first patent for Redundant Arrays of Independent Disks, or RAID technology, is filed. StorageTek develops the first solid-state disk
1981
IBM introduces its first personal computer, the IBM PC, which becomes a standard in microcomputing.
1984
Compaq releases the Intelligent Drive Electronics, or IDE, interface to embed the storage controller onto a disk drive.
1985
Imprimis builds the first IDE drive by integrating an ST506 controller in the hard disk drive.
1987
University of California-Berkeley introduces the initial definition of RAID levels.
1997
IBM’s Stuart Parkin develops the disk-drive read/write head based on a spin-valve sensor, a nanotechnology device that becomes the core of modern-day storage devices.
1998
President Bill Clinton signs The Digital Millennium Copyright Act (DMCA) into law on Oct. 28. The act defines how different types of data are to be distributed, copied and stored.
2000
IBM introduces the 1 GB microdrive, which is smaller than a matchbook and weighs only 16 grams.
2006
Congress passes Public Law 109-431, which mandates that federal data centers must install energy-efficient computer servers.
2006
Copan Systems offers MAID solutions, and solid-state disk drives gain traction in the marketplace.
2007
The Supreme Court amends the Federal Rules of Civil Procedure, which cover e-discovery of critical evidence during federal court cases and mandates large-term storage of all data.
2012
IBM to beta-test the “magnetic racetrack,” a storage-class memory system that promises a solid-state memory with the cost and storage capacities rivaling that of magnetic disk drives.