Many organizations — like those in media and entertainment and in high performance computing (HPC) — find themselves challenged with maintaining multiple petabytes (PBs) of fixed content unstructured data, such as digital audio and video files, for long periods of time. Collectively, this data has ongoing value, such as for retrieving selected videos for revenue generation, broadcast or running scientific experiments. The problem is how to pay for storing Web-scale capacity at the most reasonable price possible, while still generating the needed performance.
This is where Spectra Logic comes in with solutions involving disk, tape, or a combination of both. Specifically, the focus of this report will be on ArcticBlue which Spectra Logic says provides cost-efficient disk storage at prices close to tape, but with higher performance than tape can deliver and which some applications require.
BlackPearl delivers a gateway to object-based deep storage
Spectra Logic refers to extremely low cost, power efficient, and dense storage as “deep storage.” The most efficient and effective way to store large quantities of unstructured data in such environments is as object-based storage (similar to Amazon Web Services’ S3 and Glacier solutions, and several others). As a tape system vendor, Spectra Logic wanted to use tape to store data as objects. However, since tape is a sequential access medium with relatively slow performance compared to disk, it is not a natural or easy choice for managing object-based storage.
Spectra Logic introduced BlackPearl two years ago, a gateway solution that enables the use of tape as an object-based storage target. BlackPearl uses a Deep Storage Service (DS3) that extends Amazon’s Simple Storage Service (S3) to deal with tape. DS3 manages direct bulk storage read (GET) and write (PUT) operations to manage deep storage targets, such as tape, but also disk with ArcticBlue. BlackPearl stores data in object-based deep storage by grouping collections of data as “buckets” and manages deep storage system processes, such as retries and error handling. It also manages the necessary information to control the placement and retrieval of data. That includes maintaining an object catalogue for the actual physical location of the stored data, as well as the metadata information associated with the objects.
ArcticBlue: Spectra Logic’s disk deep storage target
The initial deep storage target of BlackPearl was tape. Tape is very cost effective and serves use cases where users can accept a few minutes as the time to start to retrieve selected objects. However, while many PB-scale fixed content repositories do not need the performance of active production data of retrieval in sub-seconds or seconds, minutes to retrieve data is too long. Spectra Logic has introduced ArcticBlue as its BlackPearl disk-based deep storage target that can begin to retrieve data in about 30 seconds. This fulfills the need for those use cases that require retrieval in “nearline” time rather than minutes.
The goal of ArcticBlue is to provide cost-efficient deep storage. It starts off with a longer useful life for ArcticBlue disk storage. The company claims that an ArcticBlue system has more than twice the life of traditional disk systems — 7 years as compared to the typical 3 year obsolescence cycle of traditional disk systems.
One way that Spectra Logic extends drive life is through lifecycle management that leverages power-down technology. That is, ArcticBlue powers down drives when not in use, a technique the company calls Drive Lifecycle Management (DLM). The older term for shutting down disks when not in use is MAID (massive array of idle disks). Spectra Logic has corrected the problems that MAID faced as we will see below. DLM is one component that leads to lower costs; employing new disk technology is a second.
ArcticBlue uses Shingled Magnetic Recording (SMR), a new hard drive technology that Seagate offers. SMR disks layer tracks on a platter on top of one another (analogous to roof shingles on a house) in order to increase platter density, i.e., tracks per inch. The result is higher capacity in the same physical footprint as traditional hard disk drives. Seagate’s 8GB SMR drive not only bumps up capacity by 33% as compared to the largest standard drives of 6TB, but is half the cost of most of those drives, in addition to being more power efficient.
Most notably, the targeted use for SMR drives is for archive, i.e., deep storage applications, such as for ArcticBlue’s nearline disk, rather than SAN or NAS applications. The reason for this is that the SMR write head is wider than a single track, so writing data to an SMR hard disk must be performed sequentially so the process doesn’t destroy data on the overlapping tracks. In contrast, block-oriented SAN and file-oriented NAS applications are not used to having such restrictions. In fact, Spectra Logic designed its object-based approach for BlackPearl, including the use of modern RESTful interfaces as part of DS3, to work with high-latency storage technologies, such as MAID. Data can be written effectively sequentially (taking advantage of large physical cache) to protect the SMR drives (but, of course, can be retrieved randomly). The RESTful interface can also handle the powering up latency of 30 seconds without timing out that SAN and NAS applications cannot handle. Also, ArcticBlue can power up all drives to access all data — randomly and concurrently. Spectra Logic thus asserts that it finally delivers on the original promise of MAID.
When storing humongous quantities of data (i.e., a PB and up) for long-term preservation, data reliability and integrity are paramount. To achieve this, ArcticBlue uses five complementary levels of protection. As examples, these include triple parity of data, continuous bit-rot error checking while writing data to disk, file level with end-to-end checksum support and copy redundancy to tape. All in all, with 5 levels of protection, Spectra Logic feels that it has achieved its goal of unsurpassed data reliability and integrity.
So what does this cost? Spectra Logic claims a purchase price for 6.1 PB raw uncompressed data (including spares and parity) of $0.10/GB (raw) and that would go up to $0.15/GB with support at $0.01 per year. Interestingly, that is an amortized storage cost of $0.0025/GB/month over that five-year period. With a more basic support service, the seven-year cost is only $0.14/GB, which translates to $0.00168/GB/ month which is 1/6th of a penny per gigabyte per month. Smaller configurations are slightly more expensive.
Overall, these are up front capital expenses, but they compare favorably to the cloud, and a user does not have to worry about changes in monthly costs or the substantial fees that many cloud providers charge to retrieve data. In fact, per Spectra’s pricing, a customer’s entire on-premise infrastructure could be paid for in less than a year, versus monthly bills of $0.01/per GB per month to a cloud provider, indefinitely.
To paraphrase the late U.S. Senator Everett Dirksen, “A PB here, a PB there, pretty soon you’re talking real storage.” And the truth is that more and more organizations are being saddled with maintaining PB-scale repositories for the long term and being responsible for extracting ongoing value from those huge data collections without going broke. Spectra Logic with its BlackPearl-targeted ArcticBlue nearline storage addresses that PB-scale challenge. Those who are faced with paying the bills should find the cost of the Spectra Logic solution attractive vis a vis alternative solutions, most notably public cloud platforms.
But while cost is important, IT also has to consider how well the solution will actually work. Is it usable? Spectra Logic is noted for the simplicity and stability of its products and, even though it is using some new technology (such as SMR drives), its overall approach with ArcticBlue, such as drive lifecycle management and proven track record, should make users feel comfortable. Is the new solution as reliable as the company claims? Preventing loss of data at Web-scale capacities is critical, and ArcticBlue’s complementary levels of protection should do a very good job of ensuring the reliability and integrity of data. Those who are charged with managing PB-scale unstructured data repositories should therefore give serious consideration to Spectra Logic’s BlackPearl with ArcticBlue solution.