IBM Continues to Focus on Storage for AI and Big Data

Any prediction worth its salt forecasts a nearly unbelievable increase in the creation of unstructured data. Mining what may be a treasure trove of potential insights is the role of intelligent data analytics, notably through such tools as those provided by AI (artificial intelligence) or big data software. Many consider intelligent data analytics to be a key driver of business value for many enterprises such as uncovering new revenue-producing opportunities. IBM Spectrum Discover and IBM Cloud Object Storage are two key IBM software products that support those intelligent analytics efforts.

Data-driven software intelligence takes center stage

Traditionally, software intelligence has been mainly application-driven. Data is created and managed to fit the needs of the application; typically, the creation of structured data is part of the application process, as in online transaction processing (OLTP) systems.

In contrast, unstructured data typically requires software intelligence that is created and managed to fit the needs of the data, which may be (and likely is) created independent of the application. Examples include AI, big data and analytical software designed to discover hidden value.

IBM’s Spectrum Discover provides metadata management that among other capabilities delivers the curation (selection, organization, and presentation) of information content that the intelligent data analytics tools require. Another key IBM software-defined-storage (SDS) software product, IBM Cloud Object Storage, stores and manages the data that the analytical tools work on as object storage. IBM’s latest storage announcement discusses updates to both of these data-driven software intelligence products.

IBM Spectrum Discover —More open, stronger data classification, and easier compliance

IBM Spectrum Discover is metadata management software (see more in our report at IBM Driving Storage Revolutions) that can be used on files or objects in conjunction with big data, AI and analytics software. Good metadata management is essential to enable those software tools to properly classify data and process voluminous quantities of data in a timely manner,

First announced in October 2018, Spectrum Discover worked originally only with IBM products — namely with file data managed by IBM Spectrum Scale or object data managed by IBM Cloud Object Storage. This new announcement includes support for key heterogeneous storage platforms —Dell-EMC Isilon, NetApp filers, Amazon S3 (and by definition other public cloud providers that support the S3 protocol), and Ceph, which is popular for its access to object storage, but also supports block and file storage. Supporting heterogeneous storage platforms has long been a key strategy for IBM’s software-defined storage (SDS) products so it should come as no surprise that Spectrum Discover should follow in their footsteps. Yes, IBM would love to sell storage hardware systems in addition to software, but selling software is profitable in and of itself. Not only that, expanding its software footprint may also give IBM opportunities to build storage hardware sales.

In addition to automatically capturing and indexing system metadata as data is created, Spectrum Discover provides for custom metadata tagging. That adds extra intelligence that can build additional value through better insights at analysis time. The new version of Spectrum Discover provides content-based data classification that applies custom metadata tags based on content. All of this would be for nothing without high speed search, but IBM states that Spectrum Discover delivers consistent low-latency searches, even on billions of files.

A key additional benefit is to make it easier for companies to follow legal or regulatory compliance rules for sensitive data, such as PII (personally identifiable information) including social security and credit card numbers. Given the increased emphasis that enterprises should be placing on sensitive data, this is a primary benefit that Spectrum Discover provides — a nice thing to have on top of its metadata support for analysis efforts.

IBM Cloud Object Storage upgrades its own IBM storage array capabilities

Although it is simplistic and understates the power and rich functionality of the product, you could think of IBM Cloud Object Storage as a “file server” for objects instead of files. IBM Cloud Object Storage has three deployment models — in the cloud, on-premises as software-only, or embedded and pre-installed in a storage array. The new announcement focuses on IBM-provided storage arrays which should expand the company’s presence in object storage sales. The new Gen2 arrays are compatible with Gen1, which provides investment protection for existing customers, thus eliminating painful and lengthy data migration processes, a critical point given the enormous size of many object storage environments. Yet these customers can also use Gen2 to accommodate growth requirements.

IBM’s Cloud Object Storage Gen2 is all about cost efficiency and what is loosely called “speeds and feeds.” Now, that may not sound very exciting, but when you are an exabyte-class storage customer (and IBM stated that it has ten such clients) or a petabyte-class customer or even a lowly hundred-plus terabyte-class customer (and I am being facetious here as this is still pretty large in my book), all those improvements are extremely relevant. Compared to Gen1, IBM Cloud Object Storage Gen2 offers cost savings of 37% per TB compared to last year and 1.6X more write operations per second in terms of performance. In addition, Gen2 offers 26% more capacity for the largest single node (1.3PB) as well as the same percentage increase in density of a single rack (10.2 PB).

Mesabi musings

With all the talk about the importance of AI and big data analytics tools, they cannot operate in a vacuum. They require metadata management software to curate and prepare the data, as well as software that manages the efficient placement and access of stored data. IBM Spectrum Discover meets the first objective and supports the second by informing better data placement while IBM Cloud Object Storage aims at customers for whom object storage is the choice over file-based storage.

IBM Spectrum Discover, among other things, now openly supports key storage vendors Dell/EMC and NetApp as well as S3-compliant cloud providers, notably Amazon, the father of the S3 protocol. IBM’s own storage arrays upon which IBM Cloud Object Storage is pre-installed and embedded have been upgraded to Gen2, which is more cost efficient and powerful than Gen1. It all adds up to better, more seamless support for AI and big data projects. IBM customers will consider that to be a good thing while potential clients should find these features compelling and attractive.

IBM Continues to Focus on Storage for AI and Big Data

Leave a Reply

Get In Touch

+1.781.326.0038

[email protected]

Info

Work

Support