IBM Reinforces Storage Portfolio for AI and Big Data

In its recent storage announcement, IBM strengthened its storage solutions for AI and big data through the introduction of the IBM Elastic Storage System (ESS) 3000 and an important new capability for IBM Spectrum Discover. The IBM Elastic Storage System 3000 enables an IT organization to take advantage not only of IBM’s software-defined-storage (SDS) capability in Spectrum Scale software, but to do so in conjunction with complementary IBM physical storage that together results in a robust and scalable storage system for AI and big data workloads. IBM Spectrum Discover, which fulfills the essential role of classification and metadata tagging in the AI (artificial intelligence) data pipeline, can now work with an organization’s own data through access to a copy of the data that is available in backups or archives.

As a corporation, IBM has long had AI as a major focus as well as analytics and big data in general. Although Watson is the face of IBM’s AI capability to the general public, the company also has a number of non-Watson-based AI initiatives. For example, on the storage infrastructure side, in December 2018 IBM announced the Spectrum Storage for AI with NVIDIA DGX reference architecture. A vendor-supplied reference architecture can help an AI project team to select a set of hardware and software products that lead to an AI infrastructure solution. (See IBM Spectrum Storage for AI with NVIDIA Reference Architecture for more detail.)

In keeping with the theme of storage for AI and big data workloads, IBM announced in July major new capabilities for Spectrum Discover and IBM Cloud Object Storage (COS). The former now supports non-IBM heterogeneous storage platforms, notably on-premises support for Dell-EMC Isilon and NetApp filers, as well as public clouds that support the S3 protocol, including Amazon, the originator of the protocol. The latter supports object storage for those AI and big data workloads that can store data as objects. One of the three COS deployment models is pre-installed on a storage array. The announcement focused on the upgrade to Gen2 for IBM-provided storage arrays. (See IBM Continues to Focus on Storage for AI and Big Data for more information,)

The IBM Elastic Storage System 3000 Makes Its Debut

IBM Elastic Storage System 3000 is the newest member of IBM’s Elastic Storage Servers which support its SDS Spectrum Scale for files. IBM Spectrum Scale’s global single namespace eliminates data silos and provides scalability that can move into the exabyte range if necessary. Being able to manage data as a single virtual pool and at any scale necessary is a good thing for many organizations.

However, an IT organization can start very much smaller than an exabyte (or even a petabyte) when using the IBM Elastic Storage System 3000! The new all flash system starts at under 25 TB and includes the introduction of end-to-end NVMe (non-volatile memory express) technology and 2U (3.5”) rackable building blocks. IT can start small with an experimental version and then scale as necessary to production-level enterprise requirements. Each 2U building block delivers a powerful 40 GB/sec (thanks to the turbocharging of the flash with NVMe) and performance scales linearly.

IBM stresses customer-friendly capabilities, such as the ability to containerize software for ease of installation and updating.

IBM Spectrum Discover Extends Its Reach into Backup Environments

IBM’s Spectrum Discover provides metadata management that along with other capabilities delivers the curation (selection, organization, and presentation) of information content that the intelligent data analytics tools require for AI and big data workloads.

At first, Spectrum Discover worked only with file data managed by IBM Spectrum Scale, including ESS storage arrays, as well as object data managed by IBM Cloud Object Storage. Then support was extended to some key heterogeneous storage systems – EMC Isilon and NetApp. With the latest release, IBM now works with IBM Spectrum Protect-and the copies of the data it manages. This means that the live active production copy of data does not need to be touched which is almost always desirable from both performance and protection perspectives. Instead, a backup or an archive copy of the data can be used. This means that the potential treasure trove of an organization’s own data can now participate as an AI or big data workload.

Spectrum Discover easily connects to Spectrum Protect metadata to discover, index, and label files of interest as well as being able to rapidly find and activate cold (unlikely to change) data in a backup/archive copy for use by analytics and AI tools.

Spectrum Discover users can also take advantage of the IBM Spectrum Discover Application Catalog. This community-supported catalog of open source action agents enhances the capability of IBM Spectrum Discover with third-party extensions that can be found and installed via CLI (with Docker Hub).

Mesabi musings

A new frontier for enterprises is the rapidly evolving need to derive value from the suddenly humongous quantity of data that they have available. AI and big data workflows use that data to craft actionable insights.

IBM Elastic Storage System 3000 now offers a SDS-managed physical instantiation that should meet the performance and scalability requirements of most AI and big data workloads. With IBM Spectrum Discover, Spectrum Protect users can seek to derive value from enterprise-owned unique data that is housed as backup or archive data. Thus, IBM is positively reinforcing its theme of providing software and hardware requirements for AI and big data workloads.