IBM Continues Its Focus on Innovative Storage Research

For over 100 years IBM has been a vibrant, profitable company; it intends to remain that way for many years into the future. One of the ways that IBM has used to remain relevant is through strong, ongoing research and development (R&D) programs that ensures that the IBM solutions pipeline is kept filled with customer-pleasing offerings. Please note in order to keep that pipeline filled, R&D has to look far beyond the current business quarter and fiscal year.

A recent IBM storage research presentation focused on three areas: 1) tape, flash, and persistent memory (i.e., physical media on which bits are stored); 2) integrating container environments into the hybrid multi-cloud (i.e., move bits everywhere as needed to enable enterprise-grade storage usage with containers); 3) using AI to improve data protection and the use of cache in tiering, (let bits “speak” so that AI can listen, interpret, and act upon what is said). Let’s examine each of the three areas a little more detail.

Tape, flash, and persistent memory

  • Tape. IBM has stated publicly that tape will maintain its existing 5X to 10X advantage over alternative storage technologies for at least another 10 years. The company cited work exploiting new advanced media technologies as a major research focus.
  • As with tape, flash faces density challenges, along with additional issues, such as write endurance (where flash cells tend to become unreliable with increasing writes and have to be retired) and ever deteriorating performance, namely access latency. IBM storage researchers have developed a” secret sauce” approach that combines different innovative techniques, including adaptively adjusting read voltages, taking advantage of workload skew to place data in faster pages, and dynamically switching blocks to a faster SLC mode. The net result is a collection of algorithms that enable consumer flash chips to be used as an enterprise-class storage mainstay.
  • Persistent memory. Unlike a human brain that contains our lifetime of “data” in our memories close to the computational power in our brain, most data that a server accesses is stored on media that is relatively slow as compared to main memory. IBM storage researchers are working on a persistent memory layer of memory-centric active storage to move computations as close to the storage as possible, in order to speed up computations, such as metadata updates for big data applications.

Integrating container environments into the hybrid multi-cloud

More and more enterprises are embracing the hybrid cloud where they can move applications and data as needed from on-premises data centers to one or more public or private clouds, as well as out to the edge for edge computing. How does an enterprise ensure it captures the necessary quality of services, such as for data management, data protection, data processing and workflow guarantees, which are required when moving data from one cloud to another?

An approach that IBM uses to seamlessly combine everything together is called cloud native storage. This uses common data services, namely 1) connect and serve for any type of server environment, including virtual machines, containers, and even bare metal servers; 2) protect data for durability and recovery purposes, 3) accelerate, such as the use of computational memory. 4) secure from leaking out in the public, 5) manage; and 6) move. The goal is to able to use any application and any data that is associated with it as intended regardless of location.

How does IBM use Kubernetes-based containers to enhance these processes? The answer is that it starts with enabling an existing storage pool to become container-ready storage.  This is done through the use of a container-storage interface (CSI) that delivers provisioning automation and snapshots enablement from the original storage pool to the container-native storage arena. One of the challenges faced by researchers is how to create an architecture that achieves both the high performance that some applications require with the elasticity offered by software-defined storage.

AI for modern data protection and for predictive caching/tiering

  • Modern data protection. While data contained on any storage system should theoretically always be available for its intended use uncorrupted and whole, this is not always the case. Data corruption and loss can occur for a number of reasons ranging from uncontrollable and unexpected events, such as a natural disaster that creates a need for disaster recovery or a hardware failure, to human-initiated events, including not only inadvertent human error, but more importantly the seemingly ubiquitous cyber-attack. So, what can artificial software (AI) tools, in conjunction with its companion analytics software, do to mitigate potential data protection problems? AI monitors system events and looks for any storage content anomalies. AI and analytics provide options of recovery that greatly reduce the end user’s complexity in making the right choices to minimize data loss and speed up data recovery
  • Predictive caching/tiering. Ensuring that frequently accessed data is put on a faster tier of storage can greatly increase the performance of a storage system. Note that the faster tier of storage is more expensive and that not all of the data on the slower tier of storage can be economically housed on the faster tier of storage. The effectiveness of the prediction model is determined by the cache hit ratio, which measures how effective a cache is at in fulfilling requests for data. The effectiveness of a particular predictive algorithm also depends upon the size of high-performance cache as a percentage of the lower cost tier of data, which contains most of the data. IBM trains a predictive model using metadata similarity that provides the context from recently accessed data as well as examining the past access patterns that are applicable for caching and tiering new data. AI-driven predictive prefetching models always do much better than the traditional least recently used (LRU) algorithm no matter the cache size and significantly better at even an 8% cache size than even the best non-AI algorithms


Mesabi musings

IBM researchers work on numerous different projects in order to keep the company on top of its storage game. IBM continues to realize that data storage must reside on physical media, including (tape, flash, and persistent memory). But the company also focuses on providing the necessary software solutions that deliver both high performance and scaling of applications and their data. This is done through the use of a container-native storage environment across multiple locations that are, by definition, always present in the hybrid multi-cloud. During the briefing, IBM revealed a few examples of its continuing thrust in AI (always a source of pride for the company), notably in data protection and predictive tiering. All in all, IBM appears to be keeping its storage innovation pipeline full to the brim.