Author Archives: mesabigroup

IBM Continues to Focus on Storage for AI and Big Data

Any prediction worth its salt forecasts a nearly unbelievable increase in the creation of unstructured data. Mining what may be a treasure trove of potential insights is the role of intelligent data analytics, notably through such tools as those provided by AI (artificial intelligence) or big data software. Many consider intelligent data analytics to be a key driver of business value for many enterprises such as uncovering new revenue-producing opportunities. IBM Spectrum Discover and IBM Cloud Object Storage are two key IBM software products that support those intelligent analytics efforts.

Data-driven software intelligence takes center stage

Traditionally, software intelligence has been mainly application-driven. Data is created and managed to fit the needs of the application; typically, the creation of structured data is part of the application process, as in online transaction processing (OLTP) systems.

In contrast, unstructured data typically requires software intelligence that is created and managed to fit the needs of the data, which may be (and likely is) created independent of the application. Examples include AI, big data and analytical software designed to discover hidden value.

IBM’s Spectrum Discover provides metadata management that among other capabilities delivers the curation (selection, organization, and presentation) of information content that the intelligent data analytics tools require. Another key IBM software-defined-storage (SDS) software product, IBM Cloud Object Storage, stores and manages the data that the analytical tools work on as object storage. IBM’s latest storage announcement discusses updates to both of these data-driven software intelligence products.

IBM Spectrum Discover —More open, stronger data classification, and easier compliance

IBM Spectrum Discover is metadata management software (see more in our report at IBM Driving Storage Revolutions) that can be used on files or objects in conjunction with big data, AI and analytics software. Good metadata management is essential to enable those software tools to properly classify data and process voluminous quantities of data in a timely manner,

First announced in October 2018, Spectrum Discover worked originally only with IBM products — namely with file data managed by IBM Spectrum Scale or object data managed by IBM Cloud Object Storage. This new announcement includes support for key heterogeneous storage platforms —Dell-EMC Isilon, NetApp filers, Amazon S3 (and by definition other public cloud providers that support the S3 protocol), and Ceph, which is popular for its access to object storage, but also supports block and file storage. Supporting heterogeneous storage platforms has long been a key strategy for IBM’s software-defined storage (SDS) products so it should come as no surprise that Spectrum Discover should follow in their footsteps. Yes, IBM would love to sell storage hardware systems in addition to software, but selling software is profitable in and of itself. Not only that, expanding its software footprint may also give IBM opportunities to build storage hardware sales.

In addition to automatically capturing and indexing system metadata as data is created, Spectrum Discover provides for custom metadata tagging. That adds extra intelligence that can build additional value through better insights at analysis time. The new version of Spectrum Discover provides content-based data classification that applies custom metadata tags based on content. All of this would be for nothing without high speed search, but IBM states that Spectrum Discover delivers consistent low-latency searches, even on billions of files.

A key additional benefit is to make it easier for companies to follow legal or regulatory compliance rules for sensitive data, such as PII (personally identifiable information) including social security and credit card numbers. Given the increased emphasis that enterprises should be placing on sensitive data, this is a primary benefit that Spectrum Discover provides — a nice thing to have on top of its metadata support for analysis efforts.

IBM Cloud Object Storage upgrades its own IBM storage array capabilities

Although it is simplistic and understates the power and rich functionality of the product, you could think of IBM Cloud Object Storage as a “file server” for objects instead of files. IBM Cloud Object Storage has three deployment models — in the cloud, on-premises as software-only, or embedded and pre-installed in a storage array. The new announcement focuses on IBM-provided storage arrays which should expand the company’s presence in object storage sales. The new Gen2 arrays are compatible with Gen1, which provides investment protection for existing customers, thus eliminating painful and lengthy data migration processes, a critical point given the enormous size of many object storage environments. Yet these customers can also use Gen2 to accommodate growth requirements.

IBM’s Cloud Object Storage Gen2 is all about cost efficiency and what is loosely called “speeds and feeds.” Now, that may not sound very exciting, but when you are an exabyte-class storage customer (and IBM stated that it has ten such clients) or a petabyte-class customer or even a lowly hundred-plus terabyte-class customer (and I am being facetious here as this is still pretty large in my book), all those improvements are extremely relevant.  Compared to Gen1, IBM Cloud Object Storage Gen2 offers cost savings of 37% per TB compared to last year and 1.6X more write operations per second in terms of performance. In addition, Gen2 offers 26% more capacity for the largest single node (1.3PB) as well as the same percentage increase in density of a single rack (10.2 PB).

Mesabi musings

With all the talk about the importance of AI and big data analytics tools, they cannot operate in a vacuum. They require metadata management software to curate and prepare the data, as well as software that manages the efficient placement and access of stored data. IBM Spectrum Discover meets the first objective and supports the second by informing better data placement while IBM Cloud Object Storage aims at customers for whom object storage is the choice over file-based storage.

IBM Spectrum Discover, among other things, now openly supports key storage vendors Dell/EMC and NetApp as well as S3-compliant cloud providers, notably Amazon, the father of the S3 protocol. IBM’s own storage arrays upon which IBM Cloud Object Storage is pre-installed and embedded have been upgraded to Gen2, which is more cost efficient and powerful than Gen1. It all adds up to better, more seamless support for AI and big data projects. IBM customers will consider that to be a good thing while potential clients should find these features compelling and attractive.

 

IBM Strengthens Its Storwize Midrange Storage Portfolio

 

This week IBM shone a spotlight on a refresh of its Storwize midrange storage family. In addition, it emphasized the value of its Spectrum Virtualize software, upon which the Storwize systems are built, but can also be used for many other purposes, including a new capability for integrating Amazon Web Services (AWS) workloads. This illustrates the continuing innovation that IBM and others are bringing to the information storage table, and should be most pertinent and pleasing to IBM customers and channel partners, who can use Storwize and Spectrum Virtualize to build a solution that extends into the public cloud.

The term “midrange” has long been used for block-based storage systems that are not in the top “enterprise-class” echelon in terms of performance, and, of course, price. However, that term is also a little misleading and a misnomer as many large “enterprises” (both private and public) use midrange storage because of the technology’s great scalability, strong performance, and ability to support software-delivered data services and other functionalities for a wide range of use cases. Not only that but Storwize products deliver enterprise-class functionality as well as the six 9s availability (which from a business perspective in a world where every second and minute counts is really a great improvement over the standard bearer five 9s availability) as their larger brethren.

New members of the Storwize family

Moreover, IBM Storwize offers entry level, middle tier, and upper end options. In October 2018 IBM launched the StorwizeV7000 Gen 3 product, the top of the Storwize range, which introduced NVMe at the storage device level for the first time in one of its midrange products. With this new announcement, IBM has introduced a whole new lineup of products into its Storwize V5000 storage system family, including two new entry level products — the V5010E and the V5030E — which do not use NVMe, as well as midrange level products, the V5100F and the V5100, which offer NVMe end-to-end (which means both the device level and the network level).

The Storwize V5010E, as the smallest member of the family, targets edge and containerized environments. Even though IBM expects a normal system to use about 9 TB, the V5010E can scale to a whopping 12 PB. It can provide up to 2x maximum IOPS compared to its predecessor, the Storwize V5010, but at an expected 30% less price.

The Storwize V5030E targets the same use cases as its smaller brother. It has a typical expected use of about 24 TB, but can scale to an unbelievable 32 PB (23 PB in a single system). Compared to its predecessor V5030, the new offering can deliver 20% better maximum IOPS at an expected 70% of the cost. Both entry level offerings are hybrid systems that can support combinations of SAS SSDs and SAS disks according to workload and customer requirements.

The last two systems, the Storwize V5100F and V5100 are variations on a common platform; the former is an all flash system while the latter supports hybrid combinations of flash and disk. Only specially-architected flash storage can have performance turbocharging NVMe built in but that capability here is the latest example of advanced functionality first being made available on a higher end product, then migrating to a less expensive product. IBM feels that a typical use case for the V5100F/V5100 will be about 70 TB with scaling to 32 PB. Depending on configuration, the new solutions can offer 2.4x maximum IOPS of the previous generation Storwize 5030F with data reduction turned on, but at only a 10% greater price. The IBM unique FlashCore Modules have hardware enhancements to deliver both data reduction and encryption without impacting performance.

Spectrum Virtualize Serves Both the Storwize Family and the Multicloud

Recall that IBM has a broad and extensive set of software-defined-storage (SDS) products under the rubric of the IBM Spectrum Storage family. A key member of this family, Spectrum Virtualize, is IBM’s block-based storage virtualization offering. Storage virtualization is a logical representation of storage resources that creates virtualized volumes independent of the physical limitations of storage media. Spectrum Virtualize can virtualize block storage arrays, enabling all of the virtualized storage volumes to be managed as a single pool of storage with a centralized point of control.

However, IT organizations have great flexibility in how Spectrum Virtualize is deployed (i.e., storage consumption models). One model is the IBM SAN Volume Controller (SVC) appliance. A second is as a traditional storage array system — for example, the Storwize family. A Cisco and IBM converged infrastructure VersaStack deployment also includes one or more of those storage systems. Finally, another consumption model is a software-only solution that can be used, say, for supporting cloud services.

The Storwize family has a solid software foundation in Spectrum Virtualize. All Storwize products offer transparent data migration, local and remote data replication (snapshots, disaster recovery [DR], and copy/migrate to the cloud). In conjunction with IBM Spectrum Copy Data Management, data can be made available at three sites. Plus, except for the low end V5010E, all the other new Storwize family members support data reduction pools, scale-out clustering, and encryption.

Spectrum Virtualize operating on-premises with its standard list of clients — including Storwize solutions and over 450 heterogeneous storage arrays — can now run in the public cloud, initially the IBM Cloud (formerly IBM Bluemix and IBM SoftLayer). The big news is that it is now available with AWS, as well.

Spectrum Virtualize in a public cloud provides real-time DR (disaster recovery) and data migration between an on-premises data center and a public cloud. Using public cloud for DR means that if an on-premises data center becomes unavailable due to a declared disaster, IT can failover to the remote public cloud. Spectrum Virtualize runs in conjunction with the computing, storage, and networking resources at both locations, delivering a single management layer for fully-functional storage between locations.

What AWS brings to the table, in addition to its immense popularity, is its optional usage of object storage. Now why would a block-based system want to create an object-based copy? The reason is that ransomware and malware (so far, at least) have only worked with block-based data. As a result, object data acts as if it were an “air-gapped” (physically isolated from a network) copy, which means that the copy is not accessible to hacking attempts. Now, while this is not truly an air-gap (as a network is still involved) for practical purposes it may be sufficient, at least for now.

Mesabi musings

The fact that each year storage innovation and progress seem to deliver more for mostly less never grows old. IBM’s new Storwize family members serve as affirmation of this fact, such as the migration of NVMe to the V5100 products. In addition, IBM customers whose Storwize arrays use Spectrum Virtualize can now avail themselves of both the IBM Cloud and AWS public clouds to create multi-cloud environments that makes it easier to do DR. All in all, this announcement qualifies as a good day at the office for IBM, its customers and channel partners.

 

 

 

IBM Continues to Advance Storage Along Key Drivers

Every quarter IBM seems to advance the cause of storage along multiple fronts, and this is no exception with enhancements along four key drivers. The first is IBM storage for containers and the cloud. This includes reference architecture “blueprints”: IBM Storage Solutions for blockchain, IBM Cloud Private, and IBM Cloud Private for analytics. The second continues to emphasize the cause of storage in conjunction with artificial intelligence (AI). In this case AI is used to address how to improve capacity planning. The third is “modern” data management which emphasizes how data protection is needed for data offload for hybrid multicloud environments. The fourth is cyber resiliency, enabling enterprises to use their storage effectively to plan, detect and recover in the world of cyber security threats.

All four are based on the way IT organizations are rapidly moving to a more complex, but desirably more cost efficient, as well as more productive world, supporting the business objectives of increasing revenues and profits. This is accomplished by rapidly changing IT infrastructures to adopt to a hybrid multicloud world as well as by introducing new technologies, such as blockchain and containerization, that help transform the way that they do business.

Since I recently covered the use of reference architecture and AI (see https://mesabigroup.com/ibm-spectrumai-with-nvidia-dgx-reference-architecture-a-solid-foundation-for-ai-data-infrastructures/ , I will focus this piece on modern data protection and cyber resiliency.

Multicloud data protection requires modern data protection

IBM emphasizes the need for modern data protection to play in the multicloud (see https://mesabigroup.com/ibm-continues-to-deliver-new-multicloud-storage-solutions/). By modern data protection IBM means that data protection has to encompass traditional IT infrastructures (such as a local data center that also uses a remote data center for disaster recovery purposes both of which are on-premises at company facilities) with multiple public cloud instances that are off-premises, as well as the ability to reuse secondary datasets (e.g. Backups, snapshots, and replicas). This ups the ante in managing data protection for data offload in such hybrid, multicloud environments.

Using multiple public clouds in conjunction with private clouds means managing ever changing cost structures in order to determine when it is appropriate to move a data protection workload from one cloud to another. This has to be done while ensuring the necessary cybersecurity levels are met (as will be discussed under cyber resiliency for software or hardware IBM-managed-storage) as well as ensuring that the necessary service levels — such as RTO (recovery time objective) or RPO (recovery point objective) — are still met.

IBM provides a blend of Spectrum Protect (for traditional IT infrastructures) in conjunction with Spectrum Protect Plus (for virtual infrastructures) to enable those responsible for enterprise data protection to successfully raise the management ante.

The most recent IBM storage announcement enhances Spectrum Protect Plus capabilities with a focus on delivering cost-effective, secure, long-term data retention. Spectrum Protect Plus can now support key cloud providers, namely IBM Cloud Object Storage, heavy hitters Amazon Web Services (AWS) and Microsoft Azure, and on-premises object storage with IBM Cloud Object Storage. It does so through the efficient use of incremental forever offloads of only changed data. It also offers critical application/database support by adding Microsoft Exchange and MongoDB database support that complements support for existing products, such as IBM DB2, Oracle Database, and VMware ESXi.

In addition, Spectrum Protect Plus offers enhanced data offloads to Spectrum Protect to further improve the partnership blend between the two. Meanwhile, Spectrum Protect simplifies management by enabling the use of retention sets that govern both backups that are used for recovery of production data as well as longer-term retention, such as for archiving. It also offers support now for Exchange 2019.

IBM’s storage portfolio supports IBM’s cyber resiliency initiatives

The need for cybersecurity does not require a lengthy discussion as even the general public is aware of such issues as illustrated by the numerous, continuing tips-of-the-iceberg data breaches that have permeated through the media. A tremendous amount of work is being performed to deal with these issues though much more needs to be done in what appears to be a never-ending battle. IBM has long been a white-hat vendor combatting the black-hat bad guys. The latest of its efforts goes under the label of cyber resiliency that it applies to its entire storage portfolio to combat potential negative cybersecurity events.

In discussing its cyber resiliency storage portfolio, IBM shows how its work follows the NIST (National Institute of Standards and Technology, a part of the U.S. Department of Commerce) Cybersecurity Framework Version 1.1 (April 16, 2018). This standard framework aids enterprises in how to plan for and recover from a compromising cyber event, such as an identity-stealing data breach. IBM has long espoused openness (such as promoting open source and open systems), support for reference architectures, and adherence to common standards. Even though IBM naturally wants to encourage organizations to acquire its own software and hardware, it does so (and has prospered by so doing) in that openness context. Showing how it provides cyber resiliency for its storage portfolio as it fits within the open NIST Cybersecurity Framework enables organizations to clearly understand and assess what IBM brings to the table.

That is not to say that IBM meets all the framework requirements (as no one can), but organizations can carefully examine the major contributions that IBM delivers.  The NIST framework discusses five phases — identify, protect, detect, respond and recovery. IBM addresses these as plan (identify and protect), detect and recover (respond and recovery). Planning relates to what an organization should do to get ready for the inevitable compromising event. Detect is about monitoring for and alerting abnormal behavior that signals that a negative cyber event is occurring or has already taken place. Recovery is about what actions need to take place to mitigate any negative effects following the event.

Touching lightly on what IBM delivers, in the identity phase, IBM Spectrum Control and IBM Storage Insights — two of its storage infrastructure management tools — enables organizations to understand their infrastructure deployment as well as its day-to-day usage. Deployment facilitates understanding of which systems are critical to the business operation as well as where they are located. Day-to-day usage by the baseline for how those systems are “normally” used. In the detect phase, abnormal usage of storage may show that a compromising event is happening as well as isolating the currently impacted systems. IBM Spectrum Protect shows what is normally protected every day plus the attributes of that normal usage, such as number of changes and volume usage. Spectrum Protect and Spectrum Protect Plus provide key support to the protect and recover phases.

IBM emphasizes the use of “air gap” data protection, which orchestrates the ingestion and automatic creation of copies of critical data onto a secure infrastructure that is isolated from a network-based attack. That could be tape copies removed from a tape library (which is a traditional strength of IBM) or a cloud-based air gap scenario, where the data sent to the cloud is physically isolated from a network. This reduces the risk of corruption, such as due to ransomware or malware attacks.  IBM also emphasizes the use of universal data encryption – including data-at-rest encryption, encryption of tape, backup data set encryption, and encryption of primary or backup data sets when sent to cloud repositories. These, and other capabilities that IBM provides, help mitigate the risk of cyber destruction, unlawful encryption, or modification, as well as unlawful copying of sensitive data. In combination with the appropriate architecture, infrastructure, and processes, these are just some of the ways in which IBM’s storage portfolio offers cyber resiliency to deal with the inevitable attempts to compromise one’s cybersecurity efforts.

Mesabi musings

The business storage arena is in constant flux. IT infrastructures are being transformed from on-premises infrastructures to a hybrid environment that combine on premises infrastructures with cloud. Consider this along with the fact that the bad guys are always trying to compromise organizations’ cybersecurity. This increases the need for modern data protection that IBM delivers with Spectrum Protect and Spectrum Protect Plus. It also expands the need for strong cyber resiliency efforts to prevent the negative impacts of cybersecurity events. With these latest additions, IBM is focused on providing cyber resiliency across its entire storage portfolio and emphasizes the use of strategies, such as air gapping and universal encryption, to enhance cyber resiliency. There is never a dull moment as to what IBM is doing to strengthen its storage portfolio.

IBM SpectrumAI with NVIDIA DGX Reference Architecture: A Solid Foundation for AI Data Infrastructures

IBM Storage and NVIDIA have teamed up to enhance artificial intelligence (AI) project development and streamline the AI data pipeline. This approach — IBM SpectrumAI with NVIDIA DGX Reference Architecture — provides data scientists and other AI project team members with a solid framework that can help in AI deployments and ends with design based on IBM and NVIDIA system and software products.

The companies’ partnership is important not only because the field of AI is growing very rapidly, but because major AI projects can be a real challenge to any organization. IBM Storage in combination with NVIDIA and its joint channel partners offers skills, resources and products to enable organizations to overcome whatever challenges they might face for their AI workloads.

The AI Revolution

Information technology (IT) always seems to be in the throes of a major revolution. AI is one such revolution and despite all that is going on, AI is still in its infancy. Many years hence, AI may still not even be at the knee of the curve of a decades-long exponential growth. Every day it seems that there is a new or expanded practical use of AI technology — such as self-driving cars, a huge number of customer sentiment and sensor-based analysis examples, threat analysis, and image interpretation. Almost all organizations should be able to benefit from AI technology, now or in the future. And infrastructure vendors are thrilled by the prospect since AI projects often demand seemingly inexhaustible compute and storage resources.

From reference architecture to a converged infrastructure solution

AI projects are data-driven in contrast to the process orientation of online transaction processing systems (OLTP). An AI data pipeline consists of ingest, classification and analyzing/training phases that require considerable development time and thought, so an AI reference architecture can substantially aid the efforts of project teams. In general, reference architectures are increasing in popularity as they provide a frame of reference for a particular domain. Reference architectures are available for specific industries and processes, such as banking, telecommunications, and manufacturing and supply chains.

These play an important role, but so can vendor-supplied reference architectures, such as the IBM SpectrumAI with NVIDIA DGX reference architecture. Vendor-specific reference architectures lead AI project teams down a path to purchasing products that implement an AI infrastructure solution. This is not a problem if AI project teams understand up front what they are getting into and are comfortable with the vendors.

The roles of IBM and NVIDIA in the IBM SpectrumAI with NVIDIA DGX Reference Architecture

Most, if not all, organizations should be comfortable with IBM and NVIDIA, two of the giants in the AI industry. Of course, IBM Watson is familiar to many, but the company has strengths and expertise in non-Watson-related AI activities. NVIDIA notably invented the GPU (Graphical Processing Unit), which has become a chief computing element in AI (such as in NVIDIA DGX servers), where it serves as an accelerator for the highly dense parallel processing engine AI projects typically demand. This is now complemented on the storage side by IBM Spectrum Scale, which, at its software-storage-sysem-based heart, has a long-proven and well-accepted parallel file system that enables close integration with DGX servers.

The net result — a powerful combination of IBM and NVIDIA for AI workloads — which encompasses all the necessary computing, storage and networking hardware that is accompanied by all the required supporting software in a single physical rack put together by IBM’s and NVIDIA’s channel partners.

The system consists of NVIDIA DGX-1 servers with Tesla V100 Tensor Core GPUs for computing. IBM supplies the storage solution with ESS (Enterprise Storage Servers) GS4S (All-Flash, non-NVMe) storage systems for immediate use, but moving to NVMe flash arrays in mid-2019 according to IBM (and that should be sufficient time as typical large AI projects have a significant gestation period). Mellanox IB (InfiniBand) Networking provides the necessary connectivity between the servers and storage elements.

But don’t forget the software.  The NVIDIA DGX software stack is specifically designed to deliver GPU-accelerated training performance, and that includes the new RAPIDS framework whose purpose is to accelerate data science workflow. At the heart of the IBM software-defined-storage (SDS) for files is IBM Spectrum Scale v5, which was specifically architected for the high-performance demand of modern AI workloads.

Now, NVIDIA’s arrangement with IBM is not an exclusive one — DDN Storage, NetApp and Pure Storage also work with the company on AI-related solutions — so how does IBM differentiate itself from these strong competitors? IBM claims that it has a performance advantage, stating that it will have a 1.5x NVMe advantage against competitors.  Additionally, IBM Spectrum Scale has extensive use in AI workloads already, including two AI reference architectures with IBM Power servers, and vast experience in the HPC-like needs of AI use cases.

IBM SpectrumAI with NVIDIA DGX will be sold only through selected channel partners supported by both companies. This makes a great deal of sense as major AI projects require a level of planning and design knowledge, along with collaboration and coordination skills, that only selected channel partners can bring to the table.

Mesabi musings

If you have not already done so, the time may be right to hop on the AI bandwagon. If you agree, looking into vendor-sponsored reference architectures, such as the one featured with IBM SpectrumAI with NVIDIA DGX, might be a good starting point. Just be sure that you realize that these vendors will eventually propose an AI deployment involving their products.

Still, you are not planning such efforts just for the fun of it, so eventually a converged infrastructure solution could provide an ideal way forward. IBM and NVIDIA are both leaders in their respective parts of the AI domain and their new IBM SpectrumAI with NVIDIA DGX offering makes a strong case for the companies.

IBM Driving Storage Revolutions

Business storage continues to be driven by two revolutions: one is storage systems–based and the other software-based. The former is focused on NVMe (nonvolatile memory express) networking technology that is accelerating the adoption of all-flash storage systems. In the latter case, software-driven innovation has become a driving force among virtually all major storage vendors.

 

One vendor that is making notable progress in both areas is IBM. On the systems/network side, i.e. NVMe-oF (NVM over Fabrics), IBM now supports Fibre Channel in addition to InfiniBand. Additionally, the company’s new Storwize V7000 Gen 3 has been architected for NVMe at the storage array level as well, joining the FlashSystem 9100 family (announced in July) with NVMe inside of the storage array. On the storage software side, IBM has just introduced Spectrum Discover as a new product in its IBM Spectrum Storage portfolio. Let’s examine these additions in a little more detail.

 

IBM continues to push the NVMe revolution

 

NVMe has two basic functions. NVMe-oF is the network side of the house and improves the performance of moving data between a host and a storage array. IBM initially enabled NVMe-oF for storage networks that use InfiniBand interconnects but now supports NVMe-oF with storage networks that use Fibre Channel (FC) to improve application performance and data access. This functionality runs in conjunction with the company’s Spectrum Virtualize through a straightforward, non-disruptive software upgrade. FC NVMe uses existing 16 Gb FC adapters and supports SVC (Model SV1), FlashSystem 9100, FlashSystem V9000 (Model AC3), and Storwize V7000F/V7000 Gen 2+ and Gen 3, and VersaStack that uses those storage arrays. This is likely to be important for users of those systems, as many of them likely have a FC SAN (storage area network).

 

IBM also continues to push NVMe at the storage device-level. Recall that the FlashSystem 9100, IBM’s enterprise-class entree in the virtual storage infrastructure space managed by Spectrum Virtualize was the first IBM storage system to offer NVMe at the device level. (See https://mesabigroup.com/ibm-flashsystem-9100-the-importance-of-nvme-based-storage-in-a-data-driven-multi-cloud-world/ for more detail.) Now, the new Storwize V7000 Gen 3–also managed by Spectrum Virtualize–offers the same NVMe end-to-end capability. That includes the use of the same version of IBM’s well-accepted FlashCore Modules that the FlashSystem 9100 pioneered.

 

Although the Storwize V7000 Gen 3 is technically not an all-flash solution (as users have the option to have some HDDs, such as for supporting non-performance-sensitive data), it can be configured as an all-flash system, and with the notable growth of all-flash arrays over the past few years, Mesabi Group expects a high percentage of them to be all-flash configurations. Since only flash (not hard disks) can benefit from NVMe technology at the device level, IT can maximize its use of a Storwize V7000 Gen 3 by having as much of its storage as feasible reside on flash storage modules (the new Storwize V7000 supports both IBM’s FlashCore technology as well as industry standard NVMe SSDs) instead of HDDs. If they do, Gen 3 offers up to a 2.7x throughput performance improvement over Gen 2+ as a key benefit.

 

IBM Spectrum Discover drives additional value from oceans of unstructured data

 

IT must get the most out of its investment in its physical architecture. For storage management purposes, that includes how storage arrays work in conjunction with the servers that demand services through a storage network. IBM’s storage management software, Storage Insights, is an AI-based tool that is offered through IBM Cloud to help users better manage their storage environments. For example, the latest version diagnoses storage network “gridlock” issues often referred to as “slow drain”. That gridlock occurs when a storage system attempts to send data to a server faster than the server can accept it; this is not a good thing! IBM storage technicians (who can monitor systems on behalf of clients who authorize it) are notified by Storage Insights of the problem as it is identified by AI technology. The technicians then review the situation and work with the client to resolve it.

 

Now, while Storage Insights deals with the physical side of storage as a storage management tool, recently announced IBM Spectrum Discover is an in-house data management software tool that targets the voluminous, and ever-rapidly growing amount of data, such as that created for Internet of Things (IoT), AI, and Big data analytic applications. Spectrum Discover works with file data managed by IBM Spectrum Scale or object data managed by IBM Cloud Object Storage, and enables users to get more out of their data for analytical, governance and storage investment purposes (IBM will also support Dell/EMC’s Isilon offerings in 2019).

 

How does it accomplish this? On the analytical side, getting to useful and actionable insights that would not be discovered otherwise within a data ocean of unstructured data rapidly is facilitated by such things as its ability to orchestrate machine learning and MapReduce processes. On the governance side, mitigating business risks by ensuring that data is compliant with governance policies and speeding up investigation into potentially fraudulent activities obviously may be of great value. On the investment side, the ability to facilitate the movement of “colder” (i.e., less frequently-accessed data suitable, say, for archiving) data to cheaper storage and to weed out and destroy unnecessary redundant data is financially advantageous.

 

The heart of Spectrum Discover’s power revolves around its metadata management and related processes. Any search and discover tool needs good data about data (i.e. metadata) to succeed. Spectrum Discover uses both automatically-generated system metadata at the time of data creation and custom metadata tagging that adds extra intelligence that is needed at analysis time. All that leads to automatic cataloging with the creation of an index where large quantities of data can be searched extremely rapidly for discovery purposes, thus reducing data scientist and storage admin preparation time and costs associated with that.

 

Although for different purposes (and not totally similar technologies) as an analogy, think of the search and discover capabilities of a public Internet browser for speed and flexibility for publicly-available data in contrast to the private data that Spectrum Discover deals with. Accompanying search and discover functions are a number of features and capabilities that greatly facilitate the use of the tool, including policy-driven workflows, a drill down dashboard, and an Action Agent that manages data movement and facilitates content inspection.

 

In essence, IBM Spectrum Discover is designed to significantly simplify and speed the data and storage processes required for analytics and AI processes. That should provide notable benefits for enterprises that aim to maximize the effectiveness and value of their advanced analytics investments.

 

Mesabi musings

You would think that storage innovations would show signs of slowing down after all these years, but the opposite seems to be true. In fact, IBM continues to be at the forefront of storage progress.

 

As illustrations of its continuing leadership, IBM has introduced the new NVMe-enabled Storwize V7000 Gen 3 on the systems side of storage, Spectrum Discover on the software side as a data management tool, and enhanced Spectrum Insights as a storage management tool.

 

Overall, IBM customers should be pleased with the progress IBM is making with NVMe technology, a fundamental storage systems underpinning technology on the hardware side, while Storage Discover, on the software side, continues the push toward extracting additional value from up to oceans of unstructured data.