Datera Democratizes Elastic Block Storage

Major public cloud providers, notably Amazon, have led a transformation in information technology architectures in recent years. That transformation can be called many things, including hyperscale computing, a scale-out distributed computing model for clouds where the orchestration, operation, and economics of the resulting infrastructure is fundamentally different (and generally considered to be substantially improved) from that found in traditional large data centers.

One way that Amazon manifests this new architecture is through its Amazon Elastic Block Store (Amazon EBS) service, which works in conjunction with its Amazon Elastic Compute Cloud (more popularly known as Amazon EC2). Amazon EBS instantiates elastic block storage, which sounds simple, but really isn’t as we will see later. But Amazon’s elastic block storage naturally means that you have to use their cloud. In contrast, Datera’s Elastic Data Fabric makes Web-scale operations and economics of elastic block storage available to both private (large enterprise) and public (service providers) clouds.

Getting Back to Basics

But before we examine what Datera does for elastic block storage, we need a little grounding for the subject. To begin with, we need to examine the terms “cloud,” “hyperscale,” and “elastic data fabric” before we move on to see how they fit into the elastic data fabric which implements elastic block storage.

NIST Describes the Essential Characteristics of the Cloud

“Cloud” is used so frequently that we may have lost sight of what distinguishes it from traditional IT infrastructures. Whenever that happens, refer back to “The NIST Definition of Cloud Computing” (Special Publication 800-145 of the National Institute of Standards and Technology). Three of the five essential cloud characteristics that they define are especially relevant for this discussion.

  • “On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.” — please note that self-service is not a capability of traditional data centers and delivering that capability is not easy.
  • “Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. …. Examples of resources include storage, processing, memory, and network bandwidth.” — once again this capability does not exist in traditional IT infrastructures.
  • “Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.” — this is in contrast to the manual, time-consuming, and rigid provisioning found in a traditional IT infrastructure.

Up, Up, and Away to Hyperscale

Hyperscale describes a distributed computing environment that instantiates the three key characteristics just described for a cloud (plus much more!). Although hyperscale computing can scale to very large data centers, as illustrated by Amazon, Facebook, and Google, it is more an architectural design. Hyperscale delivers a cost-effective stripped-down approach that starts with small servers (or nodes) that incorporate computing, storage, and network elements. Nodes are clustered and managed as a single resource pool.

That sounds a lot like a hyperconverged architecture. While that is true, hyperscale computing actually goes beyond hyperconvergence, as, for example, hyperconvergence tends to add resource elements, such as computing and storage, in lock-step with one another and hyperscale computing does not.

Enabling a Hyperscale Computing Cloud with an Elastic Data Fabric

The word “fabric” is a metaphor describing the lining-up of IT component interrelationships so the resulting unified infrastructure look a lot like a woven piece of cloth (but probably more colorful than a woven piece of cloth). Adding the term “data” to create a “data fabric” gets into the complexity and details of how a cloud is actually composed of elements woven together into a cohesive, integrated whole. Adding the word “elastic,” which means that resources assigned can grow or contract dynamically and transparently, is frosting on the hyperscale computing cloud infrastructure cake, and leads naturally to the concept of “elastic data fabric.”

Digging into the Datera Architecture

The Datera Elastic Dara Fabric is a software-defined block storage platform that turns standard commodity hardware into a policy-driven storage fabric that is suitable for large-scale clouds.

The Datera Elastic Data Fabric architecture has three major pillars: control plane, data plane, and management plane:

  • Control plane —constructs a coherent distributed system from collection of the server nodes, which are connected through a high speed ethernet network. The control plane performs a number of tasks on all the nodes in the cluster, including discovery, configuration, monitoring, and active administration. The control plane manages node or component failures as well as any necessary recovery processes. The Datera system can add, shrink, decommission, and rearrange hardware nodes without disrupting system availability or application access. The system performs fine-grain placement of data and access control based upon application service level objectives (SLOs) and cluster resource availability as well as real-time workload performance.
  • Data plane —distributes I/O across multiple nodes as appropriate based on application service level objectives (SLOs) and cluster-wide policies, as well as node characteristics (including different storage media tiers). The data plane delivers all data services, including all forms of snapshots (online, offline, scheduled, read, write) and clones. The data plane has a data tiering layer with a tiering engine that actively moves data among tiers by applying a data “heat” algorithm. The three current tiers are: NVDIMM (Non-Volatile Dual In-line Memory Module (tier 0), NVMe (Non-Volatile Memory Express) Flash (tier 1) (for use in conjunction with a PCIe (PCI Express) bus), and NL-SAS (Near-Line Serial Attached SCSI) HDD (hard disk drive) (tier 2). The first two tiers are obviously performance-oriented while the HDD tier is the capacity tier. Datera’s focus however is in optimizing the architecture for a flash-first storage layout.
  • Management plane —provides application-driven real-time resource consumption. Datera talks about applications having “intent” related to context-driven policies. The intent defines SLO functions (such as for performance or data protection requirements) for each application. Intent also allows for composability, which enables system components to be selected and assembled in whatever combinations are necessary to satisfy application SLOs. This model abstracts (i.e., decouples) storage application provisioning (the why of the intent) from the need for any knowledge of the physical infrastructure (which is how the intent is actually fulfilled). An Application Template (AppTemplate) models the view of storage and associated services for a workload or application as policies. Default templates are available for some applications, such as Cassandra, Hadoop, MySQL, Test/Dev workloads, and VMware environments. Customized templates can easily be built for other applications. Multi-tenancy is inherent in the deployment of policies by tenants or users. Overall, the management plane enables the ability to scale without having to handcraft the infrastructure requirements.

By instantiating these pillars to support on-demand self-service functions in its Elastic Data Fabric, Datera enables elastic block storage in private cloud computing or hosted public cloud environments.

The Role of Programmability in Datera Elastic Data Fabric

But who can use Datera Elastic Data Fabric and how do they go about using it? Datera designed Elastic Data Fabric with the DevOps (short for development-operations) community in mind. Now DevOps is a cross-disciplinary collaboration between development and operations throughout all the stages of an IT services lifecycle. Since the primary focus is on developing and operating rapidly-changing resilient systems at scale as cloud-native application workloads, it sounds as if DevOps and Datera Elastic Data Fabric are a good match.

Datera Elastic Data Fabric provides the programmability capabilities that DevOps users need. Self-service operations are enabled through the use of a RESTful API (for building a Representational State Transfer service with the help of an Application Programming Interface). REST is an architectural style for building lightweight, maintainable, and scalable Web-like services. The intent-based RESTful API makes the infrastructure both programmable and composable and can perform all the necessary resource provisioning, system configuration, and management tasks.

In addition, Elastic Data Fabric provides additional capabilities to DevOps users by connecting through the industry-standard iSCSI protocol that natively integrates with OpenStack. It also works with application container orchestration platforms, including Docker, Google’s Kubernetes, and Mesos, as well as VMware’s vSphere virtualization platform.

The World Datera Plays In

The standard bearer for a hyperscale cloud and the target that Datera has set its sights on is Amazon Web Services (AWS), notably Amazon Elastic Block Store. The company supports more than enough capabilities to support the requirements of most DevOps users or service providers who want to provide a public cloud-style services as alternatives to Amazon. Datera claims a significant TCO cost advantage over Amazon which should make their solution for both use cases.

Now, the world is full of vendors who offer scale-out capabilities through a node and cluster strategy, including SolidFire and Ceph. Although SolidFire (now owned by NetApp) is an all-flash storage array, it also has had many creative ideas on the next generation data center and has strong QoS capabilities for managing workload balancing in multi-tenant clouds. Though SolidFire should fit in very nicely with NetApp’s data fabric strategy, Datera claims to offer a significant TCO cost savings with respect to comparable SolidFire offerings by giving the customers a wide choice on the price/performance curve.

Ceph is a popular free software platform that stores information on a single distributed compute cluster and can nominally scale to exabytes of data. It provides interfaces for object, file, and block storage and runs on commodity hardware. Datera claims performance and operations advantages over Ceph, along with major TCO benefits. While the Ceph software may be free, the underlying hardware is not. Plus, Datera says that it needs far fewer nodes than Ceph does to achieve similar performance results, and points to its application intent-based management to significantly simplify operations.

Mesabi Musings

Private, hybrid, and public clouds are the development and operations platforms of the future. Those hyperscale computing environments are architecturally different from traditional data centers. Amazon illustrates the differences those clouds can provide end users with services like its Amazon Elastic Block Store.

Datera offers enterprise and service providers an on-prem alternative for cloud services. The Datera Elastic Data Fabric brings Web-scale operations and economics to the private and public cloud for elastic block storage. It achieves this in a number of ways, including turning storage into a single, virtualized pool that can automatically shift data between resources for faster access to the most critical data. Given the flexibility and features it provides, Datera Elastic Data Fabric should attract a lot of attention, especially from the DevOps, private cloud and service provider communities.