On-demand flexible storage.

Imagine you are buying a new car. You get in the car but instead of finding a wheel and pedals the manufacturer has decided to install two joysticks and an array of knee-operated switches. Welcome to the Storage market, where every product provisions storage differently.

CSI is a bit like a standard set of controls for Kubernetes storage. It makes it easy to choose the right storage provider for your needs, and to make sure that the different storage components are compatible with each other. If you are using Kubernetes for production workloads, CSI is essential to manage your storage efficiently. CSI will give you the flexibility and reliability you need to deploy and manage your applications successfully.

What is CSI?

Container Storage Interface is a spec started in 2019, developed as a standard for exposing arbitrary block and file storage storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes.

csi-logo

Prior to CSI, Kubernetes relied on plugins built as part of the Kubernetes codebase, which was unwieldy to manage and tied the development of new storage providers to Kubernetes’ own code and releases.

CSI provides a specification for 3rd parties to code against without the need to get involved in the complexities of working with the Kubernetes project itself.

CSI allows new storage providers to appear on the market and provide support for versions of Kubernetes which were released before their project was started, and provides a consistent and uniform experience to customers looking for container storage solutions. Whether you prefer Rancher with K3S or RKE2, or OpenShift, or maybe you are running EKS or AKS - CSI provides a common interface making it easy to obtain on-demand storage solutions where you need them. We are excited to announce CSI for HyperCloud and I’ll explain why in this post.

Why CSI?

Before Kubernetes, with Docker images, developers would have to log in to the container hosts and do some system administration to allocate disks, create partitions, set up filesystems, arrange mount points, check permissions then link this all to the command user to launch the container. Early Kubernetes started with storage plugins using this approach. Managing storage was practically unusable in early Kubernetes given each storage provider had its own proprietary way of provisioning and managing storage; making it difficult for users to learn and use multiple storage providers. This posed difficulties supporting all of the different storage options in the core Kubernetes codebase. Users also had to understand the specific configuration of the backend storage to provision volumes, as each volume interacted directly with external storage systems.

CSI takes all those admin tasks and automates them, allowing storage to be created on demand.

Interfacing with the storage system is separated from requests for new volumes. Developers only need to add a PersistentVolumeClaim entry to YAML manifests. Storage is automatically created and the Pod also has a formal dependency on the storage so applications don’t throw errors due to starting before storage is available.

Freedom to move workloads between clouds, one YAML configuration file will work anywhere. Workloads can be standardised, with CSI handling all integration of storage.

Scale is really where CSI shines - if you have one or two hosts then maybe you can manage the manual work but if you have a cluster of 50 Kubernetes VMs then keeping the storage configuration aligned is challenging. CSI does all of that for you. In fact, storage is essentially un-manageable in Kubernetes without CSI on anything but the smallest deployments. When you have 1000s of Pods then manually mounting storage is very complex and breaches the container/host security barrier so one miss-step leaves your container hosts vulnerable to bad actors. CSI reduces risk and simplifies storage deployment.

HyperCloud supports CSI and enables some time-saving and powerful features:

  • Control storage from manifests just like your Pods and Deployments
  • Application-controlled, on-demand storage provisioning
  • Storage automatically mounted where the Pods need it
  • Automated storage expansion
  • Automatic cloning of volumes
  • Snapshot and restore from Kubernetes
  • Block volumes as well as filesystem volumes

As a cloud user CSI also makes it easy to migrate apps from one ecosystem to another - which prevents lock-in and makes your life easier!

Using CSI

Consuming CSI is done via StorageClasses and PersistentVolumeClaims. These abstractions are laid out in the CSI spec and describe the storage system and the volumes it is asked to provide.

StorageClass:

  • Refers to a storage provider such as a disk array
  • Links a driver to an address and a set of credentials
  • Can specify default values for volumes provided
  • Multiple can exist in a cluster
  • One in each cluster can be set as default
  • Global resource

StorageClasses handle the configuration needed to interact with the storage system, and only need to be created once per cluster.

The StorageClass gives Kubernetes the address and details of your CSI Driver; the DNS name or IP of the endpoint, any authentication details and perhaps some default options to use when creating volumes. Many StorageClasses can exist if the user wishes, allowing for different storage systems to be used at once or for different sets of defaults to be specified - perhaps to provide SSD or HDD classes of storage in differently named StorageClasses. StorageClasses are global and can be seen from any namespace.

Diagram of CSI provisioning process

PersistentVolumeClaim:

  • A request for a volume
  • Binds to PersistentVolume when allocated
  • Specifies the required capacity
  • Specifies a StorageClass unless a default exists
  • Can include custom storage settings
  • Namespaced resource

PersistentVolumeClaims indicate a dependency on a set of storage resources within a Kubernetes application.

A PersistentVolumeClaim (PVC) is a request to deliver a single volume. The PVC states that the user wants a certain amount of storage, and at a minimum needs only to specify a name and a capacity. If a StorageClass exists and has been marked as the default in the cluster then it will automatically be used, otherwise the PVC must name the StorageClass to be used. Creating the PVC kick-starts a process of API calls to the CSI driver, which attempts to fulfil the requirements laid out in the PVC and StorageClass settings. Initially the new PVC appears with the “unbound” status, indicating that the request is not yet fulfilled. PVCs are namespaced and can only be seen from the namespace they are located in.

Persistent Volume:

  • Link to the actual disk volume
  • Automatically created once storage has been allocated
  • Can be manually created to connect pre-existing external storage
  • Global resource

PersistentVolumes are automatically created when PersistentVolumeClaims are actioned and form a link between applications and back-end storage allocations.

A PersistentVolume is created once the backend storage is allocated and ready. This PV object links storage on an external system into Kubernetes. Once the PersistentVolume is created and linked to the PVC, the status changes to “bound”. Upon deletion this volume is either deleted and capacity returned to the pool, or preserved and left as an unused PV, a setting determined by the Reclaim Policy. In special cases, where storage is pre-provisioned on the storage system and must be imported into Kubernetes, the PV can be manually created and linked to a pre-existing storage volume. PVs are global and are visible from any namespace. Care should be taken to ensure that only Nodes and Service Account that need access to PVs have security rights to them.

Features and evolution of CSI

From 2019 CSI grew, encompassing more features and a growing number of vendors. From the initial support of simple file system attachment to containers, CSI has added:

  • Raw block volumes
  • Snapshotting volumes
  • Restoring volumes (to new PVCs)
  • Cloning volumes (to new PVCs)
  • Topology-aware volume provisioning
  • Limits on volume usage
  • Volume health monitoring
  • Storage capacity tracking
  • Ephemeral volumes (follow Pod lifecycle)
  • SElinux support

Now in 2023, CSI support has grown to support most core Storage Features useful to application developers. Users are able to dynamically create volumes, use these volumes to store persistent data across many invocations of a Pod, expand these volumes as data grows, and even use Snapshots to allow rollbacks during upgrades of the application code.

CSI makes Kubernetes application deployments more scalable, resilient and repeatable.

HyperCloud allows you to use all these features to provide data protection, easy cloning and access to raw block volumes as well as on-demand filesystems in Kubernetes. All backed by fast and efficient HyperCloud storage.

Related articles