The Problem

An often repeated workflow in a modern IT shop is the “Create base VM, configure networking, update base VM, configure VM (add service/user “stuff”), repeat”. IT Departments will have a variety of approaches to this process and there are a near infinite number of solutions/approaches. Some are very manual:

  1. Create VM (define resources)
  2. Attach OS ISO
  3. Install OS via virtual console
  4. Configure networking
  5. Update OS
  6. Configure services/give the VM it’s “job”
  7. Hand off to user

Some are entirely automated and use IaC (Infrastructure as Code), Orchestration tools, Service discovery, etc to make a completely touch-less deployment.

The approach used largely comes down to historical “This is how we’ve always done it”, awareness of the tools available in the problem space, and technical knowledge of (or desire/time to learn) how to leverage said tools.

The last part (desire/time) being the real crux of the issue (followed slightly by awareness of the problem space). IT departments have a lot going on, and they may want to move away from their highly manual process but the learning curve for tools such as Terraform or Packer combined with all the things they are already doing can be more than what they can bite off at any given moment.

A reoccurring theme throughout this discussion will be “crawl-walk-run”; meaning change doesn’t have to be made in huge leaps, incremental progress can be made before an organization is full blown “running”.

Potential Solutions

As a comparison of complexity let’s spend a little time talking about some ways the manual process described could be automated and “cloudified”.

The most low-level component we can tweak here is going to be the disk image used to build the VM (or no disk image if you’re building from an ISO). We have a couple choices here, firstly you could take an upstream OS image (say Ubuntu 24.04 LTS) and deploy from that. 

This would get you a base OS deployed with current packages at the time that the vendor (Canonical) built the image used. And it is certainly the easiest solution, however you still have to update the base OS to what is current at the time of deployment, and then add any additional packages/services after the fact. This increases the time from the deployment triggered and the deployment being ready to use - and also requires the deployed VM to have reachability to repositories - public or local.

Another option is to build your own OS image (starting with an upstream validated image), update the OS and build in the appropriate packages/services so that everything is pre-baked and ready to go. There are a number of advantages (and some down-sides!), one advantage is that since everything is updated and installed the VM goes from instantiation to performing its role drastically faster.

The second main advantage is that the resulting image can be deployed in an air-gapped environment where the VM does not have external repo/Internet access.

While this route is ideal, it does add the potential for some complexity. Building this custom image can be done in a myriad of ways, some of them more elaborate than others. Continuing our “crawl-walk-run” analogy, let’s look at some of the options.

The most manual way (Crawl) would be to take an upstream base image, deploy a VM from that image, update the base OS, and add any required package manually. This updated disk can now be cloned into a new appliance image. At this point the build VM can be destroyed and the new cloned image can be used to deploy new VMs.

As the KVM hypervisor has become the industry standard, platforms like HyperCloud and VM Squared can leverage a wide ecosystem of tools to build images, ranging from virtbuilder on the simpler side (Walk), to Packer on the complex/feature-packed side (Run).

That covers some of the aspects and options for image creation there certainly is a ton more that could be covered on that topic, however lets move on to discuss configuring our new VM. 

Once you have a built VM (and updated if you didn’t create an updated image), you now need to configure said VM. This can run the gamut from fully manual, to fully automated using an automation/orchestration tool such as Ansible, Salt, Puppet, etc.

One aspect that can complicate matters with automation tools is that generally they have some form of inventory. In their simplest forms they are generally static files/lists, though there are numerous solutions to make it dynamic, that topic could be a full article on its own!

The issue remains though, that when the bring-up of your system is decoupled from any intelligence or process, making any external orchestration aware of the new VM to be managed can be problematic.

Let’s assume the orchestrator knows about the new VM to be managed, how does the orchestrator system (or admin) ssh into the VM in order to configure it?  Passwords pose the problem of securely communicating them to the admin/user or entering them into the orchestrator system. There are certainly tools to help facilitate this, however how often does a “temporary default” password become “permanently default”?

SSH keys are a much better solution, but they too have some problems. If you are instantiating a new VM how do you get a SSH key into it so that it can be configured (either manually or via orchestrator)? Putting it in via the virtual console is going to be a pain and error-prone. And you don’t want to bake the key into the image as that will just cause operational issues at some point down the line, in addition to not being flexible.

SoftIron’s Solutions

So how can HyperCloud and VM Squared ease these operational burdens? Many of the issues we’ve talked about so far are actually addressed by just one aspect of HyperCloud and VM Squared - the contextualization system. 

There are a lot of really neat features in both products, however the contextualization system is the one that’s most likely to generate an “Ah-ha!” moment for a new user - and it’s one of the most fun to experiment with!  Let’s take a look at how this one aspect of HyperCloud and VM Squared can be used to address the problems we’ve looked at.

Let’s start with the last issue we discussed - SSH keys. With both HyperCloud and VM Squared it is as simple as checking a box in the VM template and the public SSH key associated with the account of the user deploying the VM is added automatically to the VM. This works as well if the VM is deployed by an admin for another user, the target user’s key will be what is installed into the VM.

So that’s one issue down. It can also be problematic to get run-time changes into the running of an orchestration tool such as Ansible to adjust its behavior. Lets say you have a playbook that normally puts the website path into /opt/srv/webroot, but your user wants to install it to /var/www/, how can the user change that path without modifying the playbook?

There are a ton of ways to load facts (or whatever your tool of choice calls its data sources), and some of them can even be done dynamically, but the issue remains is that does the self-service user (i.e. non-DevOps) deploying the VM also have access to or knowledge of how to adjust these values to suit their desired outcome? In a lot of cases the user wouldn’t have those skills or access and it would be a matter of opening a ticket for the user/group that does.

However within the HyperCloud/VM Squared template (part of the contextualization feature) is the ability to build User Input prompts. There can be an arbitrary number of inputs that can be optional or required. They allow you to create a template that will ask the user to fill out and supply this customization information when they instantiate a VM from the template.

This bridges the gap for runtime changes and enables the self-service user a frictionless way to remain productive and self-sufficient.

These are largely free form, but there are some specialized data types. Text (default), text that will be base64 encoded (for example what if you have a JSON blob you need to pass in, the escaping on that within a script can be a giant pain), range (high/low), list (that can be selected as a drop-down), a boolean, number (float/int), as well as variation on several of these.

The resulting input will be passed into the VM as part of the contextualization process, and can be read in from a file or loaded as environment variables to be leveraged by our next feature the ‘Start Script’

Our “Start script” functionality is essentially any form of script that is executable within the image used for the VM. This can be used to address the post-install configuration aspect of a VM lifecycle. This could be bash, PowerShell, Python, cloud-init, Ansible, you name it; as long as the binary necessary for the script is in the image you’re using.

The Start script can be as simple as a bash script that updates the system and installs some packages, or it could be an Ansible playbook dynamically using the data supplied as user input when the instance was deployed.

So in our webroot example the user deploying the VM doesn’t need to know how to tweak the Ansible playbook that deploys/configures Apache they just know that when they deploy the webserver template it has a default value of /opt/srv/webroot shown, and they can change that value if they want.

This allows a non-technical user to leverage tools such as Terraform (baked into the image) that are quite complicated, but they don’t even need to know they exist. They just know they use this magical template and then it does the things they want.

The start script also doesn’t have to be just for that first boot setup, since it runs on every boot. You can use it to deploy a thin wrapper to perform ongoing functions. 

For example you could create a small python (or insert language of choice) program that runs on an Apache Guacamole manager VM to bridge together the HyperCloud API with the Guacamole API to create an auto scaling VDI solution that scales the cardinality of VDI hosts depending on the number of active sessions.

Let’s take a look at a full-on running example, leveraging all the aspects of the contextualization process we’ve talked about thus far.

Since every instance within HyperCloud and VM Squared can talk to the API endpoint (this is internal and functions without having routed access to the publicly facing API endpoint), you could create an appliance that is a “builder” appliance for another more complex multi-instance service.

For example, let’s take a multi-instance service such as kubernetes; while there are tools out there to deploy a multi-node kubernetes cluster (including a turn-key HyperCloud service template in our Partner Marketplace), let’s ignore those for the time being. 

A “builder” appliance could have Terraform baked into it along with an orchestrator configuration to deploy a multi-node cluster. You could leverage the User input feature on the template for the “builder” appliance to ask the number of master nodes, the number of worker nodes, as well as the resources of the worker nodes, you name it. Really any aspect of the Terraform run.

When the builder appliance is instantiated, Terraform can then leverage the HyperCloud provider to talk to the API and to create more nodes and build the k3s cluster based on the supplied data. The user doesn’t even need to know that Terraform is used under the hood. Just that they magically get a k3s cluster.

When you use the User Input system for the first time you will see that the values supplied at instantiation of the VM are stored as “Attributes” of the VM. You will find this is a common theme throughout both HyperCloud and VM Squared and just about every component within can have attributes. Users and groups can have attributes, Virtual Networks, Security Groups, Templates, and Images (among others) can all have attributes.

Attributes can function as “tags/labels” or as a mini built-in key-value store, and be used in a number of creative ways. Some are internal to HyperCloud (e.g. placement rules for affinity/anti-affinity) and others can be leveraged by a creative user.

Take the context of our “builder” appliance, since attributes can be set via API (even from within the instance) they can be leveraged to communicate information out of an instance. For example, the terraform k3s deployment can create a STATUS key, with the value of BUILDING to let the user know that it’s still deploying the cluster. It can then change that to COMPLETE when done etc. This can be used to convey the status of long running batch jobs etc, without the user having the rights/access to log into the VM.

Attributes can be set via the HyperCloud/VM Squared UI and then read via the API, this can be used as a conduit to pass data into an instance. Back to our imagined k3s cluster, if you could have (as part of the Start script) a function that runs in a loop checking a “control” attribute that then triggers functions within. 

For example after terraform deploy has been run, if the k3s cluster was a test/dev cluster and can be destroyed, you could have a DESTROY=true attribute that would then be noticed by the function within the builder instance and it would then execute a terraform destroy thereby removing all the VMs it had previously created.

Additional Thoughts

The limit is really up to the user’s imagination. Here’s another example: let’s imagine there is a user that wants a VM created but doesn’t have access to HyperCloud. They can request an admin create the instance, and when the admin creates the VM for them they deploy it as this end-user. The end users account in HyperCloud has an attribute EMAIL that has the users email address.

The VM template has the following in its CONTEXT section EMAIL = "$USER[EMAIL]", that will look up the user’s email from the attribute under their user and then the EMAIL attribute can be accessed as an ENV variable (or read via the API), and then the Start Script could be setup to send the end-user an email with all the networking details for the instance so they can now SSH into it.

Automating VM deployment doesn’t have to be an all-or-nothing endeavor. By adopting a crawl-walk-run approach, IT teams can incrementally streamline their processes, reducing manual effort while improving efficiency and consistency. Whether it’s leveraging pre-configured images, automating configurations, or integrating contextualization with HyperCloud and VM Squared, the key is to start where you are and build from there. With the right tools, even complex infrastructure management can become seamless, empowering teams to focus on innovation rather than repetitive setup tasks.