We recently wrote a post explaining 7 advantages of choosing HyperCloud over VMware for your cloud platform and I wanted to share a more Engineering-focused viewpoint in a parallel article.
Update May 2024
Since this blog post was originally published, we have released a new tool, Catalyst, available as part of both HyperCloud and VM Squared. While the concepts in the article below are still useful, for a step by step guide to using Catalyst, check out How to migrate from VMware to VM Squared.
About the author
I have worked for many years with VMware so these experiences draw from more than a decade of experience planning, building and operating VMware environments from small remote sites to large multi-site deployments covering 20+ data-centres around the world. I’ve worked with huge VDI deployments for 10,000+ users, critical credit-card processing infrastructure and have worked on numerous vCD, vCAC and vRA deployments including internal testing and automation of EMC’s EHC offering. I’ve seen virtual networking move from the vShield and the Nexus-1000V to NSX-V then NSX-T. So this isn’t uninformed, but it does bring back frustrating memories!
New solutions - old problems
You have an estate of virtual machines, you continually add new virtual machines and the landscape keeps growing. Many customers started small with VMware in 2010-2015 but after a few years the number of VMs quickly reached 100+, 1000+ or even 10,000+. Moving such a huge number of VMs to the public cloud is a considerable effort and brings new challenges of network access, navigating costly subscription plans and data transit fees.
Yet you want cloud features - easy on-demand access, scalable resources, a simple but resilient management interface. Virtualisation teams want ease of management. Developers want to quickly stand up virtual instances with modern tools.
IaaS beginnings
Since their inception in 1999 VMware has grown to dominate the Hypervisor market in much the same way that Microsoft dominates sales of desktop operating systems. In fact the parallel goes deep - Microsoft cannot drop its ancient DOS and Win32 beginnings so customers are left with a bewildering array of Control Panel items straight from Windows 95 and vulnerabilities due to Internet Explorer in Windows 11 because IE is still used internally to render help files. VMware cannot drop an architecture that mandates a single central vCenter, with users relying on this central vCenter VM to track, monitor and configure ESXi hosts, as well as perform RBAC and Virtual Machine scheduling.
Hampered by history
Just as Microsoft cannot transform Windows into a truly secure multi-user operating system when it started as a single-user shell, so VMware cannot transform vSphere into a resilient and distributed cloud service.
Sure, you can run ESXi in the AWS Cloud but this is more a testament to Amazon’s hardware management and use of programmable SmartNICs than anything else. VMware’s cloud integrations grant users the ability to run costly VMs on VMware Cloud in AWS or vRealize plugins which remotely operate the AWS API on behalf of VMware users. Either way, you are talking to a VMware API and VMware still only exposes the full vCenter functionality via the ancient SOAP API which means sooner or later you will have to step outside the limited REST API and deal with that.
Traditional ‘cloud’ from VMware is a mess.
VMware initially entered the cloud fray with vCloud Director and for a few years it looked like the product that would do everything - it managed network isolation and handled complex templates including not just VMs but networking too. Unfortunately this was too good to last and vCloud Director has been withdrawn from public sale and is only available for members of the VCPP program. The requirement to become a VMware Partner before a licence is available puts it beyond the reach of most customers and those who clear the hurdle must have their billing mediated by a dedicated on-site billing appliance.
A mish-mash of various acquisitions.
Customers were introduced to vCloud, a mish-mash of various acquisitions. vCloud was rebranded as vRealize a few years later, an apt naming choice which reflected vCloud’s inability to run Cloud workloads. vCloud Automation Center, known as vCAC and later vRealize Automation, is a customer-defined web dashboard with a bunch of windows-based orchestration VMs to execute customer orders. Windows? Yes, to use this you will need to have Microsoft licences available.
Deploy all of those vRA bits and pieces and you have a framework you could use to build a cloud, but still no cloud.
Fight the sprawl
IaaS features from VMware often involve creating another VM, leaving virtual administrators shepherding a complex web of interdependent virtual machines. New features bring not only their own virtual machines but their own user interfaces, upgrade schedules and compatibility requirements. Horizontal scaling is often impossible, as vCenter and the related feature VMs were designed as single non-distributed Java applications - scaling means increasing the CPU and RAM for a single VM. Individual VM failures cascade through the environment due to complex dependencies. During maintenance these virtual machines must be carefully relocated - hoping that VM storage and networking remain intact to avoid service disruption.
Failures cascade through the environment due to complex dependencies.
OpenStack was the hope for many looking for a more modular, resilient and scalable solution. I remember the feeling of excitement deploying OpenStack Grizzly in my home lab and thinking it was all going to change. Unfortunately the vendor’s attempts to “own” OpenStack caused fragmentation and problems of administering and upgrading OpenStack limited the lifespan of deployments. OpenStack embraced modular architecture but did so before containers gained popularity, leading to ever more complex upgrade processes. Today OpenStack has Kolla, a container-based installation process, but it might be too late to get customers’ attention.
Now VMware SDDC manager tries to solve the upgrade by involving another component, taking up more RAM, CPU and disk space. Of course you need another VM for that, and a paid licence too!
Operational challenges
Cost savings of virtualisation that were touted in the early 2000s have been steadily eroded over time. Growing licensing costs and lock-in coupled with indirect operating costs have caused many organisations to re-evaluate virtualisation as CIOs and CFOs try to reduce costs but not capabilities. Classical IaaS or HCI brings a litany of potential savings:
- Multiple specialist SMEs required to keep lights on
- vCenter tuning and monitoring
- NSX networking totally separate from classic ESXi/vCenter networks
- vRealize Orchestrator JavaScript automation
- vRealize Automation Windows and SQL Administration
- Fragile single control instance with vCenter
- Intentionally single instance to preserve “performance” of DRS
- Active/Active Failover requires complex CPU mirroring
- Management Overhead
- Licensing
- Compatibility
- Upgrade cycles
- Huge hardware resources purely for management of your IaaS
A manager’s management stack to manage your management stacks?
A complete VMware management stack allowing developers to order from a Service Catalogue and providing storage and network services has now grown to consume over 170 CPU cores, more than 512GB of RAM and in excess of 10TB disk space. You might think numbers like these are a fanciful work of fiction, but these are the mandated VMware Validated Design for vRealize, NSX and vCenter in vCloud Foundation.
A single vCenter instance can require in excess of 4TB of disk space!
Half of your hardware is needed purely for management.
Much of this pain comes from core architectural problems baked into products - VMware products are inherently not multi-tenant; you have to pay twice and build twice to manage one workload. Two VMware installations are needed because there must be an entirely separate installation for administration and management components due to the lack of true multi-tenancy. You will quite literally have to install a full management and software defined networking stack purely to manage your management VMs so those management VMs can manage your workload. Not an exaggeration - this is the VMware Validated Design and it means that in a cluster of 8x 256GB/40CPU servers, fully half of your hardware is needed purely for management.
HCI to the rescue?
Hyper-converged infrastructure was a big deal in 2015, offering a way out of the swamp of traditional IaaS. Where organisations were spending heavily on dedicated hardware storage solutions as well as compute, HCI bundled IaaS plus Software Defined Storage and a management app on rackmount servers.
Is HCI really different from IaaS?
Software Defined Storage (SDS) allowed customers to grow storage symmetrically with compute instead of the prior situation where individual storage arrays were complex, costly and only purchased infrequently, which caused large steps in the available storage whilst compute was able to grow more smoothly to match demand.
Vendors were very careful to separate the SAN-based storage and HCI storage to keep HCI looking fresh and new - it was generally impossible to buy the HCI servers with FC HBA adapters for example - but there was really nothing separating the servers which had been IaaS the year before from the HCI servers sold afterwards except a few hot-swap drive bays.
Revolution, evolution, iteration or side-step?
The promise of HCI was to revolutionise IaaS Operations as well, but the reality fell short as a number of basic install wizards were rushed to market and hurriedly iterated on until they met the already-low expectations of enterprise software users. Papering over the cracks in IaaS was enough for a product launch but not to compete with cloud solutions.
Networking was almost always left out of HCI solutions, as they were unable to effectively automate hardware networking. What little network integration was created was merely installing fixed configurations. These routines assume physical wiring exactly followed a static map, a brittle approach causing endless frustration during initial deployment. Part of this came down to the lead players, VMware and Nutanix, lacking hardware network offerings and being unwilling to commit to partners. Without networking many installations fell at the first hurdle when easy wizards failed to see servers and traffic was unexpectedly black holed.
Unable to effectively automate hardware networking.
Hindsight shows us that HCI was not a category as much as a reaction to buyer’s complaints with IaaS. Almost a decade on we can see HCI is present but has failed to convert many IaaS systems especially at the larger end where HCI’s appeal to SMB buyers was met with disdain due to limited deployment sizes and scaling options.
Modern HCI is now described by the features it actually delivered - Software Defined Storage. In the rear view mirror HCI demonstrated a need for a simple installer more than anything else.
HyperCloud enables real on-prem cloud
True cloud is for admins not just end-users
After throwing dirt at VMware for so long, let me take this in a more positive direction. I believe that on-prem cloud is possible. Cloud is possible without paying crazy prices for overly complex software and it’s possible without hiring an army of Unix beards to build it for you.
HyperCloud manages your servers, the servers are truly ephemeral and obtain all OS data and configuration from a small number of static nodes, with the configuration data shared amongst all servers in the cluster. Of course, Software Defined Storage data remains on the local disks but everything is auto-detected. This means that expanding the cluster is as simple as plugging in more servers and powering them on. No install wizards, no workflow to babysit. Server powers on, server joins cluster. Add server with disks, get disk space, add server with CPU get CPUs. Even GPUs are automatically added and made available for your applications.
No install wizards, no workflow to babysit.
HyperCloud truly incorporates cloud - not merely providing cloud services to users. HyperCloud provides cloud management of the physical and logical hosts comprising your infrastructure.
With HyperCloud there is no node install process; the HyperCloud node automatically receives a secure ephemeral operating system on startup and automatically joins your fleet. Likewise, there is no complex upgrade process - when a new HyperCloud release is available, the nodes will automatically upgrade without a service interruption.
Adopting cloud should help your administrators as well as your developers. By providing a truly cloud-first environment, HyperCloud is able to save time, increase productivity and enable hassle-free scaling for virtual administrators as well as application development teams.
Network advantage
Where HCI avoided hardware networking entirely and traditional IaaS was still using hard-coded manually installed switch templates HyperCloud is taking a new approach. Industry leading chipsets are used to build hardware that is fully autonomous, truly a part of the autonomous solution - not a separately managed product bolted on the side.
Leveraging the HyperCloud switches and powerful HyperCloud Interconnects creates a network that provides security, throughput and scalability without imposing management complexity on the customer. Networking as an integral part of the solution provides a win-win for HyperCloud and customers, rapidly responding to new events within the system and affording excellent observability.
Rock solid foundations
Since appearing in 1991 Linux has grown to become the server operating system of choice - running 100% of the world’s top 500 supercomputers and over 96% of the top 1 million Internet servers. No surprise that all major Cloud platforms are based on Linux - even Microsoft have run native Linux Azure services since 2018 - leading to Microsoft releasing the Azure Linux distribution and finally becoming a bona fide Linux vendor.
SoftIron HyperCloud leverages a bespoke Linux kernel custom built for HyperCloud hardware. Updates are small in size and quickly sent throughout the HyperCloud. Because of the small footprint the OS consumes minimal RAM and CPU.
The efficient footprint also ensures that SoftIron closely tracks all code to provide excellent protection from cyber threats for national security and finance customers. Fewer lines of code mean fewer security risks. SoftIron “stands on the shoulders of giants”, leveraging battle-tested open source code which has been inspected by experts the world over.
HyperCloud nodes are not directly accessible to VM consumers which further limits security risks. Virtual machine networking is conducted entirely apart from hypervisor communication traffic.
OPEX savings and full-stack security through hardware
Apple brought the concept of true vertical integration to compute products, by designing the hardware in-house Apple ensures that the hardware is a perfect fit for the software. I won’t say we can lay claim to the lofty heights that Apple has reached but - by designing a modern compute hardware platform in-house along with our own cloud software - SoftIron has been able to ensure exceptional security along with a streamlined user experience that makes installation and upgrades fast, simple and reliable.
Pushing the analogy further SoftIron uses a custom Linux OS, while Apple uses customised BSD Unix in their products - allowing a secure starting point instead of relying on obfuscation to provide security like proprietary Windows and vSphere solutions.
While there continues to be heated debate over the security of open source vs closed source software, it is certainly the case that instead of relying on VMware’s millions of lines of closed-source code HyperCloud is running based on a smaller and more widely understood codebase.
For customers dealing in sensitive data, or simply in possession of a large virtual estate not suitable for migration to the public cloud, HyperCloud offers a path forward where you can do more with less effort.
Beyond basic cloud with HyperCloud
SoftIron aims to take administrators and developers beyond the traditional IaaS and HCI offerings. Not merely a place to migrate old workloads (we can do that, and we do it well) but a flexible and agile solution that integrates with cloud native products.
Next generation cloud native workloads treat compute, storage and networking as an extension of application configuration. HyperCloud gives you power to easily stand-up new virtual Instances to test your application as part of a CI/CD workflow, and to flexibly expand successful applications and grow with you. HyperCloud allows you to focus on parts of IT that are visible to your customers without having to worry about the invisible work of keeping the lights on.