Openstack the big thing since the cloud inception. Thus here I have collected some material about real-world problems. With my take on them
Good news is that cloud actually does not exist :-) It is just a user perception! User does not care about actual server, storage or network. It looks like the servers, storages and networks are obscured by a cloud, but user does not need to know.
At first I want to share little bit different vision about virtualization and infrastructure orchestration. Many of those idea generally apply also to new fenomenon of containers (~> containerization).
I started to call that Inside out picture. Why? Simply because most of the pictures that describe Openstack just starts by enumerating the Openstack components. It is worth to say growing number of componets with variable quality (code transparency, documentation, stability and robustness). Always when I see such list of great components I feel like watching an advertisment for certain product. Do not take it wrong, I do really appreciate the work done for global benefit. I deeply believe that making software open source is the only viable way for general good. But sometimes it looks to me as the user is forgotten and somewhere in the last row of the theatrum mundi.
My picture is pretty up-side down. The big center object of the vision is the virtual machine with required resources:
- VM ~ Hypervisor (usually KVM)
- RAM - allocated from physical machine
- CPU - slices of raw machine CPU
- HDD - virtio / block storage (best provided by CEPH)
- NETWORK - virtio / network ring-buffers (DPDK, XDP or SRV-IOV)
all that is then managed by whole lot of various daemons and agents. Recently some of the resources, mainly networking turned into its own spin-off having couple names like SDN, OVN, NFV. Each of them establishing its own purpose right to exists and specific problem they are solving.
Uninformed observer might easily loose his or her focus while looking at the poliferation of the supplementary services providing the elementary UNIT of COMPUTING (aka UoC). I was wondering when I spotted podman as a daemon-less container runtime/manager whether such approach could also hit the Openstack domain. All that because some people perceive the openstack as a big beast consuming unnecessary resource just to maintain the IaaS (the very bottom of app stack). There is even a concept of Container native virtualization (aka CNV) that take the core of openstack ~ UoC and packs it as a part of the container management platform (as of today Kubernetes ~ K8S).
The decoupling of the VM from the Openstack might scare some engineers living in their home domain but what I believe it may make them think different. Of course it might cause the end of Openstack but I do not see as dark as it might suggest. This is the power of opensource that all parts of the software bundle can be taken apart simplified, optimized, replaced, etc and then form a partially new product. Product that can take the best of past and the new.
I have to admit that I have pretty short experience with the Openstack. What exactly I have learned during the common days with it? I have learned that it ressembles a puzzle that has pretty amazing final picture. I brings the fact that you when using it rely on other’s people work. As a user or better as an admin of the Openstack instance you have to understand how does the gears fit together. In other words to start an Openstack instance you most likely will encounter packstack or devstack that are out there to let you feel the Openstack. And it is a pretty good teaser! At least I felt in love with it. Sooner or later after teh first encounter you realize that the initial training has passed and you should take it little further. To start earning money out of it. Well at that point you just need some hardware to make the Virtual hardware alive.
Here it comes, growing things bigger needs planning. This is the point when industry practice might come handy. Some external consutancy might be a must. Thankfully there are couple of consulting companies having real world experiences with larger clusters or DCs. A good point to mention here is that when we talk about the Openstack we are silently assuming on permise deployment.
I have taken probably the worst and the most painful way to take. My ingredients were rather weak scholar knowledge about the automated deployment using TripleO. The TripleO is also behind commercial product Red Hat Openstack Director which gave me the illusion of the right way. Actually I did not have a budget to spend (apart from having some HPE Gen9 servers). I have had some hints from alternative tools. Namely: MAAS + JuJu by Canonical and Crowbar by SuSE. Nevertheless I have taken the integrated solution by Openstack people (should be good when used by authors…).
The TrippleO is somehow the ultimate way of doing things. You just throw some config values into it and it does the rest. Well, Uhmm, yes it does. But you will not do it without insight. There are so many variables that only some of their combinations make sense. When ever you create invalid or partially invalid setup you might not get imediate and understandable error code. Or you might even not get any error. Well, just the cluster suddenly falls-over without giving you a chance to be prepared or warned in any way.
The TripleO is amazing collection of hacked tools together. Let me as an example give you list of tools used just for the deploy of overcloud (all of them has to be working properly):
- Instalation of Undercloud (One node stripped down Openstack) ~ worked on CentOS 7
- Deploy of overcloud
- Turning the enviroments files, heat template templates (.j2) into final heat templates (~HOT) to define the HW nodes with their configs
- Executing the HOT at the undercloud, as a part of that the HW nodes are deployed with base OS (Centos 7) ~ Openstack - Ironic
- Downloading the Ansible playbooks that will preprare the HW nodes with necessary packages and configs
- Execution of those playbooks via SSH
- Starting the Openstack services (components) inside containers (docker) at the individual nodes. This step is actually done in 5 phases using mainly Ansible but also Openstack - Kolla using puppet
And in between those steps and huge number of scripts. Abolutely anywhere an error might occure. I can imagine that most of the errors remain unspotted and causing harm later in time. Why? Simply because if there is a multiple sources of code. Each and every programmer has some model od reality in his mind. The code and its purpose and authors intention/expectation will deviate from yours. Only few cases can be covered by input value tests. Thus having a tool of TripleO is great if its magic matches with your configs and intentions but on its own does not give any warranty. You are pretty much left on your own with pray and hope.
As easy it was to fell in love with such a magic tool so hard it is to leave the idea that it could work as an almighty solution doing the magic of uprising the Openstack on crappy collection of HW.
So far I felt like I have got trapped by one of the obvious fallacies “Because I and they have invested so much into it it must be good and right solution!”
To get out of that pitfall I am currently persuading myself to divide and conquer next step. One part is to have devstack for experiments and the second part is to turn to alernative that has more atomized deployment process as there are so many aspects to consider.
Some of them are:
- Performant Block storage provider - CEPH as best candidate
- High Availability and Redundancy - each components in production shall have some reliability and upgrade path
- Self Healing - automatic rollback of failed components
- Autoscaling - on demand cluster of VM resize (heavily depends on the app)
- High Perfomance Networking - all scale from Phy, SRV-IOV, DPDK, XDP, VxLAN, VLAN, eBGP VPN