Seamlessly upgrading a production OpenStack cluster in 4 hours : with 2k lines shell script


tl;dr:

To the question: “what does it take to upgrade OpenStack”, my personal answer is: less than 2K lines of dash script. I’ll here describe its internals, and why I believe it is the correct solution.

Why writing this blog post

During FOSSDEM 2024, I was asked “how to you handle upgrades”. I answered with a big smile and a short “with a very small shell script” as I couldn’t explain in 2 minutes how it was done. Just saying “it is great this way” doesn’t help giving readers enough hints to be trusted. Why and how did I do it the right way ? This blog post is an attempt to reply better to this question more deeply.

Upgrading OpenStack in production

I wrote this script maybe a 2 or 3 years ago. Though I’m only blogging about it today, because … I did such an upgrade in a public cloud in production last Thuesday evening (ie: the first region of the Infomaniak public cloud). I’d say the cluster is moderately large (as of today: about 8K+ VMs running, 83 compute nodes, 12 network nodes, … for a total of 10880 physical CPU cores and 125 TB of RAM if I only count the compute servers). It took “only” 4 hours to do the upgrade (though I already wore some more code to speed this up for the next time…). It went super smooth without a glitch. I mostly just sat, reading the script output… and went to bed once it finished running. The next day, all my colleagues at Infomaniak were nicely congratulating me that it went that smooth (a big thanks to all of you who did). I couldn’t dream of an upgrade that smooth! :)

Still not impressed? Boring read? Yeah… let’s dive into more technical details.

Intention behind the implementation

My script isn’t perfect. I wont ever pretend it is. But at least, it does minimize down time of every OpenStack service. It also is a “by the book” implementation of what’s written in the OpenStack doc, following every upstream advice. As a result, it is fully seamless for some OpenStack services, and as HA as OpenStack can be for others. The upgrade process is of course idempotent and can be re-run in case of failure. Here’s why.

General idea

My upgrade script does thing in a certain order, respecting what is documented about upgrades in the OpenStack documentation. It basically does:

  • Upgrade all dependency
  • Upgrade all services one by one, in all the cluster

Installing dependencies

The first thing the upgrade script does is:

  • disable puppet on all nodes of the cluster
  • switch the APT repository
  • apt-get update on all nodes
  • install library dependency on all nodes

For this last thing, a static list of all needed dependency upgrade is maintained between each release of OpenStack, and for each type of nodes. Then for all packages in this list, the script checks with dpkg-query that the package is really installed, and with apt-cache policy that it really is going to be upgraded (Maybe there’s an easier way to do this?). This way, no package is marked as manually installed by mistake during the upgrade process. This ensure that “apt-get –purge autoremove” really does what it should, and that the script is really idempotent.

The idea then, is that once all dependencies are installed, upgrading and restarting leaf packages (ie: OpenStack services like Nova, Glance, Cinder, etc.) is very fast, because the apt-get command doesn’t need to install all dependencies. So at this point, doing “apt-get install python3-cinder” for example (which will also, thanks to dependencies, upgrade cinder-api and cinder-scheduler, if it’s in a controller node) only takes a few seconds. This principle applies to all nodes (controller nodes, network nodes, compute nodes, etc.), which helps a lot speeding-up the upgrade and reduce unavailability.

hapc

At its core, the oci-cluster-upgrade-openstack-release script uses haproxy-cmd (ie: /usr/bin/hapc) to drain each API server to-be-upgraded from haproxy. Hapc is a simple Python wrapper around the haproxy admin socket: it sends command to it with an easy to understand CLI. So it is possible to reliably upgrade one API service only after it’s drained away. Draining means one just wait for the last query to finish and the client to disconnect from http before giving the backend server some more queries. If you do not know hapc / haproxy-cmd, I recommend trying it: it’s going to be hard for you to stop using it once you tested it. Its bash-completion script makes it VERY easy to use, and it is helpful in production. But not only: it is also nice to have when writing this type of upgrade script. Let’s dive into haproxy-cmd.

Example on how to use haproxy-cmd

Let me show you. First, ssh one of the 3 controller and search where the virtual IP (VIP) is located with “crm resource locate openstack-api-vip” or with a (more simple) “crm status”. Let’s ssh that server who got the VIP, and now, let’s drain it away from haproxy.

$ hapc list-backends
$ hapc drain-server --backend glancebe --server cl1-controller-1.infomaniak.ch --verbose --wait --timeout 50
$ apt-get install glance-api
$ hapc enable-server --backend glancebe --server cl1-controller-1.infomaniak.ch

Upgrading the control plane

My upgrade script leverages hapc just like above. For each OpenStack project, it’s done in this order on the first node holding the VIP:

  • “hapc drain-server” of the API, so haproxy gracefully stops querying it
  • stop all services on that node (including non-API services): stop, disable and mask with systemd.
  • upgrade that service Python code. For example: “apt-get install python3-nova”, which also will pull nova-api, nova-conductor, nova-novncprox, etc. but services wont start automatically as they’ve been stoped + disabled + masked on the previous bullet point.
  • perform the db_sync so that the db is up-to-date [1]
  • start all services (unmask, enable and start with systemd)
  • re-enable the API backend with “hapc enable-server”

Starting at [1], the risk is that other nodes may have a new version of the database schema, but an old version of the code that isn’t compatible with it. But it doesn’t take long, because the next step is to take care of the other (usually 2) nodes of the OpenStack control plane:

  • “hapc drain-server” of the API of the other 2 controllers
  • stop of all services on these 2 controllers [2]
  • upgrade of the package
  • start of all services

So while there’s technically zero down time, still some issues between [1] and [2] above may happen because of the new DB schema and the old code (both for API and other services) are up and running at the same time. It is however supposed to be rare cases (some OpenStack project don’t even have db change between some OpenStack releases, and it often continues to work on most queries with the upgraded db), and the cluster will be like that for a very short time, so that’s fine, and better than an full API down time.

Satellite services

Then there’s satellite services, that needs to be upgraded. Like Neutron, Nova, Cinder. Nova is the least offender as it has all the code to rewrite Json object schema on-the-fly so that it continues to work during an upgrade. Though it’s a known issue that Cinder doesn’t have the feature (last time I checked), and it’s also probably the same for Neutron (maybe recent-ish versions of OpenStack do use oslo.versionnedobjects ?). Anyways, upgrade on these nodes are done just right after the control plane for each service.

Parallelism and upgrade timings

As we’re dealing with potentially hundreds of nodes per cluster, a lot of operations are performed in parallel. I choose to simply use the & shell thingy with some “wait” shell stuff so that not too many jobs are done in parallel. For example, when disabling SSH on all nodes, this is done 24 nodes at a time. Which is fine. And the number of nodes is all depending on the type of thing that’s being done. For example, while it’s perfectly OK to disable puppet on 24 nodes at the same time, but it is not OK to do that with Neutron services. In fact, each time a Neutron agent is restarted, the script explicitly waits for 30 seconds. This conveniently avoids a hailstorm of messages in RabbitMQ, and neutron-rpc-server to become too busy. All of these waiting are necessary, and this is one of the reasons why can sometimes take that long to upgrade a (moderately big) cluster.

Not using config management tooling

Some of my colleagues would have prefer that I used something like Ansible. Whever, there’s no reason to use such tool if the idea is just to perform some shell script commands on every servers. It is a way more efficient (in terms of programming) to just use bash / dash to do the work. And if you want my point of view about Ansible: using yaml for doing such programming would be crasy. Yaml is simply not adapted for a job where if, case, and loops are needed. I am well aware that Ansible has workarounds and it could be done, but it wasn’t my choice.

Real-Time OpenStack Packaging Status with Event-Driven Automation

tl;dr: https://osbpo.debian.net/deb-status is now real-time updated and much better than it used to, helping the OpenStack packaging team be a way more efficient.

How it used to be

For years, the Debian OpenStack team has relied on automated tools to track the status of OpenStack packages across Debian releases. Our goal has always been simple: transparency, efficiency, accuracy.

We used to use a tool called os-version-checker, written by Michal Arbet, which generated a static status page at https://osbpo.debian.net/deb-status. It was functional and served us well — but it had limitations:

  • It ran on a cron job, not on demand
  • It processed all OpenStack releases at once, making it slow
  • The rsync from Jenkins hosts to osbpo.debian.net was also cron-driven
  • No immediate feedback after a package build

This meant that when a developer pushed a new package to salsa (the Debian GitLab instance) in the team’s repository, the following would happen:

  • Jenkins would build the backport
  • Store it in a local repository
  • Wait up to 30 minutes (or more) for the cron job to run rsync + status update
  • Only then would the status page reflect the new version

For maintainers actively working on a new release, this delay was frustrating. You’d fix a bug, push, build — and still see your package marked as “missing” or “out of date” for minutes. You had no real-time feedback. This was also an annoyance for testing, because when fixing a bug, I often had to trigger the rsync manually in order to not wait for it, so I could do my tests. Now, osbpo is always up-to-date a few seconds after the build of the package.

The New Way: Event-Driven, Real-Time Updates

We’ve rebuilt the system from the ground up to be fast, responsive, and event-driven. Now, the workflow is:

  • Developer git push → triggers Jenkins
  • Jenkins builds the package → publishes to local repo
  • Jenkins immediately triggers a webhook on osbpo.debian.net

The webhook on osbpo does:

  • rsyncs the new package to the central Debian repo
  • Pulls the latest OpenStack releases from git and use its YAML data (instead of parsing the release HTML pages)
  • Regenerates the status page, comparing what upstream released and what’s in Debian

No more cron. No more waiting…

How it works

The central osbpo.debian.net server runs:

  • webhook — to receive secure, HMAC-verified triggers that it processes in an async way
  • Apache — to serve the status pages and the Debian OpenStack repositories
  • Custom scripts — to rsync packages, validate, and generate reports

Jenkins instances are configured to curl the webhook on successful build. The status page is generated by openstack-debian-release-manager, a new tool I’ve packaged and deployed. The dashboard uses AJAX to load content dynamically (like when browsing from one release to another), with sorting, metadata, and real-time auto-refresh every 10 seconds.

openstack-debian-release-manager is easy to deploy and configure, and will do most (if not all) of the needed configuration. Uploading it to Debian is probably not needed, and a bit over-kill, so I believe I’ll just keep it in Salsa for the moment, unless there’s a way to make it more generic so it can help someone else (another team?) in Debian.

Room for improvement

There’s still things I want to add. Namely:

  • Add status for Debian stable (ie: without the osbpo.debian.net add-on repository), which we used to have with os-version-checker.
  • Add a per-release config file option to be able to mask not packaged project on a per OpenStack release granularity

Special thanks to Michal Arbet for the original os-version-checker that served me for years, helping me to never forget a missing OpenStack package release.

Running a Lenovo Legion pro 7 laptop under Debian

As I was tired of long build times, so I convinced my boss to buy me a Lenovo Legion pro 7. The reason is: this laptop has an AMD Ryzen 9 7945HX that has 16 cores (32 threads). This reduces a lot the time I have to just wait for my laptop to compile, or run unit tests, especially for big packages like Ceph, OpenVSwitch, and so on.

When buying it, I knew it would not be a good fit for Debian, as this type of laptop is aimed at gaming, and the support under Linux is rather bad. I wish Lenovo had other policies, but that is the way it is: if you’re a Linux user, you’re not suppose to be needing a big CPU, apparently.

Anyways, I slowly have been able to fix all issues over this year. In this blog post I’ll explain how I fixed all problems, in the hope it can be useful to others. And I’ll explain what the src:lenovolegionlinux package (that I now maintain in Debian) does.

Video

The laptop comes with an nVidia RTX-4080 and a Radeon. I quickly tried the radeon, but couldn’t make it work with an external monitor. So I gave up on it, disabled it, and now I’m using the proprietary nVidia driver from non-free. I don’t like it: the nVidia card drains too much power, and I don’t care at all 3D acceleration. I would have prefer an intel board, but no choice: all laptops with this kind of CPU comes with gamer’s 3D card. Anyways, apart from the power issue, it works out well.

Fan control

This sounds like a non-issue, but it is a huge one. Indeed, if not controlling the fan, it is impossible to get the full potential of the CPUs that are otherwise throttling. One may end up using the laptop at a few hundred MHz instead of 5GHz+. More on this later.

Sound

It took me a really long time to figure out what to do. Indeed, while the sound card works out of the box, the issue was that my laptop came with a TI (Texas Instrument) speaker firmware that isn’t on by default. I suppose the purpose is to save on power when it isn’t in use. Anyways, to have sound working, one need in Debian, to run at least kernel 6.10, which means for me, running the Bookworm backport, so that there’s a kernel module for the speakers. But that’s not it. The speakers also need a proprietary firmware in /lib/firmware/TAS2XXX38*.bin. I was able to find that in the ti.com forum. As I tried so many packages, I wouldn’t be able to tell which one was the correct one. Once that was done, the firmware needs to be initialized through the i2c interface. I could find a script that did that, which I pushed in my lenovolegionlinux package (see below).

WiFi

WiFi worked out of the box for me, just it wouldn’t wake up if I closed the laptop lead. This fixed it for me in /etc/modprobe.d/rtw8852be.conf:

options rtw89_pci disable_aspm_l1=y disable_aspm_l1ss=y
options rtw89_core disable_ps_mode=y

lenovolegionlinux package

I came across https://github.com/johnfanv2/LenovoLegionLinux which I packaged. The result is now 4 binary packages: lenovolegionlinux-dkms that provides the kernel module for accessing the fan control. python3-legion-linux that provides legion_cli and legion_gui, written in Python, that make it possible to control the kernel module. I often use sudo legion_gui, click on “Other options” and then switch the power profile from quiet to balanced. Many things on this GUI do no work for me, like the fancurve thingy, but should be working for other flavors of Legion laptops. Please feel free to contribute. There’s also legiond that provides a daemon for setting-up the fan curve on wake up. And finally, I pushed my i2c speaker script to a new lenovolegionlinux-sound debian binary package that I have just uploaded today, in the hope it may be useful for others.

Conclusion

Finally, almost everything is (almost) working as expected. Just my webcam (lsusb says it’s a Luxvisions Innotech Limited Integrated Camera) went dark at some point (it did work previously). It is now as if it is working, but just transmitting a black picture. If anyone knows how to fix, please tell me. Also, I only get 40 minutes of battery time if I’m lucky, I hope this could be fixed. But overall, I’m happy of the laptop.

Thanks to Ding Shenghao for his support of many people in the ti.com forum. Thanks to the people maintaining the LenovoLegionLinux that helped me a lot writing this Debian package.

Please try and report issue with lenovolegionlinux in Debian, and help me improving it. It is in Salsa’s debian namespace in the hope that others may push contributions.

Packaging Home Assistant

During Debconf, Edward Betts and myself started packaging Home Assistant for Debian. It consists of hundreds of Python packages. So far, we counted at least 675 packages. That’s a lot, though most packages are just libraries to talk with some IoT devices and some APIs. It’s fairly easy to create a new package: it takes me about 15 to 20 minutes, probably half that time to Edward. And it’s a lot of fun. So far in one month of time, we managed to package about 1 third of the list (probably 200+ Python packages already). Once we’ve done all the dependencies, we may start to have fun with the core of the application! At the current speed, hopefully we’ll be done before the end of the year. Edward and myself have swear to make at least one package a day, which I’ve been doing so far, and Edward did a way more… We also received contributions from Silton0506, Tianyu, piotr, EiPi Fun, sourabhtk37, and Count-Dracula, as per the very bottom of the TODO list in the wiki (see link below).

If you have a bit of free time, we’d love to have more contributors. Here’s were to get the needed information:

We created a team in Salsa: https://salsa.debian.org/homeassistant-team/

Our TODO list: https://wiki.debian.org/Python/HomeAssistant

Our DDPO Q/A page: https://qa.debian.org/developer.php?login=team%2Bhomeassistant%40tracker.debian.org

Feel free to join us on IRC: #debian-homeassistant

Discussing with a lot of people about it, I realized that A LOT of DDs are actually using Home Assistant. Wouldn’t you like it better if it was just a “apt install” away ? Any DD can simply take a package in the wiki, open an ITP, upload it’s debianized source on Salsa, and upload to the Debian archive. Most are very easy simple packages to make.

Searching for a Ryzen 9, 16 cores, small laptop

The new 7945HX CPU from AMD is currently the most powerful. I’d love to have one of them, to replace the now aging 6 core Xeon that I’ve been using for more than 5 years. So, I’ve been searching for a laptop with that CPU.

Absolutely all of the laptops I found with this CPU also embed a very powerful RTX 40×0 series GPU, that I have no use: I don’t play games, and I don’t do AI. I just want something that builds Debian packages fast (like Ceph, that takes more than 1h to build for me…). The more cores I get, the faster all OpenStack unit tests are running too (stestr does a moderately good job at spreading the tests to all cores). That’d be ok if I had to pay more for a GPU that I don’t need, and I would have deal with the annoyance of the NVidia driver, if only I could find something with a correct size. But I can only find 16″ or bigger laptops, that wont fit in my scooter back case (most of the time, these laptops have an 17 inch screen: that’s a way too big).

Currently, I found:

  • Lenovo Legion Pro 5: screen is 16.8″
  • Dell Alienware m6: super heavy, 16″
  • Asus ROG Zephyrus Duo 16: 16″
  • MSI alpha (16 and 17): also 16″

If one of the readers of this post find a smaller laptop with a 7945HX CPU, please let me know! Even better if I can get rid of the expensive NVidia GPU.

My work during debcamp

I arrived in Prizren late on Wednesday. Here’s what I did during debcamp (so over 3 days). I hope this post just motivates others to contribute more to Debian.

At least 2 DDs want to upload packages that need a new version of python3-jsonschema (ie: version > 4.x). Unfortunately, version 4 broke a few packages. I therefore uploaded it to Experimental a few months/week, so I could see the result of autopkgtest reading the pseudo excuse page. And it showed a few packages broke. Here’s the one used (or part of) OpenStack:

  • Nova
  • Designate
  • Ironic
  • python-warlock
  • Sahara
  • Vitrage

Thanks to a reactive upstream, I was able to fix the first 4 above, but not Sahara yet. Vitrage poped-up when I uploade Debian release 2 of jsonschema, surprisingly. Also python3-jsonschema autopkgtest itself was broken because missing python3-pip in depends, but that should be fixed also.
I then filed bugs for packages not under my control:

  • bmtk
  • python-asdf

It looks tlike now there’s also spyder which wasn’t in the list a few hours ago. Maybe I should also file a bug against it. At this point, I don’t think the python-jsonschema transition is finished, but it’s on good tracks.

Then I also uploaded a new package of Ceph removing the ceph-mgr-diskprediction-local because it depended on python3-sklearn that the release team wanted to remove. I also prepared a point release update for it, but I’m currently waiting for the previous upload to migrate to testing before uploading the point release.

Last, I wrote the missing “update” command for extrepo, and pushed the merge request to Salsa. Now extrepo should be feature complete (at least from my point of view).

I also merged the patch for numberstation fixing the debian/copyright, and uploaded it to the NEW queue. It’s a new package that does 2 factor authentication, and is mobile friendly: it works perfectly on any Mobian powered phone.

Next, I intend to work with Arthur on the Cloud image finder. I hope we can find the time to work on it so it does what I need (ie: support the kind of setup I want to do, with HA, puppet, etc.).

OpenStack Xena, the 24th OpenStack release, is out

It was out at 3pm, and I managed to finish uploading the last bits to Unstable at 9pm… Of course, that’s because all of the packaging and testing work was done before the release date. All of it is, as usual, also available through a Bullseye non-official backports repository that can be added using extrepo (ie: “extrepo enable openstack_xena”).

Infomaniak launches its public IaaS cloud with ground breaking prices

My employer, the biggest Swiss server hosting company, Infomaniak, has just opened registration for its new IaaS (Infrastructure as a Service) OpenStack-based public cloud. Well, in fact, it’s been opened since a week or so. Previously, it was only in beta (during that beta period, we hosted (for free) the whole Debconf 21 infrastructure). Nothing really new in the market, except that it is by far cheaper than most (if not all) of its (OpenStack-based or not) competitors, including AWS, GCE or Azure.

Also, everything is hosted in Switzerland, in our own data centers, where data protection is written in the law (and Infomaniak often advertises about data privacy: this is real here…).

Not only Infomaniak is (by far…) the cheapest offer in the market (including a 300 CHF free tier: enough for our smallest VM for a full year), but we also have very good technical support, and the hardware we used is top notch:

  • 6th Gen NVMe (read intensive) Ceph-based block devices
  • AMD Epyc CPU (128 threads per server)
  • 2x 25Gbits/s (using BGP-to-the-host networking)

Some of our customers didn’t even believe how we could do such pricing. Well, the reason is simple: most of our competitors are simply really overpriced, and are making too much money. Since we’re late in the market, and that newer hardware (with many cores on a single server) makes is possible to increase density without too much over-commit, my bosses decided that since we could, we would be the cheapest! Hopefully, this will work as a good business strategy.

All of that public cloud infrastructure has been setup with OpenStack Cluster Installer for which I’m the main author, and that is fully in Debian. All of this is running on a plain, unmodified Debian Bullseye (well, with a few OpenStack packages a little bit more up-to-date, but really not much, and all of that is publicly available…).

Last, choosing the cheapest and best offer is also a good action: it promotes OpenStack and cloud computing in Debian, which I believe is the least vendor locked-in IaaS solution.

developers-reference needs love

During Debconf, Holger, who’s one of the developers-reference maintainers, made a quick presentation that was explaining the developers-reference needs some love. Indeed, it has gathered dust, and some useful refresh would be very welcome. Holger pointed at the list of bugs:
https://bugs.debian.org/src:developers-reference

After having a quick look into that list, after Holger’s Debconf presentation, I wrote to him on IRC:

<zigo> Many of the bugs you refered are indeed easily actionable, if all of us just try to help for one bug, that’d be a huge improvement of that doc.

Then, as I was waiting for the closing ceremony of Debconf, I thought I shouldn’t just say it, but actually do something about it. I decided to address https://bugs.debian.org/793633 as I thought it was easy. In just a few minutes, I was able to do a first patch, as seen here:

https://salsa.debian.org/debian/developers-reference/-/merge_requests/27

I wrote about it on IRC, and a few people helped with rephrasing what was there (thanks to Fil for correcting my English mistakes, and others for the content).

Today, which is 2 days after the MR was opened, I have decided it was long enough and actually merged it, as I considered it was enough time to gather comments. So we now have a brand new shiny chapter about Backports and how to handle them. I’m sure that new part is perfectible, so do not hesitate, and do patch what I just wrote if you feel like you can do better.

If I’m writing this blog post, this is not to promote myself. The goal is to promote the developers-reference manual and push others in Debian to do the same. Please do what Holger suggested, and what I just did: contribute to the document by addressing just one of the currently opened bugs. If all DDs do it, we’ll get a much nicer document, and help others to contribute to Debian.

This is going to take less than 30 minutes of your time, and it is very much ok if you do this only once. It is really easy: just clone https://salsa.debian.org/debian/developers-reference/ and write a patch. If you’re a DD, you can even merge your patch yourself once you’re satisfied with it.

Puppet and OS detection

As you may know, Puppet uses “facter” to get facts about the machine it is about to configure. That’s fine, and a nice concept. One can later use variables in a puppet manifest to do different things depending on what facter tells. For example, the operating system name … oh no! This thing is really stupid … Here’s the code one has to do to be compatible with puppet from version 3 up to 5:

if $::lsbdistcodename == undef{
# This works around differences between facter versions
if $facts['os']['lsb'] != undef{
$distro_codename = $facts['os']['lsb']['distcodename']
}else{
$distro_codename = $facts['os']['distro']['codename']
}
}else{
$distro_codename = downcase($::lsbdistcodename)
}

Indeed, the global variable $::lsbdistcodename still existed up to Stretch (and is gone in Buster). The global $::facts wasn’t an array before (but a hash), so in Jessie, it breaks with the error message “facts is not a hash or array when accessing it with os”. So, one need the full code above to make this work.

It’s ok to improve things. It is NOT OK to break os detection. To me it is a very bad practice from upstream Puppet authors. I’m publishing this in the hope to avoid others to fall in the same trap as I did.