Packaging Home Assistant

During Debconf, Edward Betts and myself started packaging Home Assistant for Debian. It consists of hundreds of Python packages. So far, we counted at least 675 packages. That’s a lot, though most packages are just libraries to talk with some IoT devices and some APIs. It’s fairly easy to create a new package: it takes me about 15 to 20 minutes, probably half that time to Edward. And it’s a lot of fun. So far in one month of time, we managed to package about 1 third of the list (probably 200+ Python packages already). Once we’ve done all the dependencies, we may start to have fun with the core of the application! At the current speed, hopefully we’ll be done before the end of the year. Edward and myself have swear to make at least one package a day, which I’ve been doing so far, and Edward did a way more… We also received contributions from Silton0506, Tianyu, piotr, EiPi Fun, sourabhtk37, and Count-Dracula, as per the very bottom of the TODO list in the wiki (see link below).

If you have a bit of free time, we’d love to have more contributors. Here’s were to get the needed information:

We created a team in Salsa: https://salsa.debian.org/homeassistant-team/

Our TODO list: https://wiki.debian.org/Python/HomeAssistant

Our DDPO Q/A page: https://qa.debian.org/developer.php?login=team%2Bhomeassistant%40tracker.debian.org

Feel free to join us on IRC: #debian-homeassistant

Discussing with a lot of people about it, I realized that A LOT of DDs are actually using Home Assistant. Wouldn’t you like it better if it was just a “apt install” away ? Any DD can simply take a package in the wiki, open an ITP, upload it’s debianized source on Salsa, and upload to the Debian archive. Most are very easy simple packages to make.

Searching for a Ryzen 9, 16 cores, small laptop

The new 7945HX CPU from AMD is currently the most powerful. I’d love to have one of them, to replace the now aging 6 core Xeon that I’ve been using for more than 5 years. So, I’ve been searching for a laptop with that CPU.

Absolutely all of the laptops I found with this CPU also embed a very powerful RTX 40×0 series GPU, that I have no use: I don’t play games, and I don’t do AI. I just want something that builds Debian packages fast (like Ceph, that takes more than 1h to build for me…). The more cores I get, the faster all OpenStack unit tests are running too (stestr does a moderately good job at spreading the tests to all cores). That’d be ok if I had to pay more for a GPU that I don’t need, and I would have deal with the annoyance of the NVidia driver, if only I could find something with a correct size. But I can only find 16″ or bigger laptops, that wont fit in my scooter back case (most of the time, these laptops have an 17 inch screen: that’s a way too big).

Currently, I found:

  • Lenovo Legion Pro 5: screen is 16.8″
  • Dell Alienware m6: super heavy, 16″
  • Asus ROG Zephyrus Duo 16: 16″
  • MSI alpha (16 and 17): also 16″

If one of the readers of this post find a smaller laptop with a 7945HX CPU, please let me know! Even better if I can get rid of the expensive NVidia GPU.

My work during debcamp

I arrived in Prizren late on Wednesday. Here’s what I did during debcamp (so over 3 days). I hope this post just motivates others to contribute more to Debian.

At least 2 DDs want to upload packages that need a new version of python3-jsonschema (ie: version > 4.x). Unfortunately, version 4 broke a few packages. I therefore uploaded it to Experimental a few months/week, so I could see the result of autopkgtest reading the pseudo excuse page. And it showed a few packages broke. Here’s the one used (or part of) OpenStack:

  • Nova
  • Designate
  • Ironic
  • python-warlock
  • Sahara
  • Vitrage

Thanks to a reactive upstream, I was able to fix the first 4 above, but not Sahara yet. Vitrage poped-up when I uploade Debian release 2 of jsonschema, surprisingly. Also python3-jsonschema autopkgtest itself was broken because missing python3-pip in depends, but that should be fixed also.
I then filed bugs for packages not under my control:

  • bmtk
  • python-asdf

It looks tlike now there’s also spyder which wasn’t in the list a few hours ago. Maybe I should also file a bug against it. At this point, I don’t think the python-jsonschema transition is finished, but it’s on good tracks.

Then I also uploaded a new package of Ceph removing the ceph-mgr-diskprediction-local because it depended on python3-sklearn that the release team wanted to remove. I also prepared a point release update for it, but I’m currently waiting for the previous upload to migrate to testing before uploading the point release.

Last, I wrote the missing “update” command for extrepo, and pushed the merge request to Salsa. Now extrepo should be feature complete (at least from my point of view).

I also merged the patch for numberstation fixing the debian/copyright, and uploaded it to the NEW queue. It’s a new package that does 2 factor authentication, and is mobile friendly: it works perfectly on any Mobian powered phone.

Next, I intend to work with Arthur on the Cloud image finder. I hope we can find the time to work on it so it does what I need (ie: support the kind of setup I want to do, with HA, puppet, etc.).

OpenStack Xena, the 24th OpenStack release, is out

It was out at 3pm, and I managed to finish uploading the last bits to Unstable at 9pm… Of course, that’s because all of the packaging and testing work was done before the release date. All of it is, as usual, also available through a Bullseye non-official backports repository that can be added using extrepo (ie: “extrepo enable openstack_xena”).

Infomaniak launches its public IaaS cloud with ground breaking prices

My employer, the biggest Swiss server hosting company, Infomaniak, has just opened registration for its new IaaS (Infrastructure as a Service) OpenStack-based public cloud. Well, in fact, it’s been opened since a week or so. Previously, it was only in beta (during that beta period, we hosted (for free) the whole Debconf 21 infrastructure). Nothing really new in the market, except that it is by far cheaper than most (if not all) of its (OpenStack-based or not) competitors, including AWS, GCE or Azure.

Also, everything is hosted in Switzerland, in our own data centers, where data protection is written in the law (and Infomaniak often advertises about data privacy: this is real here…).

Not only Infomaniak is (by far…) the cheapest offer in the market (including a 300 CHF free tier: enough for our smallest VM for a full year), but we also have very good technical support, and the hardware we used is top notch:

  • 6th Gen NVMe (read intensive) Ceph-based block devices
  • AMD Epyc CPU (128 threads per server)
  • 2x 25Gbits/s (using BGP-to-the-host networking)

Some of our customers didn’t even believe how we could do such pricing. Well, the reason is simple: most of our competitors are simply really overpriced, and are making too much money. Since we’re late in the market, and that newer hardware (with many cores on a single server) makes is possible to increase density without too much over-commit, my bosses decided that since we could, we would be the cheapest! Hopefully, this will work as a good business strategy.

All of that public cloud infrastructure has been setup with OpenStack Cluster Installer for which I’m the main author, and that is fully in Debian. All of this is running on a plain, unmodified Debian Bullseye (well, with a few OpenStack packages a little bit more up-to-date, but really not much, and all of that is publicly available…).

Last, choosing the cheapest and best offer is also a good action: it promotes OpenStack and cloud computing in Debian, which I believe is the least vendor locked-in IaaS solution.

developers-reference needs love

During Debconf, Holger, who’s one of the developers-reference maintainers, made a quick presentation that was explaining the developers-reference needs some love. Indeed, it has gathered dust, and some useful refresh would be very welcome. Holger pointed at the list of bugs:
https://bugs.debian.org/src:developers-reference

After having a quick look into that list, after Holger’s Debconf presentation, I wrote to him on IRC:

<zigo> Many of the bugs you refered are indeed easily actionable, if all of us just try to help for one bug, that’d be a huge improvement of that doc.

Then, as I was waiting for the closing ceremony of Debconf, I thought I shouldn’t just say it, but actually do something about it. I decided to address https://bugs.debian.org/793633 as I thought it was easy. In just a few minutes, I was able to do a first patch, as seen here:

https://salsa.debian.org/debian/developers-reference/-/merge_requests/27

I wrote about it on IRC, and a few people helped with rephrasing what was there (thanks to Fil for correcting my English mistakes, and others for the content).

Today, which is 2 days after the MR was opened, I have decided it was long enough and actually merged it, as I considered it was enough time to gather comments. So we now have a brand new shiny chapter about Backports and how to handle them. I’m sure that new part is perfectible, so do not hesitate, and do patch what I just wrote if you feel like you can do better.

If I’m writing this blog post, this is not to promote myself. The goal is to promote the developers-reference manual and push others in Debian to do the same. Please do what Holger suggested, and what I just did: contribute to the document by addressing just one of the currently opened bugs. If all DDs do it, we’ll get a much nicer document, and help others to contribute to Debian.

This is going to take less than 30 minutes of your time, and it is very much ok if you do this only once. It is really easy: just clone https://salsa.debian.org/debian/developers-reference/ and write a patch. If you’re a DD, you can even merge your patch yourself once you’re satisfied with it.

Puppet and OS detection

As you may know, Puppet uses “facter” to get facts about the machine it is about to configure. That’s fine, and a nice concept. One can later use variables in a puppet manifest to do different things depending on what facter tells. For example, the operating system name … oh no! This thing is really stupid … Here’s the code one has to do to be compatible with puppet from version 3 up to 5:

if $::lsbdistcodename == undef{
# This works around differences between facter versions
if $facts['os']['lsb'] != undef{
$distro_codename = $facts['os']['lsb']['distcodename']
}else{
$distro_codename = $facts['os']['distro']['codename']
}
}else{
$distro_codename = downcase($::lsbdistcodename)
}

Indeed, the global variable $::lsbdistcodename still existed up to Stretch (and is gone in Buster). The global $::facts wasn’t an array before (but a hash), so in Jessie, it breaks with the error message “facts is not a hash or array when accessing it with os”. So, one need the full code above to make this work.

It’s ok to improve things. It is NOT OK to break os detection. To me it is a very bad practice from upstream Puppet authors. I’m publishing this in the hope to avoid others to fall in the same trap as I did.

The Gnocchi package in Debian

This is a follow-up from the blog post of Russel as seen here: https://etbe.coker.com.au/2020/10/13/first-try-gnocchi-statsd/. There’s a bunch of things he wrote which I unfortunately must say is inaccurate, and sometimes even completely wrong. It is my point of view that none of the reported bugs are helpful for anyone that understand Gnocchi and how to set it up. It’s however a terrible experience that Russell had, and I do understand why (and why it’s not his fault). I’m very much open on how to fix this on the packaging level, though some things aren’t IMO fixable. Here’s the details.

1/ The daemon startups

First of all, the most surprising thing is when Russell claimed that there’s no startup scripts for the Gnocchi daemons. In fact, they all come with both systemd and sysv-rc support:

# ls /lib/systemd/system/gnocchi-api.service
/lib/systemd/system/gnocchi-api.service
# /etc/init.d/gnocchi-api
/etc/init.d/gnocchi-api

Russell then tried to start gnocchi-api without the good options that are set in the Debian scripts, and not surprisingly, this failed. Russell attempted to do what was in the upstream doc, which isn’t adapted to what we have in Debian (the upstream doc is probably completely outdated, as Gnocchi is unfortunately not very well maintained upstream).

The bug #972087 is therefore, IMO not valid.

2/ The database setup

By default for all things OpenStack in Debian, there are some debconf helpers using dbconfig-common to help users setup database for their services. This is clearly for beginners, but that doesn’t prevent from attempting to understand what you’re doing. That is, more specifically for Gnocchi, there are 2 databases: one for Gnocchi itself, and one for the indexer, which not necessarily is using the same backend. The Debian package already setups one database, but one has to do it manually for the indexer one. I’m sorry this isn’t well enough documented.

Now, if some package are supporting sqlite as a backend (since most things in OpenStack are using SQLAlchemy), it looks like Gnocchi doesn’t right now. This is IMO a bug upstream, rather than a bug in the package. However, I don’t think the Debian packages are to be blame here, as they simply offer a unified interface, and it’s up to the users to know what they are doing. SQLite is anyway not a production ready backend. I’m not sure if I should close #971996 without any action, or just try to disable the SQLite backend option of this package because it may be confusing.

3/ The metrics UUID

Russell then thinks the UUID should be set by default. This is probably right in a single server setup, however, this wouldn’t work setting-up a cluster, which is probably what most Gnocchi users will do. In this type of environment, the metrics UUID must be the same on the 3 servers, and setting-up a random (and therefore different) UUID on the 3 servers wouldn’t work. So I’m also tempted to just close #972092 without any action on my side.

4/ The coordination URL

Since Gnocchi is supposed to be setup with more than one server, as in OpenStack, having an HA setup is very common, then a backend for the coordination (ie: sharing the workload) must be set. This is done by setting an URL that tooz understand. The best coordinator being Zookeeper, something like this should be set by hand:

coordination_url=zookeeper://192.168.101.2:2181/

Here again, I don’t think the Debian package is to be blamed for not providing the automation. I would however accept contributions to fix this and provide the choice using debconf, however, users would still need to understand what’s going on, and setup something like Zookeeper (or redis, memcache, or any other backend supported by tooz) to act as coordinator.

5/ The Debconf interface cannot replace a good documentation

… and there’s not so much I can do at my package maintainer level for this.

Russell, I’m really sorry for the bad user experience you had with Gnocchi. Now that you know a little big more about it, maybe you can have another go? Sure, the OpenStack telemetry system isn’t an easy to understand beast, but it’s IMO worth trying. And the recent versions can scale horizontally…

A quick look into Storcli packaging horror

So, Megacli is to be replaced by Storcli, both being proprietary tools for configuring RAID cards from LSI.

So I went to download what’s provided by Lenovo, available here:
https://support.lenovo.com/fr/en/downloads/ds041827

It’s very annoying, because they force users to download a .zip file containing a deb file, instead of providing a Debian repository. Well, ok, though at least there’s a deb file there. Let’s have a look what’s using my favorite tool before installing (ie: let’s run Lintian).
Then it’s a horror story. Not only there’s obvious packaging wrong, like the package provide stuff in /opt, and all is statically linked and provide embedded copies of libm and ncurses, or even the package is marked arch: all instead of arch: amd64 (in fact, the package contains both i386 and amd64 arch files…), but there’s also some really wrong things going on:

E: storcli: arch-independent-package-contains-binary-or-object opt/MegaRAID/storcli/storcli
E: storcli: embedded-library opt/MegaRAID/storcli/storcli: libm
E: storcli: embedded-library opt/MegaRAID/storcli/storcli: ncurses
E: storcli: statically-linked-binary opt/MegaRAID/storcli/storcli
E: storcli: arch-independent-package-contains-binary-or-object opt/MegaRAID/storcli/storcli64
E: storcli: embedded-library opt/MegaRAID/storcli/storcli64: libm
E: storcli: embedded-library … use –no-tag-display-limit to see all (or pipe to a file/program)
E: storcli: statically-linked-binary opt/MegaRAID/storcli/storcli64
E: storcli: changelog-file-missing-in-native-package
E: storcli: control-file-has-bad-permissions postinst 0775 != 0755
E: storcli: control-file-has-bad-owner postinst asif/asif != root/root
E: storcli: control-file-has-bad-permissions preinst 0775 != 0755
E: storcli: control-file-has-bad-owner preinst asif/asif != root/root
E: storcli: no-copyright-file
E: storcli: extended-description-is-empty
W: storcli: essential-no-not-needed
W: storcli: unknown-section storcli
E: storcli: depends-on-essential-package-without-using-version depends: bash
E: storcli: wrong-file-owner-uid-or-gid opt/ 1000/1000
W: storcli: non-standard-dir-perm opt/ 0775 != 0755
E: storcli: wrong-file-owner-uid-or-gid opt/MegaRAID/ 1000/1000
E: storcli: dir-or-file-in-opt opt/MegaRAID/
W: storcli: non-standard-dir-perm opt/MegaRAID/ 0775 != 0755
E: storcli: wrong-file-owner-uid-or-gid opt/MegaRAID/storcli/ 1000/1000
E: storcli: dir-or-file-in-opt opt/MegaRAID/storcli/
W: storcli: non-standard-dir-perm opt/MegaRAID/storcli/ 0775 != 0755
E: storcli: wrong-file-owner-uid-or-gid … use –no-tag-display-limit to see all (or pipe to a file/program)
E: storcli: dir-or-file-in-opt opt/MegaRAID/storcli/storcli
E: storcli: dir-or-file-in-opt … use –no-tag-display-limit to see all (or pipe to a file/program)

Some of the above are grave security problems, like wrong Unix mode for folders, even with the preinst script installed as non-root.
I always wonder why this type of tool needs to be proprietary. They clearly don’t know how to get packaging right, so they’d better just provide the source code, and let us (the Debian community) do the work for them. I don’t think there’s any secret that they are keeping by hiding how to configure the cards, so it’s not in the vendor’s interest to keep everything closed. Or maybe they are just hiding really bad code in there, that they are ashamed to share? In any way, they’d better not provide any package than this pile of dirt (and I’m trying to stay polite here…).

Upgrading an OpenStack Rocky cluster from Stretch to Buster

Upgrading an OpenStack cluster from one version of OpenStack to another has become easier, thanks to the versioning of objects in the rabbitmq message bus (if you want to know more, see what oslo.versionedobjects is). But upgrading from Stretch to Buster isn’t easy at all, event with the same version of OpenStack (it is easier to be running OpenStack Rocky backports on Stretch and upgrade to Rocky on Buster, rather than upgrading OpenStack at the same time as the system).

The reason it is difficult, is because rabbitmq and corosync in Stretch can’t talk to the versions shipped in Buster. Also, in a normal OpenStack cluster deployment, services on all machines are constantly doing queries to the OpenStack API, and exchanging messages through the RabbitMQ message bus. One of the dangers, for example, would be if a Neutron DHCP agent could not exchange messages with the neutron-rpc-server. Your VM instances in the OpenStack cluster then could loose connectivity.

If a constantly online HA upgrade with no downtime isn’t possible, it is however possible to minimize down time to just a few seconds, if following a correct procedure. It took me more than 10 tries to be able to do everything in a smooth way, understanding and working around all the issues. 10 tries, means installing 10 times an OpenStack cluster in Stretch (which, even if fully automated, takes about 2 hours) and trying to upgrade it to Buster. All of this is very time consuming, and I haven’t seen any web site documenting this process.

This blog post intends to document such a process, to save the readers the pain of hours of experimentation.

Note that this blog post asserts you’re cluster has been deployed using OCI (see: https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer) however, it should also apply to any generic OpenStack installation, or even to any cluster running RabbitMQ and Corosync.

The root cause of the problem more in details: incompatible RabbitMQ and Corosync in Stretch and Buster

RabbitMQ in Stretch is version 3.6.6, and Buster has version 3.7.8. In theory, the documentation of RabbitMQ says it is possible to smoothly upgrade a cluster with these versions. However, in practice, the problem is the Erlang version rather than Rabbit itself: RabbitMQ in Buster will refuse to talk to a cluster running Stretch (the daemon will even refuse to start).

The same way, Corosync 3.0 in Buster will refuse to accept messages from Corosync 2.4 in Stretch.

Overview of the solution for RabbitMQ & Corosync

To minimize downtime, my method is to shutdown RabbitMQ on node 1, and let all daemons (re-)connect to node 2 and 3. Then we upgrade node 1 fully, and then restart Rabbit in there. Then we shutdown Rabbit on node 2 and 3, so that all daemons of the cluster reconnect to node 1. If done well, the only issue is if a message is still in the cluster of node 2 and 3 when daemons fail-over to node 1. In reality, this isn’t really a problem, unless there’s a lot of activity on the API of OpenStack. If this was the case (for example, if running a public cloud), then the advise would simply to firewall the OpenStack API for the short upgrade period (which shouldn’t last more than a few minutes).

Then we upgrade node 2 and 3 and make them join the newly created RabbitMQ cluster in node 1.

For Corosync, node 1 will not allow start the VIP resource before node 2 is upgraded and both nodes can talk to each other. So we just upgrade node 2, and turn off the VIP resource on node 3 immediately when it is up on node 1 and 2 (which happens during the upgrade of node 2).

The above should be enough reading for most readers. If you’re not that much into OpenStack, it’s ok to stop reading this post. For those who are move involved users of OpenStack on Debian deployed with OCI, let’s go more in details…

Before you start: upgrading OCI

In previous versions of OCI, the haproxy configuration was missing a “option httpcheck” for the MariaDB backend, and therefore, if a MySQL server on one node was going down, haproxy wouldn’t detect it, and the whole cluster could fail (re-)connecting to MySQL. As we’re going to bring some MySQL servers down, make sure the puppet-master is running with the latest version of puppet-module-oci, and that the changes have been applied in all OpenStack controller nodes.

Upgrading compute nodes

Before we upgrade the controllers, it’s best to start by compute nodes, which are the most easy to do. The easiest way is to live-migrate all VMs away from the machine before proceeding. First, we disable the node, so no new VM can be spawned on it:

openstack compute service set --disable z-compute-1.example.com nova-compute

Then we list all VMs on that compute node:

openstack server list –all-projects –host z-compute-1.example.com

Finally we migrate all VMs away:

openstack server migrate --live hostname-compute-3.infomaniak.ch --block-migration 8dac2f33-d4fd-4c11-b814-5f6959fe9aac

Now we can do the upgrade. First disable pupet, then tweak the sources.list, upgrade and reboot:

puppet agent --disable "Upgrading to buster"
apt-get remove python3-rgw python3-rbd python3-rados python3-cephfs librgw2 librbd1 librados2 libcephfs2
rm /etc/apt/sources.list.d/ceph.list
sed -i s/stretch/buster/g /etc/apt/sources.list
mv /etc/apt/sources.list.d/stretch-rocky.list /etc/apt/sources.list.d/buster-rocky.list
echo "deb http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main
deb-src http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main" >/etc/apt/sources.list/buster-rocky.list
apt-get update
apt-get dist-upgrade
reboot

Then we simply re-apply puppet:

puppet agent --enable ; puppet agent -t
apt-get purge linux-image-4.19.0-0.bpo.5-amd64 linux-image-4.9.0-9-amd64

Then we can re-enable the compute service:

openstack compute service set --enable z-compute-1.example.com nova-compute

Repeate the operation for all compute nodes, then we’re ready for the upgrade of controller nodes.

Removing Ceph dependencies from nodes

Most likely, if running with OpenStack Rocky on Stretch, you’d be running with upstream packages for Ceph Luminous. When upgrading to Buster, there’s no upstream repository anymore, and packages will use Ceph Luminous directly from Buster. Unfortunately, the packages from Buster are in a lower version than the packages from upstream. So before upgrading, we must remove all Ceph packages from upstream. This is what has been done just above for the compute nodes also. Upstream Ceph packages are easily identifiable, because upstream uses “bpo90” instead of what we do in Debian (ie: bpo9), so the operation can be:

apt-get remove $(dpkg -l | grep bpo90 | awk '{print $2}' | tr '\n' ' ')

This will remove python3-nova, which is fine as it is also running on the other 2 controllers. After switching the /etc/apt/sources.list to buster, Nova can be installed again.

In a normal setup by OCI, here’s the sequence of command that needs to be done:

rm /etc/apt/sources.list.d/ceph.list
sed -i s/stretch/buster/g /etc/apt/sources.list
mv /etc/apt/sources.list.d/stretch-rocky.list /etc/apt/sources.list.d/buster-rocky.list
echo "deb http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main
deb-src http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main" >/etc/apt/sources.list/buster-rocky.list
apt-get update
apt-get dist-upgrade
apt-get install nova-api nova-conductor nova-consoleauth nova-consoleproxy nova-placement-api nova-scheduler

You may notice that we’re replacing the Stretch Rocky backports repository by one for Buster. Indeed, even if all of Rocky is in Buster, there’s a few packages that are still pending for the review of the Debian stable release team before they can be uploaded to Buster, and we need the fixes for a smooth upgrade. See release team bugs #942201, #942102, #944594, #941901 and #939036 for more details.

Also, since we only did a “apt-get remove”, the Nova configuration in nova.conf must have stayed, and nova is already configured, so when we reinstall the services we removed when removing the Ceph dependencies, they will be ready to go.

Upgrading the MariaDB galera cluster

In an HA OpenStack cluster, typically, a Galera MariaDB cluster is used. That isn’t a problem when upgrading from Stretch to Buster, because the on-the-wire format stays the same. However, the xtrabackup library in Stretch is held by the MariaDB packages themselves, while in Buster, one must install the mariadb-backup. As a consequence, best is to simply turn off MariaDB in a node, do the Buster upgrade, install the mariadb-backup package, and restart MariaDB. To avoid that the MariaDB package attempts restarting the mysqld daemon, best is to mask the systemd unit:

systemctl stop mysql.service
systemctl disable mysql.service
systemctl mask mysql.service

Upgrading rabbitmq-server

Before doing anything, make sure all of your cluster is running with the python3-oslo.messaging version >= 8.1.4. Indeed, version 8.1.3 suffers from a bug where daemons would attempt reconnect constantly to the same server, instead of trying each of the servers described in the transport_url directive. Note that I’ve uploaded 8.1.4-1+deb10u1 to Buster, and that it is part of the 10.2 Buster point release. Though upgrading oslo.messaging will not restart daemons automatically: this must be done manually.

The strategy for RabbitMQ is to completely upgrade one node, start Rabbit on it, without any clustering, then shutdown the service on the other 2 node of the cluster. If this is performed fast enough, no message will be list in the message bus. However, there’s a few traps. Running “rabbitmqctl froget_cluster_node” only removes a node from the cluster for those who will still be running. It doesn’t remove the other nodes from the one which we want to upgrade. The way I’ve found to solve this is to simply remove the mnesia database of the first node, so that when it starts, RabbitMQ doesn’t attempt to cluster with the other 2 which are running a different version of Erlang. If it did, then it would just fail and refused to start.

However, there’s another issue to take care. When upgrading the 1st node to Buster, we removed Nova, because of the Ceph issue. Before we restart the RabbitMQ service on node 1, we need to install Nova, so that it will connect to either node 2 or 3. If we don’t do that, then Nova on node 1 may connect to the RabbitMQ service on node 1, which at this point, is a different RabbitMQ cluster than the one in node 2 and 3.

rabbitmqctl stop_app
systemctl stop rabbitmq-server.service
systemctl disable rabbitmq-server.service
systemctl mask rabbitmq-server.service
[ ... do the Buster upgrade fully ...]
[ ... reinstall Nova services we removed when removing Ceph ...]
rm -rf /var/lib/rabbitmq/mnesia
systemctl unmask rabbitmq-server.service
systemctl enable rabbitmq-server.service
systemctl start rabbitmq-server.service

At this point, since the node 1 RabbitMQ service was down, all daemons are connected to the RabbitMQ service on node 2 or 3. Removing the mnesia database removes all the credentials previously added to rabbitmq. If nothing is done, OpenStack daemons will not be able to connect to the RabbitMQ service on node 1. If like I do, one is using a config management system to populate the access rights, it’s rather easy: simply re-apply the puppet manifests, which will re-add the credentials. However, that isn’t enough: the RabbitMQ message queues are created when the OpenStack daemon starts. As I experienced, daemons will reconnect to the message bus, but will not recreate the queues unless daemons are restarted. Therefore, the sequence is as follow:

Do “rabbitmqctl start_app” on the first node. Add all credentials to it. If your cluster was setup with OCI and puppet, simply look at the output of “puppet agent -t –debug” to capture the list of commands to perform the credential setup.

Do a “rabbitmqctl stop_app” on both remaining nodes 2 and 3. At this point, all daemons will reconnect to the only remaining server. However, they wont be able to exchange messages, as the queues aren’t declared. This is when we must restart all daemons in one of the controllers. The whole operation normally doesn’t take more than a few seconds, which is how long your message bus wont be available. To make sure everything works, check the logs in /var/log/nova/nova-compute.log of one of your compute nodes to make sure Nova is able to report its configuration to the placement service.

Once all of this is done, there’s nothing to worry anymore about RabbitMQ, as all daemons of the cluster are connected to the service on node 1. However, one must make sure that, when upgrading node 2 and 3, they don’t reconnect to the message service on node 2 and 3. So best is to simply stop, disable and mask the service with systemd before continuing. Then, when restarting the Rabbit service on node 2 and 3, OCI’s shell script “oci-auto-join-rabbitmq-cluster” will make them join the new Rabbit cluster, and everything should be fine regarding the message bus.

Upgrading corosync

In an OpenStack cluster setup by OCI, 3 controllers are typically setup, serving the OpenStack API through a VIP (a Virtual IP). What we call a virtual IP is simply an IP address which is able to move from one node to another automatically depending on the cluster state. For example, with 3 nodes, if one goes down, one of the other 2 nodes will take over hosting the IP address which serves the OpenStack API. This is typically done with corosync/pacemaker, which is what OCI sets up.

The way to upgrade corosync is easier than the RabbitMQ case. The first node will refuse to start the corosync resource if it can’t talk to at least a 2nd node. Therefore, upgrading the first node is transparent until we touch the 2nd node: the openstack-api resource wont be started on the first node, so we can finish the upgrade in it safely (ie: take care of RabbitMQ as per above). The first thing to do is probably to move the resource to the 3rd node:

crm_resource --move --resource openstack-api-vip --node z-controller-3.example.com

Once the first node is completely upgraded, we upgrade the 2nd node. When it is up again, we can check the corosync status to make sure it is running on both node 1 and 2:

crm status

If we see the service is up on node 1 and 2, we must quickly shutdown the corosync resource on node 3:

crm resource stop openstack-api-vip

If that’s not done, then node 3 may also reclaim the VIP, and therefore, 2 nodes may it. If running with the VIP using L2 protocol, normally switches will connect only one of the machines declaring the VIP, so even if we don’t take care of it immediately, the upgrade should be smooth anyway. If, like I do in production, you’re running with BGP (OCI allows one to use BGP for the VIP, or simply use an IP on a normal L2 network), then the situation must be even better, as the peering router will continue to route to one of the controllers in the cluster. So no stress, this must be done, but no need to hurry as much as for the RabbitMQ service.

Finalizing the upgrade

Once node 1 and 2 are up, most of the work is done, and the 3rd node can be upgraded without any stress.

Recap of the procedure for controllers

  • Move all SNAT virtual routers running on node 1 to node 2 or 3 (note: this isn’t needed if the cluster has network nodes).
  • Disable puppet on node 1.
  • Remove all Ceph libraries from upstream on node 1, which also turn off some Nova services that runtime depend on them.
  • shutdown rabbitmq on node 1, including masking the service with systemd.
  • upgrade node 1 to Buster, fully. Then reboot it. This probably will trigger MySQL re-connections to node 2 or 3.
  • install mariadb-backup, start the mysql service, and make sure MariaDB is in sync with the other 2 nodes (check the log files).
  • reinstall missing Nova services on node 1.
  • remove the mnesia db on node 1.
  • start rabbitmq on node 1 (which now, isn’t part of the RabbitMQ cluster on node 2 and 3).
  • Disable puppet on node 2.
  • populate RabbitMQ access rights on node 1. This can be done by simply applying puppet, but may be dangerous if puppet restarts the OpenStack daemons (which therefore may connect to the RabbitMQ on node 1), so best is to just re-apply the grant access commands only.
  • shutdown rabbitmq on node 2 and 3 using “rabbitmqctl stop_app”.
  • quickly restart all daemons on one controller (for example the daemons on node 1) to declare message queues. Now all daemons must be reconnected and working with the RabbitMQ cluster on node 1 alone.
  • Re-enable puppet, and re-apply puppet on node 1.
  • Move all Neutron virtual routers from node 2 to node 1.
  • Make sure the RabbitMQ services are completely stopped on node 2 and 3 (mask the service with systemd).
  • upgrade node 2 to Buster (shutting down RabbitMQ completely, masking the service to avoid it restarts during upgrade, removing the mnesia db for RabbitMQ, and finally making it rejoin the newly node 1 single node cluster using oci-auto-join-rabbitmq-cluster: normally, puppet does that for us).
  • Reboot node 2.
  • When corosync on node 2 is up again, check corosync status to make sure we are clustering between node 1 and 2 (maybe the resource on node 1 needs to be started), and shutdown the corosync “openstack-api-vip” resource on node 3 to avoid the VIP to be declared on both nodes.
  • Re-enable puppet and run puppet agent -t on node 2.
  • Make node 2 rabbitmq-server has joined the new cluster declared on node 1 (do: rabbitmqctl cluster_status) so we have HA for Rabbit again.
  • Move all Neutron virtual routers of node 3 to node 1 or 2.
  • Upgrade node 3 fully, reboot it, and make sure Rabbit is connected to node 1 and 2, as well as corosync working too, then re-apply puppet again.

Note that we do need to re-apply puppet each time, because of some differences between Stretch and Buster. For example, Neutron in Rocky isn’t able to use iptables-nft, and puppet needs to run some update-alternatives command to select iptables-legacy instead (I’m writing this because this isn’t obvious, it’s just that sometimes, Neutron fails to parse the output of iptables-nft…).

Last words as a conclusion

While OpenStack itself has made a lot of progress for the upgrade, it is very disappointing that those components on which OpenStack relies (like corosync, who is typically used as the provider of high availability), aren’t designed with backward compatibility in mind. It is also disappointing that the Erlang versions in Stretch and Buster are incompatible this way.

However, with the correct procedure, it’s still possible to keep services up and running, with a very small down time, even to the point that a public cloud user wouldn’t even notice it.

As the procedure isn’t easy, I strongly suggest anyone attempting such an upgrade to train before proceeding. With OCI, it is easy to do run a PoC using the openstack-cluster-installer-poc package, which is the perfect environment to train on: it’s easy to reproduce, reinstall a cluster and restart the upgrade procedure.