OpenStack-cluster-installer in Buster

I’ve been working on this for more than a year, and finally, I am acheiving my goal. I wrote a OpenStack cluster installer that is fully in Debian, and running in production for Infomaniak.

Note: I originally wrote this blog post a few weeks ago, though it was pending validation from my company (to make sure I wouldn’t disclose company business information).

What is it?

As per the package description and the package name, OCI (OpenStack Cluster Installer) is a software to provision an OpenStack cluster automatically, with a “push button” interface. The OCI package depends on a DHCP server, a PXE (tftp-hpa) boot server, a web server, and a puppet-master.

Once computers in the cluster boot for the first time over network (PXE boot), a Debian live system squashfs image is served by OCI (served by Apache), to act as a discovery image. This live system then reports the hardware features of the booted machine back to OCI (CPU, memory, HDDs, network interfaces, etc.). The computers can then be installed with Debian from that live system. During this process, a puppet-agent is configured so that it will connect to the puppet-master of OCI. Uppong first boot, OpenStack services are then installed and configured, depending on the server role in the cluster.

OCI is fully packaged in Debian, including all of the Puppet modules and so on. So just doing “apt-get install openstack-cluster-installer” is enough to bring absolutely all dependencies, and no other artifact are needed. This is very important so one only needs a local Debian mirror to install an OpenStack cluster. No external components must be downloaded from internet.

OCI setting-up a Swift cluster

At the begining of OCI’s life, we first used it at Infomaniak (my employer) to setup a Swift cluster. Swift is the object server of OpenStack. It is perfect solution for a (very) large backup system.

Think of a massive highly available cluster, with a capacity reaching peta bytes, storing millions of objects/files 3 times (for redundancy). Swift can virtually scale to infinity as long as you size your ring correctly.

The Infomaniak setup is also redundant at the data center level, as our cluster spans over 2 data centers, with at least one copy everything stored on each data center (the location of the 3rd copy depends on many things, and explaining it is not in the scope of this post).

If one wishes to use swift, it’s ok to start with 7 machines to begin with: 3 machines for the controller (holding the Keystone authentication, and a bit more), at least 1 swift-proxy machine, and 3 storage nodes. Though for redundancy purpose, it is IMO not good enough to start with only 3 storage node: if one fails, the proxy server will fall into timeouts waiting for the 3rd storage node. So 6 storage nodes feels like a better minimum. Though it doesn’t have to be top-noch servers, a cluster made of refurbished old hardware with only a few disks can do it, if you don’t need to store too much data.

Setting-up an OpenStack compute cluster

Though swift was the first thing OCI did for us, it now can do a way more than just Swift. Indeed, it can also setup a full OpenStack cluster with Nova (compute), Neutron (networking) and Cinder (network block devices). We also started using all of that, setup by OCI, at Infomaniak. Here’s the list services currently supported:

  • Keystone (identity)
  • Heat (orchestration)
  • Aodh (alarming)
  • Barbican (key/secret manager)
  • Nova (compute)
  • Glance (VM images)
  • Swift (object store)
  • Panko (event)
  • Ceilometer (resource monitoring)
  • Neutron (networking)
  • Cinder (network block device)

On the backend, OCI can use LVM or Ceph for Cinder, local storage or Ceph for Nova instances.

Full HA redundancy

The nice thing is, absolutely every component setup by OCI is done in a high availability way. Each machine of the control plane of OpenStack is setup with an instance of the components: all OpenStack controller components, a MariaDB server part of the Galera cluster, etc.

HAProxy is also setup on all controllers, in front of all of the REST API servers of OpenStack. And finally, the web address where final clients will connect is in fact a virtual IP, that can move from one server to another, thanks to corosync. Routing to that VIP can be done either over L2 (ie: a static address on a local network), or over BGP (useful if you need multi-datacenter redundancy). So if one of the controllers is down, it’s not such a big deal, HAproxy will detect this within seconds, and if it was the server that had the virtual IP (matching the API endpoint), then this IP will move to one of the other servers.

Full SSL transport

One of the things that OCI does when installing Debian, is setup a PKI (ie: SSL certificates signed by a local root CA) so that everything in the cluster is transported over SSL. Haproxy, of course does the SSL, but it also connects to the different API servers over SSL too. All connections to the RabbitMQ servers are also performed SSL. If one wishes, it’s possible to replace the self-signed SSL certificates before the cluster is deployed, so that the OpenStack API endpoint can be exposed on a public address.

OCI as a quite modular system

If one decides to use Ceph for storage, then for every compute node of the cluster, it is possible to choose to use either Ceph for the storage of /var/lib/nova/instance, or use local storage. On the later case, then of course, using RAID is strongly advised, to avoid any possible loss of data. It is possible to mix both types of compute node storage in a single cluster, and create server aggregates so it is later possible to decide which type of compute server to run the workload on.

If a cluster Ceph is part of the cluster, then on every compute node, the cinder-volume and cider-backup services will be provisioned. They will be in use to control the Cinder volumes of the Ceph cluster. Even though the network block storage itself will not run on the compute machines, it makes sense to do that. The idea is that the amount of these process needs to scale at the same time as the amount of compute nodes, so it makes sense to do that. Also, on compute servers, the Ceph secret is already setup using libvirt, so it was also convenient to re-use this.

As for Glance, if you have Ceph, it will use it as backend. If not, it will use Swift. And if you don’t have a Swift cluster, it will fall-back to the normal file backend, with a simple rsync from the first controller to the others. On such a setup, then only the first controller is used for glance-api. The other controllers also run glance-api, but haproxy doesn’t use them, as we really want the images to be stored on the first controller, so they can be rsync to the others. In practice, it’s not such a big deal, because the images are anyway in the cache of the compute servers when in use.

If one setup cinder volume nodes, then cinder-volume and cinder-backup will be installed there, and the system will automatically know that there’s cinder with LVM backend. Both Cinder over LVM and over Ceph can be setup on the same cluster (I never really tried this, though I don’t see why it wouldn’t work, normally, simply both backend will be available).

OCI in Buster vs current development

Lots of new features are being added to OCI. These, unfortunately, wont make it to Buster. Though the Buster release has just enough to be able to provision a working OpenStack cluster.

Future features

What I envision for OCI, is to make it able to provision a cluster ready for serving as a public cloud. This means having all of the resource accounting setup, as well as cloudkitty (which is OpenStack resource rating engine). I’ve already played a bit with this, and it should be out fast. Then the only missing bit to go public will be billing of the rated resources, which obviously, has to be done in-house, and doesn’t need to live within the OpenStack cluster itself.

The other things I am planning to do, is add more and more services. Currently, even though OCI can setup a fully working OpenStack, it is still a basic one. I do want to add advanced features like Octavia (load balancer as a service), Magnum (kubernets cluster as a service), Designate (DNS), Manila (shared filesystems) and much more if possible. The number of available projects is really big, so it probably will keep me busy for a very long time.

At this point, what OCI misses as well, is a custom ISO debian installer image that would include absolutely all. It shouldn’t be hard to write, though I lack the basic knowledge on how to do this. Maybe I will work on this at this summer’s DebConf. At the end, it could be a debian pure blend (ie: a fully integrated distro-in-the-distro system, just like debian-edu or debian-meds). It’d be nice if this ISO image could include all of the packages for the cluster, so that no external resources would be needed. The setting-up an OpenStack cluster with no internet connectivity at all would become possible. Because in fact, only the API endpoint on the port 443, and the virtual machines need internet access, your management network shouldn’t be connected (it’s much safer this way).

No, there wasn’t 80 engineers that burned-out in the process of implementing OCI

One thing that makes me proud, is that I wrote all of my OpenStack installer nearly alone (truth: leveraging all the work of puppet-openstack, it woudn’t have been possible without it…). That’s unique in the (small) OpenStack world. Companies like my previous employer, or a famous companies working on RPM based distros, this kind of product is the work of dozens of engineers. I heard that Red Hat has nearly 100 employees working on TripleO. This was possible because I tried to keep OCI in the spirit of “keep it simple stupid”. It is doing only what’s needed, and implemented the mot simple way possible, so that it is easy to maintain.

For example, the hardware discovery agent is made of 63 lines of ISO shell script (that is: not even bash… but dash), while I’ve seen others using really over engineered stuff, like heavy ruby or Python modules. Ironic-inspector, for example, in the Rocky release, is made of 98 files, for a total of 17974 lines. I really wonder what they are doing with all of this (I didn’t dare to look). There is one thing I’m sure: what I did is really enough for OCI’s needs, and I don’t want to run a 250+ MB initrd as the discovery system: OCI’s live build based discovery image loaded over the web rather than PXE is a way smarter.

On the same spirit, the part that does the bare-metal provisioning, is the same shell script that I wrote to create the official Debian OpenStack images. It was about 700 lines of shell script to install Debian on a .qcow2 image, it’s not about 1500 lines, and made of a single file. That’s the smallest footprint you’ll ever find. However, it does all what’s needed, still, and probably even more.

In comparison, in Fuel, there was a super-complicated scheduler, written in Ruby, used to be able to provision a full cluster by only a single click of a button. There’s no such thing in OCI, because I do believe that’s a useless gadget. With OCI, a user simply needs to remember the order for setting-up a cluster: Cephmon nodes needs to be setup first, then CephOSD nodes, then controllers, then finally, in no particular order, the computes, swiftproxy, swiftstore and volume nodes last. That’s really not a big deal to let this done by the final user, as it is not expected that one will setup multiple OpenStack every day. And even so, if you use the “ocicli” tool, it shouldn’t be hard to do these final bits of the automation. But I would consider this a useless gadget.

While every company jumped into the micro-service in container thing, even at this time, I continue to believe this is useless, and mostly driven by the needs marketing people that needs to sell features. Running OpenStack directly on bare metal is already hard, and the amount of complexity added by running OpenStack services in Docker is useless: it doesn’t bring any feature. I’ve been told that it makes upgrades easier, I very much doubt it: upgrades are complex for other reasons than just upgrading the running services themselves. Rather, they are complex because one needs to upgrade the cluster components with a given order, and scheduling this isn’t easy.

So this is how I managed to write an OpenStack installer alone, in less than a year, without compromising on features: because I wrote things simply, and avoided the over-engineering I saw at all levels on other products.

OpenStack Stein is comming

I’ve just pushed to Debian Experimental, and to https://buster-stein.debian.net/debian the last release of OpenStack (code name: Stein), which was released upstream on the 10th or April (yesterday, as I write these lines). I’ve been able to install Stein on top of Debian Buster, and I could start VMs on it: it’s all working as expected after a bit of changes in the puppet manifests of OCI. What’s needed now, is testing upgrades from Stretch + Rocky to Buster + Stein. Normally, puppet-openstack can do that. Let’s see…

Want to know more?

Read on… the README.md is on https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer

Last words, last thanks

This concludes a bit more than a year of development. All of this wouldn’t have been possible without my employer, Infomaniak, giving me a total freedom on the way I implement things for going into production. So a big thanks to them, and also for being a platinium sponsor for this year’s Debconf in Brazil.

Also a big thanks to the whole of the OpenStack project, including (but not limited to) the Infra team and the puppet-openstack team.

Official Debian testing OpenStack image news

A few things happened to the testing image, thanks to Steve McIntire, myself, and … some debconf18 foo!

  • The buster/testing image wasn’t generated since last April, this is now fixed. Thanks to Steve for it.
  • The datasource_list is now correct, in both the Stretch and Testing image (previously, cloustack was set too early in the list, which made the image wait 120 seconds for a data source which wasn’t available if booting on OpenStack).
  • The buster/testing image is now using the new package linux-image-cloud-amd64. This made the qcow file shrink from 614 MB to 493 MB. Unfortunately, we don’t have a matching arm64 cloud kernel image yet, but it’s still nice to have this for the amd64 arch.

Please use the new images, and report any issue or suggestion against the openstack-debian-images package.

Using a dummy network interface

For a long time, I’ve been very much annoyed by network setups on virtual machines. Either you choose a bridge interface (which is very easy with something like Virtualbox), or you choose NAT. The issue with NAT is that you can’t easily get into your VM (for example, virtualbox doesn’t exposes the gateway to your VM). With bridging, you’re getting in trouble because your VM will attempt to get DHCP from the outside network, which means that first, you’ll get a different IP depending on where your laptop runs, and second, the external server may refuse your VM because it’s not authenticated (for example because of a MAC address filter, or 802.11x auth).

But there’s a solution to it. I’m now very happy with my network setup, which is using a dummy network interface. Let me share how it works.

In the modern Linux kernel, there’s “fake” network interface through a module called “dummy”. To add such an interface, simply load the kernel module (ie: “modprobe dummy”) and start playing. Then you can bridge that interface, and tap it, then plug your VM to it. Since the dummy interface is really living in your computer, you do have access to this internal network with a route to it.

I’m using this setup for connecting both KVM and Virtualbox VMs, you can even mix both. For Virtualbox, simply use the dropdown list for the bridge. For KVM, use something like this in the command line: -device e1000,netdev=net0,mac=08:00:27:06:CF:CF -netdev tap,id=net0,ifname=mytap0,script=no,downscript=no

Here’s a simple script to set that up, with on top, masquerading for both ip4 and ipv6:

# Load the dummy interface module
modprobe dummy

# Create a dummy interface called mynic0
ip link set name mynic0 dev dummy0

# Set its MAC address
ifconfig mynic0 hw ether 00:22:22:dd:ee:ff

# Add a tap device
ip tuntap add dev mytap0 mode tap user root

# Create a bridge, and bridge to it mynic0 and mytap0
brctl addbr mybr0
brctl addif mybr0 mynic0
brctl addif mybr0 mytap0

# Set an IP addresses to the bridge
ifconfig mybr0 192.168.100.1 netmask 255.255.255.0 up
ip addr add fd5d:12c9:2201:1::1/24 dev mybr0

# Make sure all interfaces are up
ip link set mybr0 up
ip link set mynic0 up
ip link set mytap0 up

# Set basic masquerading for both ipv4 and 6
iptables -I FORWARD -j ACCEPT
iptables -t nat -I POSTROUTING -s 192.168.100.0/24 -j MASQUERADE
ip6tables -I FORWARD -j ACCEPT
ip6tables -t nat -I POSTROUTING -s fd5d:12c9:2201:1::/64 -j MASQUERADE

Privacy breaches when unlocking a Xiaomi’s Mi 5s plus

My little girl decided the old OnePlus One of my wife had to take a swim in the toilets. So we had to buy a new phone. Since I know how bad standard ROMs are, I looked-up in the LineageOS list of compatible OS, and found out that the Xiaomi’s Mi 5s plus was not too bad, and we bought one. The phone itself looks quite nice: a 64 bits fast processor, a huge amount of RAM, nice screen, etc. Then I tried the procedure for unlocking… because I care about privacy, and I knew the Chinese Xiaomi ROM is full of spyware (the phone was purchased in China). Though what I didn’t know is that the unlock procedure (needed before changing the ROM) is itself is full of privacy breaches. Let me give you the details.

First, you got to register on Xiaomi’s website, and request for the permission to unlock the device. That’s already bad enough: why should I ask for the permission to use the device I own as I am pleased to? Anyway, I did that. The procedure includes receiving an SMS. Again, more bad: why should I give-up such a privacy thing as my phone number? Anyway, I did it, and received the code to activate my website account. Then I started the unlock program in a virtualbox Windows XP VM (yeah right… I wasn’t expecting something better anyway…), and then, the program tells me that I need to add my Xiaomi’s account in the phone. Of course, it then sends a web request to Xiaomi’s server (it refused to work unless I connected the phone to WiFi). I’m already not happy with all of this, but that’s not it. After all of these privacy breaches, the unlock APP tells me that I need to wait 72 hours to get my phone to account association to be activated. Since I wont be available in the middle of the week, for me, that means waiting until next week-end to do that. Silly…

Let’s recap. During this unlock procedure, I had to give-up:

  • My phone number (due to the SMS).
  • My phone ID (probably the EMEI was sent).
  • My email address (truth is: I could have given them a temporary email address).
  • Hours of my time understanding and run the stupid procedure, and I can’t even finish it in a single day.
  • My policy of not using Windows. I also consider that using Windows is a privacy breach, though here I have a way to roll-back the Virtualbox image, and I only use it for this kind of bad software, so privacy wise, it’s kind of fine, because I’m used of this trick. The real issue here is that, to unlock freedom on that phone, one must use a proprietary OS.

So my advice: if you want an unlocked Android device, do not choose Xiaomi, unless you’re ok to give up the above. It’s probably fine to pay a little bit more and reward the maker of a phone if the unlock experience isn’t that bad.

Testing OpenStack using tempest: all is packaged, try it yourself

tl;dr: this post explains how the new openstack-tempest-ci-live-booter package configures a machine to PXE boot a Debian Live system running on KVM in order to run functional testing of OpenStack. It may be of interest to you if you want to learn how to PXE boot a KVM virtual machine running Debian Live, even if you aren’t interested in OpenStack.

Moving my CI from one location to another leads to package it fully

After packaging a release of OpenStack, it’s kind of mandatory to functionally test the set of packages. This is done by running the tempest test suite on an already deployed OpenStack installation. I used to do that on a real hardware, provided by my employer. But since I’ve lost my job (I’m still looking for a new employer at this time), I also lost access to the hardware they were providing to me.

As a consequence, I searched for a sponsor to provide the hardware to run tempest on. I first sent a mail to the openstack-dev list, asking for such a hardware. Then Rochelle Grober and Stephen Li from Huawei got me in touch with Zachary Smith, the CEO of Packet.net. And packet.net gave me an account on their system. I am amazed how good their service is. They provide baremetal servers around the world (15 data centers), provisioned using an API (meaning, fully automatically). A big thanks to them!

Anyway, even if I planned for a few weeks to give a big thanks to the above people (they really deserves it!), this isn’t the only goal of this post. This is to introduce how to run your own tempest CI on your own machine. Because since I have been in the situation where my CI had to move twice, I decided to industrialize it, and fully automate the setup of the CI server. And what does a DD do when writing software? Package it of course. So I packaged it all, and uploaded it to the archive. Here’s how to use all of this.

General principle

The best way to run an OpenStack tempest CI is to run it on a Debian Live system. Why? Because setting-up a full OpenStack environment takes a lot of time, mostly spent on disk I/O. And on a live system, everything runs on a RAM disk, so installing under this environment is the fastest way one could do. This is what I did when working with Mirantis: I had a real baremetal server, which I was PXE booting on a Debian Live system. However nice, this imposes having access to 2 servers: one for running the Live system, and one running the dhcp/pxe/tftp server. Also, this means the boot server needs 2 nics, one on the internet, and one for booting the 2nd server that will run the Live system. It was not possible to have such specific setup at packet, so I decided to replicate this using KVM, so it would become portable. And since the servers at packet.net are very fast, it isn’t much of an issue anymore to not run on baremetal.

Anyway, let’s dive into setting-up all of this.

Network topology

We’ll assume that one of your interface has internet access, let’s say eth0. Since we don’t want to destroy any of your network config, the openstack-tempest-ci-live-booter package will use a dummy network interface (ie: modprobe dummy) and bridge it to the network interface of the KVM virtual machine. That dummy network interface will be configured with 192.168.100.1, and the Debian Live KVM will use 192.168.100.2. This convenient default can be changed, but then you’ll have to pass your specific network configuration to each and every script (just read the beginning of each script to read the parameters).

Configure the host machine

First install the openstack-tempest-ci-live-booter package. This runtime depends on the isc-dhcp-server, tftpd-hpa, apache2, qemu-kvm and all what’s needed to run a Debian Live machine, booting it over PXE / iPXE (the package support both, more on iPXE later). So, let’s do it:

apt-get install openstack-tempest-ci-live-booter

The package, once installed, doesn’t do much. To respect the Debian policy, it can’t touch configuration files of other packages in maintainer scripts. Therefore, you have to manually run:

openstack-tempest-ci-live-booter-config --configure-dummy-nick

Running this script will:

  • configure the kvm-intel module to allow nested visualization (by unloading the module, adding “options kvm-intel nested=y” to /etc/modprobe.d, and reloading the module)
  • modprobe the dummy kernel module, run “ip link set name tempestnic0 dev dummy0” to create a tempestnic0 dummy interface
  • create a tempestbr bridge, set 192.168.100.1 for the bridge IP, bridge the tempestnic0 and tempesttap
  • configure tftpd-hpa to listen on 192.168.100.1
  • configure isc-dhcp-server to dhcpreply 192.168.100.2 on the tempestbr, so that the KVM machine can boot up with an IP
  • configure apache2 to serve the filesystem.squashfs root filesystem, loaded by the Linux kernel at boot time. Note that you may need to manually start and/or reload apache after this setup though.

Again, you can change the IP addresses if you like. You can also use a real interface if you intend to boot a real hardware rather than a KVM machine (in which case, just omit the –configure-dummy-nick, and manually configure your 2nd interface).

Also, openstack-tempest-ci-live-booter provides a /etc/init.d/openstack-tempest-ci-live-booter script which will configure NAT on your server, so that the Debian Live machine has internet access (needed for apt-get operations). Edit the file if you need to change 192.168.100.1/24 by something else. The script will pick-up the interface that is connected to the default gateway by itself.

The dhcp server is configured to support both legacy PXE and the new iPXE standard. I had to support iPXE, because that’s what the standard KVM ROM does, and also I wanted to keep legacy support for older baremetal hardware. The way iPXE works is that dhcpd tells the client where to fetch the iPXE script, which itself chains to lpxelinux.0 (instead of the standard pxelinux.0). It’s rather easy to setup once you understood how it works.

Build the live image

Now that the PXE server is configured, it’s now time to build the Debian live image. Simply do this to build the image, and copy its resulting files in the PXE server folder (ie: /var/lib/tempest-live-booter):

mkdir live
cd live
openstack-tempest-ci-build-live-image --debian-mirror-addr http://ftp.nl.debian.org/debian

Since we need to login in that server later on, the script will create an ssh key-pair. If you want your own keys, simply drop the id_rsa and id_rsa.pub files in your current folder before running the script. Then make it so that this key-pair can be later on used by default by the user who will run the tempest script (ie: copy id_rsa and id_rsa.pub in the ~/.ssh folder).

Running the openstack-tempest-ci

What the openstack-tempest-ci script does is (re-)starting your KVM virtual machine, ssh into it, upgrade it to sid, install OpenStack, and eventually run all the tempest suite. There’s 2 ways to run it: either install the openstack-tempest-ci package, eventually configure it (in /etc/default/openstack-tempest-ci), and simply run the “openstack-tempest-ci” command. Or, you can skip the installation of the package, and simply run it from source:

git clone http://anonscm.debian.org/git/openstack/debian/openstack-meta-packages.git
cd openstack-meta-packages/src
./openstack-tempest-ci

Indeed, the script is designed to copy all scripts from source inside the Debian Live machine before using these scripts. The reason it’s doing that is because we want to avoid the situation where a modification needs to be uploaded to Debian before being able to test it, and also it was needed to be able to run the openstack-tempest-ci script without installing a package (which would need root access that I don’t have on casulana.debian.org, where running tempest is needed to test official OpenStack Debian images). So, definitively, feel free to hack everything in openstack-meta-packages/src before running the tempest script. Also, openstack-tempest-ci will look for a sources.list file in the current directory, and upload it to the Debian Live system before doing the upgrade/install. This way, it is easy to use the closest mirror.

There’s cloud, and it can even be YOURS on YOUR computer

Each time I see the FSFE picture, just like on Daniel’s last post to planet.d.o, where it says:

“There is NO CLOUD, just other people’s computers”

it makes me so frustrated. There’s such a thing as private cloud, setup on your own set of servers. I’ve been working on delivering OpenStack to Debian for the last 6 years and a half, motivated exactly to fix this issue: I refuse that the only cloud people could use would be a closed source solution like GCE, AWS or Azure. The FSFE (and the FSF) completely dismissing this work is more than annoying: it is counter productive. Not only the FSFE shouldn’t pull anyone away from the cloud, but it should push for the public to choose cloud providers using free software like OpenStack.

The openstack.org market place lists 23 public cloud providers using OpenStack, so there is now no excuse to use any other type of cloud: for sure, there’s one where you need it. If you use a free software solution like OpenStack, then the question if you’re running on your own hardware, on some rented hardware (on which you deployed OpenStack yourself), or on someone else’s OpenStack deployment is just a practical one, on which you can always back-up quickly. That’s one of the very reason why one should deploy on the cloud: so that it’s possible to redeploy quickly on another cloud provider, or even on your own private cloud. This gives you more freedom than you ever had, because it makes you not dependent anymore on the hosting company you’ve selected: switching provider is just the mater of launching a script. The reality is that neither the FSFE or RMS understand all of this. Please don’t dive into the FSFE very wrong message.

Released OpenStack Newton, Moving OpenStack packages to upstream Gerrit CI/CD

OpenStack Newton is released, and uploaded to Sid

OpenStack Newton was released on the Thursday 6th of October. I was able to upload nearly all of it before the week-end, though there was a bit of hick-ups still, as I forgot to upload python-fixtures 3.0.0 to unstable, and only realized it thanks to some bug reports. As this is a build time dependency, it didn’t disrupt Sid users too much, but 38 packages wouldn’t build without it. Thanks to Santiago Vila for pointing at the issue here.

As of writing, a lot of the Newton packages didn’t migrate to Testing yet. It’s been migrating in a very messy way. I’d love to improve this process, but I’m not sure how, if not filling RC bugs against 250 packages (which would be painful to do), so they would migrate at once. Suggestions welcome.

Bye bye Jenkins

For a few years, I was using Jenkins, together with a post-receive hook to build Debian Stable backports of OpenStack packages. Though nearly a year and a half ago, we had that project to build the packages within the OpenStack infrastructure, and use the CI/CD like OpenStack upstream was doing. This is done, and Jenkins is gone, as of OpenStack Newton.

Current status

As of August, almost all of the packages Git repositories were uploaded to OpenStack Gerrit, and the build now happens in OpenStack infrastructure. We’ve been able to build all packages a release OpenStack Newton Debian packages using this system. This non-official jessie backports repository has also been validated using Tempest.

Goodies from Gerrit and upstream CI/CD

It is very nice to have it built this way, so we will be able to maintain a full CI/CD in upstream infrastructure using Newton for the life of Stretch, which means we will have the tools to test security patches virtually forever. Another thing is that now, anyone can propose packaging patches without the need for an Alioth account, by sending a patch for review through Gerrit. It is our hope that this will increase the likeliness of external contribution, for example from 3rd party plugins vendors (ie: networking driver vendors, for example), or upstream contributors themselves. They are already used to Gerrit, and they all expected the packaging to work this way. They are all very much welcome.

The upstream infra: nodepool, zuul and friends

The OpenStack infrastructure has been described already in planet.debian.org, by Ian Wienand. So I wont describe it again, he did a better job than I ever would.

How it works

All source packages are stored in Gerrit with the “deb-” prefix. This is in order to avoid conflict with upstream code, and to easily locate packaging repositories. For example, you’ll find Nova packaging under https://git.openstack.org/cgit/openstack/deb-nova. Two Debian repositories are stored in the infrastructure AFS (Andrew File System, which means a copy of that repository exist on each cloud were we have compute resources): one for the actual deb-* builds, under “jessie-newton”, and one for the automatic backports, maintained in the deb-auto-backports gerrit repository.

We’re using a “git tag” based workflow. Every Gerrit repository contains all of the upstream branch, plus a “debian/newton” branch, which contains the same content as a tag of upstream, plus the debian folder. The orig tarball is generated using “git archive”, then used by sbuild to produce binaries. To package a new upstream release, one simply needs to “git merge -X theirs FOO” (where FOO is the tag you want to merge), then edit debian/changelog so that the Debian package version matches the tag, then do “git commit -a –amend”, and simply “git review”. At this point, the OpenStack CI will build the package. If it builds correctly, then a core reviewer can approve the “merge commit”, the patch is merged, then the package is built and the binary package published on the OpenStack Debian package repository.

Maintaining backports automatically

The automatic backports is maintained through a Gerrit repository called “deb-auto-backports” containing a “packages-list” file that simply lists source packages we need to backport. On each new CR (change request) in Gerrit, thanks to some madison-lite and dpkg –compare-version magic, the packages-list is used to compare what’s in the Debian archive and what we have in the jessie-newton-backports repository. If the version is lower in our repository, or if the package doesn’t exist, then a build is triggered. There is the possibility to backport from any Debian release (using the -d flag in the “packages-list” file), and even we can use jessie-backports to just rebuild the package. I also had to write a hack to just download from jessie-backports without rebuilding, because rebuilding the webkit2gtk package (needed by sphinx) was taking too resources (though we’ll try to never use it, and rebuild packages when possible).

The nice thing with this system, is that we don’t need to care much about maintaining packages up-to-date: the script does that for us.

Upstream Debian repository are NOT for production

The produced package repositories are there because we have interconnected build dependencies, needed to run unit test at build time. It is the only reason why such Debian repository exist. They are not for production use. If you wish to deploy OpenStack, we very much recommend using packages from distributions (like Debian or Ubuntu). Indeed, the infrastructure Debian repositories are updated multiple times daily. As a result, it is very likely that you will experience failures to download (hash or file size mismatch and such). Also, the functional tests aren’t yet wired in the CI/CD in OpenStack infra, and therefore, we cannot guarantee yet that the packages are usable.

Improving the build infrastructure

There’s a bunch of things which we could do to improve the build process. Let me give a list of things we want to do.

  • Get sbuild pre-setup in the Jessie VM images, so we can win 3 minutes per build. This means writing a diskimage-builder element for sbuild.
  • Have the infrastructure use a state-of-the-art Debian ftp-sync mirror, instead of the current reprepro mirroring which produces an unsigned reprository, which we can’t use for sbuild-createchroot. This will improve things a lot, as currently, there’s lots of build failures because of httpredir.debian.org mirror inconsistencies (and these are very frustrating loss of time).
  • For each packaging change, there’s 3 build: the check job, the gate job, and the POST job. This is a loss of time and resources, as we need to build a package once only. It will be hopefully possible to fix this when the OpenStack infra team will deploy Zuul 3.

Generalizing to Debian

During Debconf 16, I had very interesting talks with the DSA (Debian System Administrator) about deploying such a CI/CD for the whole of the Debian archive, interfacing Gerrit with something like dgit and a build CI. I was told that I should provide a proof of concept first, which I very much agreed with. Such a PoC is there now, within OpenStack infra. I very much welcome any Debian contributor to try it, through a packaging patch. If you wish to do so, you should read how to contribute to OpenStack here: https://wiki.openstack.org/wiki/How_To_Contribute#If_you.27re_a_developer and then simply send your patch with “git review”.

This system, however, currently only fits the “git tag” based packaging workflow. We’d have to do a little bit more work to make it possible to use pristine-tar (basically, allow to push in the upstream and pristine-tar branches without any CI job connected to the push).

Dear DSA team, as we now nice PoC that is working well, on which the OpenStack PKG team is maintaining 100s of packages, shall we try to generalize and provide such infrastructure for every packaging team and DDs?

Announcing validated Debian packages for Mitaka

Greetings! This is a (4 days delay) copy of the announce I made on the openstack-dev@lists.openstack.org on the 8th of April 2016.

I am overjoyed, thrilled and delighted to announce the release of the Debian packages for Mitaka.

All of the DefCore packages were validated successfully this morning through our package-only-based Tempest CI.

Content of this release
This release includes the following 23 services:
aodh 2.0.0
barbican 2.0.0
ceilometer 6.0.0
cinder 8.0.0
congress 3.0.0+dfsg1
designate 2.0.0
glance 12.0.0
gnocchi 2.0.2
heat 6.0.0
horizon 9.0.0
ironic 5.1.0
keystone 9.0.0
magnum 2.0.0
manila 2.0.0
mistral 2.0.0
murano 2.0.0
neutron 8.0.0
nova 13.0.0
trove 5.0.0
sahara 4.0.0
senlin 1.0.0
swift 2.7.0
zaqar 2.0.0

Where to find these packages
1/ Sid
All of Mitaka was uploaded to Debian Sid this week. You can use Debian Sid directly to use them.

2/ Official jessie-backports
As soon as everything migrates to Debian Testing (currently aka: Stretch), in 5 days if no RC bug is reported, it will be possible to upload all of Mitaka to the Debian official jessie-backports.

3/ Non-official Jessie and Trusty backports
In the meantime, the packages are available through Mirantis Jenkins automatic Debian Jessie backport repository. The full sources.list is available here:

http://mitaka-jessie.pkgs.mirantis.com/

You can use the Trusty backports as well:

http://mitaka-trusty.pkgs.mirantis.com/

To use these repositories, simply add the described sources.list to (for example) /etc/apt/sources.list.d/openstack.list, and run apt-get update. If you want to install the GPG key of the repositories, you can either install the mitaka-jessie-archive-keyring or mitaka-trusty-archive-keyring package (depending on your distribution of choice). Alternatively “apt-key add” the public key available at /debian/dists/pukey.gpg in these repositories.

As a reminder, the URLs above contain the word “Mirantis” only because the service is sponsored by my employer. These repositories are “straight” backports from what is available in Debian Sid, without any modification.

Remember that the packages listed below are maintained separately in Debian and Ubuntu, and therefore, packages are different in these distributions:
aodh, barbican, ceilometer, cinder, designate, glance, heat, horizon, ironic, keystone, manila, neutron, nova, trove, swift.

All other packages (including all OpenStack libraries like Oslo and python-*clients) are maintained in Debian, with the contribution of Canonical, and then synced to Ubuntu, so they are the exact same packages (or at least, with a minimal difference). I hope we can further improve collaboration between Debian and Canonical during the Newton cycle.

Bug reporting
As always, bug reports are welcome, and considered as high value contributions. Please follow the instructions available at https://www.debian.org/Bugs/Reporting to report bugs to the Debian BTS.

Moving forward with higher QA and the Packaging-deb project in Newton
Currently, DefCore packages are tested through a package-only (ie: no puppet, chef, you-name-it… system management involved) Tempest CI. Results can be seen at:
https://mitaka-jessie.pkgs.mirantis.com/job/openstack-tempest-ci/

Though not all packages are included in this CI. It is my intention, during the Newton cycle, to also include services like Designate, Trove, Barbican, Congress, … in this CI. Individual upstream team for these services are more than welcome to approach us to get this happen quicker.

Also, as we’re slowing starting to get the Packaging-Deb project (ie: packaging using upstream OpenStack gerrit and gating), it is also in the pipe to use the above mentioned tempest CI system as a gate system for the packaging. Hopefully, this will lead us to a full CI/CD working from trunk. We also hope to be able to use these packages to help the Puppet team to test packaged OpenStack from trunk.

Greetings
On each release, I ask myself who I should thank. This time, I would like to thank everyone, because this release was overall very nice and working well. The whole OpenStack community is always very helpful and understand the requirements of downstream distributions. Guys, you’re awesome, I love my work, and I love working with you all!

Cheers,

Moby

Just a quick reply to Rhonda about Moby. You can’t introduce him without telling about Go, which is the title who made him famous, very early in the age of electronic music (November 1990, according to wikipedia). Many attempted to remix this song (and Moby himself), but nothing’s as good as the original version.

Django upgrades area always a pain

It’s been a few years that I maintain some python-django-* packages, as part of the maintenance of the OpenStack dashboard, Horizon. Currently, this consist of: python-django-appconf, python-django-babel, python-django-bootstrap-form, python-django-compressor, python-django-discover-runner, python-django-formtools, python-django-openstack-auth, python-django-overextends, python-django-pyscss.

By far, Django has been one of the biggest pain point. It moves too fast, deprecating its own API from one minor version to the next, at the rate of one minor release every 6 months.

As Django 1.9 was uploaded to Sid, a bunch of problems appeared. The Django 1.9 release notes explains it all: a large chunk of its API gets removed (look for “Freatures removed” in that page). I had to fix a few issues: the last one I fixed was #807346 (in django-openstack-auth), which needed 2 patches. Amusingly, the patch I wrote looks the same as what is currently under review, by one of the upstream authors. Though still have #807355 to fix, and that one is more complex. To fix it, I have to package the latest commit of django-compressor, and:

  • Upgrade python-coffin to the latest version
  • Package and upload django-overextends
  • Get django-sekizai upgraded (I already interacted with the BTS to inform its package maintainer)
  • Add hand-crafted patches (ie: not from upstream) to stop using django.utils.unittest and django.utils.importlib

Even after doing all of this, django-compressor still doesn’t build (unit test failures) with lots of errors ending with this:

File “some-path/build-area/python-django-compressor-1.6+2015.12.15.git.66feba0db5/compressor/management/commands/compress.py”, line 162, in compress
followlinks=options.get(‘followlinks’, False)):
File “/usr/lib/python2.7/os.py”, line 278, in walk
names = listdir(top)
TypeError: coercing to Unicode: need string or buffer, Origin found

I tried fixing this last one, but failed so far. (if anyone can help, please do…) This was just the upgrade from 1.8 to 1.9, and it doesn’t include some of the issues fixed earlier (when Django 1.9 was only in Experimental and easy fixes were written). All this to say: Django upgrades are always painful.

As I always say: the Linux kernel is so much more complex than this kind of Python modules, and yet, they don’t allow themselves break the userland API. Why most Python developers believe that it’s OK to do so? It isn’t possible to separate private and public API clearly in python (like it is with the kernel). So it isn’t uncommon that library users start using non-public functions, classes or methods. For that, it is understandable that there are breakages (when someone uses something that isn’t made to be used by the library users). But that’s the only case where it is, and there’s no excuse to break known used public API. Django upstream authors, if you read me, please stop breaking the world every 6 months! And no, your deprecation messages are not an excuse. If you did a design mistake in the past, that’s no excuse. Too bad… you’ll have to live with it until the end of times and find a work-around.