Odoo on Kubernetes 2024 Edition

All the way back in 2021 we posted How we Host Odoo, which detailed our history of why we picked Kubernetes. Possibly not the most gripping title, and it didn't get much interest.

It turns out there's quite some interest in running Odoo on Kubernetes. Adrien Peiffer from Ascone did a talk at OCA Days 2023, (which is one of the most viewed recorded talks).

Considering we've been doing it longer than most, we thought we'd weigh in with what we've changed since 2021.

(Spoiler alert, if you've watched the linked video, somewhat reassuringly we're very similar to Ascone's setup)

This isn't going to be a "How To" guide. If you're looking for support running Odoo on Kubernetes, reach out.

Why Kubernetes? Why not just use a VM per customer? OMG why is this so overcomplicated?

We'll cover this in more detail in another post when we compare our hosting to Odoo.sh, CloudPepper, Odoo Online, and others. We don't directly compete with these other products.

But the long and short is that it works for us because we manage more than 1 installation of Odoo and because of how we've positioned our hosting offering. 

If you're hosting just a single install, I'd seriously question anyone reaching for Kubernetes, unless you already use it for other things. 

It fixed a bunch of problems we had (which we covered in our 2021 edition of this post), whilst generating some others.

What tools do we use in 2024?

What

Purpose

Changed since 2021?

FluxCD2 - GitOps Tool Kit

Manages both core infrastructure deployment, and rolling out Odoo image upgrades using our Helm Chart (below).

No. 

Infact we've started using features that we were not using at the time.

It's been rock solid. We do miss a Web UI for more junior members of staff, which Argo has. But it's not tempted us to move yet.

CloudNative Postgres (CNPG)

Manages the full lifecycle of a highly available PostgreSQL database cluster with a primary/standby architecture, using native streaming replication. 

Yes!

We used to run PostgreSQL external to Kubernetes.

We are extremely happy with CNPG.

SOPS

Secret management (integrated with FluxCD2)

No.

We will probably add External Secrets to the mix, but SOPS will continue to exist to at least bootstrap the basics.

Velero

Backups (for data only, kubernetes manifests as all managed through FluxCD2)

Yes!

We used to use Duplicity. 

We migrated to Velero because we wanted to use Restic, and have metrics to alert on. Velero was the most mature option at that time.

We have recently moved to using the Kopia backend of Velero, which dramatically reduced the time and resources needed to backup our larger filestores.

Ingress Nginx

Ingress controller

Yes!

We used to use Traefik. There's full break down on why we changed here.

Cert Manager

SSL certificate management

No.

External DNS

Automatic configuration of  external DNS records

No.

kube-prometheus-stack

Monitoring

No.

Personally, I'm not overly fond of PromQL. I don't write it often enough and if I need to write something complex I will need reach for the documentation and curse.

But the ecosystem has settled on Prometheus and it works.

Glo's internal Odoo Helm Chart

(Whilst the chart is open source we do not recommend using it outside of Glo - it's got a lot of legacy/backwards compat support baked in and we don't always do a great job at announcing breaking changes. But you may want to take inspiration)

Deployment/management of Odoo via Helm

Yes and No.

We were very early in our Odoo Helm chart when we wrote our original post.

The Helm chart has evolved a bit.

We have deeply mixed feelings about Helm and the chart in general, but right now it continues to work for us.

GitHub Actions

Build and push Odoo Docker images to Docker Hub.

CI and other testing.

Yes and No.

We used to use GitHub Actions to directly deploy Odoo into Kubernetes. This was a hangover from when we used Docker Swarm.

Now GitHub Actions only run unit tests, pre-commit, etc. checks on pull requests, and push new builds to Docker Hub. FluxCD2 manages the deployment process and it's Image Updater component monitors for updated images.

Azure

Underlying platform

Yes!

We were on-premises using our own hardware. We migrated to Azure as our hardware was aging out. It's also helped us grow.

Azure Files replaced NFS as the underlying storage for the Odoo filestore (RWX).

It has come at the expense of a significantly higher cost to us and a bit of a learning curve around Azure pricing.

Was it worth it? Right now yes. Would we do it again. Maybe.

Some Kubernetes + Odoo specific things we learnt along the way

  • Memory limits matter. CPU limits less so. 
    • Specifically because of how Odoo works we now effectively set most installations to the same request and limit values and ensure that they match Odoo's conf. 
    • It's not the most efficient way to runs things, but it has improved stability as things approach peak times.
  • Don't be afraid to ring fence nodes/use different node pools. 
    • We put this off for far too long.
    • We now use different size nodes to UAT (testing) and production installations. 
    • Do label your nodepools/nodes consistently and try and keep the labels simple.
  • Just use RWX for the Odoo filestore.
    • It's tempting to use something fancier, like a module in Odoo to move the attachments to object storage. These come with their own trade offs. Our main issue was training though.
    • Azure Files mounted in the container as RWX (using https://learn.microsoft.com/en-us/azure/aks/azure-files-csi) is Good Enough(TM) for most of our installations.
  • CNPG is great - other solutions are probably just as good, but not for us.
    • CNPG takes a lot of cognitive load off running Highly Available PostgreSQL inside Kubernetes. But just like running any PostgreSQL cluster you must be involved. It's not, and probably never will be, fire and forget.
    • CNPG, like all software, isn't perfect. In the past it's self-recovery process has not been as smooth as I would like.

Some Odoo + PostgreSQL specific things we also learnt along the way

  • Under PostgreSQL 12 and onwards jit = 'on'​ is not your friend. We now default to disabling this. This isn't an issue specific to running PostgreSQL in Kubernetes, or even CNPG, we did appear to suffer more as we were more rigid with memory allocation.

    We first saw issues under PostgreSQL 12 with LLVM asking for memory beyond the PostgreSQL configured memory limits, and it was really hard to reason about when or why it would need more. We disabled it and went along our merry way.

    1-2 years later we needed to rediscover this when we started moving to CNPG.
  • At least one installation ended up with a collation not set to C​. This caused massive performance issues that went unnoticed until critical mass.

    Why is this a problem? Because of how BTree indexes work. We now ensure that both in a dev and production environment that this doesn't happen. See https://github.com/odoo/odoo/pull/25196 for more details.

The roadmap

I'd love to have an Odoo Operator to manage the lifecycle of Odoo on Kubernetes.

Our Helm Chart has some limitations that we could solve, but we've found the tooling and testing around helm wanting.

I would like to: 

  • Integrate backups directly into the operator. 

    Presently we use CNPG's native base backup, WAL archiving for PostgreSQL and Velero for a combined pgdump+filestore backup. I would like to remove Velero from the equation if possible because we don't need *everything else* that it does. 

    I do acknowledge that this may be a fools errand.
  • Integrate monitoring queries, health and enterprise licensing checks, directly into the operator.

    We currently use CNPG custom monitoring queries. On the few places where we run Odoo still without CNPG we have no good monitoring story for these things.
  • Improve the automatic upgrade handling. Currently we use helm hooks to handle scaling down deployments before upgrades start.

    We've been asked before to issue pop up warnings with countdown, and maintenance pages during upgrades. Typically they don't take long enough to warrant it. However, for maintenance pages we've done it in the past with priorities (Traefik) and fallback containers (Nginx), and we've not spent the engineering time for in-odoo warnings.
  • Integrate webhooks and other options for connecting to third party systems, such as our own Odoo install, and GitHub, to signal successful deployments, etc.

    (I'd love to get to a point where we could easily hook up our own Odoo installation with a customer management portal, with a cut down feature set of Odoo.sh, that our customers truly care about - most don't care about adding new deployments, or branches, but they do care about metrics, stats, very occasionally logs)

This has been on the backlog for quite sometime. If you're interested in sponsoring this work, please do reach out.

Odoo Enterprise 17.0 Upgrade Service is now available... What does that mean for your business?
Should you consider upgrading? What does an upgrade look like?