Friday, 23 December 2016

Advanced routing with Google Load Balancing

Hosted Cyclid runs on Google Compute Platform, with Google Load Balancing in front of multiple servers. The basic setup is quite simple; two Cylid API server instances with a single load balancer that terminates HTTP(S) and proxies to the API instances. So we have:

  1. Two instances (prod-usc-api01 & prod-euw-api01)
  2. A single instance group (prod-api) containing both instances.
  3. A load balancer frontend for HTTP & HTTPS, complete with the HTTPS certificate.
  4. A single load balancer backend which proxies requests to the instance group in port 8361 (the Cyclid API server port)
  5. A healthcheck which queries the Cyclid API health status endpoint.

The Cyclid API server provides two healthcheck API endpoints, /health/status and /health/info. The /health/status endpoint returns either 200 (Everything is okay) or 503 (One or more components has an error). The healthcheck simply polls that endpoint; if there are any problems with the server, it will return 503 and the API server is removed from the load balancer.

So far, so standard.

The health dashboard

If one of the API servers has a problem, I'd like an easy way to find out about it. The second health API endpoint, /health/info is one way to find out. It always returns 200 (even if the server itself is unhealthy) and the body provides the health information for each component E.g.

$ curl http://prod-euw-api01.api.cyclid.io/health/info
{"statusDetails":{"database":{"status":"OK",
"message":"database connection is okay"},
"dispatcher_local":{"status":"OK",
"message":"sidekiq is okay"}},
"status":"OK","message":"everything is fine"}

So in theory all I have to do is query this API on each server and I can see what, if anything, is broken. Except of course both API servers are behind a single load balancer; so if I query that, I'll end up querying the status of whichever server the request is routed to. So, I need some way to route these health checks to each individual server.

I have a few options:
  1. Modify the firewall rules so that each API server is also directly accessible from the internet.
  2. Find some way to forward requests directly to each server, perhaps using Request Forwarding.
  3. Create a new load balancer, one for each server I want to query.
  4. Find some way to pass these requests directly to each server.
Option #1 didn't appeal at all: the Google Load Balancer effectively acts as a simple WAF, and shields the API servers from direct attacks or exploits. I couldn't find any sensible way to make Option #2 work, although I admit I didn't consider it for too long. Option #3 would work but is clearly very clunky, and I didn't fancy paying for all those extra IPv4 addresses it would need.

Instead, I found a way to make Option #4 work.

Host & Path matching

Google Load Balancing allows you to configure multiple backends for a given frontend, and each backend can also have it's own rules and healthchecks. We can use that to route only requests for /health/info to individual servers, all from the same load balancer frontend.

The basic idea is to add a backend for each API server, and configure the backend to match a host & path I.e. the backend for prod-usc-api01 will match on the host prod-usc-api01.api.cyclid.io and the path /health/info. All other requests will go to the prod-api backend.

First of all, Google Load Balancer backends can only forward to an Instance Group, so we'll need to create an Instance Group per. server we want to forward too:

$ gcloud compute instance-groups list
NAME                ZONE            NETWORK     MANAGED  INSTANCES
...
prod-euw-1c-api     europe-west1-c  production  No       1
prod-usc-1b-api     us-central1-b   production  No       1

Note that each "group" only has one instance; that way we know that requests can only ever be routed to that individual instance.

Next up we need to add a backend for each server:


Then, add Host & Path rules for those backends:

These rules mean that:
    1. By default, all requests are routed to the prod-api backend, which is a pool of all the available API servers.
    2. Requests to specific hosts E.g. prod-usc-api01.api.cyclid.io/health/info will be routed to the backend specifically for that server.
    3. Requests for specific hosts, but paths other than /health/info will be routed to the prod-api backend.

DNS & Healthchecks

As we have rules that match on the hostname, we'll need to ensure that DNS A records exist for each host we're matching on. We'll need one A record for each server, with the record set to the load balancer frontend address, and of course the A record for standard requests to the API:

$ gcloud dns record-sets list --zone cyclid-io
...
api.cyclid.io.                   A      300    130.211.28.97
prod-euw-api01.api.cyclid.io.    A      300    130.211.28.97
prod-usc-api01.api.cyclid.io.    A      300    130.211.28.97

So normal API requests go to api.cyclid.io and on to the servers in a normal manner, while requests to prod-euw-api01.api.cyclid.io/health/info will match the Host & Path rules and be forwarded to the prod-euw-api01 server only. Perfect.

We'll also need a different healthcheck for these backends; we can't use the same healthcheck as prod-api because it will return a 503 error if something is wrong, which will cause the load balancer to drop it! Instead we'll create a simple healthcheck that just polls /health/info: as long as the server is up & responding that will return a 200 response. If the server can't respond appropriately to that endpoint then we won't be able to query it anyway, so it doesn't matter if the load balancer has dropped it at that point:

$ gcloud compute http-health-checks describe cyclid-http-ok
...
kind: compute#httpHealthCheck
name: cyclid-http-ok
port: 8361
requestPath: /health/info
timeoutSec: 5
unhealthyThreshold: 2

Dashboard

Now I've configured the Load Balancer, I can query each individual API server for it's current health information:


Tuesday, 6 December 2016

Wait; let me try that again

Soon after I finished writing my last post, I had one of those moments.

"If treating LXD containers like a Cloud is hard, maybe I shouldn't be thinking of it like one? Maybe SSH & cloud-init are the wrong answer?"

Hmmm. Well, time to rethink this whole thing.

Back to the Documentation

Clearly, the lxc command line tool can execute commands inside containers; I knew this. Running "lxc exec bash" is the standard way to get a shell inside a container. What I hadn't done is link that idea to the REST API; lo' and behold, a check of the documentation shows me that the API provides a method to execute a command in a container.

Even more obviously (in hindsight) the Hyperkit Gem documentation shows an execute_command method, as well methods to download & upload files. "Wait a minute, that's precisely the three things a Transport plugin does!" and finally the lightbulb switches on.

Foresight & Hindsight are well balanced

When I originally designed & built Cyclid I was very careful to layer everything:

  • Builders deal with creating & destroying build hosts
  • Transports deal with executing commands in build hosts
  • Provisioners deal with setup & configuration of build hosts

This means that everything is decoupled, and nothing makes any assumptions about how the build host has been created, how it's been configured or how to execute commands: that's all handled by the different plugins, and (in theory) is completely opaque to the other layers.

I originally designed things way with something like a WinRM Transport plugin, or even an "agent" type transport (similar to how Jenkins works). It turns out that decision was pure 20:20 foresight. All I need to do is write an LXD Transport plugin that uses the LXD REST API to execute commands on LXD build hosts, instead of SSH!

Gotchas

You didn't think it was that easy, did you?

Okay, it wasn't that bad, but there were some gotchas I had to contend with. The biggest issue was that actually using the REST API to execute a command is not particularly well documented. In fact, the documentation for the WebSockets that are created as part of the process is, as far as I can tell, entirely undocumented. I had to fall back to reading the command line client source code (which is written in Go: I don't actually know Go) to piece it together. If you're wondering:

  • There are two options that are relevant, "wait-for-websocket" and "interactive". These are both documented, but the documentation may not make it immediately obvious what the implication are.
  • Depending on the "interactive" flag, you either get 1 WebSocket which combines STDIN, STDOUT & STDERR, or three seperate WebSockets (one each).
  • There is always a "control" WebSocket. The actual function of the control WebSocket is undocumented.
In the case of Cyclid I needed STDOUT & STDERR to be combined into a single stream, so the "interactive" flag needs to be set. One  side effect of the way that LXD manages the WebSocket is that it creates & attaches a real PTY device, which in turn causes a bunch of executables to assume they're attached to an Xterm compatible terminal: this is bad, but what can you do? The Transport plugin has to strip out control sequences before the output is written to the log.

It turns out the "control" WebSocket is a red-herring. From reading the source, it is used to send two different types of message: "signal" and "window-size". The messages are simple JSON objects with the message type & associated parameters. We don't need to deal with signals, and certainly don't need to set the terminal size, so we can ignore the control Websocket.

It turned out that the Hyperkit Gem didn't actually expose the "wait-for-websocket" or "interactive" flags, so I had to submit patches to enable that functionality. While I was there I also extended the "put file" and "get file" functionality to better fit how Cyclid passes IO objects around, rather than reading and writing files on-disk.

Last but not least, I had to learn enough about WebSockets to be dangerous. In fact I'm not entirely sure if I've got it right still; it does seem to work, but some of the semantics seem odd. I have no idea if this is by design of WebSockets, by design of LXD, or an artifact of the WebSockets client library I'm using: if you know more about WebSockets than I do (it may not be hard) please let me know so that I can pick your brains!

It lives!

Given those relatively minor caveats, the good news is that it all works: the Cyclid LXD plugin can create build instances in a machine running LXD, connect to those instances and run jobs. This is a really significant milestone; with Cyclid & LXD you can now run an entirely Open Source Continuous Integration system, on-premise, with a price tag of 0.

And if that isn't exciting enough for you then I hope the Grinch steals your Christmas!

Friday, 18 November 2016

Containers are small clouds, right?

Cyclid is deliberately agnostic when it comes to where & how it creates build hosts; currently it supports Google Cloud, DigitalOcean and something called Mist.

Mist creates LXC containers for build hosts, and was created very early on as an important component of Cyclid because support for containers is important. The only problem is that LXC is rough around the edges; although it seemed easy enough, Mist is not very good and it only ever really worked for Ubuntu Trusty containers. Clearly not ideal.

As it happens, the LXC developers also realised that LXC wasn't ideal, so they created LXD. LXD solves a lot of the issues that LXC has and adds lots of nice features such as proper image management and a REST API. There's even a decent Rubygem on top of the REST API.

So all I have to do is write an LXD Builder plugin for Cyclid and that's the container problem solved, right? That's what I thought...

The Cloud Conundrum

First of all, let's be clear: LXD is an enormous improvement on LXC and I'd use LXD over LXC any day of the week. When it comes to my requirements for Cyclid though, it's still just as tricky to use as LXC.

The main problem is image management. In theory, Cyclid is also agnostic about the operating system, distribution & version that your builds run on. So it can't know in advance what any given job might request; all it can do is pass the request on to whatever is creating the actual instance and see if it can fulfil the request. In practice this is pretty reliable with most cloud providers; they all tend to work in a similar fashion and all tend to offer a broadly similar set of instances to choose from.

With LXD, there is no simple answer. In theory we could have LXD download a template from the Linuxcontainers.org public image server, but there is a major issue with that approach: none of the images there have cloud-init installed. So we can create a container we can't configure.

But wait! There are some templates which do have cloud-init installed...but those are only Ubuntu images. They're also served from a different public image server. They are more reliable than the old LXC templates, so at least we can now create containers other than Ubuntu Trusty, but if we want a container for anything other than Ubuntu we're still out of luck.

Okay, so if Ubuntu provide cloud images, surely other distributions do, too? The page at https://images.linuxcontainers.org/ even tells us that these are not official and that we should use the official images provided by the distribution.

It turns out that most distributions that aren't Ubuntu do provide cloud images; but not images that you can use with LXD. LXD has its own image format and most distributions only provide images in QCOW2 or raw format, for use with virtual machines like Qemu or VirtualBox.

So we can't rely on the public LXD images providing the functionality we need, and we can't use the cloud images that do provide what we need.

Plan C...

The remaining options are unattractive for various reasons:
  1. Bootstrap instances without cloud-init. We can run commands directly in the container via. the API, so we could create the users & configure SSH that way. Except the Builder plugin would need to have knowledge of every possible distribution (package names, configuration files etc.) we'd support, which is impractical and inflexible.
  2. Provide our own cloud images for popular distributions. Again, theoretically possible, but not something I want to do. I'd have to spend time updating image templates, and I'd have to host them and pay the bandwidth costs.
  3. Give up entirely and make it the users problem: if you want to use a distribution with Cyclid on LXD, you have to provide the image(s) with cloud-init enabled. Welcome to the party!

...to Plan Z

Which brings us back to almost where we were at the start: LXD is better than LXC, but we can only use it to reliably create Ubuntu containers. That's probably what's going to happen, at least in the short term, because it's better than nothing (and still better than Mist). But far from ideal.

It really feels like LXD is missing a trick. It's not quite a cloud in a box, but with a little more effort and some attention to the operational aspects, it could be. Right now it seems to be aimed at traditional hands-on systems administration, where every container is lovingly hand crafted from the command line. That may be fine; heck that may really be what the vast majority of users care about, but it's a shame. Because I really, really, do not want to run OpenStack just to be able to reliably create a container...