- Two instances (prod-usc-api01 & prod-euw-api01)
- A single instance group (prod-api) containing both instances.
- A load balancer frontend for HTTP & HTTPS, complete with the HTTPS certificate.
- A single load balancer backend which proxies requests to the instance group in port 8361 (the Cyclid API server port)
- A healthcheck which queries the Cyclid API health status endpoint.
The Cyclid API server provides two healthcheck API endpoints, /health/status and /health/info. The /health/status endpoint returns either 200 (Everything is okay) or 503 (One or more components has an error). The healthcheck simply polls that endpoint; if there are any problems with the server, it will return 503 and the API server is removed from the load balancer.
So far, so standard.
The health dashboard
If one of the API servers has a problem, I'd like an easy way to find out about it. The second health API endpoint, /health/info is one way to find out. It always returns 200 (even if the server itself is unhealthy) and the body provides the health information for each component E.g.$ curl http://prod-euw-api01.api.cyclid.io/health/info
{"statusDetails":{"database":{"status":"OK",
"message":"database connection is okay"},
"dispatcher_local":{"status":"OK",
"message":"sidekiq is okay"}},
"status":"OK","message":"everything is fine"}
So in theory all I have to do is query this API on each server and I can see what, if anything, is broken. Except of course both API servers are behind a single load balancer; so if I query that, I'll end up querying the status of whichever server the request is routed to. So, I need some way to route these health checks to each individual server.
I have a few options:
- Modify the firewall rules so that each API server is also directly accessible from the internet.
- Find some way to forward requests directly to each server, perhaps using Request Forwarding.
- Create a new load balancer, one for each server I want to query.
- Find some way to pass these requests directly to each server.
Option #1 didn't appeal at all: the Google Load Balancer effectively acts as a simple WAF, and shields the API servers from direct attacks or exploits. I couldn't find any sensible way to make Option #2 work, although I admit I didn't consider it for too long. Option #3 would work but is clearly very clunky, and I didn't fancy paying for all those extra IPv4 addresses it would need.
Instead, I found a way to make Option #4 work.
Host & Path matching
Google Load Balancing allows you to configure multiple backends for a given frontend, and each backend can also have it's own rules and healthchecks. We can use that to route only requests for /health/info to individual servers, all from the same load balancer frontend.
The basic idea is to add a backend for each API server, and configure the backend to match a host & path I.e. the backend for prod-usc-api01 will match on the host prod-usc-api01.api.cyclid.io and the path /health/info. All other requests will go to the prod-api backend.
First of all, Google Load Balancer backends can only forward to an Instance Group, so we'll need to create an Instance Group per. server we want to forward too:
$ gcloud compute instance-groups list NAME ZONE NETWORK MANAGED INSTANCES
... prod-euw-1c-api europe-west1-c production No 1 prod-usc-1b-api us-central1-b production No 1
Note that each "group" only has one instance; that way we know that requests can only ever be routed to that individual instance.
Next up we need to add a backend for each server:
Then, add Host & Path rules for those backends:
These rules mean that:
- By default, all requests are routed to the prod-api backend, which is a pool of all the available API servers.
- Requests to specific hosts E.g. prod-usc-api01.api.cyclid.io/health/info will be routed to the backend specifically for that server.
- Requests for specific hosts, but paths other than /health/info will be routed to the prod-api backend.
DNS & Healthchecks
As we have rules that match on the hostname, we'll need to ensure that DNS A records exist for each host we're matching on. We'll need one A record for each server, with the record set to the load balancer frontend address, and of course the A record for standard requests to the API:$ gcloud dns record-sets list --zone cyclid-io ... api.cyclid.io. A 300 130.211.28.97 prod-euw-api01.api.cyclid.io. A 300 130.211.28.97 prod-usc-api01.api.cyclid.io. A 300 130.211.28.97
So normal API requests go to api.cyclid.io and on to the servers in a normal manner, while requests to prod-euw-api01.api.cyclid.io/health/info will match the Host & Path rules and be forwarded to the prod-euw-api01 server only. Perfect.
We'll also need a different healthcheck for these backends; we can't use the same healthcheck as prod-api because it will return a 503 error if something is wrong, which will cause the load balancer to drop it! Instead we'll create a simple healthcheck that just polls /health/info: as long as the server is up & responding that will return a 200 response. If the server can't respond appropriately to that endpoint then we won't be able to query it anyway, so it doesn't matter if the load balancer has dropped it at that point:
$ gcloud compute http-health-checks describe cyclid-http-ok ... kind: compute#httpHealthCheck name: cyclid-http-ok port: 8361 requestPath: /health/info timeoutSec: 5 unhealthyThreshold: 2



