Disposable Cloud Environments With Vagrant and Tailscale

There's a lot in this blog post, so I'll summarize it first, and then tell you a horrible joke that got me a content warning from Slack (but it won't make sense unless you're a functional programming nerd).

  • Goal #1: I want to build out an ELK cluster and Do Science To it.
  • Goal #2: I want to not start from scratch or figure out how to undo things when I screw up.
  • Goal #3: I want to be able to keep working on it from different computers.
  • Goal #4: I want to work through Kubernetes the Hard Way and set up a cloud environment.

The solution to #1 is containerization. Run Docker Compose and set up an multi node ElasticSearch cluster.

The solution to #2 is virtualization. Create a virtual machine using Vagrant, install Docker on it, then do #1. Now I can take VM snapshots before config changes and rollback if I screwed up, and I can dispose of the boxes when I'm done.

The solution to #3 is to build out a homelab server (incredibly cheap at $391). Install Ubuntu, then #2.

The solution to #4 is to throw more memory at the homelab server, then install Kind after #2. Because of #2, I can now also mess with Terraform state and get away with it.

Pause to build and install everything…

Problem: I want to see Kibana from my laptop browser. Docker Compose forwards everything to localhost, and then the VM also requires networking magic to expose it to the host. I have a box containing a box, containing a box, and I don't want to have to port forward all the things.

Solution: Install Tailscale on the VM, exposing it as a host on the network (tailnet in Tailscale parlance).

Problem: Kubernetes is an orchestration layer, so now there are many boxes and portforwarding is impossible.

Solution: Set up Tailscale as a subnet router inside the VM, using Escaping the Nested Doll with Tailscale as a guide. Now I have infinite hosts on the network, and if I want a different configuration I can rollback to a base k8s state, or even set up them up side by side.

Now I'm going to tell you the horrible joke.

Imagine that we're defining processes as code, so a physical server is a container for processes: `

// http://devserver:80 from nginx -p 80 (the trivial case)
val devServer: Server = Server(Process("nginx", 80))

We can describe Docker and VirtualMachine the same way:

// docker run -p 80:80 nginx
val docker: Docker = Docker(Process("nginx", 80), PortForward(guest = 80, host = 80))

// config.vm.provision "shell", inline: "nginx -p 80"
// config.vm.network "nginx-port", guest: 80, host: 80
val standardVM: VirtualMachine = 
  VirtualMachine(Process("nginx", 80), PortForward(guest = 80, host = 80))

And we can build this up by putting Docker inside of a VM:

// config.vm.provision "shell", inline: "docker run -p 80:80 nginx"
// config.vm.network "nginx-port", guest: 80, host: 80
val vagrantDocker: VirtualMachine = 
  VirtualMachine(docker, PortForward(guest = 80, host = 80)))

And see how Kubernetes is a bit different:

// podIP: 10.244.0.6:80, 10.244.0.7:80 
val kubernetes: Kubernetes = Kubernetes(
  Set(
    Pod(Process("nginx", 80)),
    Pod(Process("nginx", 80))
  )
)

// Port mapping breaks down when we have multiple pods on a single VM :-(
val vagrantKubernetes: VirtualMachine = 
  VirtualMachine(kubernetes, ???)

From this, we can infer that Server, Docker and Pod are all Container types:

trait Container[T]
trait Server extends Container[Set[Process]]
trait Docker extends Container[Set[Process]] with Process
trait Pod extends Container[Set[Process]] with Process

And that VirtualMachine and Kubernetes are also instances of Container:

trait VirtualMachine extends Container[Server]
trait Kubernetes extends Container[Set[Pod]] extends Process

And Tailscale creates a Server from VirtualMachine:

val vagrantDocker = VirtualMachine(docker, portMapping))

// http://vagrant-docker:80 on the tailnet
val exposedNginxHost: Server = Tailscale(vagrantDocker)

But if VirtualMachine is a Container[Container[Set[Process]]] and Server is a Container[Set[Process]]:

val Tailscale: Container[Container[Set[Process]]] => Container[Set[Process]]

Tailscale is flatMap for the Containerization monad.

Still here? Let's dig into the details and I'll show all the setup steps.

Putting The Machine Together

Developing on my laptop has issues.

I installed elementary 5 on my laptop a while ago. It's based on Ubuntu 18.04, and I've started running into "version GLIBC_2.28' not found" errors more and more as it gets further behind. There's no way to upgrade Elementary 5 – the upgrade path is to reinstall the operating system from scratch. And… well, it has all kinds of cruft on it from various docker/k8s/cluster management tools all over it. It works fine as a laptop, but as a development environment it's not great. And trying to use Windows with WSL2 was even worse.

The easiest thing to do – obviously – is to take a week off work, put together a cheap headless machine as a homelab server, stick it in the basement, move everything to that box, and then connect the laptop remotely.

I went with the Ars Technica System Guide base specs, with a couple of changes: I added 64GB of memory, and I picked out an AMD Ryzen 5 5600X instead of the 5600G. (This was a mistake – the 5600X doesn't have an integrated GPU, leading to a frantic moment trying to figure out why the BIOS wouldn't come up on the HDMI port.) After upgrading the BIOS, staring at the manual for pins, and enabling virtualization by turning on SVM Mode, it was finally ready for a minimal Ubuntu install, using xrdp and Remmina to connect remotely.

I named it devserver.

Using Tailscale for "Server in the Basement"

The first thing to do was to install Tailscale on absolutely everything and enable every single feature, especially DNS.

Tailscale is good at the core use case, but does have some client based issues. For WSL, it won't recognize the tailscale client in the main Windows app, so you have to run tailscaled explicitly and distinguish it from the Windows host:

sudo nohup tailscaled &
sudo tailscale up --hostname windows-wsl

With the laptop, I also sometimes had to do tailscale down/up or --reset in order to get the mappings to resolve correctly.

There's a couple of things to be aware of when setting up Tailscale for a server. The first one is disabling key expiry for the server, since it's going to be hanging around for a while. The second is that tailscale provides its own ssh, which requires its own parameters:

sudo tailscale up --ssh --operator=$USER

Once SSH was up, it was time to futz with configuration files. I like to use Visual Studio Code with SSH remote development, which comes with a secret command line tool for connecting to any host on the tailnet:

code --folder-uri "vscode-remote://ssh-remote+devserver/home/wsargent/"

There are also some utilities that Tailscale provides for ad-hoc port forwarding. For example, I can run jekyll serve on devserver and it will start a server on port 4000 – I can see how that looks on my phone by using tailscale serve:

sudo tailscale serve tcp:4000 tcp://localhost:4000 

And then I can go to http://devserver:4000 on my phone and see how the blog post looks from there. The services page on Tailscale shows a list of ports open on all the machines, so it's easy to see what services are active and how to get to them.

Adding Tailscale to Vagrant

Adding Tailscale to Vagrant is straightforward. Generate an authentication key, make it reusable, and save it into 1Password for provisioning. 1Password has a CLI that's very useful in managing secrets – this blog post is already too long, but there's an example repository that shows how to provision secrets.

I started off with Virtualbox, but have been experimenting with libvert. To use libvert, add vagrant-libvert plugin:

sudo apt install libvert-dev
sudo apt-get purge vagrant-libvirt
sudo apt-mark hold vagrant-libvirt
sudo apt-get update 
sudo apt-get install -y qemu libvirt-daemon-system ebtables libguestfs-tools vagrant ruby-fog-libvirt

Using vagrant-env plugin, you can then set up Tailscale on startup and shutdown:

Vagrant.configure("2") do |config|
  config.env.enable

  config.vm.box = ENV['VM_BOX']
  config.vm.hostname = ENV['VM_HOSTNAME']    
  config.vm.provider ENV['VM_ENGINE'] do |v| 
    v.name = ENV['VM_HOSTNAME']
    v.memory = ENV['VM_MEMORY']
    v.cpus = ENV['VM_CPUS']
  end

  config.vm.provision "tailscale-install", type: "shell" do |s|
    s.inline = "curl -fsSL https://tailscale.com/install.sh | sh"
  end

  config.vm.provision "tailscale-up", type: "shell" do |s|
    s.inline = "tailscale up --ssh --operator=vagrant --authkey #{ENV['TAILSCALE_AUTHKEY']}"
  end

  config.trigger.before :destroy do |trigger|
    trigger.run_remote = {inline: "tailscale logout"}
    trigger.on_error = :continue
  end

end

To start it, I run vagrant up. To stop it, I run vagrant halt. When I'm done experimenting with the environment, I destroy it with vagrant destroy, and it removes itself from the tailnet automatically.

Adding Docker to Vagrant

Now we have to add Docker to Vagrant:

Vagrant.configure("2") do |config|
  # ...

  config.vm.provision "docker-install", type: "shell" do |s|
    s.inline = <<-SCRIPT
curl -fsSL https://get.docker.com -o get-docker.sh &&
sudo sh get-docker.sh &&
sudo adduser vagrant docker
SCRIPT
  end

end

Now we can have some fun – we'll run two docker compose instances side by side without port conflicts. Checkout awesome-compose so it shows up on the /vagrant mount:

$ vagrant ssh
$ cd /vagrant/awesome-compose/nginx-golang
$ docker compose up

And then again, only this time we have vagrant-nginx-nodejs-redis as the hostname:

$ vagrant ssh
$ cd /vagrant/awesome-compose/nginx-nodejs-redis
$ docker compose up

Now we've got two nginx instances, both running on port 80 – but they just appear as different hosts.

$ curl vagrant-nginx-nodejs-redis
$ web1: Number of visits is: 1

and

$ curl vagrant-nginx-golang

          ##         .
    ## ## ##        ==
 ## ## ## ## ##    ===
/"""""""""""""""""\___/ ===
{                       /  ===-
\______ O           __/
 \    \         __/
  \____\_______/


Hello from Docker!

I also have a vagrant-docker box that I use for ad-hoc installations. From the laptop, I can install only the docker CLI and set DOCKER_HOST to point the box:

sudo apt install docker-ce-cli
export DOCKER_HOST=ssh://vagrant@vagrant-docker
ssh vagrant@vagrant-docker # or tailscale ssh vagrant@vagrant-docker
docker ps # will work after ssh succeeded!

And now I can run various services directly and hit them at http://vagrant-docker:3000.

Disposable Cloud Environments

The limitation of using Docker Compose is that you're still referencing the Vagrant box, and picking out a service by port. If you have a more complex environment, you'll probably have several databases, a key/store server, several microservices and so on. Really, you'd like to be able to spin up Kubernetes inside a Vagrant Box and access all the pods through Tailscale automatically.

Setting up Kubernetes itself is suprisingly simple in a Vagrantfile. For example, setting up Kind is as simple as a vagrant up:

Vagrant.configure("2") do |config|

  # ...install Docker

  config.vm.provision "kind", type: "shell" do |s|
    s.inline = <<-SCRIPT
curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.18.0/kind-$(uname)-amd64"
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
SCRIPT
  end

  config.vm.provision "kubectl", type: "shell" do |s|
    s.inline = "sudo snap install kubectl --classic"
  end

  config.vm.provision "kubectl-completion", type: "shell" do |s|
    s.inline = 'echo "source <(kubectl completion bash)" >> ~/.bashrc'
  end

end

Setting up Tailscale in Kubernetes is… not so simple. This took some trial and error, and I leaned heavily on Escaping the Nested Doll with Tailscale when going through this.

I like to start fresh so I immediately wipe the cluster to clean everything:

$ kind delete cluster
$ kind create cluster

Tailscala's Kubernetes subnet routing section is a bit confusing and out of date, so I used the README.md in https://github.com/tailscale/tailscale/tree/main/docs/k8s which is slightly different.

First we need to set up with an auth key and write it as a k8s service account token secret and pass it through:

$ kubectl apply -f - <<EOF apiVersion: v1
kind: Secret
metadata:
  name: tailscale-auth
stringData:
  TS_AUTHKEY: <your-auth-key>
EOF

Then we checkout the github project and get to the Makefile:

$ git clone https://github.com/tailscale/tailscale
$ cd tailscale/docs/k8s

And execute make rbac:

$ sudo apt install make
$ export SA_NAME=tailscale
$ export TS_KUBE_SECRET=tailscale-auth
$ make rbac

Next, we want to set up a subnet router. We need pod and service IP addresses. We can set up an nginx instance from the k8s docs:

$ kubectl apply -f https://k8s.io/examples/application/deployment.yaml

From there, we can see the IP address of the pods:

$ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP           NODE                 NOMINATED NODE   READINESS GATES
nginx-deployment-85996f8dbd-6clrn   1/1     Running   0          5h34m   10.244.0.6   kind-control-plane   <none>           <none>
nginx-deployment-85996f8dbd-jwk4q   1/1     Running   0          5h34m   10.244.0.5   kind-control-plane   <none>           <none>

and the service:

$ kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   5h35m

Then we set the TS_ROUTES and call make subnet-router:

$ SERVICE_CIDR=10.96.0.0/16
$ POD_CIDR=10.244.0.0/15
$ export TS_ROUTES=$SERVICE_CIDR,$POD_CIDR
$ make subnet-router
pod/subnet-router created

And finally I can see the subnet router defined in Tailscale:

$ tailscale ping subnet-router
pong from subnet-router (100.96.237.125) via DERP(sfo) in 26ms

Next, we need to go to the machines page and it will have a little alert next to it saying "Unapproved subnet routes!" Go to the "Edit routes settings" menu option and click Approve All.

And now the pods are accessible via IP address through Tailscale and I can see them from my Windows machine and my iPhone:

C:\Users\wsargent>curl 10.244.0.6
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Note that the pods do not show up in tailscale status, and they are not accessible by pod name – only IP address. There is a way to hook Tailscale into the k8s DNS, but I haven't dug into it. I'll abstract this into a Vagrantfile eventually, but this is a good place to stop.

Further Work

This approach works out of the box, but could use some optimization.

I need to set up a Docker registry to cache images, and an apt cache so that initializing Vagrant boxes doesn't go over the network. It might make sense to have them run on devserver specifically so they don't have to rely on specific vagrant instances being up (and I can't wipe them out by accident).

I could also make a Vagrant basebox with Tailscale and Docker and remove that bit from initialization.