tl,dr;

Setup a k3s cluster on Turing Pi 2 using Tailscale for cross node connectivity. It wasn’t the most terrible experience.

Background

I kickstarted the Turing Pi 2 project back when I was at Gitpod because at the time the idea of using it as a self hosted kubernetes cluster to run Gitpod seemed like a fun work project. Then Gitpod discontinued the self hosted product, Raspberry Pi compute modules were impossible to find, and the board itself was delayed forever. When I finally had all the assembled pieces available to me, I didn’t really have an immediate need to use it, so it sat on a shelf.

It’s an interesting board but also a reminder that kickstarter is not the best place to buy hardware. I’d forgotten about the limitations around IO, specifically that the CM4 modules can’t actually use the NVME slots on the back - those are for turingpi’s own boards. The management interface could use some polish, but once I got it sorted out, it is very useful as a way to remotely reflash a different compute modules and deploy them.

Adding this note for others - in the latest firmware if you want to mount a compute module as a USB device, the command is now tpi advanced -n $NODE msd and to unmount/book it is tpi advanced -n $NODE normal. The sanest way to work with the firmware is after updating it, reformat the SD card you used and copy the .img files you want to flash with, then follow the steps on their steps on flashing with the CLI. Once you’ve imaged the drive, you will need to mount it and do the usual drop in of files for SSH / user account creation before you can boot it, and then you’re set. Not 100% required if you’re using microSD cards, but since I had to build this from whatever CM4’s options were avilable, I ended up with a few with eMMC built in storage (which replaces the SD card interface).

Getting k3s + tailscale installed

Prior to running the k3s installer, the hosts need to be prepped. For Tailscale on Raspberry Pi OS (bookworm), that means allowing port forwarding on the hosts and updated ethtool for performance features:

echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
sudo sysctl -p /etc/sysctl.d/99-tailscale.conf

# Also on bookworm/raspberry pi os
# https://tailscale.com/kb/1320/performance-best-practices#ethtool-configuration
sudo nmcli con modify "Wired connection 1" ethtool.feature-rx-udp-gro-forwarding on ethtool.feature-rx-gro-list off

# Can also tweak this by adding systemd-resolved which isn't present in bookworm
# https://tailscale.com/kb/1188/linux-dns

sudo apt install -y systemd-resolved
sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
sudo systemctl restart systemd-resolved
sudo systemctl restart NetworkManager


# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh

Once Tailscale is installed on each node, but not activated, make sure cgroups are enabled in the kernel by appending cgroup_memory=1 cgroup_enable=memory to /boot/firmware/cmdline.txt per k3s notes and restart the machine.

Now that the nodes are prepared, there needs to be a Tailscale tag and authkey created, along with the access controls updated to allow access to advertised routes. The tag created is rpi-nodes, it will have approval to create any routes in the subnet range of 10.42.0.0/16 (the default for k3s). The admin group will have ssh access to the tag.

"tagOwners": {
	"tag:rpi-nodes":    ["autogroup:admin"],
    // more here ...
}
"autoApprovers": {
    "routes": {
        "10.42.0.0/16": ["tag:rpi-nodes"],
        // more here ...
    }
}
"ssh": [
    {
        "action": "accept",
        "src":    ["group:admin"],
        "dst":    ["tag:rpi-nodes"],
        "users":  ["autogroup:nonroot"],
    }
    // more acls here
]

The ACLs here are a bit more in depth if you do not have the generic “allow all to all” that comes with a default Tailnet. Since these nodes are running as subnet routers to connect the pods between each nodes, they need to have the cidr ranges included to allow traffic, not just tagged traffic. Without this, pods on rpi-node-2 won’t be able to reach pods on node-rpi-1, etc.

"acls": [
    {
        // allow inbound access to the nodes from admin group
        "action": "accept",
        "src": ["group:admin"],
        "dst": ["tag:rpi-nodes:*"],
    },
    {
        // allow pods on each node to talk to each other
        "action": "accept",
        "src": [
            "tag:rpi-nodes",
            "10.42.0.0/16",
        ],
        "dst": [
            "tag:rpi-nodes:*",
            "10.42.0.0/16:*",
        ],
    },

Once the nodes are prepped (and restarted), you’re able to install k3s. You can specify storing the authkey in a file, but in practice I skipped that for now. By default the k3s Tailscale option doesn’t enable ssh, but you can add it with extraArgs=--ssh in the –vpn-auth flag. Another issue is this only works with auth keys, as the install script right now doesn’t parse being handed tags, a requirement if one is using an oauth token. Adding the --tls-san= flag lets you specify what the full tailnet DNS name of the node will be, ensuring the certificate has all the hostnames on it correctly (if the machines hostname is rpi-node-1 and your tailnet domain is fluffy-mammoth, then rpi-node-1.fluffy-mammoth.ts.net will be the full name to use).

On the dedicated control node run:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server" sh -s - --tls-san="full-node-magic-dns-name" \
--vpn-auth="name=tailscale,joinKey=tskey-auth-12345678,extraArgs=--ssh"

Once it has completed, verify that k3s is running with sudo k3s kubectl get nodes, as this will load the right kubeconfig for the cluster and verify it is working. If you see the node listed and the install completed as expected, then you’re ready to attach another node. From the control node grab the registration token with sudo cat /var/lib/rancher/k3s/server/node-token and run the installation command on the intended agent / worker node:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="agent" K3S_URL=https://full-node-magic-dns-name:6443 \
K3S_TOKEN="contents of sudo cat /var/lib/rancher/k3s/server/node-token" sh -s -  \
--vpn-auth="name=tailscale,joinKey=joinKey=tskey-auth-12345678,extraArgs=--ssh"

The SSH session might quickly hang as part of the installation includes bringing up a ton of firewall rules, but from the control node you can check if the node is registered with the command sudo k3s kubectl get nodes running on the control node.

While you can copy the k3s.yaml provided on the control node (at /etc/rancher/k3s/k3s.yaml) to use for a kubeconfig from your workstation, just changing the service address from server: https://127.0.0.1:6443 to server: https://full-node-magic-dns-name:6443, the Tailscale Kubernetes operator takes that to the next level and was the reason I setup this cluster in the first place. Read more about my experience doing that over here.

Impressions

k3s has matured a lot since I was last looking at it. Now that it has built in support for Tailscale it makes deploying nodes much easier - you can avoid having to configure static DNS or IPs for the nodes, since all the connectivity between them is over Tailscale, which brings assigns those as static attributes for a node. In theory that also means one can deploy remote worker nodes and they will just call home over Tailscale.

Oauth client support would be a nice improvement over expiring auth keys, I can see getting around that with whatever installer platform hooks one was using. Having a Bolt Project that would use an oauth client key stored locally to generate one time use registration auth keys at run time is how I would imagine using it in the future if I were fully automating k3s deployments this way.

In setting up this lab, I started with Ansible, but learned that Puppet now supports Arm64 / Raspberry Pis, along with puppet-bolt still kicking along, means that I may put together a Bolt Project to manage the cluster in the future. Between the remote ssh commands to reimage / configure nodes on the turing pi itself and the ability to use standard puppet modules to manage the state of the clusters, I’m glad to have it in my toolbelt for this project.