tl,dr;

Tailscsale’s kubernetes operator lets you centralize access and authorization to administer clusters across a business. The operator also provides ingress, loadbalancer, and cross cluster connectivity for services, making it easier to secure apps and services running on clusters in an organization.

What is it?

The Kubernetes operator brings the Tailscale network ideas to k8s pods and services. Previously it was possible to create sidecars in pods to enable Tailnet access to services inside a cluster or for a pod to access another service across the Tailnet, but it involved using the Tailscale container image and lots of manual pod building. Now the operator standardizes that for users and formalizes offerings such as automatically providing ingresses with valid SSL, performing cross cluster connectivity, and even running subnet routers and exit nodes (and likely App Connectors also).

Tailscale has also introduced the ability to proxy access to kube-apiserver, allowing for policies to dictate who can see the kube-apiserver endpoints on different clusters, and enforcing fine grained access using grants if desired. This could replace a lot of secondary infrastructure access tools used in a company when combined with the SSH session recording. In it’s simplest form, it acts as an ingress to the kube-apiserver endpoint, allowing for users on a Tailnet to access the endpoint if the ACLs allow it, and from there the users would have to provide the appropriate authentication. Similar to the app connector issue, this provides a way to gate access to sensitive business resources without having to force all users traffic over a single tunnel. The Tailscale cli handles configuring the users machine with the tailscale configure kubeconfig command. This allows for easy access to the kubernetes control plane while conforming to best practices about limited access to the kube-apiserver interface itself - previously organizations would resort to jump hosts, cloud vendor specific tools, or a combination of the above to gate access.

The operator can be configured with a privileged service account, allowing for it to proxy requests as a specific role. This works with latest grants syntax, where Tailscale dictates which role the proxy will assume on behalf of a specific user. The consistency of having one place to declare both which clusters a user can access and which permissions they can have in those clusters makes this simple to administer and for end users to follow. The low touch nature of Tailscale for endusers (they continue to use their everyday tools as normal, with the tunnels to their approved services controlled via policy) means that makes for a really practical way to introduce another level of security to a business organization.

Trying it out

Installing this was painless. Assuming these commands are run on a node that has helm and access to the kubernetes cluster in question already (I installed k3s as an excuse to try this out).

First ensure you have the tags created and acls to access the resources created using them, the second ACL policy enables pods tagged with k8s to communicate with each other across the cluster:

"tagOwners": {
   "tag:k8s-operator": [],
   "tag:k8s": ["tag:k8s-operator"],
}
"acls": [
    {
        "action": "accept",
        "src":    ["autogroup:admin"],
        "dst": [
            "tag:k8s-operator:*",
            "tag:k8s:*",
        ],
    },
    {
        // allow cross cluster pods to communicate with each other
        "action": "accept",
        "src":    ["tag:k8s"],
        "dst": [
            "tag:k8s:*",
        ],
    },
]

Then generate the oauth client with the tag “k8s-operator”. Add the helm repo and update it:

helm repo add tailscale https://pkgs.tailscale.com/helmcharts
helm repo update

Install the kubernetes operator. Since I wanted to try out both ingress and auth proxy options, I added --set-string apiServerProxyConfig.mode="true" - this configures the proxy and installs the needed rbac role.

helm upgrade \
  --install \
  tailscale-operator \
  tailscale/tailscale-operator \
  --namespace=tailscale \
  --create-namespace \
  --set-string oauth.clientId=<OAauth client ID> \
  --set-string oauth.clientSecret=<OAuth client secret> \
  --set-string apiServerProxyConfig.mode="true" \
  --wait

After a few minutes, you should see an operator in the output of kubectl get pods -n tailscale if everything is working. A new machine will appear in Tailscale as tailscale-operator in the Machines list and in the output of tailscale status if the oauth client / tags were setup correctly, allowing it to join the tailnet. A tailscale ping tailscale-operator may even show the nat traversal magic happening in realtime:

> tailscale ping tailscale-operator
pong from tailscale-operator (100.65.146.77) via DERP(lhr) in 57ms
pong from tailscale-operator (100.65.146.77) via DERP(lhr) in 42ms
pong from tailscale-operator (100.65.146.77) via DERP(lhr) in 33ms
pong from tailscale-operator (100.65.146.77) via 10.98.32.129:52643 in 1ms

Below is an example manifest that shows two ways to expose a service, first by annotating the service itself by adding tailscale.com/expose and second by adding an ingress. The difference between the two is that annotating the service will expose all the ports of the service to the tailnet while an Ingress provides only HTTPS so requires magicdns and ssl certificates enabled for the tailnet. An ingress also allows one to share a service to the public via Funnels. In practice annotating a service will be useful when combined with the egress option to allow you to bridge services across clusters over a tailscale tunnel.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app.kubernetes.io/name: proxy
spec:
  containers:
  - name: nginx
    image: nginx:stable
    ports:
      - containerPort: 80
        name: http-web-svc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  annotations:
    tailscale.com/expose: "true"
    tailscale.com/tags: "tag:k8s"
spec:
  selector:
    app.kubernetes.io/name: proxy
  ports:
  - name: name-of-service-port
    protocol: TCP
    port: 80
    targetPort: http-web-svc
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  annotations:
    tailscale.com/tags: "tag:k8s"
spec:
  defaultBackend:
    service:
      name: nginx-service
      port:
        number: 80
  ingressClassName: tailscale
  tls:
  - hosts:
    - nginx

With manifest deployed, tailscale status should show nginx (the ingress node) and default-nginx-service (the service node), pinging both of them should confirm they are reachable from a test machine if the policies were done correctly:

> tailscale ping nginx
pong from nginx (100.111.59.138) via DERP(lhr) in 39ms
pong from nginx (100.111.59.138) via DERP(lhr) in 49ms
pong from nginx (100.111.59.138) via DERP(lhr) in 33ms
pong from nginx (100.111.59.138) via 10.90.8.7:56750 in 5ms
> tailscale ping default-nginx-service
pong from default-nginx-service (100.111.249.130) via DERP(lhr) in 32ms
pong from default-nginx-service (100.111.249.130) via DERP(lhr) in 29ms
pong from default-nginx-service (100.111.249.130) via DERP(lhr) in 31ms
pong from default-nginx-service (100.111.249.130) via 10.90.8.7:7663 in 5ms

And curling both of them will show the same default nginx page. The first time attempting curl https://nginx may take a moment or possibly timeout, as the Tailscale serve process is requesting a certificate from LetsEncrypt and that takes a moment. Since the service annotation based pod is just exposing port 80, it should be immediately available via curl http://default-nginx-service. If deployed using the above manifests, one can get the logs for the Ingress based Tailscale pod are possible with kubectl logs -l tailscale.com/parent-resource=nginx -n tailscale and the logs for the Service annotation based Tailscale pod with kubectl logs -l tailscale.com/parent-resource=nginx-service -n tailscale.

To try cross cluster communication, I’m going to use the docker desktop kubernetes on my laptop and add the tailscale operator, I’ll reuse the tags and commands to install the helm chart. It’s possible to customize the operator hostname, but in this instance I’m relying on Tailscale to rename the second operator for me. Once that is running, I will deploy a second manifest, this one exposes the default-nginx-service that was created in the first cluster, but to do that the manifest has to be updated with the full magicdns name of the service (atleast until this is merged).

apiVersion: v1
kind: Service
metadata:
  annotations:
    tailscale.com/tailnet-fqdn: default-nginx-service.<full magicdns of host>
    tailscale.com/tags: "tag:k8s"
  name: nginx-remote
spec:
  externalName: nginx-remote
  type: ExternalName

Once it finishes, check the service for the external-ip to populate. Once it does, you know that the pod is connected to the other cluster and can be checked by running curl against the domain from inside a pod in this cluster.

> kubectl get service nginx-remote
NAME           TYPE           CLUSTER-IP   EXTERNAL-IP                                         PORT(S)   AGE
nginx-remote   ExternalName   <none>       ts-nginx-remote-vpbgg.tailscale.svc.cluster.local   <none>    92s

And then checking from inside the cluster, it’s possible to verify the generic nginx page:

> kubectl run curl --image=curlimages/curl -i --tty -- sh
~ $ curl ts-nginx-remote-vpbgg.tailscale.svc.cluster.local
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

That’s an http request running from cluster B on my laptop, being routed via tailscale to the nginx service running on cluster A. I’m not navigating any difficult firewalls or having to expose the service in cluster A to the public internet so I can access it from my laptop. Complex ssh proxies aren’t involved, it just works. For devs wanting to test out parts of a kubernetes based application this is a great way to expose large datasets or common services that they would otherwise have to replicate in their development evironments. The first part of exposing the services to Tailscale already made it easy for a docker container on a laptop to connect with it, but this allows me to ensure a whole app deployment works properly and interact with the service as expected. Plus cluster A was hosting my backend services, while working on the front end in cluster B, and I wanted to share my UI revisions to a team at large, I could add a Tailscale ingress in front of my in progress service and share that out to the rest of my team, who would be able to access it over the tailscale connections (so from Sarah’s laptop => ingress pod on my laptop => my front end app => egress to dev cluster => ingress/service for backend app on dev cluster).

Auth Mode

By installing the Operator with --set-string apiServerProxyConfig.mode="true", the cluster is almost ready for controlled access via Tailscale. There is an additional step to creaete a grant, a new aspect of ACLs, that dictates which users can connect to the cluster in proxy mode. Grants are more fine grained access control mechanisms and include application level permissions with routes like format. For the kubernetes operator, this is where you can specificy which roles in a cluster can be impersonated by specific tailscale users or groups. Grants also supports device postures, a new feature that allows one to restrict which of a users devices are allow to access the granted application or destination. In the below example it means that only admins connecting from their MacBooks can have full admin access to the cluster. This can be augmented with other device management tools, to restrict it another level: only admins on devices managed by crowdstrike are allowed access.

"grants": [{
    "src": ["group:admin"],
    "srcPosture": ["posture:latestMac"],
    "dst": ["tag:k8s-operator"],
    "app": {
        "tailscale.com/cap/kubernetes": [{
            "impersonate": {
                "groups": ["system:masters"],
            },
        }],
    },
}],
"postures": {
    "posture:latestMac": [
        "node:os IN ['macos']",
        "node:tsReleaseTrack == 'stable'",
        "node:tsVersion >= '1.60'",
    ],
},

With the grant in place combined with the previous ACL allowing traffic to reach tag:k8s-operator, deploying the kubeconfig is as simple as tailscale configure kubeconfig tailscale-operator. The kubeconfig is generated with the magicdns hostname and certificate, allowing for you to quickly switch to the new cluster with kubectx tailscale-operator and perform the needed actions. In production, instead of having a static group of admins with admin access to the cluster, the admin group can be rotated with whoever is meant to be on call or similar control mechanisms, while giving a larger group of users a view role that would aid in debugging a service or checking on the health. The documentation around setting up authentication and authorization gives an example of the view grant along with creating the role in the cluster. What’s information about the grants system it’s not just for internal tailscale services, but meant to provide a hook for others to build around, with the Operator as a Tailscale provided reference implementation.

Visibility about who performed the actions is still preserved as the Operator logs to the kube-api which Tailscale user it acting on the behalf of, shown here:

"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
...
"requestURI": "/api/v1/namespaces/default/pods/nginx",
"verb": "get",
"user": {
    "username": "system:serviceaccount:tailscale:operator",
    "uid": "7b852fc3-1335-4746-a9e3-c4970a4998a5",
    "groups": [
        "system:serviceaccounts",
        "system:serviceaccounts:tailscale",
        "system:authenticated"
    ],
...
},
"impersonatedUser": {
    "username": "chris@sneezingdog.com",
    "groups": [
        "system:masters",
        "system:authenticated"
    ]
},

Impressions

The Operator is really powerful and a huge useability improvement over the previous method of hand rolling services and sidecars with the tailscale container. For businesses the Operator allows for developers to access and work on the clusters securely, with minimal disruption to their workflow. Egress support makes bridging clusters really easy and could lower the number of developers who need massive workstations to run a full stack if all their working on is a handful of components, and they access the rest remotely. With the point to point nature of Tailscales tunnels, rolling out the tooling and features to a company can be done fairly easily. Everyone gets Tailscale on their machines and as services are activated and policies updated centrally, they take effect on the machines. In the rare occassion a feature requires a user to update their Tailscale package is less of a worry thanks to the auto update and device management integrations they offer. How easily and quickly users can adopt a tool will have a bigger impact on overall organization security than how strong the crypto is. If users can’t or won’t adopt it, or it is so burdensome to use they find work arounds, or it is so costly not every user is protected by it, the effectiveness of the tool wanes. It becomes more likely for doors to be left open or simple lateral movement achieved due to not providing both a secure perimeter and internal boundaries.

Some features specific to the Operator that could round out the experience:

  • Egress via App Connectors: for things like CI workers or companies building SaaS tools that would want a way for specific pods or namespaces to interact with external systems from stable IPs, this would be really useful
  • Checks like SSH for Auth Mode proxy: Requiring a user to re-authenticate when accessing specific clusters would add a layer of security. It helps mitigate the stolen device and credentials scenario. If this was possible today, I know the altnerative centrally auditable kubernetes tool I deployed a the last company I was could be replaced with Tailscale.
  • Kubeconfig definitions in Tailscale - for orgs using noauth mode for the Operator, allowing for tailscale configure kubeconfig generate a template with the definition of the cluster in Tailscale would cut out a step and allow for peope to follow the same workflow regardless of if the cluster is authed or not by Tailscale.