What happens if a pod in AKS initiates a connection with a private endpoint? Which private IP address does the outbound connection uses?
This is relevant for a private IP inside the same VNET, a peered VNET or an IP accessible via a VPN or Express Route.
In general, the private IP of the VM is used as an outbound private IP. This shouldn’t be confused with the slightly more complex rules about which public IP is used when we contact a public endpoint.
But if we are using kubenet plugin, Kubernetes does its own networking virtualization where pods get a cluster IP. Would that IP be used as an outbound IP?
I thought it would be interesting to simply do an experience to determine the answer.
Basically, we will deploy an AKS cluster with Kubenet plugin in a subnet. In a separate subnet, we are going to deploy an Azure Container Instance (ACI). On that latter subnet we are going to deploy a Network Security Group (NSG) to guard incoming connections.
We are going to test a connection to ACI from AKS and see how we can block it with NSGs.
As usual, code is in Github.
Creating the cluster
Let’s start by downloading a script file from GitHub:
curl https://raw.githubusercontent.com/vplauzon/aks/master/kubenet-outbound/create-cluster.sh \ > create-cluster.sh
We are going to run that script with five parameters:
|Name of the resource group||If the group doesn’t exist, the script will create it|
|Azure region||Must be compatible with regions supporting ACI in VNET. At the time of this writing, i.e. end-of-March 2019, that means one of the following: EastUS2EUAP, CentralUSEUAP, WestUS, WestCentralUS, NorthEurope, WestEurope, EastUS or AustraliaEast.|
|Name of cluster||This is also used as the DNS prefix for the cluster, hence must be unique|
|Service Principal Application ID||Application ID of a Service Principal|
|Service Principal Object ID||Object ID of the same Service Principal|
|Service Principal Password||Password of the same Service Principal|
The last three parameters are related to the Service Principal that will be used by AKS.
Let’s run the command locally, e.g.:
./create-cluster.sh aks-group eastus myuniqueaks \ <my-principal-app-id> \ <my-principal-object-id> \ <my-principal-password>
This will run for several minutes and create 4 resources in the resource group:
- A Network Security Group (NSG) named aciNsg
- A Virtual Network named cluster-vnet
- An Azure Container Instance (ACI) named myContainerGroup
- An AKS cluster named as specified in the script parameters
The script will also connect kubectl to the newly created cluster (
az aks get-credentials).
ACI IP address
The script also outputs an IP address, e.g.:
Successfully deployed cluster myuniqueaks and ACI with IP 172.16.32.4 Connect kubectl to newly created cluster myuniqueaks... Merged "myuniqueaks" as current context in /home/myusername/.kube/config
Here the IP is 172.16.32.4. Let’s copy that IP.
Connect to ACI
Let’s do the experiment. Let’s deploy an observer pod within AKS:
$ kubectl run --rm -it --image=appropriate/curl:latest observer --generator=run-pod/v1 --command sh
This lands our session on a command prompt within a pod.
Let’s try to contact ACI:
/ # watch -n 2 curl -v --connect-timeout 1 <ACI IP>
We should see something like refreshing every 2 seconds:
Every 2s: curl -v --connect-timeout 1 172.16.32.4 2019-03-22 21:56:44 * Rebuilt URL to: 172.16.32.4/ * Trying 172.16.32.4... * TCP_NODELAY set * Connected to 172.16.32.4 (172.16.32.4) port 80 (#0) > GET / HTTP/1.1 > Host: 172.16.32.4 > User-Agent: curl/7.59.0 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 200 OK < Content-Type: text/html; charset=utf-8 < Content-Length: 130 < Server: Werkzeug/0.14.1 Python/2.7.14 < Date: Fri, 22 Mar 2019 21:56:44 GMT < * Closing connection 0 <h3>Hello World!</h3><b>Hostname:</b> wk-caas-416604191f9b41ada1766436a3c4673b-203163b74dfca1b08abdec<br/><b>Visits:</b> undefined
The key part is that connection is established, so AKS can talk to ACI.
Let’s go to the Azure Portal and look at the NSG:
The first rule let traffic coming from 172.16.0.0/20. This correspond to the subnet occupied by AKS nodes.
The second rule let Azure Firewall probe pass (not used here but always good to have) while the third rule forbids every other traffic.
Let’s modify the first rule by simply changing its priority from 100 to 400. We should end up with:
Now if we look at our watch, we should have something like the following:
Every 2s: curl -v --connect-timeout 1 172.16.32.4 2019-03-22 22:03:08 * Rebuilt URL to: 172.16.32.4/ * Trying 172.16.32.4... * TCP_NODELAY set * Connection timed out after 1001 milliseconds * Closing connection 0 curl: (28) Connection timed out after 1001 milliseconds
Basically, the connection now fails.
We can conclude from our experiment that outbound connection from an AKS cluster with kubenet plugin are still within the AKS subnet.
This means that to allow access for AKS workload, we simply need to “whitelist” the AKS subnet.
That whitelisting doesn’t discriminate between AKS workloads: any pod running on the cluster will be whitelisted.