RKE2 Cluster Installation With Ansible

Example commands and configs are for 3 masters, 3 workers and an additional jump node all running Ubuntu.

Example topology:

Name	IP
Master-01	10.40.140.4
Master-02	10.40.140.5
Master-03	10.40.140.6
Worker-01	10.40.140.7
Worker-02	10.40.140.8
Worker-03	10.40.140.9
Jump	10.40.140.10

ssh into the jump machine
```
ssh root@10.40.140.10
```
Enter the ssh password when prompted

Set up passwordless SSH

Generate ssh key
```
ssh-keygen
```
Simply press enter when prompted for default name and no password, this will create ~/.ssh/id_rsa
Copy ssh keys to master and worker nodes
```
ssh-copy-id -i ~/.ssh/id_rsa root@10.40.140.4
```
Enter the ssh password when prompted and repeat for all master and worker nodes.
Add the ssh key to the ssh-agent
```
ssh-agent bash
ssh-add ~/.ssh/id_rsa
```
If you get an error run the following and try again
```
eval `ssh-agent`
```

Install Ansible and RKE2 deployment role

Install ansible

sudo apt update
sudo apt install ansible -y

Install the lablabs/rke2 ansible role (https://galaxy.ansible.com/lablabs/rke2)
```
ansible-galaxy install lablabs.rke2
```
This will install the role under the ~/.ansible directory
Navigate to the ~/.ansible directory
```
cd ~/.ansible
```

Create inventory file in ~/.ansible (can be copied from local with scp)

vi inventory

Go into insert mode by pressing i

Copy the following and paste with ctrl+shift+v

[masters]
master-01 ansible_host=10.40.140.4 rke2_type=server
master-02 ansible_host=10.40.140.5 rke2_type=server
master-03 ansible_host=10.40.140.6 rke2_type=server

[workers]
worker-01 ansible_host=10.40.140.7 rke2_type=agent
worker-02 ansible_host=10.40.140.8 rke2_type=agent
worker-03 ansible_host=10.40.140.9 rke2_type=agent

[k8s_cluster:children]
masters
workers

Host names (ex: master-01) can be changed, however they must be lowercase.

Save and quit by pressing Esc, then :wq!

Create playbook.yaml file in ~/.ansible (check previous step for help with vi)
```
- name: Deploy RKE2
  hosts: all
  become: yes
  vars:
    rke2_ha_mode: true
    rke2_api_ip : 10.40.140.4
    rke2_download_kubeconf: true
    rke2_ha_mode_keepalived: false
    rke2_server_node_taints:
      - 'CriticalAddonsOnly=true:NoExecute'
  roles:
    - role: lablabs.rke2
```
rke2_api_ip should point to your load balancer for master nodes, this should be a TCP load balancer on port 6443. If you don't have one it can point to one of the master nodes.

Alternatively you can use keepalived by removing the rke2_ha_mode_keepalived line, and pointing the rke2_api_ip to an unused static IP such as 10.40.140.100
Confirm inventory is working
```
ansible all -i inventory -m ping
```
There should be no errors.

Run the playbook and confirm the RKE2 cluster is up

Run the ansible playbook

ansible-playbook -i inventory playbook.yaml

If you're ssh'ing to the other machines as a non-root user, run the following instead:

ansible-playbook -i inventory playbook.yaml -K

Provide the password for the user you're logging in as when prompted.

If it hangs at 'Wait for remaining nodes to be ready', check if rke2 is installed on all machines, if not the rke2.sh script may not be running. To fix this, edit the ~/.ansible/roles/lablabs.rke2/tasks/rke2.yml by removing the Check RKE2 version task and replacing it with:

- name: Check rke2 bin exists
  ansible.builtin.stat:
    path: "{{ rke2_bin_path }}"
  register: rke2_exists

- name: Check RKE2 version
  ansible.builtin.shell: |
    set -o pipefail
    {{ rke2_bin_path }} --version | grep -E "rke2 version" | awk '{print $3}'
  args:
    executable: /bin/bash
  changed_when: false
  register: installed_rke2_version
  when: rke2_exists.stat.exists

Install kubectl
```
sudo snap install kubectl --classic
```
Copy kubeconfig file to a better directory and export kubeconfig
```
cp /tmp/rke2.yaml ~/rke2.yaml
```
Now we can manage our cluster with kubectl --kubeconfig ~/rke2.yaml, or we can do the following to shorten our commands:
```
export KUBECONFIG=~/rke2.yaml
```
Confirm our cluster is running and with correct internal IP addresses
```
kubectl get nodes -o wide
```
Check the health of our pods
```
kubectl get pods -A
```

Cleanup

Jump

ssh into the Jump machine
```
ssh root@10.40.140.10
```
Make sure we're using the correct kubeconfig
```
export KUBECONFIG=~/rke2.yaml
```

Remove Kubectl and Ansible

sudo snap remove kubectl
sudo apt remove ansible

Remove remaining artifacts

rm -rf /root/.ansible
rm /root/.ssh/id_rsa /root/.ssh/id_rsa.pub
rm ~/rke2.yaml
rm -rf /root/.kube
apt autoremove -y

If you followed the installation guide as a non-root user the directories might be different.

Master and Worker nodes

Repeat the following on all master and worker nodes

ssh into the node
```
ssh root@10.40.140.4
```
Run the RKE2 uninstall script
```
sudo /usr/local/bin/rke2-uninstall.sh
```