环境要求

3台Linux虚拟机, 能联网, 我使用的发行版是Rocky Linux 9.4
最低配置: 2CPU, 2G内存, 20G硬盘
3台虚拟机的IP和主机名如下

1
2
3

k8s-master 192.168.52.200 (k8s主节点)
k8s-node1 192.168.52.201 (k8s从节点1)
k8s-node2 192.168.52.202 (k8s从节点2)

0. 准备操作

关闭防火墙

1 2	systemctl stop firewalld systemctl disable firewalld

关闭SELINUX

1 2	setenforce 0 sed --follow-symlinks -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config

修改主机名

k8s主节点执行

1	hostnamectl set-hostname k8s-master

从节点1执行

1	hostnamectl set-hostname k8s-node1

从节点2执行

1	hostnamectl set-hostname k8s-node2

修改hosts文件

修改所有机器的/etc/hosts文件, 添加三台虚拟机的IP和主机名

1
2
3

192.168.52.200 k8s-master 
192.168.52.201 k8s-node1 
192.168.52.202 k8s-node2

禁用交换分区

kubelet默认行为是在节点上检测到交换内存时无法启动，所以这里先禁用交换分区。临时禁用交换分区方法:

1	swapoff -a

如需要永久禁用交换分区，修改/etc/fstab, 删除swap分区那一行的配置。

加载内核模块, 再设置内核参数

临时加载内核模块

1 2	modprobe ip_vs_rr modprobe br_netfilter

每次启动自动加载
vim /etc/modules-load.d/k8s.conf

1 2	overlay br_netfilter

再设置内核参数

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables 	= 1
net.ipv4.ip_forward 				= 1
EOF

sysctl -p /etc/sysctl.d/k8s.conf

1. 安装containerd, kubeadm, kubelet, kubectl, calico

国内机器需要更换YUM国内源

sed -e 's|^mirrorlist=|#mirrorlist=|g' \
    -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' \
    -i.bak \
    /etc/yum.repos.d/[Rr]ocky-*.repo
dnf makecache

所有机器安装containerd

1
2
3

dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo # 国内用阿里源
#dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
dnf install -y containerd

配置containerd

1	containerd config default > /etc/containerd/config.toml

再修改/etc/containerd/config.toml

把SystemdCGroup的值改成true
把sandbox_image改为registry.aliyuncs.com/google_containers/pause:3.9 (国内用户需要做这一步，把镜像换成阿里的)

启动containerd

1 2	systemctl daemon-reload systemctl enable --now containerd

查看containerd版本

# ctr version
Client:
  Version:  1.7.24
# runc -v
runc version 1.2.2

所有机器安装kubeadm, kubelet, kubectl

先添加k8s的repo

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

再通过yum安装kubelet, kubeadm, kubectl

1 2	yum install -y kubelet kubeadm kubectl systemctl enable --now kubelet

查看kubelet, kubeadm, kubectl版本

kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.2", GitCommit:"89a4ea3e1e4ddd7f7572286090359983e0387b2f", GitTreeState:"clean", BuildDate:"2023-09-13T09:34:32Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}

[root@k8s-master yum.repos.d]# kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
The connection to the server localhost:8080 was refused - did you specify the right host or port?

[root@k8s-master yum.repos.d]# kubelet --version
Kubernetes v1.28.2

说明: kubelet现在每隔几秒就会重启，它陷入了一个等待 kubeadm 指令的死循环, 这是符合预期的。接下来需要在主节点执行kubeadm init，初始化k8s集群

2. 初始化k8s集群

在主节点执行kubeadm init

kubeadm init --apiserver-advertise-address 192.168.52.200 \
			 --image-repository registry.aliyuncs.com/google_containers \
			 --kubernetes-version v1.28.2 \
			 --pod-network-cidr=198.18.0.0/16

参数说明:
–apiserver-advertise-address：监听地址，填主节点IP
–image-repository：国内用户需指定镜像地址为阿里云的，默认是海外镜像你无法访问。
–kubernetes-version：指定kubernetes的版本
–pod-network-cidr=198.18.0.0/16 (这个cidr表示Pod的IP地址范围，根据你的网络环境自定义，不能和其他IP发生冲突即可)

执行时间较长，耐心等几分钟。执行成功后，会打印如下内容，提示你下一步怎么做

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.52.200:6443 --token zk8fth.5psohqfk9lomq0tw \
        --discovery-token-ca-cert-hash sha256:f32851bb6a86cc7f0a394f1d77e1db5b217cde1b0f40909ee3916959519173f7

我使用的是root用户，参照上面的提示，只需export环境变量KUBECONFIG，操作如下：
编辑/etc/profile，结尾添加一行

1	export KUBECONFIG=/etc/kubernetes/admin.conf

使环境变量立即生效

1	source /etc/profile

此时，kubectl已经可以查到如下pod, 但coredns pod运行不成功。下一步需要在主节点上安装网络插件

kubectl get pods -A
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-66f779496c-8ctsc             0/1     Pending   0          3m11s
kube-system   coredns-66f779496c-hx76v             0/1     Pending   0          3m11s
kube-system   etcd-k8s-master                      1/1     Running   0          3m24s
kube-system   kube-apiserver-k8s-master            1/1     Running   0          3m24s
kube-system   kube-controller-manager-k8s-master   1/1     Running   0          3m24s
kube-system   kube-proxy-89c9k                     1/1     Running   0          3m11s
kube-system   kube-scheduler-k8s-master            1/1     Running   0          3m24s

3. 在主节点安装calico网络插件

安装网络插件，可以选择calico或者flannel，我这里选calico，安装最新版本
calico安装，我参考了官方文档: https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart

1、Install the Tigera Calico operator and custom resource definitions.

1
2
3

yum install -y wget
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/tigera-operator.yaml
kubectl create -f tigera-operator.yaml

2、Install Calico by creating the necessary custom resource.

1
2
3

wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/custom-resources.yaml
修改custom-resources.yaml, 把cidr改成198.18.0.0/16
kubectl create -f custom-resources.yaml

3、Confirm that all of the pods are running with the following command.

1	watch kubectl get pods -n calico-system

Wait until each pod has the STATUS of Running.

4、Remove the taints on the control plane so that you can schedule pods on it.

1 2	kubectl taint nodes --all node-role.kubernetes.io/control-plane- node/k8s-master untainted

5、 Confirm that you now have a node in your cluster with the following command.

1
2
3

kubectl get nodes -o wide
NAME         STATUS   ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                      KERNEL-VERSION                 CONTAINER-RUNTIME
k8s-master   Ready    control-plane   5h16m   v1.28.2   10.206.216.96   <none>        Rocky Linux 9.4 (Blue Onyx)   5.14.0-427.13.1.el9_4.x86_64   containerd://1.7.24

如果是国内用户，最常见问题是Pod启动失败，无法拉取镜像。你可以用kubectl describe查看启动日志，获取拉取失败的镜像信息，然后从国内源拉取，再改一下tag即可
calico安装需要的镜像如下:

[root@k8s-master ~]# ctr -n k8s.io image list | awk '{print $1}'
REF
docker.io/calico/apiserver:v3.29.1
docker.io/calico/cni:v3.29.1
docker.io/calico/csi:v3.29.1
docker.io/calico/kube-controllers:v3.29.1
docker.io/calico/node-driver-registrar:v3.29.1
docker.io/calico/node:v3.29.1
docker.io/calico/pod2daemon-flexvol:v3.29.1
docker.io/calico/typha:v3.29.1
quay.io/tigera/operator:v1.36.2

国内镜像站: https://docker.aityp.com

举例: 手动拉取docker.io/calico/kube-controllers:v3.29.1镜像，再打tag

1
2

ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.29.1
ctr -n k8s.io images tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.29.1 docker.io/calico/kube-controllers:v3.29.1

安装calicoctl

calicoctl安装，参考了官方文档: https://docs.tigera.io/calico/latest/operations/calicoctl/install

1
2
3

curl -L https://github.com/projectcalico/calico/releases/download/v3.29.1/calicoctl-linux-amd64 -o calicoctl
chmod +x ./calicoctl
cp calicoctl /usr/bin/

查看calico Pod是否创建成功

kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-658d97c59c-gftzg   1/1     Running   0          12m
kube-system   calico-node-x7dhl                          1/1     Running   0          12m
kube-system   coredns-66f779496c-8ctsc                   1/1     Running   0          62m
kube-system   coredns-66f779496c-hx76v                   1/1     Running   0          62m
kube-system   etcd-k8s-master                            1/1     Running   0          62m
kube-system   kube-apiserver-k8s-master                  1/1     Running   0          62m
kube-system   kube-controller-manager-k8s-master         1/1     Running   0          62m
kube-system   kube-proxy-89c9k                           1/1     Running   0          62m
kube-system   kube-scheduler-k8s-master                  1/1     Running   0          62m

此时，通过calicoctl可以查到主节点k8s-master

# calicoctl node status
Calico process is running.

IPv4 BGP status
No IPv4 peers found.

IPv6 BGP status
No IPv6 peers found.

# calicoctl get nodes
NAME
k8s-master

4. 把其他两个从节点加入集群

先在Master节点上获取token

1
2
3

kubeadm token list | awk '{print $1}'
TOKEN
zk8fth.5psohqfk9lomq0tw

默认token 24小时内过期，如果过期了，可以在主节点上重新创建新token

1	kubeadm token create

再从主节点上获取–discovery-token-ca-cert-hash的值

1
2
3

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'
f32851bb6a86cc7f0a394f1d77e1db5b217cde1b0f40909ee3916959519173f7

最后,在两个从节点执行kubeadm join命令，将从节点加入集群:

kubeadm join --token zk8fth.5psohqfk9lomq0tw \
		192.168.52.200:6443 \
		--discovery-token-ca-cert-hash sha256:f32851bb6a86cc7f0a394f1d77e1db5b217cde1b0f40909ee3916959519173f7

[preflight] Running pre-flight checks
        [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

将kubelet设为开机自启动

1	systemctl enable kubelet.service

回到主节点，查看从节点是否加入成功

需要等几分钟，直到所有Pod创建完成，如下:

watch kubectl get pods -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE
calico-apiserver   calico-apiserver-7f67554766-dfzc9          1/1     Running   0          22m
calico-apiserver   calico-apiserver-7f67554766-xsdlk          1/1     Running   0          22m
calico-system      calico-kube-controllers-6b7df74554-qxqgf   1/1     Running   0          22m
calico-system      calico-node-7lhhg                          1/1     Running   0          10m
calico-system      calico-node-ksrfp                          1/1     Running   0          22m
calico-system      calico-node-lnt8q                          1/1     Running   0          10m
calico-system      calico-typha-6855cf6f56-dzfd4              1/1     Running   0          22m
calico-system      calico-typha-6855cf6f56-z77jt              1/1     Running   0          10m
calico-system      csi-node-driver-cn6h5                      2/2     Running   0          10m
calico-system      csi-node-driver-pqrw8                      2/2     Running   0          10m
calico-system      csi-node-driver-rhd5m                      2/2     Running   0          22m
kube-system        coredns-5dd5756b68-84wqr                   1/1     Running   0          5h32m
kube-system        coredns-5dd5756b68-hc9xm                   1/1     Running   0          5h32m
kube-system        etcd-k8s-master                            1/1     Running   0          5h32m
kube-system        kube-apiserver-k8s-master                  1/1     Running   0          5h32m
kube-system        kube-controller-manager-k8s-master         1/1     Running   0          5h32m
kube-system        kube-proxy-2hmd5                           1/1     Running   0          10m
kube-system        kube-proxy-4h2cz                           1/1     Running   0          10m
kube-system        kube-proxy-6qptd                           1/1     Running   0          5h32m
kube-system        kube-scheduler-k8s-master                  1/1     Running   0          5h32m
tigera-operator    tigera-operator-c7ccbd65-rddsw             1/1     Running   0          5h25m

可以查到两个从节点已经Ready

# kubectl get nodes
NAME         STATUS   ROLES           AGE     VERSION
k8s-master   Ready    control-plane   5h33m   v1.28.2
k8s-node1    Ready    <none>          11m     v1.28.2
k8s-node2    Ready    <none>          11m     v1.28.2

# calicoctl node status
calicoctl node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+---------------+-------------------+-------+----------+-------------+
| 10.206.216.98 | node-to-node mesh | up    | 08:48:32 | Established |
| 10.206.216.99 | node-to-node mesh | up    | 08:49:43 | Established |
+---------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

# calicoctl get nodes
NAME
k8s-master
k8s-node1
k8s-node2

5. 测试集群，在集群上部署Nginx

创建Nginx deployment, 设置副本数为3

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3 # 副本数设置为3，假设集群中有三个节点，每个节点上会尝试运行一个Nginx实例（但k8s会根据资源情况和调度策略来实际分配）
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

使用kubectl apply -f nginx-deployment.yaml命令部署Nginx

# kubectl get pods  -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
nginx-deployment-7c79c4bf97-2bk2q   1/1     Running   0          22m   198.18.169.130   k8s-node2    <none>           <none>
nginx-deployment-7c79c4bf97-5pdr5   1/1     Running   0          22m   198.18.235.199   k8s-master   <none>           <none>
nginx-deployment-7c79c4bf97-w4s8h   1/1     Running   0          22m   198.18.36.66     k8s-node1    <none>           <none>

可以看到，每个节点都成功部署了一个Nginx

删除所有节点的Nginx Pod

1 2	[root@k8s-master ~]# kubectl delete deploy nginx-deployment deployment.apps "nginx-deployment" deleted

6. 从节点退出集群

在主节点下删除节点

1 2	kubectl delete node k8s-node1 kubectl delete node k8s-node2

从节点退出后，如果想重新加入，在从节点上执行如下命令

1
2
3

systemctl stop kubelet
rm -rf /etc/kubernetes/*
kubeadm join ... # 从节点上执行，重新加入集群

参考

【1】 https://medium.com/@redswitches/install-kubernetes-on-rocky-linux-9-b01909d6ba72
【2】 https://juejin.cn/post/7286669548740591673
【3】 https://cloud.tencent.com/developer/article/2255721

PC's Blog

使用kubeadm搭建kubernetes集群