环境要求 3台Linux虚拟机, 能联网, 我使用的发行版是Rocky Linux 9.4 最低配置: 2CPU, 2G内存, 20G硬盘 3台虚拟机的IP和主机名如下
1 2 3 k8s-master (k8s主节点) k8s-node1 (k8s从节点1) k8s-node2 (k8s从节点2)
0. 准备操作 关闭防火墙 1 2 systemctl stop firewalld systemctl disable firewalld
关闭SELINUX 1 2 setenforce 0 sed --follow-symlinks -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
修改主机名 k8s主节点执行
1 hostnamectl set-hostname k8s-master
1 hostnamectl set-hostname k8s-node1
1 hostnamectl set-hostname k8s-node2
修改hosts文件 修改所有机器的/etc/hosts文件, 添加三台虚拟机的IP和主机名
1 2 3 k8s-master k8s-node1 k8s-node2
禁用交换分区 kubelet默认行为是在节点上检测到交换内存时无法启动,所以这里先禁用交换分区。临时禁用交换分区方法:
, 删除swap分区那一行的配置。
加载内核模块, 再设置内核参数 临时加载内核模块
1 2 modprobe ip_vs_rr modprobe br_netfilter
每次启动自动加载 vim /etc/modules-load.d/k8s.conf
1 2 3 4 5 6 7 cat <<EOF | tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF sysctl -p /etc/sysctl.d/k8s.conf
1. 安装containerd, kubeadm, kubelet, kubectl, calico 国内机器需要更换YUM国内源 1 2 3 4 5 sed -e 's|^mirrorlist=|#mirrorlist=|g' \ -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' \ -i.bak \ /etc/yum.repos.d/[Rr]ocky-*.repo dnf makecache
所有机器安装containerd 1 2 3 dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo # 国内用阿里源 #dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo dnf install -y containerd
1 containerd config default > /etc/containerd/config.toml
1 2 systemctl daemon-reload systemctl enable --now containerd
1 2 3 4 5 # ctr version Client: Version: 1.7.24 # runc -v runc version 1.2.2
所有机器安装kubeadm, kubelet, kubectl 先添加k8s的repo
1 2 3 4 5 6 7 8 9 cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
再通过yum安装kubelet, kubeadm, kubectl
1 2 yum install -y kubelet kubeadm kubectl systemctl enable --now kubelet
查看kubelet, kubeadm, kubectl版本
1 2 3 4 5 6 7 8 9 10 kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.2", GitCommit:"89a4ea3e1e4ddd7f7572286090359983e0387b2f", GitTreeState:"clean", BuildDate:"2023-09-13T09:34:32Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"} [root@k8s-master yum.repos.d]# kubectl version Client Version: v1.28.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 The connection to the server localhost:8080 was refused - did you specify the right host or port? [root@k8s-master yum.repos.d]# kubelet --version Kubernetes v1.28.2
说明: kubelet现在每隔几秒就会重启,它陷入了一个等待 kubeadm 指令的死循环, 这是符合预期的。 接下来需要在主节点执行kubeadm init,初始化k8s集群
2. 初始化k8s集群 在主节点执行kubeadm init 1 2 3 4 kubeadm init --apiserver-advertise-address \ --image-repository registry.aliyuncs.com/google_containers \ --kubernetes-version v1.28.2 \ --pod-network-cidr=
参数说明: –apiserver-advertise-address:监听地址,填主节点IP –image-repository:国内用户需指定镜像地址为阿里云的,默认是海外镜像你无法访问。 –kubernetes-version:指定kubernetes的版本 –pod-network-cidr= (这个cidr表示Pod的IP地址范围,根据你的网络环境自定义,不能和其他IP发生冲突即可)
执行时间较长,耐心等几分钟。 执行成功后,会打印如下内容,提示你下一步怎么做
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join --token zk8fth.5psohqfk9lomq0tw \ --discovery-token-ca-cert-hash sha256:f32851bb6a86cc7f0a394f1d77e1db5b217cde1b0f40909ee3916959519173f7
我使用的是root用户,参照上面的提示,只需export环境变量KUBECONFIG,操作如下: 编辑/etc/profile,结尾添加一行
1 export KUBECONFIG=/etc/kubernetes/admin.conf
此时,kubectl已经可以查到如下pod, 但coredns pod运行不成功。下一步需要在主节点上安装网络插件
1 2 3 4 5 6 7 8 9 kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-66f779496c-8ctsc 0/1 Pending 0 3m11s kube-system coredns-66f779496c-hx76v 0/1 Pending 0 3m11s kube-system etcd-k8s-master 1/1 Running 0 3m24s kube-system kube-apiserver-k8s-master 1/1 Running 0 3m24s kube-system kube-controller-manager-k8s-master 1/1 Running 0 3m24s kube-system kube-proxy-89c9k 1/1 Running 0 3m11s kube-system kube-scheduler-k8s-master 1/1 Running 0 3m24s
3. 在主节点安装calico网络插件 安装网络插件,可以选择calico或者flannel,我这里选calico,安装最新版本 calico安装,我参考了官方文档: https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
1、Install the Tigera Calico operator and custom resource definitions.
1 2 3 yum install -y wget wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/tigera-operator.yaml kubectl create -f tigera-operator.yaml
2、Install Calico by creating the necessary custom resource.
1 2 3 wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/custom-resources.yaml 修改custom-resources.yaml, 把cidr改成198.18.0.0/16 kubectl create -f custom-resources.yaml
3、Confirm that all of the pods are running with the following command.
1 watch kubectl get pods -n calico-system
Wait until each pod has the STATUS of Running.
4、Remove the taints on the control plane so that you can schedule pods on it.
1 2 kubectl taint nodes --all node-role.kubernetes.io/control-plane- node/k8s-master untainted
5、 Confirm that you now have a node in your cluster with the following command.
1 2 3 kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master Ready control-plane 5h16m v1.28.2 <none> Rocky Linux 9.4 (Blue Onyx) 5.14.0-427.13.1.el9_4.x86_64 containerd://1.7.24
如果是国内用户,最常见问题是Pod启动失败,无法拉取镜像。 你可以用kubectl describe查看启动日志,获取拉取失败的镜像信息,然后从国内源拉取,再改一下tag即可 calico安装需要的镜像如下:
1 2 3 4 5 6 7 8 9 10 11 [root@k8s-master ~]# ctr -n k8s.io image list | awk '{print $1}' REF docker.io/calico/apiserver:v3.29.1 docker.io/calico/cni:v3.29.1 docker.io/calico/csi:v3.29.1 docker.io/calico/kube-controllers:v3.29.1 docker.io/calico/node-driver-registrar:v3.29.1 docker.io/calico/node:v3.29.1 docker.io/calico/pod2daemon-flexvol:v3.29.1 docker.io/calico/typha:v3.29.1 quay.io/tigera/operator:v1.36.2
国内镜像站: https://docker.aityp.com
举例: 手动拉取docker.io/calico/kube-controllers:v3.29.1镜像,再打tag
1 2 ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.29.1 ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.29.1 docker.io/calico/kube-controllers:v3.29.1
安装calicoctl calicoctl安装,参考了官方文档: https://docs.tigera.io/calico/latest/operations/calicoctl/install
1 2 3 curl -L https://github.com/projectcalico/calico/releases/download/v3.29.1/calicoctl-linux-amd64 -o calicoctl chmod +x ./calicoctl cp calicoctl /usr/bin/
查看calico Pod是否创建成功
1 2 3 4 5 6 7 8 9 10 11 kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-658d97c59c-gftzg 1/1 Running 0 12m kube-system calico-node-x7dhl 1/1 Running 0 12m kube-system coredns-66f779496c-8ctsc 1/1 Running 0 62m kube-system coredns-66f779496c-hx76v 1/1 Running 0 62m kube-system etcd-k8s-master 1/1 Running 0 62m kube-system kube-apiserver-k8s-master 1/1 Running 0 62m kube-system kube-controller-manager-k8s-master 1/1 Running 0 62m kube-system kube-proxy-89c9k 1/1 Running 0 62m kube-system kube-scheduler-k8s-master 1/1 Running 0 62m
1 2 3 4 5 6 7 8 9 10 11 12 # calicoctl node status Calico process is running. IPv4 BGP status No IPv4 peers found. IPv6 BGP status No IPv6 peers found. # calicoctl get nodes NAME k8s-master
4. 把其他两个从节点加入集群 先在Master节点上获取token 1 2 3 kubeadm token list | awk '{print $1}' TOKEN zk8fth.5psohqfk9lomq0tw
默认token 24小时内过期,如果过期了,可以在主节点上重新创建新token
再从主节点上获取–discovery-token-ca-cert-hash的值 1 2 3 openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \ openssl dgst -sha256 -hex | sed 's/^.* //' f32851bb6a86cc7f0a394f1d77e1db5b217cde1b0f40909ee3916959519173f7
最后,在两个从节点执行kubeadm join命令,将从节点加入集群: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 kubeadm join --token zk8fth.5psohqfk9lomq0tw \ \ --discovery-token-ca-cert-hash sha256:f32851bb6a86cc7f0a394f1d77e1db5b217cde1b0f40909ee3916959519173f7 [preflight] Running pre-flight checks [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service' [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
1 systemctl enable kubelet.service
回到主节点,查看从节点是否加入成功 需要等几分钟,直到所有Pod创建完成,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 watch kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver calico-apiserver-7f67554766-dfzc9 1/1 Running 0 22m calico-apiserver calico-apiserver-7f67554766-xsdlk 1/1 Running 0 22m calico-system calico-kube-controllers-6b7df74554-qxqgf 1/1 Running 0 22m calico-system calico-node-7lhhg 1/1 Running 0 10m calico-system calico-node-ksrfp 1/1 Running 0 22m calico-system calico-node-lnt8q 1/1 Running 0 10m calico-system calico-typha-6855cf6f56-dzfd4 1/1 Running 0 22m calico-system calico-typha-6855cf6f56-z77jt 1/1 Running 0 10m calico-system csi-node-driver-cn6h5 2/2 Running 0 10m calico-system csi-node-driver-pqrw8 2/2 Running 0 10m calico-system csi-node-driver-rhd5m 2/2 Running 0 22m kube-system coredns-5dd5756b68-84wqr 1/1 Running 0 5h32m kube-system coredns-5dd5756b68-hc9xm 1/1 Running 0 5h32m kube-system etcd-k8s-master 1/1 Running 0 5h32m kube-system kube-apiserver-k8s-master 1/1 Running 0 5h32m kube-system kube-controller-manager-k8s-master 1/1 Running 0 5h32m kube-system kube-proxy-2hmd5 1/1 Running 0 10m kube-system kube-proxy-4h2cz 1/1 Running 0 10m kube-system kube-proxy-6qptd 1/1 Running 0 5h32m kube-system kube-scheduler-k8s-master 1/1 Running 0 5h32m tigera-operator tigera-operator-c7ccbd65-rddsw 1/1 Running 0 5h25m
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane 5h33m v1.28.2 k8s-node1 Ready <none> 11m v1.28.2 k8s-node2 Ready <none> 11m v1.28.2 # calicoctl node status calicoctl node status Calico process is running. IPv4 BGP status +---------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +---------------+-------------------+-------+----------+-------------+ | | node-to-node mesh | up | 08:48:32 | Established | | | node-to-node mesh | up | 08:49:43 | Established | +---------------+-------------------+-------+----------+-------------+ IPv6 BGP status No IPv6 peers found. # calicoctl get nodes NAME k8s-master k8s-node1 k8s-node2
5. 测试集群,在集群上部署Nginx 创建Nginx deployment, 设置副本数为3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 3 # 副本数设置为3,假设集群中有三个节点,每个节点上会尝试运行一个Nginx实例(但k8s会根据资源情况和调度策略来实际分配) template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80
使用kubectl apply -f nginx-deployment.yaml
1 2 3 4 5 # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-7c79c4bf97-2bk2q 1/1 Running 0 22m k8s-node2 <none> <none> nginx-deployment-7c79c4bf97-5pdr5 1/1 Running 0 22m k8s-master <none> <none> nginx-deployment-7c79c4bf97-w4s8h 1/1 Running 0 22m k8s-node1 <none> <none>
删除所有节点的Nginx Pod 1 2 [root@k8s-master ~]# kubectl delete deploy nginx-deployment deployment.apps "nginx-deployment" deleted
6. 从节点退出集群 在主节点下删除节点
1 2 kubectl delete node k8s-node1 kubectl delete node k8s-node2
1 2 3 systemctl stop kubelet rm -rf /etc/kubernetes/* kubeadm join ... # 从节点上执行,重新加入集群
