使用的主机信息如下:
角色 | HOSTNAME | IP | CPU | 内存 | 系统盘 | CPU 架构 | 操作系统 |
---|---|---|---|---|---|---|---|
控制平面 | k8s-master | 192.168.0.101 | 2 | 4G | 64G | x86_64 | openSUSE Leap 15.6 |
工作平面 | k8s-worker-1 | 192.168.0.102 | 2 | 4G | 64G | x86_64 | openSUSE Leap 15.6 |
工作平面 | k8s-worker-2 | 192.168.0.103 | 4 | 8G | 64G | x86_64 | openSUSE Leap 15.6 |
1) 更新主机操作系统
zypper ref
zypper up -y
2) 重启应用更新
reboot
hostnamectl set-hostname k8s-master
hostnamectl set-hostname k8s-worker-1
hostnamectl set-hostname k8s-worker-2
vim /etc/hosts
192.168.0.101 k8s-master
192.168.0.102 k8s-worker-1
192.168.0.103 k8s-worker-2
1) 临时禁用
swapoff -a
2) 永久禁用
sed -i '/swap/d' /etc/fstab
3) 查看 SWAP 分区
free -h
Swap 部分显示为 0 表示禁用
swapon --show
返回为空时表示已禁用
1) 临时加载
modprobe overlay
modprobe br_netfilter
2) 设置开机自动加载
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
1) 编辑配置文件
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
2) 更新系统设置
sysctl --system
1) 开放端口
firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=10259/tcp
firewall-cmd --permanent --add-port=10257/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=30000-32767/tcp
Calico CNI
时)firewall-cmd --permanent --add-port=179/tcp
firewall-cmd --permanent --add-port=4789/udp
Flannel CNI
时)firewall-cmd --permanent --add-port=8472/udp
2) 刷新防火墙规则
firewall-cmd --reload
1) 关闭防火墙
systemctl stop firewalld.service
2) 禁用开机自启动
systemctl disable firewalld.service
3) 检查防火墙状态
systemctl status firewalld.service
1) 安装
zypper install -y containerd
2) 锁定(防止误升级)
zypper addlock containerd
1) 生成配置文件
mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
2) 设置 SystemdCgroup 为 true
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
编辑
/etc/containerd/config.toml
文件,找到[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
下的SystemdCgroup
将其设置为true
3) 设置镜像(部分地区需要)
sed -i 's#registry.k8s.io/pause:3.8#registry.aliyuncs.com/google_containers/pause:3.10#' /etc/containerd/config.toml
编辑
/etc/containerd/config.toml
文件,找到[plugins."io.containerd.grpc.v1.cri"]
下的registry.k8s.io/pause
将其设置为可用的镜像(比如registry.aliyuncs.com/google_containers/pause
)
systemctl enable --now containerd
1) 检查启动状态
systemctl status containerd --no-pager -l
注意返回的 Active
是否为 running
,同时查看日志中是否存在报错(error
)信息,启动正常时的示例如下:
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; preset: disabled)
Active: active (running) since Tue 2025-05-13 16:45:50 CST; 3s ago
Docs: https://containerd.io
Process: 14287 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 14289 (containerd)
Tasks: 9
CPU: 95ms
CGroup: /system.slice/containerd.service
└─14289 /usr/sbin/containerd
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.826755936+08:00" level=info msg="Start subscribing containerd event"
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.826999505+08:00" level=info msg="Start recovering state"
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827080467+08:00" level=info msg="Start event monitor"
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827093889+08:00" level=info msg="Start snapshots syncer"
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827104849+08:00" level=info msg="Start cni network conf syncer for default"
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827113059+08:00" level=info msg="Start streaming server"
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827156181+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827208547+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
May 13 16:45:50 RuxJYs2NMcb0XctS containerd[14289]: time="2025-05-13T16:45:50.827266632+08:00" level=info msg="containerd successfully booted in 0.039139s"
May 13 16:45:50 RuxJYs2NMcb0XctS systemd[1]: Started containerd container runtime.
2) 查看版本
containerd
命令containerd --version
返回值根据安装的版本有所不同,示例如下:
containerd github.com/containerd/containerd v1.7.10 4e1fe7492b9df85914c389d1f15a3ceedbb280ac
1) 配置软件源
cat <<EOF | sudo tee /etc/zypp/repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.33/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
2) 导入密钥
rpm --import https://pkgs.k8s.io/core:/stable:/v1.33/rpm/repodata/repomd.xml.key
3) 刷新仓库元数据
zypper ref
1) 安装
zypper install -y --allow-downgrade kubelet-1.33.0 kubeadm-1.33.0 kubectl-1.33.0
2) 锁定(防止误升级)
zypper addlock kubelet kubeadm kubectl
systemctl enable --now kubelet
kubeadm init --kubernetes-version=v1.33.0 --apiserver-advertise-address=192.168.0.101 --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///run/containerd/containerd.sock --image-repository=registry.aliyuncs.com/google_containers
其中:
--kubernetes-version=v1.33.0
指定部署的 Kubernetes 版本--apiserver-advertise-address=192.168.0.101
指定 API 服务器 IP--pod-network-cidr=10.244.0.0/16
指定 Pod 使用的网段--cri-socket=unix:///run/containerd/containerd.sock
指定容器运行时 sock 文件--image-repository=registry.aliyuncs.com/google_containers
指定容器镜像仓库,默认使用 registry.k8s.io
(部分地区需要)其他可选参数:
--apiserver-bind-port=[端口号]
指定 API 服务绑定端口,默认为 6443
--control-plane-endpoint=[IP:端口号]
指定控制平面共享 IP:端口,用于高可用集群--service-dns-domain=[域名]
指定服务 DNS 域,默认为 cluster.local
--cert-dir=[目录]
指定证书存储目录,默认为 /etc/kubernetes/pki
--certificate-key=[密钥]
指定证书加密密钥,用于高可用集群控制平面之间传输证书--upload-certs
指定将证书至集群,用于高可用集群--dry-run
指定不实际执行,只打印日志初始化成功时的日志大致如下:
[init] Using Kubernetes version: v1.33.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ruxjys2nmcb0xcts] and IPs [10.96.0.1 192.168.0.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost ruxjys2nmcb0xcts] and IPs [192.168.0.101 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost ruxjys2nmcb0xcts] and IPs [192.168.0.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.001602829s
[control-plane-check] Waiting for healthy control plane components. This can take up to 4m0s
[control-plane-check] Checking kube-apiserver at https://192.168.0.101:6443/livez
[control-plane-check] Checking kube-controller-manager at https://127.0.0.1:10257/healthz
[control-plane-check] Checking kube-scheduler at https://127.0.0.1:10259/livez
[control-plane-check] kube-controller-manager is healthy after 2.359039736s
[control-plane-check] kube-scheduler is healthy after 2.945401931s
[control-plane-check] kube-apiserver is healthy after 5.002319352s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node ruxjys2nmcb0xcts as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node ruxjys2nmcb0xcts as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: vslwvq.fxq2vzikgv7mhu31
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.0.101:6443 --token 123456 \
--discovery-token-ca-cert-hash sha256:123456
注意其中的
kubeadm join
命令,在工作平面中使用此命令即可将其加入当前集群。
kubectl 默认使用 $HOME/.kube/config
配置文件,需要手动配置该文件后才能使用 kubectl 管理集群。
可直接复制 admin.conf
文件(默认路径为 /etc/kubernetes
)后配置权限:
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf
在部署 Pod 网络插件(CNI)之前,Pod 、 Service 无法通信,CoreDNS Pod 会处于 Pending 状态,集群功能不完整。
使用的 CNI 必须和
kubeadm init
时--pod-network-cidr
参数指定的网段一致,通常Calico
使用192.168.0.0/16
、Flannel
使用10.244.0.0/16
,因此本集群使用Flannel
1) 创建相关目录
mkdir -pv /opt/kubernetes/flannel
2) 下载 Flannel 的 yaml 文件
cd /opt/kubernetes/flannel
wget https://github.com/flannel-io/flannel/releases/download/v0.26.7/kube-flannel.yml
3) 部署 Flannel
kubectl apply -f kube-flannel.yml
4) 检查 Flannel 部署状态
kubectl get nodes
检查节点状态是否正常,正常示例如下:
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 21m v1.33.0
kubectl get pods -n kube-system
检查 coredns
相关 Pod 状态是否正常,正常示例如下:
NAME READY STATUS RESTARTS AGE
coredns-6766b7b6bb-4mlcz 1/1 Running 0 21m
coredns-6766b7b6bb-kr2fz 1/1 Running 0 21m
etcd-k8s-master 1/1 Running 0 21m
kube-apiserver-k8s-master 1/1 Running 0 21m
kube-controller-manager-k8s-master 1/1 Running 0 21m
kube-proxy-khfz9 1/1 Running 0 21m
kube-scheduler-k8s-master 1/1 Running 0 21m
默认情况下, Kubernetes 不会安装监控组件,因此需要单独部署。可以使用 Kubernetes Metrics Server,也可以使用其他方案(如 Prometheus + Grafana 等)监控集群。
1) 创建相关目录
mkdir -pv /opt/kubernetes/metrics-server
2) 下载 Kubernetes Metrics Server 的 yaml 文件
cd /opt/kubernetes/metrics-server
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.2/components.yaml
3) 自定义镜像仓库(部分地区需要)
sed -i 's#registry.k8s.io/metrics-server#registry.aliyuncs.com/google_containers#' /opt/kubernetes/metrics-server/components.yaml
4) 配置跳过证书验证(建议用于测试环境)
编辑 components.yaml 文件,在 Deployment 的 args 部分新增 --kubelet-insecure-tls,示例如下:
vim /opt/kubernetes/metrics-server/components.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
......
spec:
......
template:
......
spec:
containers:
- args:
......
- --kubelet-insecure-tls
5) 部署 Kubernetes Metrics Server
kubectl apply -f components.yaml
6) 检查 Kubernetes Metrics Server 部署状态
kubectl get nodes
检查节点状态是否正常,正常示例如下:
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 3h15m v1.32.3
k8s-worker-1 Ready <none> 153m v1.32.3
kubectl get pods -n kube-system
检查 metrics-server 相关 Pod 状态是否正常,正常示例如下:
NAME READY STATUS RESTARTS AGE
coredns-6766b7b6bb-kg5jq 1/1 Running 0 3h3m
coredns-6766b7b6bb-lpv5t 1/1 Running 0 3h3m
etcd-k8s-master 1/1 Running 0 3h44m
kube-apiserver-k8s-master 1/1 Running 0 3h44m
kube-controller-manager-k8s-master 1/1 Running 0 3h3m
kube-proxy-727ct 1/1 Running 0 3h2m
kube-proxy-pgw8q 1/1 Running 0 3h3m
kube-scheduler-k8s-master 1/1 Running 0 3h44m
metrics-server-5d97b7bf6f-xc78v 1/1 Running 0 3m31s
kubectl top nodes
kubectl top pods -A
检查是否有指标数据返回,正常示例如下:
NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
k8s-master 58m 2% 1405Mi 36%
k8s-worker-1 15m 0% 638Mi 16%
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-flannel kube-flannel-ds-mpft4 4m 13Mi
kube-flannel kube-flannel-ds-q5627 4m 13Mi
kube-system coredns-6766b7b6bb-kg5jq 2m 16Mi
kube-system coredns-6766b7b6bb-lpv5t 2m 16Mi
kube-system etcd-k8s-master 11m 179Mi
kube-system kube-apiserver-k8s-master 20m 208Mi
kube-system kube-controller-manager-k8s-master 9m 52Mi
kube-system kube-proxy-727ct 1m 17Mi
kube-system kube-proxy-pgw8q 1m 15Mi
kube-system kube-scheduler-k8s-master 4m 23Mi
kube-system metrics-server-5d97b7bf6f-xc78v 2m 19Mi
systemctl status containerd --no-pager
systemctl status kubelet --no-pager
kubectl get nodes
kubectl cluster-info
kubectl get pods -n kube-system
kubectl get componentstatuses
kubectl run busybox --image=busybox:1.28 -- sleep 3600
kubectl exec -it busybox -- nslookup kubernetes.default
kubectl get --raw='/healthz'
kubectl -n kube-system exec -it etcd-k8s-master -- etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key endpoint health
控制平面初始化成功后,会生成 join 命令,建议直接使用此命令将工作平面加入集群
kubeadm join 192.168.0.101:6443 --cri-socket /opt/containerd/run/containerd.sock --token=123456 --discovery-token-ca-cert-hash=sha256:123456
其中:
192.168.0.101:6443
为控制平面地址--token=123456
为 token,由控制平面自动生成--discovery-token-ca-cert-hash=sha256:123456
为证书 hash 值,由控制平面自动生成其他可选参数:
--node-name=[名称]
指定节点名称,默认为主机名--control-plane
指定此节点以控制平面身份加入--certificate-key=[密钥]
指定证书解密密钥,和 kubeadm init
的 --certificate-key
相同--apiserver-advertise-address=[IP:端口号]
指定控制平面共享 IP:端口,用于高可用集群--dry-run
指定不实际执行,只打印日志containerd
作为容器运行时,如使用其他容器运行时,请参考相关文档 kubeadm reset
和 kubeadm reset -f
命令重置,并在删除 $HOME/.kube/config
后重新初始化