Author: ninehills
Labels: blog
Created: 2020-04-10T04:22:00Z
Link and comments: https://github.com/ninehills/blog/issues/77
作者:
swulling+pub@gmail.com
摘要:Kubernetes 实现基于 Namespace 的物理队列,即Namespace下的Pod和Node的强绑定
Kuberntes 目前在实际业务部署时,有两个流派:一派推崇小集群,一个或数个业务共享小集群,全公司有数百上千个小集群组成;另一派推崇大集群,每个AZ(可用区)一个或数个大集群,各个业务通过Namespace的方式进行隔离。
两者各有优劣,但是从资源利用率提升和维护成本的角度,大集群的优势更加突出。但同时大集群也带来相当多的安全、可用性、性能的挑战和维护管理成本。
本文属于Kubernetes多租户大集群实践的一部分,用来解决多租户场景下,如何实现传统的物理队列隔离。
物理队列并不是一个通用的业界名词,它来源于一种集群资源管理模型,该模型简化下如下:
资源结构如图所示:
物理队列实现:
和Namespace
的自动绑定的原理:
Admission Controller
: PodNodeSelector
和PodTolerationRestriction
,参考Admission ControllersNamespace
增加默认的NodeSelector
和Tolerations
策略,并自动应用到该 Namespace 下的全部新增 Pod 上,从而自动将Pod绑定到物理队列上。1.18.0 版本的Kind集群创建有问题,后续进行测试
# this config file contains all config fields with comments
# NOTE: this is not a particularly useful config file
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
# patch the generated kubeadm config with some extra settings
kubeadmConfigPatches:
- |
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
nodefs.available: "0%"
- |
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
apiServer:
extraArgs:
enable-admission-plugins: PodNodeSelector,PodTolerationRestriction
# 1 control plane node and 3 workers
nodes:
# the control plane node config
- role: control-plane
# the three workers
- role: worker
- role: worker
- role: worker
可以使用 kubeadmin 或 api-server启动参数:
apiServer:
extraArgs:
enable-admission-plugins: PodNodeSelector,PodTolerationRestriction
apiVersion: v1
kind: Namespace
metadata:
name: public
annotations:
scheduler.alpha.kubernetes.io/node-selector: "node-restriction.kubernetes.io/physical_queue=public-phy"
scheduler.alpha.kubernetes.io/defaultTolerations: '[{"operator": "Equal", "effect": "NoSchedule", "key": "node-restriction.kubernetes.io/physical_queue", "value": "public-phy"}]'
# scheduler.alpha.kubernetes.io/tolerationsWhitelist: '[{"operator": "Equal", "effect": "NoSchedule", "key": "node-restriction.kubernetes.io/physical_queue", "value": "public-phy"}]'
此处要点:
node-restriction.kubernetes.io/physical_queue
,此处是根据文档的建议,后续可以配合NodeRestriction admission plugin限制kubelet自定配置$ kubectl label node kind-worker node-restriction.kubernetes.io/physical_queue=public-phy
$ kubectl taint nodes kind-worker node-restriction.kubernetes.io/physical_queue=public-phy:NoSchedule
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
$ kubectl apply -f nginx_deployment.yaml --namespace public
$ kubectl describe pod nginx-deployment-574b87c764-kb9k7 --namespace public
Node-Selectors: node-restriction.kubernetes.io/physical_queue=public-phy
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
node-restriction.kubernetes.io/physical_queue=public-phy:NoSchedule
kubectl describe node kind-worker
$ kubectl delete deployment nginx-deployment --namespace public
# 修改nginx_deployment.yaml ,增加spec.template.spec.nodeSelector
nodeSelector:
node-restriction.kubernetes.io/physical_queue: second-phy
# 验证能否部署
$ kubectl apply -f nginx_deployment.yaml --namespace public
# 查看deployments
$ kubectl describe replicaset nginx-deployment-585fcd8d7d --namespace public
Warning FailedCreate 49s (x15 over 2m11s) replicaset-controller Error creating: pods is forbidden: pod node label selector conflicts with its namespace node label selector
后续考虑使用Node Affinity配置节点亲和性。但是目前并没有现成的Adminssion Controller
去给Namespace绑定默认的节点亲和性,如有需求需要自己开发。
NodeSelector 和 Toleration 的功能,可以被 Node Affinity 进行替代,且后者提供更高级的调度功能,后续尝试是否基于此进行资源调度的整体设计。
此外Node Affinity还可以实现一个逻辑队列绑定多个物理队列的情况下,配置物理队列的调度权重的功能,即优先部署到某个物理队列。