86 Star 468 Fork 273

GVPopenEuler / iSulad

 / 详情

对接k8s,calico-node无法启动

已完成
任务
创建于  
2020-11-23 13:02

iSula版本信息及项目分支

Version 2.0.5, commit f84000843bf2ba98adea547721b235e3b77168e1

操作系统版本及编译器版本

NAME="openEuler"
VERSION="20.09"
ID="openEuler"
VERSION_ID="20.09"
PRETTY_NAME="openEuler 20.09"
ANSI_COLOR="0;31"

问题描述及重现步骤

k8s版本:v1.17.9
部署k8s后pod无法启动,查看isulad的日志报以下错误
输入图片说明

isulad配置

{
    "group": "isulad",
    "default-runtime": "lcr",
    "graph": "/var/lib/isulad",
    "state": "/var/run/isulad",
    "engine": "lcr",
    "log-level": "ERROR",
    "pidfile": "/var/run/isulad.pid",
    "log-opts": {
        "log-file-mode": "0600",
        "log-path": "/var/lib/isulad",
        "max-file": "1",
        "max-size": "30KB"
    },
    "log-driver": "stdout",
    "hook-spec": "/etc/default/isulad/hooks/default.json",
    "start-timeout": "2m",
    "storage-driver": "overlay2",
    "storage-opts": [
        "overlay2.override_kernel_check=true"
    ],
    "registry-mirrors": [
        "docker.io"
    ],
    "insecure-registries": [
    ],
    "pod-sandbox-image": "kubesphere/pause:3.2",
    "native.umask": "secure",
    "network-plugin": "cni",
    "cni-bin-dir": "/opt/cni/bin",
    "cni-conf-dir": "/etc/cni/net.d",
    "image-layer-check": false,
    "use-decrypted-key": true,
    "insecure-skip-verify-enforce": false
}

评论 (8)

pixiake 创建了任务
pixiake 关联仓库设置为openEuler/iSulad
展开全部操作日志

Hey pixiake, Welcome to openEuler Community.
All of the projects in openEuler Community are maintained by @openeuler-ci-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at https://gitee.com/openeuler/community/blob/master/en/sig-infrastructure/command.md to find the details.

pixiake 修改了描述
pixiake 修改了描述

1.isulad 的配置文件/etc/isulad/daemon.json
参考官方文档,增加hosts 的设置项

{
    "group": "isulad",
    "default-runtime": "lcr",
    "graph": "/var/lib/isulad",
    "state": "/var/run/isulad",
    "engine": "lcr",
    "log-level": "ERROR",
    "pidfile": "/var/run/isulad.pid",
    "log-opts": {
        "log-file-mode": "0600",
        "log-path": "/var/lib/isulad",
        "max-file": "1",
        "max-size": "30KB"
    },
    "log-driver": "stdout",
    "hook-spec": "/etc/default/isulad/hooks/default.json",
    "start-timeout": "2m",
    "storage-driver": "overlay2",
    "storage-opts": [
        "overlay2.override_kernel_check=true"
    ],
    "registry-mirrors": [     # 配置镜像仓库地址
        "https://72idtxd8.mirror.aliyuncs.com",
        "https://reg-mirror.qiniu.com",
        "http://hub-mirror.c.163.com"
    ],
    "insecure-registries": [    # 不使用 TLS 校验的镜像仓库
        "192.168.1.153:5000"  
    ],
    "pod-sandbox-image": "192.168.1.153:5000/kubesphere/pause:3.1", # pod 默认使用镜像
    "native.umask": "secure",
    "network-plugin": "cni",  # 指定网络插件为 cni
    "cni-bin-dir": "",
    "cni-conf-dir": "",
    "image-layer-check": false,
    "use-decrypted-key": true,
    "insecure-skip-verify-enforce": false,
    "hosts" : [       # 指定通信方式
        "unix:///var/run/isulad.sock"
    ]
}

2.使用命令kubeadm config print init-defaults > init.yml可导出初始化配置文件,根据实际环境修改init.yml

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.1.153   # 修改为本机IP
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/isulad.sock   # 修改容器引擎为isulad
  name: master  # 修改为本机的主机名
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: cluster.local  # 修改k8s集群的名字
controlPlaneEndpoint: lb.kubesphere.local   # 修改可以映射到 IP 地址的 DNS 名称
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers    # 使用国内镜像仓库
kind: ClusterConfiguration
kubernetesVersion: v1.17.9  # 修改k8s版本号
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.233.0.0/18
  podSubnet: 10.233.64.0/18
scheduler: {}

执行kubeadm init --config=init.yml完成初始化集群
3.启用 CNI 插件
编辑 kubelet 的配置文件/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf,传递 --network-plugin=cni 命令行选项来选择 CNI 插件,重启kubelet 服务

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=192.168.1.153:5000/kubesphere/pause:3.1 --node-ip=192.168.1.153 --hostname-override=master"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

@pixiake 从你截图的日志来看,isulad收到了容器停止命令,发送15号信号给了容器,导致容器退出,你高亮的日志是容器退出的正常打印,需要你那边查看下k8s的日志,找到为什么会下发停止容器的命令。
[图片上传中…(image-TBkxy71MfCQFBsfqtEAA)]
[图片上传中…(image-pJh9u0lwryODB3i3Jlfo)]

问题找到了,部署v3.16+的calico会遇到集群挂掉的情况。
原因是v3.16+的calico的yaml中挂载了主机/sys/fs这个目录,目前我把这部分注释掉之后calico可以正常运行了,会不会是这个挂载项跟isula有冲突?

# For eBPF mode, we need to be able to mount the BPF filesystem at /sys/fs/bpf so we mount in the parent directory.
- name: sysfs
  mountPath: /sys/fs/
  # Bidirectional means that, if we mount the BPF filesystem at /sys/fs/bpf it will propagate to the host.
  # If the host is known to mount that filesystem already then Bidirectional can be omitted.
  mountPropagation: Bidirectional

查看calico-node容器日志也只有容器启动错误的信息:
输入图片说明
kubelet日志:
输入图片说明

pixiake 修改了标题
openeuler-ci-bot 负责人设置为lifeng_isula

@pixiake 启动错误时的iSulad日志是否可以完整的贴下,你之前的截图来看,是iSulad启动容器成功,后面发送了stop命令停止了容器,和你后面贴的启动时错误对应不上

问题找到了,部署v3.16+的calico会遇到集群挂掉的情况。
原因是v3.16+的calico的yaml中挂载了主机/sys/fs这个目录,目前我把这部分注释掉之后calico可以正常运行了,会不会是这个挂载项跟isula有冲突?

# For eBPF mode, we need to be able to mount the BPF filesystem at /sys/fs/bpf so we mount in the parent directory.
- name: sysfs
mountPath: /sys/fs/
# Bidirectional means that, if we mount the BPF filesystem at /sys/fs/bpf it will propagate to the host.
# If the host is known to mount that filesystem already then Bidirectional can be omitted.
mountPropagation: Bidirectional

查看calico-node容器日志也只有容器启动错误的信息:
输入图片说明
kubelet日志:
输入图片说明

@pixiake calico的yaml中挂载了主机/sys/fs,你那边了解这个配置的作用是什么吗?k8s会怎么处理这个挂载项,是否会将主机的/sys/fs 重新挂载?如果重新挂载的话,启动容器时,会向/sys/fs/cgroup 目录写入配置,如果主机的/sys/fs被重新挂载,应该就找不到/sys/fs/cgroup目录,导致写入失败了

@pixiake 我们在本地已经复现了这个问题,正在进行修改

@pixiake 已经提交MR修改,PR, 问题原因为底层的容器runtime lxc不支持挂载sys/fs目录,已修正。另外提交了iSulad 的MR 修正了mount 配置顺序错误逻辑
!846:Mounts: only qsort the configed mounts

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5329419 openeuler ci bot 1632792936 5226885 lifeng2221dd1 1594455940 1353381 pixiake 1618294408
C
1
https://gitee.com/openeuler/iSulad.git
git@gitee.com:openeuler/iSulad.git
openeuler
iSulad
iSulad

搜索帮助