13 Star 9 Fork 93

src-openEuler/systemd

 / 详情

同时创建140+systemd的服务的时候systemd卡主,多个服务启动失败

待办的
缺陷
创建于  
2024-07-25 14:32

【标题描述】能够简要描述问题:说明什么场景下,做了什么操作,出现什么问题(尽量使用正向表达方式)

一、缺陷信息

**内核信息:5.10.0-136.69.0.149.oe2203sp1.x86_64

**缺陷归属组件:systemd

**缺陷归属的版本:systemd 249 (v249-64.oe2203sp1)

**缺陷简述:systemctl 并行 start 142个service服务时卡住;

  • 如果有特殊组网,请提供网络拓扑信息
    systemd服务:
    [Unit]
    Description=XXX daemon osd.%i
    After=network-online.target local-fs.target time-sync.target ceph-mon.target hik_fw.service
    Wants=network-online.target local-fs.target time-sync.target PartOf=ceph-osd.target
    [Service]
    LimitNOFILE=1048576
    LimitNPROC=1048576
    ExecStart=/usr/bin/daemon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
    ExecStartPre=/usr/lib/ceph/test-prestart.sh --cluster ${CLUSTER} --id %i
    ExecStartPost=/usr/lib/ceph/test_coredump.sh -set %i ExecReload=/bin/kill -HUP $MAINPID
    ProtectHome=read-only
    ProtectSystem=full
    PrivateTmp=true #TasksMax=infinity
    Restart=always
    StartLimitInterval=20min
    StartLimitBurst=3
    RestartSec=30s
    Nice=-20
    [Install] WantedBy=ceph-osd.target

【问题复现步骤】:
1、CEPH在创建OSD的时候,脚本中会先执行systemctl reset-failed,然后并发创建140个OSD,每创建一个OSD新建一个进程,每个进程执行如下3条命令,并发开启进程创建OSD,每个并发进程超时时间是12分钟
a、systemctl reset-failed ceph-osd@.service ;
b、systemctl disable ceph-osd@.service;
c、systemctl start ceph-osd@.service
现像:问题必现 3次必现
(1)/usr/bin/systemd-tty-ask-password-agent --watch卡住
报错Failed to allocate directory watch: Too many open files

(2)ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
已执行完毕,但是systemd报start-pre timeout,但是实际进程中的pre的脚本已经执行完毕

(3)systemd同时并发142个服务是top查看 cpu 占用率100%

   (4)NetworkManager rsyslog polkit等dbus或notify类型服务有重启或变成inacivte

2、基于上面的1 ,继续修改,同样报错
将 systemctl start ceph-osd@.service
修改为 systemctl --no-ask-password start ceph-osd@.service
(1)报错Failed to allocate directory watch: Too many open files消失
(2)同样ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
已执行完毕,但是systemd报start-pre timeout
(3)systemd同时并发142个服务是top查看 cpu 占用率100%
(4)NetworkManager rsyslog polkit等dbus或notify类型服务有重启或变成inacivte仍然存在

3、基于上面的2 在创建OSD的脚本中,将全局systemctl reset-failed 去掉,原来每创建一个OSD需要执行
a、systemctl reset-failed ceph-osd@.service;
b、systemctl disable ceph-osd@.service;
c、systemctl start ceph-osd@.service
在这个实验中,去掉b systemctl disable ceph-osd@.service;
现象: 仍有失败
报错信息:
同样发现同样ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
已执行完毕,但是systemd报 timeout
Jul 25 14:24:31 node1 systemd[1]: [73517.567195] Failed to start Ceph object storage daemon osd.100.
Jul 25 14:24:31 node1 systemd[1]: [73517.570330] ceph-osd@101.service: Failed with result 'timeout'.
Jul 25 14:24:31 node1 systemd[1]: [73517.572295] Failed to start Ceph object storage daemon osd.101.
Jul 25 14:24:31 node1 systemd[1]: [73517.574146] ceph-osd@103.service: Failed with result 'timeout'.
Jul 25 14:24:31 node1 systemd[1]: [73517.584671] Failed to start Ceph object storage daemon osd.103.
Jul 25 14:24:31 node1 systemd[1]: [73517.586701] ceph-osd@102.service: start-pre operation timed out. Terminating.
Jul 25 14:24:31 node1 systemd[1]: [73517.588865] ceph-osd@104.service: Failed with result 'timeout'.
Jul 25 14:24:31 node1 systemd[1]: [73517.597246] Failed to start Ceph object storage daemon osd.104.
Jul 25 14:24:31 node1 systemd[1]: [73517.598634] ceph-osd@105.service: start-pre operation timed out. Terminating.
Jul 25 14:24:31 node1 systemd[1]: [73517.599141] ceph-osd@106.service: start-pre operation timed out. Terminating.

(3)systemd同时并发142个服务是top查看 cpu 占用率100%

评论 (1)

sweetbreeze 创建了缺陷

Hi sweetbreeze, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Base-service, and any of the maintainers: @Monday , @syyhao , @谢志鹏 , @zhujianwei001 , @hexiaowen , @licunlong , @const , @xujing , @chenjiayi , @陈棋德

openeuler-ci-bot 添加了
 
sig/Base-service
标签
xieminmin 负责人设置为胡宇彪

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(2)
5329419 openeuler ci bot 1632792936
1
https://gitee.com/src-openeuler/systemd.git
git@gitee.com:src-openeuler/systemd.git
src-openeuler
systemd
systemd

搜索帮助

Cb406eda 1850385 E526c682 1850385