【标题描述】能够简要描述问题:说明什么场景下,做了什么操作,出现什么问题(尽量使用正向表达方式)
一、缺陷信息
**内核信息:5.10.0-136.69.0.149.oe2203sp1.x86_64
**缺陷归属组件:systemd
**缺陷归属的版本:systemd 249 (v249-64.oe2203sp1)
**缺陷简述:systemctl 并行 start 142个service服务时卡住;
【问题复现步骤】:
1、CEPH在创建OSD的时候,脚本中会先执行systemctl reset-failed,然后并发创建140个OSD,每创建一个OSD新建一个进程,每个进程执行如下3条命令,并发开启进程创建OSD,每个并发进程超时时间是12分钟
a、systemctl reset-failed ceph-osd@.service ;
b、systemctl disable ceph-osd@.service;
c、systemctl start ceph-osd@.service
现像:问题必现 3次必现
(1)/usr/bin/systemd-tty-ask-password-agent --watch卡住
报错Failed to allocate directory watch: Too many open files
(2)ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
已执行完毕,但是systemd报start-pre timeout,但是实际进程中的pre的脚本已经执行完毕
(3)systemd同时并发142个服务是top查看 cpu 占用率100%
(4)NetworkManager rsyslog polkit等dbus或notify类型服务有重启或变成inacivte
2、基于上面的1 ,继续修改,同样报错
将 systemctl start ceph-osd@.service
修改为 systemctl --no-ask-password start ceph-osd@.service
(1)报错Failed to allocate directory watch: Too many open files消失
(2)同样ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
已执行完毕,但是systemd报start-pre timeout
(3)systemd同时并发142个服务是top查看 cpu 占用率100%
(4)NetworkManager rsyslog polkit等dbus或notify类型服务有重启或变成inacivte仍然存在
3、基于上面的2 在创建OSD的脚本中,将全局systemctl reset-failed 去掉,原来每创建一个OSD需要执行
a、systemctl reset-failed ceph-osd@.service;
b、systemctl disable ceph-osd@.service;
c、systemctl start ceph-osd@.service
在这个实验中,去掉b systemctl disable ceph-osd@.service;
现象: 仍有失败
报错信息:
同样发现同样ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i
已执行完毕,但是systemd报 timeout
Jul 25 14:24:31 node1 systemd[1]: [73517.567195] Failed to start Ceph object storage daemon osd.100.
Jul 25 14:24:31 node1 systemd[1]: [73517.570330] ceph-osd@101.service: Failed with result 'timeout'.
Jul 25 14:24:31 node1 systemd[1]: [73517.572295] Failed to start Ceph object storage daemon osd.101.
Jul 25 14:24:31 node1 systemd[1]: [73517.574146] ceph-osd@103.service: Failed with result 'timeout'.
Jul 25 14:24:31 node1 systemd[1]: [73517.584671] Failed to start Ceph object storage daemon osd.103.
Jul 25 14:24:31 node1 systemd[1]: [73517.586701] ceph-osd@102.service: start-pre operation timed out. Terminating.
Jul 25 14:24:31 node1 systemd[1]: [73517.588865] ceph-osd@104.service: Failed with result 'timeout'.
Jul 25 14:24:31 node1 systemd[1]: [73517.597246] Failed to start Ceph object storage daemon osd.104.
Jul 25 14:24:31 node1 systemd[1]: [73517.598634] ceph-osd@105.service: start-pre operation timed out. Terminating.
Jul 25 14:24:31 node1 systemd[1]: [73517.599141] ceph-osd@106.service: start-pre operation timed out. Terminating.
(3)systemd同时并发142个服务是top查看 cpu 占用率100%
Hi sweetbreeze, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Base-service, and any of the maintainers: @Monday , @syyhao , @谢志鹏 , @zhujianwei001 , @hexiaowen , @licunlong , @const , @xujing , @chenjiayi , @陈棋德
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
登录 后才可以发表评论