Sign in
Sign up
Explore
Enterprise
Education
Search
Help
Terms of use
About Us
Explore
Enterprise
Education
Gitee Premium
Gitee AI
AI teammates
Sign in
Sign up
Fetch the repository succeeded.
description of repo status
Donate
Please sign in before you donate.
Cancel
Sign in
Scan WeChat QR to Pay
Cancel
Complete
Prompt
Switch to Alipay.
OK
Cancel
Watch
Unwatch
Watching
Releases Only
Ignoring
62
Star
82
Fork
487
openEuler
/
release-management
Closed
Code
Issues
419
Pull Requests
6
Wiki
Insights
Pipelines
Service
JavaDoc
PHPDoc
Quality Analysis
Jenkins for Gitee
Tencent CloudBase
Tencent Cloud Serverless
悬镜安全
Aliyun SAE
Codeblitz
SBOM
DevLens
Don’t show this again
Update failed. Please try again later!
Remove this flag
Content Risk Flag
This task is identified by
as the content contains sensitive information such as code security bugs, privacy leaks, etc., so it is only accessible to contributors of this repository.
【openEuler 20.03 LTS SP2】openEuler 20.03 LTS SP2 内存UCE故障降级,用户数据保持一致,业务可持续运行
Done
#I3N36H
Requirement
xxs
Opened this issue
2021-04-20 23:09
内存UCE故障降级,用户数据保持一致,业务可持续运行 基本过程:当遇到内存UCE错误且可以恢复时,OS通过SIGBUS信号携带有问题的内存地址(内核会先转换为进程相关的虚拟地址)发送给受影响的进程。对应进程注册SIGBUS的handler并对其中携带的地址进行细粒度容错处理。特殊情况下(如地址为-EFAULT)内核会发送SIGKILL到受影响进程。后面这种情况用户进程不用考虑。 内存UCE事件处理过程注意事项: 1. 要充分考虑多线程程序对信号的处理逻辑 2. 不要阻塞SIGBUS或者对其设置SIGIGN 3. 如果收到code=BUS_MCEERR_AR的信号最好立刻处理,不能等待或者切换线程(要考虑在极端情况下可能会出现出问题的内存不在当前线程上下文的情况) 4. 如果收到code=BUS_MCEERR_AO的信号可以等待或者延迟一会 SIGBUS携带的信息如下: info.si_signo = SIGBUS; info.si_errno = 0; info.si_code = code; info.si_addr = addr; info.si_addr_lsb = lsb; si_code取值如下: /* hardware memory error consumed on a machine check: action required */ #define BUS_MCEERR_AR 4 /* hardware memory error detected in process but not consumed: action optional*/ #define BUS_MCEERR_AO 5 可以忽略si_code不为以上2种取值的信号 si_addr 发生错误的内存虚拟地址。一般情况下不为0. 可以忽略si_addr=0的信号 si_addr_lsb 虚拟地址的起始有效位(地址掩码)。如si_addr_lsb=12表示地址掩码为4K对齐的地址。有效位不能为0,可以忽略si_addr_lsb=0的信号
内存UCE故障降级,用户数据保持一致,业务可持续运行 基本过程:当遇到内存UCE错误且可以恢复时,OS通过SIGBUS信号携带有问题的内存地址(内核会先转换为进程相关的虚拟地址)发送给受影响的进程。对应进程注册SIGBUS的handler并对其中携带的地址进行细粒度容错处理。特殊情况下(如地址为-EFAULT)内核会发送SIGKILL到受影响进程。后面这种情况用户进程不用考虑。 内存UCE事件处理过程注意事项: 1. 要充分考虑多线程程序对信号的处理逻辑 2. 不要阻塞SIGBUS或者对其设置SIGIGN 3. 如果收到code=BUS_MCEERR_AR的信号最好立刻处理,不能等待或者切换线程(要考虑在极端情况下可能会出现出问题的内存不在当前线程上下文的情况) 4. 如果收到code=BUS_MCEERR_AO的信号可以等待或者延迟一会 SIGBUS携带的信息如下: info.si_signo = SIGBUS; info.si_errno = 0; info.si_code = code; info.si_addr = addr; info.si_addr_lsb = lsb; si_code取值如下: /* hardware memory error consumed on a machine check: action required */ #define BUS_MCEERR_AR 4 /* hardware memory error detected in process but not consumed: action optional*/ #define BUS_MCEERR_AO 5 可以忽略si_code不为以上2种取值的信号 si_addr 发生错误的内存虚拟地址。一般情况下不为0. 可以忽略si_addr=0的信号 si_addr_lsb 虚拟地址的起始有效位(地址掩码)。如si_addr_lsb=12表示地址掩码为4K对齐的地址。有效位不能为0,可以忽略si_addr_lsb=0的信号
Comments (
1
)
Sign in
to comment
Status
Done
新建
已接纳
已挂起
In Design
In Development
Done
Accepted
Declined
Assignees
Not set
陈功
clumsycg
Assignee
Collaborator
+Assign
+Mention
Labels
kind/design
Not set
Projects
Unprojected
Unprojected
Milestones
openEuler-20.03-LTS-SP2
No related milestones
Pull Requests
None yet
None yet
Successfully merging a pull request will close this issue.
Branches
No related branch
Branches (
-
)
Tags (
-
)
Planed to start   -   Planed to end
-
Top level
Not Top
Top Level: High
Top Level: Medium
Top Level: Low
Priority
Not specified
Serious
Main
Secondary
Unimportant
Duration
(hours)
参与者(2)
1
https://gitee.com/openeuler/release-management.git
git@gitee.com:openeuler/release-management.git
openeuler
release-management
release-management
Going to Help Center
Search
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
Comment
Repository Report
Back to the top
Login prompt
This operation requires login to the code cloud account. Please log in before operating.
Go to login
No account. Register