402 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

【openEuler-1.0-LTS】故障注入fsck -a没报错,-fn报软链接无效

已完成
任务
创建于  
2022-03-21 15:01

[root@euleros-pxe home]# fsck.ext4 -fn ram0yb
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Inode 53 extent tree (at level 1) could be shorter. Optimize? no

Inode 72 extent tree (at level 1) could be shorter. Optimize? no

Inode 1185 extent tree (at level 1) could be shorter. Optimize? no

Inode 1696 extent tree (at level 1) could be shorter. Optimize? no

Inode 2565 extent tree (at level 1) could be shorter. Optimize? no

Inode 3315 extent tree (at level 1) could be shorter. Optimize? no

Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no

Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no

Symlink /p3/d14/d1f/d22/l4b (inode #3494) is invalid.
Clear? no

Entry 'l4b' in /p3/d14/d1f/d22 (3394) has an incorrect filetype (was 7, should be 0).
Fix? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

ram0yb: ********** WARNING: Filesystem still has errors **********

ram0yb: 4179/65536 files (15.3% non-contiguous), 54960/262144 blocks
You have mail in /var/spool/mail/root

评论 (2)

iceleaf 创建了任务

Hi iceleaf2019, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @YangYingliang , @pi3orama , @成坚 (CHENG Jian) , @Qiuuuuu , @zhengzengkai , @gogooo , @刘勇强 , @jiaoff , @Xie XiuQi

openeuler-ci-bot 添加了
 
sig/Kernel
标签

问题现象:

[yebin@ceph-admin 19:09:56 ~/fsck_symlink]$fsck.ext4 -fn ram0_bak
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Inode 53 extent tree (at level 1) could be shorter. Optimize? no

Inode 72 extent tree (at level 1) could be shorter. Optimize? no

Inode 1185 extent tree (at level 1) could be shorter. Optimize? no

Inode 1696 extent tree (at level 1) could be shorter. Optimize? no

Inode 2565 extent tree (at level 1) could be shorter. Optimize? no

Inode 3315 extent tree (at level 1) could be shorter. Optimize? no

Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no

Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no

Symlink /p3/d14/d1f/d22/l4b (inode #3494) is invalid.
Clear? no

Entry 'l4b' in /p3/d14/d1f/d22 (3394) has an incorrect filetype (was 7, should be 0).
Fix? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

ram0_bak: ********** WARNING: Filesystem still has errors **********

ram0_bak: 4179/65536 files (15.3% non-contiguous), 54960/262144 blocks
[yebin@ceph-admin 19:10:05 ~/fsck_symlink]$debugfs ram0_bak
debugfs 1.45.5 (07-Jan-2020)
debugfs: stat <3494>
Inode: 3494 Type: symlink Mode: 0777 Flags: 0x80000
Generation: 1436667368 Version: 0x00000000:00000001
User: 1446183 Group: 0 Project: 0 Size: 3996
File ACL: 0
Links: 2 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x6234339c:a7517f0c -- Fri Mar 18 15:24:12 2022
atime: 0x6234339a:6b3ca388 -- Fri Mar 18 15:24:10 2022
mtime: 0x6234339a:6b3ca388 -- Fri Mar 18 15:24:10 2022
crtime: 0x6234339a:6b3ca388 -- Fri Mar 18 15:24:10 2022
Size of extra inode fields: 32
Extended attributes:
security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"
Inode checksum: 0xf310bace
EXTENTS:
(0):5968
debugfs: logdump -S
Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
Journal size: 32M
Journal length: 8192
Journal sequence: 0x000003b8
Journal start: 0
Journal checksum type: crc32c
Journal checksum: 0x24b17951

Journal starts at block 0, transaction 952
关键日志信息:

Mar 18 15:24:10 euleros-pxe kernel: [105524.214848] Buffer I/O error on device ram0, logical block 5968
这条日志显示的是5968块写失败了,出问题inode的数据块刚好也是这个块。在日志区可以找到这个块对应的数据是正确的。但出问题时的镜像日志已经是清空了。 那么就怀疑是在回写失败之后没有正确处理导致,后续做checkpoint时并没有检测到异常导致日志被被清空。
所以看日志回写流程中调用的函数ext4_finish_bio逻辑:

....
89 if (bio->bi_status) {
90 SetPageError(page);
91 mapping_set_error(page->mapping, -EIO); --> 首先置了page mapping标记
92 }
...
100 do {
101 if (bh_offset(bh) < bio_start ||
102 bh_offset(bh) + bh->b_size > bio_end) {
103 if (buffer_async_write(bh))
104 under_io++;
105 continue;
106 }
107 clear_buffer_async_write(bh);
108 if (bio->bi_status)
109 buffer_io_error(bh); --> 打印了错误信息
110 } while ((bh = bh->b_this_page) != head);
...
做check ponit 时加检测错误的逻辑:

__jbd2_journal_remove_checkpoint
if (buffer_write_io_error(bh))
set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal_wrapper->j_atomic_flags);
那么问题的就是可以解释通了,如果symlink文件的日志回写失败并没有在buffer上置上错误标记导致在清理checkpoint list时没有检查到错误,而出现卸载之后日志被清空。fsck 最后也不做日志恢复,最终symlink 的数据内容是垃圾数据。

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(2)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助