398 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

ext4: quota: 压测+故障注入导致quota访问非法内存地址

已完成
缺陷
创建于  
2022-09-09 11:08

【标题描述】ext4: quota: 压测+故障注入导致quota访问非法内存地址
【环境信息】
硬件信息:
NA
软件信息:
OLK-5.10
【问题复现步骤】
具体操作步骤
CONFIG_QUOTA=y CONFIG_FAULT_INJECTION=y
找一个xfs作为根文件系统的系统
-smp 2 // 使得bh释放更容易换出
-m 2048 // drop cache更容易释放bh

  1. 打入diff补丁
  2. gcc -o bb b.c
  3. 执行test.sh
    出现概率(是否必现,概率性错误)必现
    【预期结果】
    内核panic
    【实际结果】
[ 111.725700] v2_read_file_info: <3> free_blk 0 free_entry 5 dqi_blks 6 isize 6144
[ 111.728161] v2_read_file_info: <4> free_blk 0 free_entry 5 dqi_blks 6 isize 6144
[ 111.729750] ext4 filesystem being mounted at /root/temp supports timestamps until 2038 (0x7fffffff)
[ 113.713024] ext4_quota_write[chown]: write blk 6 phy 4097
[ 118.941575] wait offline
[ 119.450825] wait done
[ 119.451152] Buffer I/O error on dev sda, logical block 4097, lost async page write
[ 119.452084] end_buffer_async_write: bh ffff88801698f7b8(blk 4097, ref 2) uptodate 0
[ 119.453426] EXT4-fs error (device sda): ext4_check_bdev_write_error:217: comm chown: Error while async write back metadata
[ 119.713887] put_free_dqblk[chown]: info->dqi_free_blk is 6
[ 125.085670] wait offline
[ 125.586953] wait done
[ 125.587266] Buffer I/O error on dev sda, logical block 4097, lost async page write
[ 125.588160] end_buffer_async_write: bh ffff88801698f7b8(blk 4097, ref 3) uptodate 0
[ 125.994355] EXT4-fs error (device sda): ext4_check_bdev_write_error:217: comm ls: Error while async write back metadata
[ 135.155615] free_buffer_head: free g_bh ffff88801698f7b8
[ 135.332611] get_free_dqblk[chown]: use info->dqi_free_blk 6
[ 135.333901] get_free_dqblk[chown]: set info->dqi_free_blk 4294959296
[ 135.335450] ext4_quota_write[chown]: write blk 6 phy 4097
[ 135.450760] get_free_dqblk[chown]: use info->dqi_free_blk 4294959296
[ 135.452878] get_free_dqblk[chown]: set info->dqi_free_blk 0
[ 135.454632] __quota_error: 11 callbacks suppressed
[ 135.454637] Quota error (device sda): qtree_write_dquot: Error -8000 occurred while creating quota
[ 135.459462] BUG: unable to handle page fault for address: ffffffffffffe120
[ 135.462008] #PF: supervisor write access in kernel mode
[ 135.463479] #PF: error_code(0x0002) - not-present page
[ 135.464364] PGD 2e0b067 P4D 2e0b067 PUD 2e0d067 PMD 0
[ 135.465254] Oops: 0002 [#1] SMP
[ 135.465789] CPU: 0 PID: 9515 Comm: chown Not tainted 5.10.0-00013-g4e08fb15dccf-dirty #152
[ 135.467150] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc314
[ 135.468910] RIP: 0010:_raw_spin_lock+0x1c/0x50
[ 135.469493] Code: 05 b8 e4 bf 08 01 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 05 f3 e3 bf 08 01 48 83 05 f3 e3 bf 08 01 31 c0 ba 01 006
[ 135.471864] RSP: 0018:ffff8880114a3ba8 EFLAGS: 00010246
[ 135.472550] RAX: 0000000000000000 RBX: ffffffffffffe0c0 RCX: 0000000000000000
[ 135.473466] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffffffe120
[ 135.474368] RBP: 0000000000000001 R08: ffff8880114a3c10 R09: ffffffff8a09bac0
[ 135.475288] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880114a3c10
[ 135.476206] R13: 0000000000000000 R14: ffff888016a1adf8 R15: ffff8880114a3d30
[ 135.477126] FS: 00007f15a13b64c0(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 135.478169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 135.478906] CR2: ffffffffffffe120 CR3: 0000000012fa0000 CR4: 00000000000006f0
[ 135.479808] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 135.480697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 135.481587] Call Trace:
[ 135.481915] dquot_add_inodes+0x23/0x270
[ 135.482418] __dquot_transfer+0x312/0x850
[ 135.482944] ? jbd2_journal_stop+0x1d5/0x500
[ 135.483498] ? ZSTD_initDStream+0x50/0x320
[ 135.484010] ? dqput+0x255/0x330
[ 135.484428] ? dqget+0x623/0x670
[ 135.484841] dquot_transfer+0xaa/0x1f0
[ 135.485315] ext4_setattr+0x175/0xf40
[ 135.485784] notify_change+0x3c0/0x730
[ 135.486262] ? chown_common+0x12a/0x290
[ 135.486760] chown_common+0x12a/0x290
[ 135.487233] do_fchownat+0x107/0x180
[ 135.487696] __x64_sys_fchownat+0x29/0x40
[ 135.488209] do_syscall_64+0x45/0x70
[ 135.488676] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 135.489325] RIP: 0033:0x7f15a0ed23ca
[ 135.489782] Code: 48 8b 0d c1 da 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 04 01 00 008
[ 135.492120] RSP: 002b:00007ffe64ed7368 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
[ 135.493073] RAX: ffffffffffffffda RBX: 00007ffe64ed75b0 RCX: 00007f15a0ed23ca
[ 135.493975] RDX: 000000000000fffe RSI: 0000560461bb8070 RDI: 00000000ffffff9c
[ 135.494880] RBP: 0000560461bb6d10 R08: 0000000000000000 R09: 00000000ffffffff
[ 135.495772] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000560461bb6d88
[ 135.496671] R13: 00000000ffffffff R14: 00000000ffffff9c R15: 0000000000000001
[ 135.497565] Modules linked in:
[ 135.497965] CR2: ffffffffffffe120
[ 135.498402] ---[ end trace 481e1575132a30b9 ]---
[ 135.498983] RIP: 0010:_raw_spin_lock+0x1c/0x50
[ 135.499545] Code: 05 b8 e4 bf 08 01 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 05 f3 e3 bf 08 01 48 83 05 f3 e3 bf 08 01 31 c0 ba 01 006
[ 135.501858] RSP: 0018:ffff8880114a3ba8 EFLAGS: 00010246
[ 135.502520] RAX: 0000000000000000 RBX: ffffffffffffe0c0 RCX: 0000000000000000
[ 135.503413] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffffffe120
[ 135.504308] RBP: 0000000000000001 R08: ffff8880114a3c10 R09: ffffffff8a09bac0
[ 135.505341] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880114a3c10
[ 135.506263] R13: 0000000000000000 R14: ffff888016a1adf8 R15: ffff8880114a3d30
[ 135.507179] FS: 00007f15a13b64c0(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 135.508208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 135.508943] CR2: ffffffffffffe120 CR3: 0000000012fa0000 CR4: 00000000000006f0
[ 135.509851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 135.510757] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 135.511666] Kernel panic - not syncing: Fatal exception
[ 135.512392] Kernel Offset: disabled
[ 135.512837] ---[ end Kernel panic - not syncing: Fatal exception ]---

【附件信息】
diff

diff --git a/block/blk-core.c b/block/blk-core.c
index 7d4324f6e664..1d5b1bba799a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -822,6 +822,7 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
 	return BLK_STS_OK;
 }
 
+extern struct bio *g_bio;
 static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 {
 	struct request_queue *q = bio->bi_disk->queue;
@@ -841,8 +842,20 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !blk_queue_nowait(q))
 		goto not_supported;
 
-	if (should_fail_bio(bio))
+	smp_rmb();
+	if (bio == g_bio) {
+		pr_err("wait offline\n");
+		mdelay(500);
+	}
+
+	if (should_fail_bio(bio)) {
+		if (bio == g_bio) {
+			pr_err("wait done\n");
+			g_bio = NULL;
+			smp_wmb();
+		}
 		goto end_io;
+	}
 
 	if (bio->bi_partno) {
 		if (unlikely(blk_partition_remap(bio)))
diff --git a/fs/buffer.c b/fs/buffer.c
index 37a08026d3ef..f46b12f0f689 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -339,6 +339,7 @@ static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
  * Completion handler for block_write_full_page() - pages which are unlocked
  * during I/O, and which have PageWriteback cleared upon I/O completion.
  */
+extern struct buffer_head *g_bh;
 void end_buffer_async_write(struct buffer_head *bh, int uptodate)
 {
 	unsigned long flags;
@@ -358,6 +359,10 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate)
 		SetPageError(page);
 	}
 
+	smp_rmb();
+	if (bh == g_bh)
+		pr_err("%s: bh %px(blk %llu, ref %d) uptodate %d\n", __func__, bh, bh->b_blocknr, atomic_read(&bh->b_count), uptodate);
+
 	first = page_buffers(page);
 	spin_lock_irqsave(&first->b_uptodate_lock, flags);
 
@@ -3007,6 +3012,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
 	bio_put(bio);
 }
 
+struct bio *g_bio;
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 			 enum rw_hint write_hint, struct writeback_control *wbc)
 {
@@ -3025,6 +3031,11 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 		clear_buffer_write_io_error(bh);
 
 	bio = bio_alloc(GFP_NOIO, 1);
+	smp_rmb();
+	if (g_bh == bh) {
+		g_bio = bio;
+		smp_wmb();
+	}
 
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
@@ -3349,6 +3360,11 @@ EXPORT_SYMBOL(alloc_buffer_head);
 
 void free_buffer_head(struct buffer_head *bh)
 {
+	if (bh == g_bh) {
+		pr_err("%s: free g_bh %px\n", __func__, bh);
+		g_bh = NULL;
+		smp_wmb();
+	}
 	BUG_ON(!list_empty(&bh->b_assoc_buffers));
 	kmem_cache_free(bh_cachep, bh);
 	preempt_disable();
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6a8812ed9d31..d408d1fa4d2e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6656,6 +6656,7 @@ static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data,
 	return len;
 }
 
+struct buffer_head *g_bh;
 /* Write to quotafile (we know the transaction is already started and has
  * enough credits) */
 static ssize_t ext4_quota_write(struct super_block *sb, int type,
@@ -6704,6 +6705,12 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
 	lock_buffer(bh);
 	memcpy(bh->b_data+offset, data, len);
 	flush_dcache_page(bh->b_page);
+	smp_rmb();
+	if (!g_bh && (len == 1 << 10) && !strncmp(current->comm, "chown", 5) && bh->b_blocknr > 2048) {
+		pr_err("%s[%s]: write blk %u phy %llu\n", __func__, current->comm, blk, bh->b_blocknr);
+		g_bh = bh;
+		smp_wmb();
+	}
 	unlock_buffer(bh);
 	err = ext4_handle_dirty_metadata(handle, NULL, bh);
 	brelse(bh);
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 646fb512bd1e..b22259215a11 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -696,8 +696,9 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	 * jbd2_journal_destroy(). So mark the writeback IO error in the
 	 * journal here and we abort the journal later from a better context.
 	 */
-	if (buffer_write_io_error(bh))
-		set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags);
+//	This can be reserved, to increase reproduce possibility, we remove it
+//	if (buffer_write_io_error(bh))
+//		set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags);
 
 	__buffer_unlink(jh);
 	jh->b_cp_transaction = NULL;
diff --git a/fs/quota/quota_tree.c b/fs/quota/quota_tree.c
index 1a188fbdf34e..d7520962f135 100644
--- a/fs/quota/quota_tree.c
+++ b/fs/quota/quota_tree.c
@@ -91,10 +91,12 @@ static int get_free_dqblk(struct qtree_mem_dqinfo *info)
 		return -ENOMEM;
 	if (info->dqi_free_blk) {
 		blk = info->dqi_free_blk;
+		pr_err("%s[%s]: use info->dqi_free_blk %u\n", __func__, current->comm, info->dqi_free_blk);
 		ret = read_blk(info, blk, buf);
 		if (ret < 0)
 			goto out_buf;
 		info->dqi_free_blk = le32_to_cpu(dh->dqdh_next_free);
+		pr_err("%s[%s]: set info->dqi_free_blk %u\n", __func__, current->comm, info->dqi_free_blk);
 	}
 	else {
 		memset(buf, 0, info->dqi_usable_bs);
@@ -124,6 +126,7 @@ static int put_free_dqblk(struct qtree_mem_dqinfo *info, char *buf, uint blk)
 	if (err < 0)
 		return err;
 	info->dqi_free_blk = blk;
+	pr_err("%s[%s]: info->dqi_free_blk is %u\n", __func__, current->comm, info->dqi_free_blk);
 	mark_info_dirty(info->dqi_sb, info->dqi_type);
 	return 0;
 }
@@ -187,6 +190,7 @@ static int insert_free_dqentry(struct qtree_mem_dqinfo *info, char *buf,
 		return -ENOMEM;
 	dh->dqdh_next_free = cpu_to_le32(info->dqi_free_entry);
 	dh->dqdh_prev_free = cpu_to_le32(0);
+	pr_err("%s[%s]: dh->dqdh_next_free is %u\n", __func__, current->comm, info->dqi_free_entry);
 	err = write_blk(info, blk, buf);
 	if (err < 0)
 		goto out_buf;
@@ -252,6 +256,7 @@ static uint find_free_dqentry(struct qtree_mem_dqinfo *info,
 		/* This is enough as the block is already zeroed and the entry
 		 * list is empty... */
 		info->dqi_free_entry = blk;
+		pr_err("%s[%s]: info->dqi_free_entry is %u\n", __func__, current->comm, blk);
 		mark_info_dirty(dquot->dq_sb, dquot->dq_id.type);
 	}
 	/* Block will be full? */
diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c
index b1467f3921c2..af43f32c6690 100644
--- a/fs/quota/quota_v2.c
+++ b/fs/quota/quota_v2.c
@@ -150,6 +150,7 @@ static int v2_read_file_info(struct super_block *sb, int type)
 	qinfo->dqi_blocksize_bits = V2_DQBLKSIZE_BITS;
 	qinfo->dqi_usable_bs = 1 << V2_DQBLKSIZE_BITS;
 	qinfo->dqi_qtree_depth = qtree_depth(qinfo);
+	pr_err("%s: <%lu> free_blk %u free_entry %u dqi_blks %u isize %lld\n", __func__, sb_dqopt(sb)->files[type]->i_ino, qinfo->dqi_free_blk, qinfo->dqi_free_entry, qinfo->dqi_blocks, i_size_read(sb_dqopt(sb)->files[type]));
 	if (version == 0) {
 		qinfo->dqi_entry_size = sizeof(struct v2r0_disk_dqblk);
 		qinfo->dqi_ops = &v2r0_qtree_ops;

test.sh

#!/bin/bash

echo 1 > /sys/kernel/debug/fail_make_request/verbose
echo 100 > /sys/kernel/debug/fail_make_request/probability
echo 1 > /sys/kernel/debug/fail_make_request/interval
echo 1 > /sys/kernel/debug/fail_make_request/times

umount /root/temp
#dd if=/dev/urandom of=/dev/sda bs=1M count=64
./bb
mkfs.ext4 -F -b1024 -Oquota /dev/sda 2M
mount /dev/sda /root/temp

# Run out space, make sure new allocated block locates area beyonds 2M
dd if=/dev/zero of=/root/temp/file bs=1M count=2
resize2fs /dev/sda 32M

mkdir /root/temp/dir
for i in `seq 0 1000`
do
	touch /root/temp/dir/foo_$i
done

sync
sync

echo "chown"
# create new blk(6:4097) for dquota
chown abrt /root/temp/dir/foo_1
chown bin /root/temp/dir/foo_2
chown adm /root/temp/dir/foo_3
chown apache /root/temp/dir/foo_4
chown chrony /root/temp/dir/foo_5
chown cockpit-ws /root/temp/dir/foo_6
#chown daemon /root/temp/dir/foo_7
#chown dbus /root/temp/dir/foo_8
#chown dirsrv /root/temp/dir/foo_9
#chown dovecot /root/temp/dir/foo_10
#chown dovenull /root/temp/dir/foo_11
#chown freg /root/temp/dir/foo_12

# inject err, make block 4097 written fail
dmesg -c > /dev/null
echo "wait offline"
while true
do
	let csecond=RANDOM%200
	text=`dmesg -c`
	if [[ $text =~ "wait offline" ]]
	then
		break
	fi
	sysctl -w vm.dirty_writeback_centisecs=$csecond > /dev/null
done
echo 1 > /sys/block/sda/make-it-fail

# put_free_dqblk, set dqi_free_blk to 6
chown root /root/temp/dir/foo_1
chown root /root/temp/dir/foo_2
chown root /root/temp/dir/foo_3
chown root /root/temp/dir/foo_4
chown root /root/temp/dir/foo_5
chown root /root/temp/dir/foo_6
#chown root /root/temp/dir/foo_7
#chown root /root/temp/dir/foo_8
#chown root /root/temp/dir/foo_9
#chown root /root/temp/dir/foo_10
#chown root /root/temp/dir/foo_11
#chown root /root/temp/dir/foo_12

# inject err, make block 4097 written fail
dmesg -c > /dev/null
echo "wait offline2"
while true
do
	let csecond=RANDOM%200
	text=`dmesg -c`
	if [[ $text =~ "wait offline" ]]
	then
		break
	fi
	sysctl -w vm.dirty_writeback_centisecs=$csecond > /dev/null
done
echo 1 > /sys/kernel/debug/fail_make_request/times

echo "drop buffer head"
# drop blk6's buffer head
# May stuck for a while, you may try to open another shell to run fsstress/ls temp/dir and do drop_cache
while true
do
	echo 3 > /proc/sys/vm/drop_caches
	text=`dmesg -c`
	if [[ $text =~ "free g_bh" ]]
	then
		break
	fi
	ls -l / > /dev/null
	ls /root/temp/dir > /dev/null
done

# get_free_dqblk, switch 6->bad(random number on disk)
chown fsgqa /root/temp/dir/foo_13

echo "Use bad quota"
# use bad block, get_free_dqblk, use info->dqi_free_blk
chown mail /root/temp/dir/foo_14
chown mailnull /root/temp/dir/foo_15
chown memcached /root/temp/dir/foo_16
chown named /root/temp/dir/foo_17
chown nfsnobody /root/temp/dir/foo_18

exit 0

b.c

#define _GNU_SOURCE             /* See feature_test_macros(7) */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mount.h>
#include <getopt.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/xattr.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <errno.h>
#include <string.h>
#include <linux/reboot.h>
#include <arpa/inet.h>
#include <pthread.h>
#include <grp.h>
#include <sys/prctl.h>
#include <linux/fs.h>
#include <signal.h>
#include <sched.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <linux/perf_event.h>
#include <linux/bpf.h>
#include <sys/sysinfo.h>
#include <sys/epoll.h>
#include <asm/unistd.h>
#include <sys/time.h>
#include <sys/sendfile.h>
#include <stdarg.h>

int n = 32 * 1024 * 1024;

int main(void)
{
	unsigned int c = 0xffffe0c0;
	int i, fd = open("/dev/sda", O_RDWR);
	for (i = 0; i < n; i += 4) {
		if (write(fd, &c, 4) < 0)
			perror("write error");
		if (i % (1024 * 1024) == 0)
			printf("%d\n", i / 1024 / 1024);
	}
	return 0;
}

评论 (1)

chengzhihao 创建了缺陷

Hi czh549642238, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @YangYingliang , @成坚 (CHENG Jian) , @jiaoff , @zhengzengkai , @刘勇强 , @wangxiongfeng , @朱科潜 , @WangShaoBo , @lujialin , @wuxu_buque , @Xu Kuohai , @冷嘲啊 , @Lingmingqiang , @yuzenghui , @岳海兵 , @juntian , @OSSIM , @陈结松 , @whoisxxx , @koulihong , @刘恺 , @hanjun-guo , @woqidaideshi , @Chiqijun , @Kefeng , @ThunderTown , @AlexGuo , @kylin-mayukun , @Zheng Zucheng , @柳歆 , @Jackie Liu , @zhujianwei001 , @郑振鹏 , @SuperSix173 , @colyli , @Zhang Yi , @htforge , @Qiuuuuu , @Xie XiuQi

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(2)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助