10 Star 4 Fork 30

src-openEuler / gazelle

 / 详情

【22.03SP2】客户端使用gazelle长时间打流发生coredump(stack_send)

已完成
缺陷
创建于  
2023-11-22 11:01

【环境信息】
openeulerversion=openEuler-22.03-LTS-SP2
compiletime=2023-06-29-19-26-48
gccversion=10.3.1-37.oe2203sp2
kernelversion=5.10.0-153.12.0.92.oe2203sp2
openjdkversion=1.8.0.372.b07-1.oe2203sp2
[root@openEuler ~]# rpm -q gazelle
gazelle-1.0.2-15.aarch64
[root@openEuler ~]# rpm -q dpdk
dpdk-21.11-50.oe2203sp2.aarch64

【问题复现步骤】,请描述具体的操作步骤
服务端启动内核态 ./benchmark_ker -sMode dn -pSize 0 -mSize 4096 -pdSize 2 -cTimes 0 -uSocket 0 -mSq 0 -pol 0 -md5Check 0

[root@openEuler gazelle]# cat config.ini
[MicroBenchmark]
#publicd
DebugMode=0
TestMode=1 #1 ONLY_TX  2 BOTH_TX_RX
MsgHeadLen=30

#cn/dn
#CnHostName=124.88.97.87
CnHostName=124.88.97.88
CnPort=3113

#CnHostName=124.88.97.87
Dn1HostName=192.168.133.169
Dn2HostName=192.168.133.169
Dn1Port=41111
Dn2Port=51111
ThreadPoolSize=30
#client
ThreadNums=30
ReportDuring=15

客户端启动启动用户态 ./benchmark_usr -sMode client -mSize 4096 -tNums 3 -cNums 30 --flow_mode high -uSocket 0 -md5Check 0 -cNb 1

[root@openEuler gazelle]# cat config.ini
[MicroBenchmark]
#publicd
DebugMode=0
TestMode=1 #1 ONLY_TX  2 BOTH_TX_RX
MsgHeadLen=30

#cn/dn
#CnHostName=124.88.97.87
CnHostName=124.88.97.87
CnPort=3114

#CnHostName=124.88.97.87
Dn1HostName=192.168.133.169
Dn2HostName=192.168.133.169
Dn1Port=41111
Dn2Port=51111
ThreadPoolSize=30
#client
ThreadNums=30
ReportDuring=15

dpdk_args=["--socket-mem", "2400,0,0,0", "--huge-dir", "/mnt/hugepages-lstack", "--proc-type", "primary", "--legacy-mem", "--map-perfect","--file-prefix","lstack_file2","-a","0000:0c:00.0"]
use_ltran=0
kni_switch=0
low_power_mode=0
num_cpus="2-4"
#num_wakeup="2-4"
app_bind_numa=1
host_addr="124.88.70.176"
mask_addr="255.255.0.0"
gateway_addr="124.88.0.1"
devices="24:a5:2c:d1:ed:4f"

send_connect_number=8
read_connect_number=8
rpc_number=8
nic_read_number=128
tcp_conn_count=1500
mbuf_count_per_conn=505
#tack_thread_mode="run-to-completion" 
unix_prefix="02"

【实际结果】,请描述出问题的结果和影响
无法正常建连,最后发生coredump
输入图片说明

Core was generated by `./benchmark_usr -sMode client -mSize 4096 -tNums 3 -cNums 30 --flow_mode high -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  tcp_output (pcb=0x11b177280) at core/tcp_out.c:1529
1529	core/tcp_out.c: No such file or directory.
[Current thread is 1 (Thread 0xffffad89dea0 (LWP 774982))]
(gdb) bt
#0  tcp_output (pcb=0x11b177280) at core/tcp_out.c:1529
#1  0x0000ffffb2fb6b50 in lwip_netconn_do_writemore (conn=0x119f3f0e8, delayed=delayed@entry=0 '\000') at api/api_msg.c:1801
#2  0x0000ffffb2fb7fe0 in lwip_netconn_do_write (m=m@entry=0xffffad89d3a8) at api/api_msg.c:1883
#3  0x0000ffffb2fbc8f4 in tcpip_send_msg_wait_sem (fn=0xffffb2fb7f70 <lwip_netconn_do_write>, apimsg=apimsg@entry=0xffffad89d3a8, 
    sem=<optimized out>) at api/tcpip.c:459
#4  0x0000ffffb2fb5d54 in netconn_apimsg (apimsg=<optimized out>, fn=<optimized out>) at api/api_lib.c:131
#5  netconn_write_vectors_partly (conn=<optimized out>, vectors=vectors@entry=0xffffad89d408, vectorcnt=vectorcnt@entry=1, 
    apiflags=<optimized out>, bytes_written=bytes_written@entry=0xffffad89d450) at api/api_lib.c:1079
#6  0x0000ffffb2fb5dd8 in netconn_write_partly (conn=<optimized out>, dataptr=dataptr@entry=0xffff9e333990, size=size@entry=65535, 
    apiflags=<optimized out>, bytes_written=bytes_written@entry=0xffffad89d450) at api/api_lib.c:994
#7  0x0000ffffb2fba910 in lwip_send (s=1239, data=0xffff9e333990, size=65535, flags=<optimized out>) at api/sockets.c:1624
#8  0x0000ffffb2fe19c0 in do_lwip_send (stack=stack@entry=0xffffa0000b70, fd=<optimized out>, sock=sock@entry=0xffff9e333990, 
    len=len@entry=4096, flags=flags@entry=0) at core/lstack_lwip.c:657
#9  0x0000ffffb2fe5d70 in stack_send (msg=0x109ab5900) at core/lstack_protocol_stack.c:842
#10 0x0000ffffb2fe6fbc in poll_rpc_msg (stack=stack@entry=0xffffa0000b70, max_num=7) at core/lstack_thread_rpc.c:107
#11 0x0000ffffb2fe4ea0 in stack_polling (wakeup_tick=wakeup_tick@entry=3058118001) at core/lstack_protocol_stack.c:448
#12 0x0000ffffb2fe5020 in gazelle_stack_thread (arg=<optimized out>) at core/lstack_protocol_stack.c:513
#13 0x0000ffffb2d9c630 in start_thread (arg=0x0) at pthread_create.c:443
#14 0x0000ffffb2e02b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
(gdb) p seg
$1 = (struct tcp_seg *) 0x14c2cf768
(gdb) p *seg
$2 = {next = 0x14e9964e8, p = 0x14c2cf700, len = 1436, flags = 253 '\375', tcphdr = 0x1050fd964f0a9f9a}
(gdb) p seg->tcphdr
$3 = (struct tcp_hdr *) 0x1050fd964f0a9f9a
(gdb) p * seg->tcphdr

评论 (2)

chenshijuan3 创建了缺陷

Hi chenshijuan3, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: sig-high-performance-network, and any of the maintainers: @L.X. , @LemmyHuang , @sky , @李扬扬 , @吴昌盛 , @jinag12 , @lilijun , @李辉松 , @kircher

openeuler-ci-bot 添加了
 
sig/sig-high-perform
标签
chenshijuan3 修改了标题
chenshijuan3 修改了描述
chenshijuan3 修改了描述
chenshijuan3 修改了标题
chenshijuan3 修改了标题

以下PR修复,在关闭fd之后还存在发送数据的问题,导致UAF问题,新版本连续跑了一晚上并未出现coredump
https://gitee.com/openeuler/gazelle/pulls/435

jinag12 任务状态待办的 修改为已完成

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
5329419 openeuler ci bot 1632792936
1
https://gitee.com/src-openeuler/gazelle.git
git@gitee.com:src-openeuler/gazelle.git
src-openeuler
gazelle
gazelle

搜索帮助

53164aa7 5694891 3bd8fe86 5694891