# lock_test **Repository Path**: liudegui/lock_test ## Basic Information - **Project Name**: lock_test - **Description**: 测试多线程使用ConcurrentQueue、spinlock和mutex的性能的程序 - **Primary Language**: C++ - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2022-07-01 - **Last Updated**: 2024-05-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README @[TOC] ## 0. 多线程性能测试:ConcurrentQueue(无锁队列)、std::atomic_flag 和std:: mutex **完整测试代码**: [lock_test](https://gitee.com/liudegui/lock_test) **测试目标**: 1. 比较 mutex 和 std::atomic_flag 的性能差异。参考文献: [mutex 和 spin lock 的区别](https://blog.csdn.net/qq_21792169/article/details/50822702) 2. 多线程读写 [concurrentqueue](https://github.com/cameron314/concurrentqueue) 3. 单线程读写 [readerwriterqueue](https://github.com/cameron314/readerwriterqueue) 4. 验证基于 C++ STL 利用 CAS 原子操作封装的无锁 list 的性能。参考文献: [无锁 list](https://blog.csdn.net/oceanperfect/article/details/74940230) ## 1. 摘要 本测试比较了无锁队列 [ConcurrentQueue](https://github.com/cameron314/concurrentqueue)、[std::atomic_flag](http://www.cplusplus.com/reference/atomic/atomic_flag/) 和 [std::mutex](http://www.cplusplus.com/reference/mutex/mutex/?kw=mutex) 在多线程环境下的表现。 **结论**: - 对于使用 std::mutex 的 std::list 或 std::deque,如果 Item 结构体较小,推荐使用 std::atomic_flag 代替 std::mutex。 - 如果 Item 结构体较大,继续使用 std::mutex。 - 新的业务代码可以酌情使用无锁队列 ConcurrentQueue。 ## 2. 性能测试结果 ### 2.1 测试一:30 线程并发,1 万条 2K 大小的 Item 数据 push 和 pop #### 测试结果: | 测试平台 | Queue 类型 | pushTime (ms) | popTime (ms) | | -------- | ---------------- | ------------- | ------------ | | 1.230 | ConcurrentQueue | 595.476 | 328.856 | | | atomic_flag | 412.675 | 955.207 | | | Mutex | 946.301 | 907.553 | | 1.240 | ConcurrentQueue | 1584.1 | 333.36 | | | atomic_flag | 576.209 | 1479.5 | | | Mutex | 1133.68 | 1107.63 | | 1.30 | ConcurrentQueue | 1005.89 | 244.84 | | | atomic_flag | 355.606 | 402.343 | | | Mutex | 597.448 | 739.805 | | Linux | ConcurrentQueue | 140.899 | 80.4264 | | | atomic_flag | 136.703 | 136.91 | | | Mutex | 231.019 | 213.732 | | Windows | ConcurrentQueue | 200.119 | 142.239 | | | atomic_flag | 602.542 | 482.394 | | | Mutex | 483.498 | 306.393 | #### 结论: - **Linux 平台**: - **push 数据**: atomic_flag > ConcurrentQueue > mutex - **pop 数据**: ConcurrentQueue > atomic_flag > mutex - **Windows 平台**: - 各项性能都是 ConcurrentQueue 最优。 ### 2.2 测试二:30 线程并发,1 万条 20K 大小的 Item 数据 push 和 pop #### 测试结果: | 测试平台 | Queue 类型 | pushTime (ms) | popTime (ms) | | -------- | ---------------- | ------------- | ------------ | | 1.230 | ConcurrentQueue | 6936.41 | 732.457 | | | atomic_flag | 4256.77 | 7103.61 | | | Mutex | 5165.12 | 4044.51 | | 1.240 | ConcurrentQueue | 18411.2 | 942.713 | | | atomic_flag | 5750.07 | 11232.5 | | | Mutex | 7236.02 | 6573.35 | | 1.30 | ConcurrentQueue | 5399.57 | 247.965 | | | atomic_flag | 2253.27 | 1285.47 | | | Mutex | 2625.55 | 1588.46 | | Linux | ConcurrentQueue | 2231.09 | 183.022 | | | atomic_flag | 1117.93 | 715.601 | | | Mutex | 1288.95 | 805.378 | | Windows | ConcurrentQueue | 8047.59 | 5098.22 | | | atomic_flag | 16736.2 | 26468.2 | | | Mutex | 21498.3 | 50173.6 | #### 结论: 随着 Item 大小的增大,atomic_flag 的性能不一定更优,反而下降,而 ConcurrentQueue 的插入性能更显著。 - **Linux 平台**: - **push 数据**: atomic_flag > mutex > ConcurrentQueue - **pop 数据**: ConcurrentQueue > mutex > atomic_flag - **Windows 平台**: - 各项性能都是 ConcurrentQueue 最优。 ## 3 硬件配置 ### 3.1 本地 Linux 配置 ``` CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz (8 cores) 内存: 16GB 操作系统: Linux 5.4.0-47-generic #51~18.04.1-Ubuntu SMP x86_64 ``` ### 3.2 1.240 测试机器配置 ``` CPU: 96 cores, ARM architecture 内存: 256GB 操作系统: Linux 4.15.0-71-generic #2 SMP aarch64 ``` ### 3.3 1.230测试机器 配置 ``` CPU: 96 cores, ARM architecture 内存: 64GB 操作系统: Linux 4.15.0-71-generic #4 SMP aarch64 ``` ### 3.4 1.30测试机器配置 ``` CPU: Unknown 内存: 256GB 操作系统: Linux 4.15.0-45-generic #48~16.04.1-Ubuntu SMP x86_64 ``` ### 3.5 Windows 10 x64 测试机配置 ``` CPU: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz 内存: 8GB (1867 MHz) ``` ## 4 其它 验证基于 C++ STL 利用 CAS 原子操作封装的无锁 list 的效果不佳,可能原因是新版 gcc 已做优化。不推荐参考此文。 ## 5 参考文献 - [atomic, spinlock and mutex 性能比较](https://blog.csdn.net/cywosp/article/details/8987593) - [mutex 和 spin lock 的区别](https://blog.csdn.net/qq_21792169/article/details/50822702) ## 6 完整代码 ```cpp #include #include #include #include #include #include #include #include #include "concurrentqueue.h" // Include ConcurrentQueue library // 定义小数据结构 struct SmallItem { int data[20]; }; // 定义大数据结构 struct LargeItem { int data[1024]; }; // 测试参数 const int NUM_THREADS = 30; const int NUM_ITEMS = 10000; // 测试函数模板 template void testPerformance(std::atomic_flag& lock, std::list& container, std::vector& data) { auto start = std::chrono::high_resolution_clock::now(); std::vector threads; for (int i = 0; i < NUM_THREADS; ++i) { threads.push_back(std::thread([&]() { for (const auto& item : data) { while (lock.test_and_set(std::memory_order_acquire)); container.push_back(item); lock.clear(std::memory_order_release); } })); } for (auto& t : threads) { t.join(); } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration elapsed = end - start; std::cout << "Push time: " << elapsed.count() << " seconds\n"; start = std::chrono::high_resolution_clock::now(); threads.clear(); for (int i = 0; i < NUM_THREADS; ++i) { threads.push_back(std::thread([&]() { for (int j = 0; j < NUM_ITEMS / NUM_THREADS; ++j) { while (lock.test_and_set(std::memory_order_acquire)); if (!container.empty()) { container.pop_front(); } lock.clear(std::memory_order_release); } })); } for (auto& t : threads) { t.join(); } end = std::chrono::high_resolution_clock::now(); elapsed = end - start; std::cout << "Pop time: " << elapsed.count() << " seconds\n"; } template void testPerformance(std::mutex& mtx, std::list& container, std::vector& data) { auto start = std::chrono::high_resolution_clock::now(); std::vector threads; for (int i = 0; i < NUM_THREADS; ++i) { threads.push_back(std::thread([&]() { for (const auto& item : data) { std::lock_guard lock(mtx); container.push_back(item); } })); } for (auto& t : threads) { t.join(); } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration elapsed = end - start; std::cout << "Push time: " << elapsed.count() << " seconds\n"; start = std::chrono::high_resolution_clock::now(); threads.clear(); for (int i = 0; i < NUM_THREADS; ++i) { threads.push_back(std::thread([&]() { for (int j = 0; j < NUM_ITEMS / NUM_THREADS; ++j) { std::lock_guard lock(mtx); if (!container.empty()) { container.pop_front(); } } })); } for (auto& t : threads) { t.join(); } end = std::chrono::high_resolution_clock::now(); elapsed = end - start; std::cout << "Pop time: " << elapsed.count() << " seconds\n"; } template void testPerformance(moodycamel::ConcurrentQueue& queue, std::vector& data) { auto start = std::chrono::high_resolution_clock::now(); std::vector threads; for (int i = 0; i < NUM_THREADS; ++i) { threads.push_back(std::thread([&]() { for (const auto& item : data) { queue.enqueue(item); } })); } for (auto& t : threads) { t.join(); } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration elapsed = end - start; std::cout << "Push time: " << elapsed.count() << " seconds\n"; start = std::chrono::high_resolution_clock::now(); threads.clear(); for (int i = 0; i < NUM_THREADS; ++i) { threads.push_back(std::thread([&]() { T item; for (int j = 0; j < NUM_ITEMS / NUM_THREADS; ++j) { queue.try_dequeue(item); } })); } for (auto& t : threads) { t.join(); } end = std::chrono::high_resolution_clock::now(); elapsed = end - start; std::cout << "Pop time: " << elapsed.count() << " seconds\n"; } int main() { std::vector smallData(NUM_ITEMS); std::vector largeData(NUM_ITEMS); std::cout << "Testing with std::atomic_flag and SmallItem\n"; std::list smallList; std::atomic_flag atomicFlag = ATOMIC_FLAG_INIT; testPerformance(atomicFlag, smallList, smallData); std::cout << "Testing with std::mutex and SmallItem\n"; std::list smallListMutex; std::mutex mtx; testPerformance(mtx, smallListMutex, smallData); std::cout << "Testing with ConcurrentQueue and SmallItem\n"; moodycamel::ConcurrentQueue smallQueue; testPerformance(smallQueue, smallData); std::cout << "Testing with std::atomic_flag and LargeItem\n"; std::list largeList; testPerformance(atomicFlag, largeList, largeData); std::cout << "Testing with std::mutex and LargeItem\n"; std::list largeListMutex; testPerformance(mtx, largeListMutex, largeData); std::cout << "Testing with ConcurrentQueue and LargeItem\n"; moodycamel::ConcurrentQueue largeQueue; testPerformance(largeQueue, largeData); return 0; } ``` **运行结果** ``` Testing with std::atomic_flag and SmallItem Push time: 1.32209 seconds Pop time: 0.0568216 seconds Testing with std::mutex and SmallItem Push time: 0.315925 seconds Pop time: 0.0417502 seconds Testing with ConcurrentQueue and SmallItem Push time: 0.0795355 seconds Pop time: 0.0406369 seconds Testing with std::atomic_flag and LargeItem Push time: 5.12904 seconds Pop time: 0.049452 seconds Testing with std::mutex and LargeItem Push time: 3.94701 seconds Pop time: 0.0519268 seconds Testing with ConcurrentQueue and LargeItem Push time: 3.93888 seconds Pop time: 0.051377 seconds ```