# lock_test

**Repository Path**: liudegui/lock_test

## Basic Information

- **Project Name**: lock_test
- **Description**: 测试多线程使用ConcurrentQueue、spinlock和mutex的性能的程序
- **Primary Language**: C++
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2022-07-01
- **Last Updated**: 2024-05-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

@[TOC]
## 0. 多线程性能测试：ConcurrentQueue(无锁队列)、std::atomic_flag 和std:: mutex

**完整测试代码**: [lock_test](https://gitee.com/liudegui/lock_test)

**测试目标**:
1. 比较 mutex 和 std::atomic_flag 的性能差异。参考文献: [mutex 和 spin lock 的区别](https://blog.csdn.net/qq_21792169/article/details/50822702)
2. 多线程读写 [concurrentqueue](https://github.com/cameron314/concurrentqueue)
3. 单线程读写 [readerwriterqueue](https://github.com/cameron314/readerwriterqueue)
4. 验证基于 C++ STL 利用 CAS 原子操作封装的无锁 list 的性能。参考文献: [无锁 list](https://blog.csdn.net/oceanperfect/article/details/74940230)

## 1. 摘要
本测试比较了无锁队列 [ConcurrentQueue](https://github.com/cameron314/concurrentqueue)、[std::atomic_flag](http://www.cplusplus.com/reference/atomic/atomic_flag/) 和 [std::mutex](http://www.cplusplus.com/reference/mutex/mutex/?kw=mutex) 在多线程环境下的表现。

**结论**:
- 对于使用 std::mutex 的 std::list 或 std::deque，如果 Item 结构体较小，推荐使用 std::atomic_flag 代替 std::mutex。
- 如果 Item 结构体较大，继续使用 std::mutex。
- 新的业务代码可以酌情使用无锁队列 ConcurrentQueue。

## 2. 性能测试结果

### 2.1 测试一：30 线程并发，1 万条 2K 大小的 Item 数据 push 和 pop

#### 测试结果：

| 测试平台 | Queue 类型       | pushTime (ms) | popTime (ms) |
| -------- | ---------------- | ------------- | ------------ |
| 1.230    | ConcurrentQueue  | 595.476       | 328.856      |
|          | atomic_flag      | 412.675       | 955.207      |
|          | Mutex            | 946.301       | 907.553      |
| 1.240    | ConcurrentQueue  | 1584.1        | 333.36       |
|          | atomic_flag      | 576.209       | 1479.5       |
|          | Mutex            | 1133.68       | 1107.63      |
| 1.30     | ConcurrentQueue  | 1005.89       | 244.84       |
|          | atomic_flag      | 355.606       | 402.343      |
|          | Mutex            | 597.448       | 739.805      |
| Linux    | ConcurrentQueue  | 140.899       | 80.4264      |
|          | atomic_flag      | 136.703       | 136.91       |
|          | Mutex            | 231.019       | 213.732      |
| Windows  | ConcurrentQueue  | 200.119       | 142.239      |
|          | atomic_flag      | 602.542       | 482.394      |
|          | Mutex            | 483.498       | 306.393      |

#### 结论：

- **Linux 平台**：
  - **push 数据**: atomic_flag > ConcurrentQueue > mutex
  - **pop 数据**: ConcurrentQueue > atomic_flag > mutex

- **Windows 平台**：
  - 各项性能都是 ConcurrentQueue 最优。

### 2.2 测试二：30 线程并发，1 万条 20K 大小的 Item 数据 push 和 pop

#### 测试结果：

| 测试平台 | Queue 类型       | pushTime (ms) | popTime (ms) |
| -------- | ---------------- | ------------- | ------------ |
| 1.230    | ConcurrentQueue  | 6936.41       | 732.457      |
|          | atomic_flag      | 4256.77       | 7103.61      |
|          | Mutex            | 5165.12       | 4044.51      |
| 1.240    | ConcurrentQueue  | 18411.2       | 942.713      |
|          | atomic_flag      | 5750.07       | 11232.5      |
|          | Mutex            | 7236.02       | 6573.35      |
| 1.30     | ConcurrentQueue  | 5399.57       | 247.965      |
|          | atomic_flag      | 2253.27       | 1285.47      |
|          | Mutex            | 2625.55       | 1588.46      |
| Linux    | ConcurrentQueue  | 2231.09       | 183.022      |
|          | atomic_flag      | 1117.93       | 715.601      |
|          | Mutex            | 1288.95       | 805.378      |
| Windows  | ConcurrentQueue  | 8047.59       | 5098.22      |
|          | atomic_flag      | 16736.2       | 26468.2      |
|          | Mutex            | 21498.3       | 50173.6      |

#### 结论：

随着 Item 大小的增大，atomic_flag 的性能不一定更优，反而下降，而 ConcurrentQueue 的插入性能更显著。

- **Linux 平台**：
  - **push 数据**: atomic_flag > mutex > ConcurrentQueue
  - **pop 数据**: ConcurrentQueue > mutex > atomic_flag

- **Windows 平台**：
  - 各项性能都是 ConcurrentQueue 最优。

## 3 硬件配置

### 3.1 本地 Linux 配置
```
CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz (8 cores)
内存: 16GB
操作系统: Linux 5.4.0-47-generic #51~18.04.1-Ubuntu SMP x86_64
```

### 3.2 1.240 测试机器配置
```
CPU: 96 cores, ARM architecture
内存: 256GB
操作系统: Linux 4.15.0-71-generic #2 SMP aarch64
```

### 3.3  1.230测试机器 配置
```
CPU: 96 cores, ARM architecture
内存: 64GB
操作系统: Linux 4.15.0-71-generic #4 SMP aarch64
```

### 3.4  1.30测试机器配置
```
CPU: Unknown
内存: 256GB
操作系统: Linux 4.15.0-45-generic #48~16.04.1-Ubuntu SMP x86_64
```

### 3.5  Windows 10 x64 测试机配置
```
CPU: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
内存: 8GB (1867 MHz)
```

## 4 其它
验证基于 C++ STL 利用 CAS 原子操作封装的无锁 list 的效果不佳，可能原因是新版 gcc 已做优化。不推荐参考此文。

## 5 参考文献
- [atomic, spinlock and mutex 性能比较](https://blog.csdn.net/cywosp/article/details/8987593)
- [mutex 和 spin lock 的区别](https://blog.csdn.net/qq_21792169/article/details/50822702)

## 6 完整代码
```cpp
#include <iostream>
#include <list>
#include <deque>
#include <vector>
#include <thread>
#include <atomic>
#include <mutex>
#include <chrono>
#include "concurrentqueue.h"  // Include ConcurrentQueue library

// 定义小数据结构
struct SmallItem {
  int data[20];
};

// 定义大数据结构
struct LargeItem {
  int data[1024];
};

// 测试参数
const int NUM_THREADS = 30;
const int NUM_ITEMS = 10000;

// 测试函数模板
template<typename T>
void testPerformance(std::atomic_flag& lock, std::list<T>& container, std::vector<T>& data) {
  auto start = std::chrono::high_resolution_clock::now();

  std::vector<std::thread> threads;
  for (int i = 0; i < NUM_THREADS; ++i) {
    threads.push_back(std::thread([&]() {
      for (const auto& item : data) {
        while (lock.test_and_set(std::memory_order_acquire));
        container.push_back(item);
        lock.clear(std::memory_order_release);
      }
      }));
  }

  for (auto& t : threads) {
    t.join();
  }

  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> elapsed = end - start;
  std::cout << "Push time: " << elapsed.count() << " seconds\n";

  start = std::chrono::high_resolution_clock::now();
  threads.clear();
  for (int i = 0; i < NUM_THREADS; ++i) {
    threads.push_back(std::thread([&]() {
      for (int j = 0; j < NUM_ITEMS / NUM_THREADS; ++j) {
        while (lock.test_and_set(std::memory_order_acquire));
        if (!container.empty()) {
          container.pop_front();
        }
        lock.clear(std::memory_order_release);
      }
      }));
  }

  for (auto& t : threads) {
    t.join();
  }

  end = std::chrono::high_resolution_clock::now();
  elapsed = end - start;
  std::cout << "Pop time: " << elapsed.count() << " seconds\n";
}

template<typename T>
void testPerformance(std::mutex& mtx, std::list<T>& container, std::vector<T>& data) {
  auto start = std::chrono::high_resolution_clock::now();

  std::vector<std::thread> threads;
  for (int i = 0; i < NUM_THREADS; ++i) {
    threads.push_back(std::thread([&]() {
      for (const auto& item : data) {
        std::lock_guard<std::mutex> lock(mtx);
        container.push_back(item);
      }
      }));
  }

  for (auto& t : threads) {
    t.join();
  }

  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> elapsed = end - start;
  std::cout << "Push time: " << elapsed.count() << " seconds\n";

  start = std::chrono::high_resolution_clock::now();
  threads.clear();
  for (int i = 0; i < NUM_THREADS; ++i) {
    threads.push_back(std::thread([&]() {
      for (int j = 0; j < NUM_ITEMS / NUM_THREADS; ++j) {
        std::lock_guard<std::mutex> lock(mtx);
        if (!container.empty()) {
          container.pop_front();
        }
      }
      }));
  }

  for (auto& t : threads) {
    t.join();
  }

  end = std::chrono::high_resolution_clock::now();
  elapsed = end - start;
  std::cout << "Pop time: " << elapsed.count() << " seconds\n";
}

template<typename T>
void testPerformance(moodycamel::ConcurrentQueue<T>& queue, std::vector<T>& data) {
  auto start = std::chrono::high_resolution_clock::now();

  std::vector<std::thread> threads;
  for (int i = 0; i < NUM_THREADS; ++i) {
    threads.push_back(std::thread([&]() {
      for (const auto& item : data) {
        queue.enqueue(item);
      }
      }));
  }

  for (auto& t : threads) {
    t.join();
  }

  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> elapsed = end - start;
  std::cout << "Push time: " << elapsed.count() << " seconds\n";

  start = std::chrono::high_resolution_clock::now();
  threads.clear();
  for (int i = 0; i < NUM_THREADS; ++i) {
    threads.push_back(std::thread([&]() {
      T item;
      for (int j = 0; j < NUM_ITEMS / NUM_THREADS; ++j) {
        queue.try_dequeue(item);
      }
      }));
  }

  for (auto& t : threads) {
    t.join();
  }

  end = std::chrono::high_resolution_clock::now();
  elapsed = end - start;
  std::cout << "Pop time: " << elapsed.count() << " seconds\n";
}

int main() {
  std::vector<SmallItem> smallData(NUM_ITEMS);
  std::vector<LargeItem> largeData(NUM_ITEMS);

  std::cout << "Testing with std::atomic_flag and SmallItem\n";
  std::list<SmallItem> smallList;
  std::atomic_flag atomicFlag = ATOMIC_FLAG_INIT;
  testPerformance(atomicFlag, smallList, smallData);

  std::cout << "Testing with std::mutex and SmallItem\n";
  std::list<SmallItem> smallListMutex;
  std::mutex mtx;
  testPerformance(mtx, smallListMutex, smallData);

  std::cout << "Testing with ConcurrentQueue and SmallItem\n";
  moodycamel::ConcurrentQueue<SmallItem> smallQueue;
  testPerformance(smallQueue, smallData);

  std::cout << "Testing with std::atomic_flag and LargeItem\n";
  std::list<LargeItem> largeList;
  testPerformance(atomicFlag, largeList, largeData);

  std::cout << "Testing with std::mutex and LargeItem\n";
  std::list<LargeItem> largeListMutex;
  testPerformance(mtx, largeListMutex, largeData);

  std::cout << "Testing with ConcurrentQueue and LargeItem\n";
  moodycamel::ConcurrentQueue<LargeItem> largeQueue;
  testPerformance(largeQueue, largeData);

  return 0;
}

```

**运行结果**
```
Testing with std::atomic_flag and SmallItem
Push time: 1.32209 seconds
Pop time: 0.0568216 seconds
Testing with std::mutex and SmallItem
Push time: 0.315925 seconds
Pop time: 0.0417502 seconds
Testing with ConcurrentQueue and SmallItem
Push time: 0.0795355 seconds
Pop time: 0.0406369 seconds
Testing with std::atomic_flag and LargeItem
Push time: 5.12904 seconds
Pop time: 0.049452 seconds
Testing with std::mutex and LargeItem
Push time: 3.94701 seconds
Pop time: 0.0519268 seconds
Testing with ConcurrentQueue and LargeItem
Push time: 3.93888 seconds
Pop time: 0.051377 seconds
```