# atomiccounter

**Repository Path**: mirrors_chen3feng/atomiccounter

## Basic Information

- **Project Name**: atomiccounter
- **Description**: A High Performance Atomic Counter for Write-More-Read-Less Scenario in Go
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2022-08-20
- **Last Updated**: 2025-12-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# atomiccounter -- 高并发写入性能的计数器

[English](README.md) | 简体中文

[![License Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-red.svg)](COPYING)
[![Golang](https://img.shields.io/badge/Language-go1.18+-blue.svg)](https://go.dev/)
![Build Status](https://github.com/chen3feng/atomiccounter/actions/workflows/go.yml/badge.svg)
[![Coverage Status](https://coveralls.io/repos/github/chen3feng/atomiccounter/badge.svg?branch=master)](https://coveralls.io/github/chen3feng/atomiccounter?branch=master)
[![GoReport](https://goreportcard.com/badge/github.com/securego/gosec)](https://goreportcard.com/report/github.com/chen3feng/atomiccounter)
[![Go Reference](https://pkg.go.dev/badge/github.com/chen3feng/atomiccounter.svg)](https://pkg.go.dev/github.com/chen3feng/atomiccounter)

本库实现了一个针对高并发写入进行性能优化的计数器。类似于 Java 里的 [LongAdder](https://segmentfault.com/a/1190000023761290)，
[folly](https://github.com/facebook/folly) 里的 [ThreadCachedInt](https://github.com/facebook/folly/blob/main/folly/docs/ThreadCachedInt.md)，
在高并发写入但是读取很少的应用下，可以提供高达 `sync/atomic` 几十倍的写入性能。

## 性能压测

每 100 次调用。

在 M1 Pro 芯片的 MacOs 下：

```console
goos: darwin
goarch: arm64
pkg: github.com/chen3feng/atomiccounter
BenchmarkNonAtomicAdd-10        47337121                22.14 ns/op
BenchmarkAtomicAdd-10             180942                 6861 ns/op
BenchmarkCounter-10             14871549                81.02 ns/op
```

在 Linux 下：

```console
goos: linux
goarch: amd64
pkg: github.com/chen3feng/atomiccounter
cpu: Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
BenchmarkNonAtomicAdd-16    	 9508723	       135.3 ns/op
BenchmarkAtomicAdd-16       	  582798	        2070 ns/op
BenchmarkCounter-16         	 4748263	       263.1 ns/op
```

从上到下，分别是非原子（因而不安全）、原子以及用 `atomiccounter` 的写入耗时结果。可以看出，
在高并发写入的情况下，`atomiccounter` 仅是非原子写入耗时的几倍，但是却比原子写入快很多。

但是读取却会慢很多:

```console
goos: darwin
goarch: arm64
pkg: github.com/chen3feng/atomiccounter
BenchmarkNonAtomicRead-10       1000000000               0.3112 ns/op
BenchmarkAtomicRead-10          1000000000               0.5336 ns/op
BenchmarkCounterRead-10         54609476                  21.20 ns/op
```

因此请仅用于有大量并发写入但是读取很少的少量场合，比如统计请求次数等。

## 和其他库比较

在 GitHub 上找到了三个类似的库，其中后两个看起来是实现是一样的：

- https://github.com/puzpuzpuz/xsync
- https://github.com/linxGnu/go-adder
- https://github.com/line/garr

做了 Benchmark，Apple M1 Pro 上结果如下：

```console
BenchmarkAdd_NonAtomic-10               49337793                22.02 ns/op
BenchmarkAdd_Atomic-10                    206678                 6854 ns/op
BenchmarkAdd_AtomicCounter-10           14658782                82.22 ns/op
BenchmarkAdd_XsyncCounter-10             9599529                144.6 ns/op
BenchmarkAdd_GoAdder-10                   825858                 1339 ns/op
BenchmarkAdd_GarrAdder-10                 915090                 1305 ns/op

BenchmarkRead_NonAtomic-10             263460258                4.087 ns/op
BenchmarkRead_Atomic-10                172530186                6.945 ns/op
BenchmarkRead_AtomicCounter-10           2793618                425.2 ns/op
BenchmarkRead_XSyncCounter-10            2396407                489.6 ns/op
BenchmarkRead_GoAdder-10                32101244                36.02 ns/op
BenchmarkRead_GarrAdder-10              29420326                35.40 ns/op
```

显然，`atomiccounter` 的并发写入速度是最快的。

详情参见测试源代码 [atomiccounter_bench](https://github.com/chen3feng/atomiccounter_bench)。

## 实现原理

竞争是多核程序中最大的性能杀手之一。对于大量写入的计数器，如果用普通的 atomic，会严重影响性能。
在读取很少的场景下，一种常用的解决方案是把写入分散到不同的变量中，读取时再累加。采用这种方法的有
Java 的 LongAdder 和 folly ThreadCachedInt，以及 Linux 内核中的 per-cpu。虽然实现方式不同，但是思路是相似的。

目前在 go 中还没有搜到比较出名的满足这种类用途的实现，因此我实现了这个库。

为了减少内存占用，还让多个 `Int64` 对象可以共享因 cache line 对齐而浪费的存储空间。

### 内存布局

每个 CPU [cache line size](https://en.wikipedia.org/wiki/CPU_cache#Cache_entries) 倍数大小的 int64 数组成为一个 cell。
一组 cell 称为一个 chunk。

cell 的大小是按 CPU 的 cache line size 整倍数，并且首位都以 cache line size 大小的空白填充，从而避免了[假共享](https://www.google.com/search?q=%E5%81%87%E5%85%B1%E4%BA%AB)。

cell 中还有个成员变量 `lastIndex` 记录当前 cell 已经被分配给了几个 Int64 对象。

每个 `Int64` 对象包含 2 个成员变量：chunk 指针和 cell 中的下标，因此多个 `Int64` 对象可以共享同一个 chunk，只不过访问的是各个 cell 中不同下标的元素。

### 分配 Int64 对象

最后一次创建的 chunk 的地址记录在全局变量 `lastChunk` 中，当创建 Int64 对象时，增加其 `lastIndex`，如果已经达到了 cell 中 int64 的个数，
说明本 chunk 已经分配完毕，需要分配一个新的 chunk。

### 访问 Int64 对象

请先了解 Go 的[GMP](https://www.google.com/search?q=golang+GMP) 调度模型。

性能最好的方式是在 Go 中获得当前的 `M` 的下标，直接访问相应的 `cell`，这样不同的 `M` 之间完全不会有冲突，甚至可以避免使用原子运算。
但是我目前没找到这样获得 `M` 下标的方法。

因此本实现采用了以 `M` 的地址的 hash 做下标来访问 cell 的方式，实测效果也不错。
只要让每个 chunk 中 cell 的个数大于一般的常见的 CPU 核数，就能减少 hash 冲突的影响，使得不同的 M 很大概率上会访问到不同的 cell。

增加 `Int64` 对象的值时，需要先获得其当前所属的的 `M`，以其地址的 hash，为下标获得 chunk 中相应的 cell。
再以 `Int64.index` 成员为下标访问此 cell 中 int64 数组。

读取时，遍历累加 chunk 中所有 cell 数组中以 `Int64.index` 为下标的值。

<!-- gomarkdoc:embed:start -->

<!-- Code generated by gomarkdoc. DO NOT EDIT -->

# atomiccounter

```go
import "github.com/chen3feng/atomiccounter"
```

Package atomiccounter provides an atomic counter for high throughput concurrent writing and rare reading scenario.

<details><summary>Example</summary>
<p>

```go
package main

import (
	"fmt"
	"github.com/chen3feng/atomiccounter"
	"sync"
)

func main() {
	counter := atomiccounter.MakeInt64()
	var wg sync.WaitGroup
	for i := 0; i < 100; i++ {
		wg.Add(1)
		go func() {
			counter.Inc()
			wg.Done()
		}()

	}
	wg.Wait()
	fmt.Println(counter.Read())
	counter.Set(0)
	fmt.Println(counter.Read())
	counter.Add(10)
	fmt.Println(counter.Read())
}
```

#### Output

```
100
0
10
```

</p>
</details>

## Index

- [type Int64](<#type-int64>)
  - [func MakeInt64() Int64](<#func-makeint64>)
  - [func (c *Int64) Add(n int64)](<#func-int64-add>)
  - [func (c *Int64) Inc()](<#func-int64-inc>)
  - [func (c *Int64) Read() int64](<#func-int64-read>)
  - [func (c *Int64) Set(n int64)](<#func-int64-set>)
  - [func (c *Int64) Swap(n int64) int64](<#func-int64-swap>)


## type [Int64](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L19-L22>)

Int64 is an int64 atomic counter.

```go
type Int64 struct {
    // contains filtered or unexported fields
}
```

### func [MakeInt64](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L57>)

```go
func MakeInt64() Int64
```

MakeInt64 creates a new Int64 object. Int64 objects must be created by this function, simply initialized doesn't work.

### func \(\*Int64\) [Add](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L72>)

```go
func (c *Int64) Add(n int64)
```

Add adds n to the counter.

### func \(\*Int64\) [Inc](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L78>)

```go
func (c *Int64) Inc()
```

Inc adds 1 to the counter.

### func \(\*Int64\) [Read](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L91>)

```go
func (c *Int64) Read() int64
```

Read return the current value. it is a little slow so it should not be called frequently. Th result is not guaranteed to be accurate in race conditions.

### func \(\*Int64\) [Set](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L83>)

```go
func (c *Int64) Set(n int64)
```

Set set the value of the counter to n.

### func \(\*Int64\) [Swap](<https://github.com/chen3feng/atomiccounter/blob/master/int64.go#L101>)

```go
func (c *Int64) Swap(n int64) int64
```

Swap returns the current value and swap it with n.


Generated by [gomarkdoc](<https://github.com/princjef/gomarkdoc>)


<!-- gomarkdoc:embed:end -->