35% of the cost being spent on updating the statistics and possible cacheline optimization

**Is this a BUG REPORT or FEATURE REQUEST?**: Optimization Request

> Uncomment only one, leave it on its own line: 
>
> - bug
>
- feature/performance optimization

**What happened**:

1. instr_unique_sql_count helps configure how many unique sql to collect.

2. Default value is 100 which means 100 unique sqls will be collected.

3. If the said feature is enabled (by setting instr_unique_sql_count > 0) it would cause a lot of other information to get updated related to the said SQL statement.

4. Mainly we are concerned with following 2 information set:
   1. different timing (like db, cpu, etc...): UpdateUniqueSQLTimeStat()
   2. different n/w stats (like send, recv, etc...): UpdateUniqueSQLNetInfo

5. Together it causes 10+12 = 22 stats to get updated and all of them are atomic (or mutex/semaphore protected).

6. Assuming N threads are updating these 22 stats for each query it is easy to imagine contention.

7. Infact, perf points out it to be top-function with 35% of the time being used in the said stats update.

+   35.52%        134097  TPLworker        gaussdb             [.] UpdateUniqueSQLStat

8. Stats are more for information or plan creation but 35% cost is non-justifiable for queries.

9. Ideally, we should look into how this could be optimized or reduced by a significant margin.

**What you expected to happen**:

Solution-1:

As part of the said exercise, I tried some ideas

1. Given the stats are updated by N different threads at same time and all of them are co-located it naturally is giving rise to cacheline contention.

2. To avoid this I tried to place all the stats on different cacheline.
[Yes, it would increase the space requirement but given the global one-time instance that small increase by a few bytes is undervalued by the performance improvement we see from it Of course, there are ways to make use of these additional padding bytes. I will try and experiment with that aspect too].

3. With the cacheline aligned, there is  6% reduction in the said contention

+   29.23%        136291  TPLworker        gaussdb             [.] UpdateUniqueSQLStat

4. Given complex software like DB, running in multithreaded mode, it follows a timing model that has a wider impact on final throughput.

5. Performance for higher concurrency has shown improvement in the range of 5-13%.

(Check the attached graph).

**How to reproduce it (as minimally and precisely as possible)**:

* pgbench select workload with default instr_unique_sql_count should able to reproduce it. 
* here is the sample configuration I have used https://github.com/mysqlonarm/benchmark-suites/blob/master/ogsql-pbench/conf/ogsql.cnf/ogsql.conf

**Anything else we need to know?**:
I have the patch ready but still trying to findout proper way to contribute. Given I am new to opengauss community can you help me direct on this front especially branch/test/validation etc..
Will add the patch here for quick reference.

diff --git a/src/include/instruments/instr_unique_sql.h b/src/include/instruments/instr_unique_sql.h
index 9b58a5de..6dc9bc8f 100644
--- a/src/include/instruments/instr_unique_sql.h
+++ b/src/include/instruments/instr_unique_sql.h
@@ -38,12 +38,16 @@ typedef struct {
     int64 max_time;   /* max time for unique sql entry's history events */
 } UniqueSQLElapseTime;
 
+const size_t cacheline_factor = PG_CACHE_LINE_SIZE / sizeof(uint64);
+
 typedef struct UniqueSQLTime {
-    int64 TimeInfoArray[TOTAL_TIME_INFO_TYPES];
+    int64& operator[](int idx) { return TimeInfoArray[idx * cacheline_factor]; }
+    int64 TimeInfoArray[TOTAL_TIME_INFO_TYPES * cacheline_factor];
 } UniqueSQLTime;
 
 typedef struct UniqueSQLNetInfo {
-    uint64 netInfoArray[TOTAL_NET_INFO_TYPES];
+    uint64& operator[](int idx) { return netInfoArray[idx * cacheline_factor]; }
+    uint64 netInfoArray[TOTAL_NET_INFO_TYPES * cacheline_factor];
 } UniqueSQLNetInfo;
 
 typedef struct UniqueSQLWorkMemInfo {

**Environment**:
- Version:
opengauss v2.1.0
- OS (e.g. from /etc/os-release): 
NAME="openEuler"
VERSION="20.03 (LTS-SP2)"
ID="openEuler"
VERSION_ID="20.03"
PRETTY_NAME="openEuler 20.03 (LTS-SP2)"
ANSI_COLOR="0;31"
- Kernel (e.g. `uname -a`):
Linux openEuler169 4.19.90-2106.3.0.0095.oe1.aarch64 #1 SMP Wed Jun 23 14:51:58 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
- Install tools:
pgbench
- Others:

openGauss/openGauss-server

内容风险标识

评论 (2)

openGauss/openGauss-server .gitee-modal { width: 500px !important; }

内容风险标识