Note: This is moved to https://gitee.com/tinylab/microbench

microbench

This benchmark is designed to measure the time cost of the core CPU instructions or the core combined instructions.

It aims to such goals:

help developers understand the existing glittering kernel and application code snippets.
reveal the shortcoming of a CPU design, guide the next-generation design.
guide the software optimization direction from the instruction level.

It is based on the google benchmark framework.

Installation

Please install make, gcc, g++ and cmake at first.

If cpufreq supported, before testing, please make sure the cpu frequency is locked at a fixed level (base frequency or max frequency).

$ sudo -s
# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
or
# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

Usage

Run it for x86_64.

$ make logging
benchmark/build/test/x86_64
2022-03-21T22:55:56+08:00
Running benchmark/build/test/x86_64
Run on (3 X 1992 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x3)
  L1 Instruction 32 KiB (x3)
  L2 Unified 256 KiB (x3)
  L3 Unified 8192 KiB (x3)
Load Average: 1.74, 1.13, 1.05
------------------------------------------------------------------
Benchmark			 Time		  CPU	Iterations
------------------------------------------------------------------
BM_nop			     0.288 ns	     0.277 ns	1000000000
BM_ub			     0.974 ns	     0.972 ns	 708061514
BM_bnez			     0.992 ns	     0.991 ns	 666291716
BM_beqz			      1.03 ns	      1.03 ns	 666640065
BM_load_bnez		     0.565 ns	     0.563 ns	1000000000
BM_load_beqz		     0.863 ns	     0.859 ns	 805747354
BM_cache_miss_load_bnez	      1.34 ns	     0.334 ns	1000000000
BM_cache_miss_load_beqz	      2.31 ns	     0.326 ns	1000000000

Run it for the other architectures

Please create a new test/$(ARCH).cc at first, for example:

$ cp test/x86_64.cc test/aarch64.cc

And then, refer to the target ISA Spec and customize the instructions in test/$(ARCH).cc.

Finally, copy the whole microbench directory to the target machine with Aarch64 cpu, and run it:

$ make

Logging it:

$ make logging

Logging it if product or cpumodel can not be fetched automatically:

$ make logging PRODUCT=product-name CPUMODEL=cpu-name

If the detected cpu frequency is wrong, please modify it manually, thanks!

Run with or without loop optimization

By default, the iterations optimization is enabled, to disable / enable it explicitly:

// prevent iterations optimization
$ make O=0

// allow iterations optimization
$ make O=1

Static compiling

If want to run in another embedded system, statically compile it:

$ make clean
$ make STATIC=1

Cross compiling

We can cross compile for target architecture easily:

$ make clean
$ make ARCH=riscv64 clean
$ make ARCH=riscv64

泰晓科技/RISCV-Linux

microbench

Installation

Usage

Run it for the other architectures

Run with or without loop optimization

Static compiling

Cross compiling

简介

发行版

贡献者

语言

近期动态

泰晓科技/RISCV-Linux .gitee-modal { width: 500px !important; }

microbench

Installation

Usage

Run it for the other architectures

Run with or without loop optimization

Static compiling

Cross compiling

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

语言

近期动态

搜索帮助

泰晓科技/RISCV-Linux