# c2goasm **Repository Path**: mirrors_minio/c2goasm ## Basic Information - **Project Name**: c2goasm - **Description**: C to Go Assembly - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-09 - **Last Updated**: 2025-11-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # c2goasm: C to Go Assembly ## Introduction This is a tool to convert assembly as generated by a C/C++ compiler into Golang assembly. It is meant to be used in combination with [asm2plan9s](https://github.com/minio/asm2plan9s) in order to automatically generate pure Go wrappers for C/C++ code (that may for instance take advantage of compiler SIMD intrinsics or `template<>` code). Mode of operation: ``` $ c2goasm -a /path/to/some/great/c-code.s /path/to/now/great/golang-code_amd64.s ``` You can optionally nicely format the code using [asmfmt](https://github.com/klauspost/asmfmt) by passing in an `-f` flag. This project has been developed as part of developing a Go wrapper around [Simd](https://github.com/fwessels/go-cv-simd). However it should also work with other projects and libraries. Keep in mind though that it is not intented to 'port' a complete C/C++ project in a single action but rather do it on a case-by-case basis per function/source file (and create accompanying high level Go code to call into the assembly code). ## Command line options ``` $ c2goasm --help Usage of c2goasm: -a Immediately invoke asm2plan9s -c Compact byte codes -f Format using asmfmt -s Strip comments ``` ## A simple example Here is a simple C function doing an AVX2 intrinsics computation: ``` void MultiplyAndAdd(float* arg1, float* arg2, float* arg3, float* result) { __m256 vec1 = _mm256_load_ps(arg1); __m256 vec2 = _mm256_load_ps(arg2); __m256 vec3 = _mm256_load_ps(arg3); __m256 res = _mm256_fmadd_ps(vec1, vec2, vec3); _mm256_storeu_ps(result, res); } ``` Compiling into assembly gives the following ``` __ZN14MultiplyAndAddEPfS1_S1_S1_: ## @_ZN14MultiplyAndAddEPfS1_S1_S1_ ## BB#0: push rbp mov rbp, rsp vmovups ymm0, ymmword ptr [rdi] vmovups ymm1, ymmword ptr [rsi] vfmadd213ps ymm1, ymm0, ymmword ptr [rdx] vmovups ymmword ptr [rcx], ymm1 pop rbp vzeroupper ret ``` Running `c2goasm` will generate the following Go assembly (eg. saved in `MultiplyAndAdd_amd64.s`) ``` //+build !noasm !appengine // AUTO-GENERATED BY C2GOASM -- DO NOT EDIT TEXT ยท_MultiplyAndAdd(SB), $0-32 MOVQ vec1+0(FP), DI MOVQ vec2+8(FP), SI MOVQ vec3+16(FP), DX MOVQ result+24(FP), CX LONG $0x0710fcc5 // vmovups ymm0, yword [rdi] LONG $0x0e10fcc5 // vmovups ymm1, yword [rsi] LONG $0xa87de2c4; BYTE $0x0a // vfmadd213ps ymm1, ymm0, yword [rdx] LONG $0x0911fcc5 // vmovups yword [rcx], ymm1 VZEROUPPER RET ``` This needs to be accompanied by the following Go code (in `MultiplyAndAdd_amd64.go`) ``` //go:noescape func _MultiplyAndAdd(vec1, vec2, vec3, result unsafe.Pointer) func MultiplyAndAdd(someObj Object) { _MultiplyAndAdd(someObj.GetVec1(), someObj.GetVec2(), someObj.GetVec3(), someObj.GetResult())) } ``` And as you may have gathered the amd64.go file needs to be in place in order for the arguments names to be derived (and allow `go vet` to succeed). ## Benchmark against cgo We have run benchmarks of `c2goasm` versus `cgo` for both Go version 1.7.5 and 1.8.1. You can find the `c2goasm` benchmark test in `test/` and the `cgo` test in `cgocmp/` respectively. Here are the results for both versions: ``` $ benchcmp ../cgocmp/cgo-1.7.5.out c2goasm.out benchmark old ns/op new ns/op delta BenchmarkMultiplyAndAdd-12 382 10.9 -97.15% ``` ``` $ benchcmp ../cgocmp/cgo-1.8.1.out c2goasm.out benchmark old ns/op new ns/op delta BenchmarkMultiplyAndAdd-12 236 10.9 -95.38% ``` As you can see Golang 1.8 has made a significant improvement (38.2%) over 1.7.5, but it is still about 20x slower than directly calling into assembly code as wrapped by `c2goasm`. ## Converted projects - [go-cv-simd (WIP)](https://github.com/fwessels/go-cv-simd) ## Internals The basic process is to (in the prologue) setup the stack and registers as how the C code expects this to be the case, and upon exiting the subroutine (in the epilogue) to revert back to the golang world and pass a return value back if required. In more details: - Define assembly subroutine with proper golang decoration in terms of needed stack space and overall size of arguments plus return value. - Function arguments are loaded from the golang stack into registers and prior to starting the C code any arguments beyond 6 are stored in C stack space. - Stack space is reserved and setup for the C code. Depending on the C code, the stack pointer maybe aligned on a certain boundary (especially needed for code that takes advantages of SIMD instructions such as AVX etc.). - A constants table is generated (if needed) and any `rip`-based references are replaced with proper offsets to where Go will put the table. ## Limitations - Arguments need (for now) to be 64-bit size, meaning either a value or a pointer (this requirement will be lifted) - Maximum number of 14 arguments (hard limit -- if you hit this maybe you should rethink your api anyway...) - Generally no `call` statements (thus inline your C code) with a couple of exceptions for functions such as `memset` and `memcpy` (see `clib_amd64.s`) ## Generate assembly from C/C++ For eg. projects using cmake, here is how to see a list of assembly targets ``` $ make help | grep "\.s" ``` To see the actual command to generate the assembly ``` $ make -n SimdAvx2BgraToGray.s ``` ## Supported golang architectures For now just the AMD64 architecture is supported. Also ARM64 should work just fine in a similar fashion but support is lacking at the moment. ## Compatible compilers The following compilers have been tested: - `clang` (Apple LLVM version) on OSX/darwin - `clang` on linux Compiler flags: ``` -masm=intel -mno-red-zone -mstackrealign -mllvm -inline-threshold=1000 -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti ``` | Flag | Explanation | |:----------------------------------| :--------------------------------------------------| | `-masm=intel` | Output Intel syntax for assembly | | `-mno-red-zone` | Do not write below stack pointer (avoid [red zone](https://en.wikipedia.org/wiki/Red_zone_(computing))) | | `-mstackrealign` | Use explicit stack initialization | | `-mllvm -inline-threshold=1000` | Higher limit for inlining heuristic (default=255) | | `-fno-asynchronous-unwind-tables` | Do not generate unwind tables (for debug purposes) | | `-fno-exceptions` | Disable exception handling | | `-fno-rtti` | Disable run-time type information | The following flags are only available in `clang -cc1` frontend mode (see [below]()): | Flag | Explanation | |:----------------------------------| :------------------------------------------------------------------| | `-fno-jump-tables` | Do not use jump tables as may be generated for `select` statements | #### `clang` vs `clang -cc1` As per the clang [FAQ](https://clang.llvm.org/docs/FAQ.html#driver), `clang -cc1` is the frontend, and `clang` is a (mostly GCC compatible) driver for the frontend. To see all options that the driver passes on to the frontend, use `-###` like this: ``` $ clang -### -c hello.c "/usr/lib/llvm/bin/clang" "-cc1" "-triple" "x86_64-pc-linux-gnu" etc. etc. etc. ``` #### Command line flags for clang To see all command line flags use either `clang --help` or `clang --help-hidden` for the clang driver or `clang -cc1 -help` for the frontend. #### Further optimization and fine tuning Using the LLVM optimizer ([opt](http://llvm.org/docs/CommandGuide/opt.html)) you can further optimize the code generation. Use `opt -help` or `opt -help-hidden` for all available options. An option can be passed in via `clang` using the `-mllvm ` option, such as `-mllvm -inline-threshold=1000` as discussed above. Also LLVM allows you to tune specific functions via [function attributes](http://llvm.org/docs/LangRef.html#function-attributes) like `define void @f() alwaysinline norecurse { ... }`. #### What about GCC support? For now GCC code will not work out of the box. However there is no reason why GCC should not work fundamentally (PRs are welcome). ## Resources - [A Primer on Go Assembly](https://github.com/teh-cmc/go-internals/blob/master/chapter1_assembly_primer/README.md) - [Go Function in Assembly](https://github.com/golang/go/files/447163/GoFunctionsInAssembly.pdf) - [Stack frame layout on x86-64](http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64) - [Compiler Explorer (interactive)](https://go.godbolt.org/) ## License c2goasm is released under the Apache License v2.0. You can find the complete text in the file LICENSE. ## Contributing Contributions are welcome, please send PRs for any enhancements.