Some IoT devices may require to run the AOT file from flash or ROM which is read-only, so as to reduce the memory consumption, or resolve the issue that there is no executable memory available to run AOT code. In such case, the AOT code inside the AOT file shouldn't be duplicated into memory and shouldn't be modified (or patched) by the AOT relocations. To address this, WAMR implements the XIP (Execution In Place) feature, which generates the AOT relocations as few as possible:
llvm.experimental.constrained.fadd.f32
is replaced by the calling to aot_intrinsic_fadd_f32
.The XIP file is an AOT file without (or with few) relocations to patch the AOT code (or text section). Developer can use the option --enable-indirect-mode --disable-llvm-intrinsics
for wamrc to generate the AOT file, e.g.:
wamrc --enable-indirect-mode --disable-llvm-intrinsics -o <aot_file> <wasm_file>
or
wamrc --xip -o <aot_file> <wasm_file>
Note: --xip is a short option for --enable-indirect-mode --disable-llvm-intrinsics
There may be some relocations to the ".rodata" like sections which require to patch the AOT code. More work will be done to resolve it in the future.
WAMR provides a default mapping table for some targets, but it may not be the best one for your target. And it doesn't cover all the supported targets.
So, wamrc provides the option --enable-builtin-intrinsics=<intr1,intr2,...>
to make it possible to tune the intrinsic functions for your target.
Firstly, you should understand why we don't use the LLVM intrinsic functions directly. The reason is that the LLVM intrinsic functions can't map to the native instructions directly, e.g. the LLVM intrinsic function i32.div_s
can't map to the native instruction if the target doesn't support the division instruction, it will be translated to a function call to the runtime function from libgcc/compiler-rt. This will cause the AOT code to have the relocations to the libgcc/compiler-rt, which is not acceptable for the XIP feature.
So, we need to replace the LLVM intrinsic functions with the runtime self implemented functions, which can be called through the function pointer table (--enable-indirect-mode) and don't have the relocations to the libgcc/compiler-rt (--disable-llvm-intrinsics).
Available intrinsic functions for tuning:
LLVM intrinsic function | Explanation |
---|---|
llvm.experimental.constrained.fadd.f32 | float32 add |
llvm.experimental.constrained.fadd.f64 | float64 add |
llvm.experimental.constrained.fsub.f32 | float32 sub |
llvm.experimental.constrained.fsub.f64 | float64 sub |
llvm.experimental.constrained.fmul.f32 | float32 mul |
llvm.experimental.constrained.fmul.f64 | float64 mul |
llvm.experimental.constrained.fdiv.f32 | float32 div |
llvm.experimental.constrained.fdiv.f64 | float64 div |
llvm.fabs.f32 | float32 abs |
llvm.fabs.f64 | float64 abs |
llvm.ceil.f32 | float32 ceil |
llvm.ceil.f64 | float64 ceil |
llvm.floor.f32 | float32 floor |
llvm.floor.f64 | float64 floor |
llvm.trunc.f32 | float32 trunc |
llvm.trunc.f64 | float64 trunc |
llvm.rint.f32 | float32 rint |
llvm.rint.f64 | float64 rint |
llvm.sqrt.f32 | float32 sqrt |
llvm.sqrt.f64 | float64 sqrt |
llvm.copysign.f32 | float32 copysign |
llvm.copysign.f64 | float64 copysign |
llvm.minnum.f32 | float32 minnum |
llvm.minnum.f64 | float64 minnum |
llvm.maxnum.f32 | float32 maxnum |
llvm.maxnum.f64 | float64 maxnum |
llvm.ctlz.i32 | int32 count leading zeros |
llvm.ctlz.i64 | int64 count leading zeros |
llvm.cttz.i32 | int32 count trailing zeros |
llvm.cttz.i64 | int64 count trailing zeros |
llvm.ctpop.i32 | int32 count population |
llvm.ctpop.i64 | int64 count population |
f64_convert_i32_s | int32 to float64 |
f64_convert_i32_u | uint32 to float64 |
f32_convert_i32_s | int32 to float32 |
f32_convert_i32_u | uint32 to float32 |
f64_convert_i64_s | int64 to float64 |
f64_convert_i64_u | uint64 to float64 |
f32_convert_i64_s | int64 to float32 |
f32_convert_i64_u | uint64 to float32 |
i32_trunc_f32_s | float32 to int32 |
i32_trunc_f32_u | float32 to uint32 |
i32_trunc_f64_s | float64 to int32 |
i32_trunc_f64_u | float64 to uint32 |
i64_trunc_f64_s | float64 to int64 |
i64_trunc_f64_u | float64 to uint64 |
i64_trunc_f32_s | float32 to int64 |
i64_trunc_f32_u | float32 to uint64 |
f32_demote_f64 | float64 to float32 |
f64_promote_f32 | float32 to float64 |
f32_cmp | float32 compare |
f64_cmp | float64 compare |
i64.div_s | int64 div |
i64.div_u | uint64 div |
i32.div_s | int32 div |
i32.div_u | uint32 div |
i64.rem_s | int64 rem |
i64.rem_u | uint64 rem |
i32.rem_s | int32 rem |
i32.rem_u | uint32 rem |
i64.or | int64 or |
i64.and | int64 and |
i32.const | emit i32 const into constant table |
i64.const | emit i64 const into constant table |
f32.const | emit f32 const into constant table |
f64.const | emit f64 const into constant table |
And also provide combined intrinsic functions to simplify the tuning:
For ARM Cortex-M55, since it has double precision floating point unit, so it can support f32/f64 operations. But as a 32-bit MCU, it can only support 32-bit integer operations. So we can use the following command to generate the XIP binary:
wamrc --target=thumbv8m.main --cpu=cortex-m55 --xip --enable-builtin-intrinsics=i64.common -o hello.aot hello.wasm
For ARM Cortex-M3, since it has no floating point unit, and it can only support 32-bit integer operations. So we can use the following command to generate the XIP binary:
wamrc --target=thumbv7m --cpu=cortex-m3 --xip --enable-builtin-intrinsics=i64.common,fp.common,fpxint -o hello.aot hello.wasm
Other platforms can be tuned in the same way, which intrinsic should be enabled depends on the target platform's hardware capability.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。