From d9ed718fb82762d85a8d69dc85245188f46ba1b4 Mon Sep 17 00:00:00 2001 From: yang-youwen <1483224757@qq.com> Date: Sat, 28 Sep 2024 21:09:04 +0800 Subject: [PATCH 1/2] add GCC/LLVM implementation-defined behaviours --- .../implementation-defined behaviours/C11.md | 2035 +++++++++++++++++ .../CPP14.md | 162 ++ .../readme.md | 93 + 3 files changed, 2290 insertions(+) create mode 100644 Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md create mode 100644 Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md create mode 100644 Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md diff --git a/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md b/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md new file mode 100644 index 0000000..4f96e67 --- /dev/null +++ b/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md @@ -0,0 +1,2035 @@ + + +# Implementation-defined Behaviors + +## 1. Translation + +### 1. How a diagnostic is identified ([3.10](https://port70.net/~nsz/c/c11/n1570.html#3.10), [5.1.1.3](https://port70.net/~nsz/c/c11/n1570.html#5.1.1.3)). + +- 如何识别诊断 +- Diagnostics consist of all the output sent to stderr by LLVM/Clang. + +```c +#include + +int main(){ + int a=1; + a = i; + return 0; +} +``` + +结果如下: + +```bash +$ make gcc +gcc --std=c11 ./test.c -o target_gcc +./test.c: In function ‘main’: +./test.c:5:9: error: ‘i’ undeclared (first use in this function) + 5 | a = i; + | ^ +./test.c:5:9: note: each undeclared identifier is reported only once for each function it appears in +make: *** [Makefile:13: gcc] Error 1 +``` + +```bash +$ make clang +clang --std=c11 ./test.c -o target_clang +./test.c:5:9: error: use of undeclared identifier 'i' + a = i; + ^ +1 error generated. +make: *** [Makefile:17: clang] Error 1 +``` + +与GCC一致。 + +### 2. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character in translation phase 3 ([5.1.1.2](https://port70.net/~nsz/c/c11/n1570.html#5.1.1.2)). + +- 在翻译阶段3中,是否保留每个非空的空白字符序列(换行符除外)或者用一个空格字符替换它 +- In textual output, each whitespace sequence is collapsed to a single space. For aesthetic reasons, the first token on each non-directive line of output is preceded with sufficient spaces that it appears in the same column as it did in the original source file. +- 包含制表符 (\t)、换页 (\f) 或纵向输入 (\v) 的非空字符序列将替换为一个空格字符。 + +```c +int main(){ + int a=1; + return 0; +} +``` + +经过预处理后(执行`make preprocess_gcc`与`make preprocess_clang`)结果分别如下: + +```c +//output_gcc: +# 0 "./test.c" +# 0 "" +# 0 "" +# 1 "/usr/include/stdc-predef.h" 1 3 4 +# 0 "" 2 +# 1 "./test.c" +int main(){ + int a=1; + return 0; +} + +``` + +```c +//output_clang: +# 1 "./test.c" +# 1 "" 1 +# 1 "" 3 +# 360 "" 3 +# 1 "" 1 +# 1 "" 2 +# 1 "./test.c" 2 +int main(){ + int a=1; + return 0; +} + +``` + +与GCC一致。 + +## 2. Environment + +The behavior of most of these points are dependent on the implementation of the **C library**. + +即有部分实现定义行为并未由GCC或LLVM决定,可能由C标准库实现或者C标准决定,这些行为已在本文档中明确指明。 + +### 1. The mapping between physical source file multibyte characters and the source character set in translation phase 1 ([5.1.1.2](https://port70.net/~nsz/c/c11/n1570.html#5.1.1.2)). + +- 翻译阶段1中物理源文件多字节字符与源字符集之间的映射。 + +- GCC: + - The input character set can be specified using the -finput-charset option, while the execution character set may be controlled using the -fexec-charset and -fwide-exec-charset options. + - GCC 可以指定输入源文件的字符集和执行字符集。默认为UTF-8 + +- 但是clang无法做到这一点,LLVM's execution character set is always UTF-8 with respect to the C standard,输入源文件也是如此,如果使用了其他字符集,并且包含非 UTF-8 编码的多字节字符,Clang 可能会无法正确解析这些字符。 + + - [Character set - llvm-mos](https://llvm-mos.org/wiki/Character_set) + +- 事实上,社区里有人试图实现fexec-charset support + + [RFC: Enabling fexec-charset support to LLVM and clang (Reposting) - Clang Frontend - LLVM Discussion Forums](https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512) + +### 2. The name and type of the function called at program startup in a freestanding environment ([5.1.2.1](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.1)). + +- 在独立环境中,程序启动时调用的函数的名称和类型。 + +- Not defined by GCC or LLVM themself. + +### 3. The effect of program termination in a freestanding environment ([5.1.2.1](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.1)). + +- 在独立环境中程序终止的影响 + +- Not defined by GCC or LLVM themself. + +### 4. An alternative manner in which the main function may be defined ([5.1.2.2.1](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.2.1)). + +- 定义 `main` 函数的另一种方式 + +- Not defined by GCC or LLVM themself. + +- C11标准: + + - > - [1](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.2.1p1) The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters: + > + > ``` + > int main(void) { /* ... */ } + > ``` + > + > or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared): + > + > ``` + > int main(int argc, char *argv[]) { /* ... */ } + > ``` + > + > or equivalent;[**10)**](https://port70.net/~nsz/c/c11/n1570.html#note10) or in some other implementation-defined manner. + > + > [2](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.2.1p2) If they are declared, the parameters to the main function shall obey the following constraints: + > + > - The value of argc shall be nonnegative. + > - argv[argc] shall be a null pointer. + > - If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. If the host environment is not capable of supplying strings with letters in both uppercase and lowercase, the implementation shall ensure that the strings are received in lowercase. + > - If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters. + > - The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination. + +### 5. The values given to the strings pointed to by the argv argument to main ([5.1.2.2.1](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.2.1)). + +- 传递给 `main` 函数的 `argv` 参数所指向的字符串的值。 + +- Not defined by GCC or LLVM themself. + +- C99标准: + + - > [1](https://port70.net/~nsz/c/c99/n1256.html#5.1.2.2.1p1) The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters: + > + > ``` + > int main(void) { /* ... */ } + > ``` + > + > or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared): + > + > ``` + > int main(int argc, char *argv[]) { /* ... */ } + > ``` + > + > or equivalent;[**9)**](https://port70.net/~nsz/c/c99/n1256.html#note9) or in some other implementation-defined manner. + +### 6. What constitutes an interactive device ([5.1.2.3](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.3)). + +- 什么构成交互设备。 + +- Not defined by GCC or LLVM themself. + +- 仅作了解用:LLVM源代码中,于`llvm\lib\Support\Unix\Process.inc`中,有: + + - ```c + bool Process::FileDescriptorIsDisplayed(int fd) { + #if HAVE_ISATTY + return isatty(fd); + #else + // If we don't have isatty, just return false. + return false; + #endif + } + ``` + + - 可见交互式设备是系统库调用 `isatty`() 为其返回非零值的设备。 + +### 7. Whether a program can have more than one thread of execution in a freestanding environment ([5.1.2.4](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.4)). + +- 在独立环境中,程序是否可以拥有多个执行线程。 +- Not defined by GCC or LLVM themself. + +### 8. The set of signals, their semantics, and their default handling ([7.14](https://port70.net/~nsz/c/c11/n1570.html#7.14)). + +- 信号的集合、它们的语义以及它们的默认处理方式。 +- Not defined by GCC or LLVM themself. + +### 9. Signal values other than SIGFPE, SIGILL, and SIGSEGV that correspond to a computational exception ([7.14.1.1](https://port70.net/~nsz/c/c11/n1570.html#7.14.1.1)). + +- 除了 SIGFPE、SIGILL 和 SIGSEGV 之外,与计算异常对应的信号值。 +- Not defined by GCC or LLVM themself. + +### 10. Signals for which the equivalent of signal(sig, SIG_IGN); is executed at program startup ([7.14.1.1](https://port70.net/~nsz/c/c11/n1570.html#7.14.1.1)). + +- 程序启动时执行 `signal(sig, SIG_IGN);` 等效操作的信号。 +- Not defined by GCC or LLVM themself. + +### 11. The set of environment names and the method for altering the environment list used by the getenv function ([7.22.4.6](https://port70.net/~nsz/c/c11/n1570.html#7.22.4.6)). + +- 环境变量名的集合以及用于 `getenv` 函数修改环境列表的方法 +- Not defined by GCC or LLVM themself. + +### 12. The manner of execution of the string by the system function ([7.22.4.8](https://port70.net/~nsz/c/c11/n1570.html#7.22.4.8)). + +- 通过 `system` 函数执行字符串的方式。 +- Not defined by GCC or LLVM themself. + +## 3. Identifiers + +### 1. Which additional multibyte characters may appear in identifiers and their correspondence to universal character names ([6.4.2](https://port70.net/~nsz/c/c11/n1570.html#6.4.2)). + +- 哪些额外的多字节字符可以出现在标识符中,以及它们与通用字符名称的对应关系。 +- GCC accepts in identifiers exactly those extended characters that correspond to universal character names permitted by the chosen standard.(C11) + +```c +#include + +int main(){ + int 变量 =1; + 变量= 变量+1; + printf("%d\n",变量); + return 0; +} +``` + +```c +//target_gcc +2 + +``` + +可见,使用通用字符名是可行的,unicode都能表示,不一一实验了。 + +与GCC一致。 + +### 2. The number of significant initial characters in an identifier ([5.2.4.1](https://port70.net/~nsz/c/c11/n1570.html#5.2.4.1), [6.4.2](https://port70.net/~nsz/c/c11/n1570.html#6.4.2)). + +- 标识符中有意义的初始字符数。 + +- 对于C11而言,限制较宽松,C11 对内部标识符要求至少支持63 个有效字符,对外部标识符要求至少支持31 个有效字符: + + - > The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:18) + > + > ... + > — 63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character) + > — 31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)19) + > ... + > + > 18) Implementations should avoid imposing fixed translation limits whenever possible. + > 19) See ‘‘future language directions’’ (6.11.3). + > + > §6.11.3 **External names** + > ¶1 Restriction of the significance of an external name to fewer than 255 characters (considering each universal character name or extended source character as a single character) is an obsolescent feature that is a concession to existing implementations. + + - [What's the exact role of "significant characters" in C (variables)? - Stack Overflow](https://stackoverflow.com/questions/18290165/whats-the-exact-role-of-significant-characters-in-c-variables) + +- 而对于GCC而言: + + - For internal names, all characters are significant. For external names, the number of significant characters are defined by the linker; for almost all targets, all characters are significant. + +```c +#include + +// 定义两个标识符,前63个字符相同,但第64个字符不同 +int aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa = 10; +int aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab = 20; + +int main() { + // 打印两个变量的值,以验证编译器是否区分它们 + printf("Value of variable a: %d\n", aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa); + printf("Value of variable b: %d\n", aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab); + + // 验证编译器是否将两个变量视为相同的标识符 + if (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa == + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab) { + printf("Compiler treats both identifiers as the same.\n"); + } else { + printf("Compiler treats both identifiers as different.\n"); + } + + return 0; +} + +``` + +```c +//target_gcc +Value of variable a: 10 +Value of variable b: 20 +Compiler treats both identifiers as different. + +``` + +```c +//target_clang +Value of variable a: 10 +Value of variable b: 20 +Compiler treats both identifiers as different. + +``` + +可见都突破了63的限制。 + +与GCC一致。 + +### 3. Whether case distinctions are significant in an identifier with external linkage (C90 6.1.2). + +- C99 and C11 require that case distinctions are always significant in identifiers. +- 在C11标准下,这已经不是一个实现定义行为。 + +## 4. Characters + +### 1. The number of bits in a byte ([3.6](https://port70.net/~nsz/c/c11/n1570.html#3.6)). + +- 一个字节中的位数。 +- Determined by ABI. + +一般而言就为8位。 + +### 2. The values of the members of the execution character set ([5.2.1](https://port70.net/~nsz/c/c11/n1570.html#5.2.1)). + +- 执行字符集成员的值。 +- Determined by ABI. + - 不过由于clang的执行字符集固定为UTF-8,所以本质而言是确定的。 + +### 3. The unique value of the member of the execution character set produced for each of the standard alphabetic escape sequences ([5.2.2](https://port70.net/~nsz/c/c11/n1570.html#5.2.2)). + +- 标准字母表转义序列生成的执行字符集成员的唯一值。 +- Determined by ABI. + - 对于测试环境而言: + +```c +#include + +int main() { + char a = '\a'; + char b = '\b'; + char f = '\f'; + char n = '\n'; + char r = '\r'; + char t = '\t'; + char v = '\v'; + printf("%d,%d,%d,%d,%d,%d,%d\n", a, b, f, n, r, t, v); + + return 0; + +} +``` + +```c +//target_clang +7,8,12,10,13,9,11 +``` + +```c +//target_gcc +7,8,12,10,13,9,11 +``` + +对于该测试环境而言,如下表: + +| **Escape Sequence** | **Unique Value** | +| ------------------- | ---------------- | +| \a | 7 | +| \b | 8 | +| \f | 12 | +| \n | 10 | +| \r | 13 | +| \t | 9 | +| \v | 11 | + +### 4. The value of a char object into which has been stored any character other than a member of the basic execution character set ([6.2.5](https://port70.net/~nsz/c/c11/n1570.html#6.2.5)). + +- 存储非基本执行字符集成员的 char 对象的值 +- Determined by ABI. + - 一般而言会报错,因为这些Unicode字符可能会超过char类型能表示的范围(0-255) + +在本测试环境下: + +```c +#include + +int main() { + char a = '€'; // 尝试存储 Unicode 字符 '€' 到 char 对象中 + printf("%c", a); +} + +``` + + + +```bash +$ make gcc +gcc --std=c11 ./test.c -o target_gcc +./test.c: In function ‘main’: +./test.c:4:14: warning: multi-character character constant [-Wmultichar] + 4 | char a = '€'; // 尝试存储 Unicode 字符 '€' 到 char 对象中 + | ^~~ +./test.c:4:14: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘14844588’ to ‘-84’ [-Woverflow] +echo //target_gcc > output_gcc +./target_gcc>>output_gcc + +$ make clang +clang --std=c11 ./test.c -o target_clang +./test.c:4:14: error: character too large for enclosing character literal type + char a = '€'; // 尝试存储 Unicode 字符 '€' 到 char 对象中 + ^ +1 error generated. +make: *** [Makefile:20: clang] Error 1 +``` + +唯一不同的是,溢出后GCC会继续执行,LLVM/Clang会报错。 + +### 5. Which of signed char or unsigned char has the same range, representation, and behavior as ''plain'' char ([6.2.5](https://port70.net/~nsz/c/c11/n1570.html#6.2.5), [6.3.1.1](https://port70.net/~nsz/c/c11/n1570.html#6.3.1.1)). + +- signed char 和 unsigned char 中哪个与“普通” char 具有相同范围、表示和行为。 + +- Determined by ABI. + - 对于本测试环境而言,为signed char。 + +```c +#include + +int main(){ + char a = -1; + printf("%d", a); + return 0; +} + +``` + +```c +//target_gcc +-1 +``` + +```c +//target_clang +-1 +``` + + + +### 6. The mapping of members of the source character set (in character constants and string literals) to members of the execution character set ([6.4.4.4](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4), [5.1.1.2](https://port70.net/~nsz/c/c11/n1570.html#5.1.1.2)). + +- 源字符集(在字符常量和字符串字面量中)到执行字符集的映射。 +- Determined by ABI. + - 但值得注意的是,clang要求source character set以及execution character set都是UTF-8。 + +### 7. The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character ([6.4.4.4](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4)). + +- 包含多个字符或包含不映射到单字节执行字符的字符或转义序列的整数字符常量的值。 + +```c +#include + +int main(int argc, char *argv[]){ + int a = 'AB'; + + printf("%d", a); + + return 0; +} + +``` + +```c +//target_gcc +16706 +``` + +```c +//target_clang +16706 +``` + + + +```c +#include + +int main(int argc, char *argv[]){ + int a = '😊'; + + printf("%d", a); + + return 0; +} + +``` + +```bash +$ make test +gcc --std=c11 ./test.c -o target_gcc +./test.c: In function ‘main’: +./test.c:4:13: warning: multi-character character constant [-Wmultichar] + 4 | int a = '😊'; + | ^~~~ +echo //target_gcc > output_gcc +./target_gcc>>output_gcc +clang --std=c11 ./test.c -o target_clang +./test.c:4:13: error: character too large for enclosing character literal type + int a = '😊'; + ^ +1 error generated. +make: *** [Makefile:20: clang] Error 1 +``` + +```c +//target_gcc +-257976182 +``` + +二种类型的实验如上,可见对于LLVM而言: + +1. containing more than one character:其整数值为从左到右每个char依次左移8位,并相加得出的值 +2. containing a character or escape sequence that does not map to a single-byte execution character:直接报错 + +与GCC不一致,GCC会强行转换并溢出。 + +### 8. The value of a wide character constant containing more than one multibyte character or a single multibyte character that maps to multiple members of the extended execution character set, or containing a multibyte character or escape sequence not represented in the extended execution character set ([6.4.4.4](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4)). + +- 包含多个多字节字符或单个多字节字符(映射到扩展执行字符集多个成员)或包含不在扩展执行字符集表示范围内的多字节字符或转义序列的宽字符常量的值。 + +```c +#include +#include + +int main() { + // 1. 正常的宽字符常量 + wchar_t valid_char = L'A'; // 单字符 + + // 2. 包含多个字符的宽字符常量(多字节字符) + wchar_t invalid_multi_char = L'ABCD'; // 理论上未定义行为 + + // 3. 使用Unicode字符的宽字符常量(UTF-8可能用多字节表示) + wchar_t unicode_char = L'中'; // 正常的Unicode字符 + + // 4. 尝试使用转义序列,但该转义序列可能不在扩展执行字符集中 + wchar_t invalid_escape_char = L'\xFF'; // 超出范围的转义字符,可能导致未定义行为 + + // 打印输出 + wprintf(L"Valid wchar_t: %lc\n", valid_char); + wprintf(L"Invalid multi wchar_t (multiple characters): %lc\n", invalid_multi_char); + wprintf(L"Unicode wchar_t: %lc\n", unicode_char); + wprintf(L"Invalid escape wchar_t: %lc\n", invalid_escape_char); + + return 0; +} + +``` + +```c +//target_gcc +Valid wchar_t: A +Invalid multi wchar_t (multiple characters): D +Unicode wchar_t: ? +Invalid escape wchar_t: ? + +``` + +可见在本测试环境中,对于GCC而言: + +- 标准的宽字符常量,GCC 能够正确处理并输出。 +- 多个字符的宽字符常量,GCC只取了宽字符常量中最后一个字符,并将其作为宽字符处理。 +- 在没有设置区域(locale)的情况下,GCC 可能无法正确处理并显示 Unicode 字符。 +- GCC 对于超出范围的转义字符 未进行特殊处理,只是简单地将其转换为未知字符。 + +对于LLVM而言: +```bash +$ make clang +clang --std=c11 ./test.c -o target_clang +./test.c:9:34: error: wide character literals may not contain multiple characters + wchar_t invalid_multi_char = L'ABCD'; // 理论上未定义行为 + ^ +1 error generated. +make: *** [Makefile:20: clang] Error 1 +``` + +```c +#include +#include + +int main() { + // 1. 正常的宽字符常量 + wchar_t valid_char = L'A'; // 单字符 + + // 2. 包含多个字符的宽字符常量(多字节字符) + //wchar_t invalid_multi_char = L'ABCD'; // 理论上未定义行为 + + // 3. 使用Unicode字符的宽字符常量(UTF-8可能用多字节表示) + wchar_t unicode_char = L'中'; // 正常的Unicode字符 + + // 4. 尝试使用转义序列,但该转义序列可能不在扩展执行字符集中 + wchar_t invalid_escape_char = L'\xFF'; // 超出范围的转义字符,可能导致未定义行为 + + // 打印输出 + wprintf(L"Valid wchar_t: %lc\n", valid_char); + //wprintf(L"Invalid multi wchar_t (multiple characters): %lc\n", invalid_multi_char); + wprintf(L"Unicode wchar_t: %lc\n", unicode_char); + wprintf(L"Invalid escape wchar_t: %lc\n", invalid_escape_char); + + return 0; +} + +``` + +```c +//target_clang +Valid wchar_t: A +Unicode wchar_t: ? +Invalid escape wchar_t: ? + +``` + +可见在本测试环境中,对于LLVM而言: + +- 标准的宽字符常量,LLVM能够正确处理并输出。 +- 多个字符的宽字符常量,LLVM会直接报错。 +- 在没有设置区域(locale)的情况下,LLVM可能无法正确处理并显示 Unicode 字符。 +- LLVM对于超出范围的转义字符 未进行特殊处理,只是简单地将其转换为未知字符。 + +综上,与GCC不一致。对于多个字符的宽字符常量而言,GCC会只取了宽字符常量中最后一个字符,并将其作为宽字符处理。LLVM会直接报错。 + +### 9. The current locale used to convert a wide character constant consisting of a single multibyte character that maps to a member of the extended execution character set into a corresponding wide character code ([6.4.4.4](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4)). + +- 将由当前区域设置用来将由单个多字节字符组成的宽字符常量转换为相应的宽字符代码的当前区域设置。 + +```c +#include +#include +#include + +int main() { + // 设置并打印初始区域 + wprintf(L"Default locale: %s\n", setlocale(LC_ALL, NULL)); + + // 尝试在默认区域下处理多字节中文字符 '中' + wchar_t mb_char = L'中'; // 使用中文字符 '中' + wprintf(L"Wide char in default locale: %lc (code: %d)\n", mb_char, mb_char); + + // 设置区域为 "C" 区域 + setlocale(LC_ALL, "C"); + wprintf(L"\nLocale set to C: %s\n", setlocale(LC_ALL, NULL)); + wprintf(L"Wide char in C locale: %lc (code: %d)\n", mb_char, mb_char); + + // 设置区域为 "en_US.UTF-8" + setlocale(LC_ALL, "en_US.UTF-8"); + wprintf(L"\nLocale set to en_US.UTF-8: %s\n", setlocale(LC_ALL, NULL)); + wprintf(L"Wide char in en_US.UTF-8 locale: %lc (code: %d)\n", mb_char, mb_char); + + // 设置区域为 "zh_CN.UTF-8" + setlocale(LC_ALL, "zh_CN.UTF-8"); + wprintf(L"\nLocale set to zh_CN.UTF-8: %s\n", setlocale(LC_ALL, NULL)); + wprintf(L"Wide char in zh_CN.UTF-8 locale: %lc (code: %d)\n", mb_char, mb_char); + + return 0; +} + +``` + +```c +//target_gcc +Default locale: C +Wide char in default locale: ? (code: 20013) + +Locale set to C: C +Wide char in C locale: ? (code: 20013) + +Locale set to en_US.UTF-8: en_US.UTF-8 +Wide char in en_US.UTF-8 locale: ? (code: 20013) + +Locale set to zh_CN.UTF-8: zh_CN.UTF-8 +Wide char in zh_CN.UTF-8 locale: ? (code: 20013) + +``` + +```c +//target_clang +Default locale: C +Wide char in default locale: ? (code: 20013) + +Locale set to C: C +Wide char in C locale: ? (code: 20013) + +Locale set to en_US.UTF-8: en_US.UTF-8 +Wide char in en_US.UTF-8 locale: ? (code: 20013) + +Locale set to zh_CN.UTF-8: zh_CN.UTF-8 +Wide char in zh_CN.UTF-8 locale: ? (code: 20013) + +``` + + + +与GCC一致。在 `C` 区域中,中文字符无法正确显示,而在支持 UTF-8 的区域(如 `en_US.UTF-8` 和 `zh_CN.UTF-8`),字符可以正确存储,但终端是否能正确显示则取决于终端的字体和编码支持情况。 + +### 10. Whether differently-prefixed wide string literal tokens can be concatenated and, if so, the treatment of the resulting multibyte character sequence ([6.4.5](https://port70.net/~nsz/c/c11/n1570.html#6.4.5)). + +- 不同前缀的宽字符串字面量是否可以连接,如果可以,生成的多字节字符序列的处理方式。 + +```c +#include + +int main() { + // 下面这些操作将产生编译错误,因为不同前缀的字符串字面量不能拼接 + wchar_t *str1 = L"Hello" L" World"; // 同前缀可以拼接,但为了保持一致性保留 + wchar_t *str2 = L"Hello" u" World"; // L 和 u 前缀的拼接 + wchar_t *str3 = L"Hello" U" World"; // L 和 U 前缀的拼接 + wchar_t *str4 = L"Hello" u8" World"; // L 和 u8 前缀的拼接 + char16_t *str5 = u"Hello" u8" World"; // u 和 u8 前缀的拼接 + char32_t *str6 = U"Hello" L" World"; // U 和 L 前缀的拼接 + char *str7 = u8"Hello" U" World"; // u8 和 U 前缀的拼接 + + // 打印结果以防止编译器忽略未使用的变量 + // (尽管上面的代码应该在编译阶段就会报错) + wprintf(L"%ls\n", str1); + wprintf(L"%ls\n", str2); + wprintf(L"%ls\n", str3); + wprintf(L"%ls\n", str4); + wprintf(L"%ls\n", str5); + wprintf(L"%ls\n", str6); + printf("%s\n", str7); + + return 0; +} + +``` + +```bash +$ make gcc +gcc --std=c11 ./test.c -o target_gcc +./test.c: In function ‘main’: +./test.c:6:5: error: unsupported non-standard concatenation of string literals + 6 | wchar_t *str2 = L"Hello" u" World"; // L 和 u 前缀的拼接 + | ^~~~~~~ +./test.c:6:5: error: unsupported non-standard concatenation of string literals +./test.c:7:5: error: unsupported non-standard concatenation of string literals + 7 | wchar_t *str3 = L"Hello" U" World"; // L 和 U 前缀的拼接 + | ^~~~~~~ +./test.c:7:5: error: unsupported non-standard concatenation of string literals +./test.c:8:5: error: unsupported non-standard concatenation of string literals + 8 | wchar_t *str4 = L"Hello" u8" World"; // L 和 u8 前缀的拼接 + | ^~~~~~~ +./test.c:8:5: error: unsupported non-standard concatenation of string literals +./test.c:9:5: error: unknown type name ‘char16_t’ + 9 | char16_t *str5 = u"Hello" u8" World"; // u 和 u8 前缀的拼接 + | ^~~~~~~~ +./test.c:9:5: error: unsupported non-standard concatenation of string literals +./test.c:9:5: error: unsupported non-standard concatenation of string literals +./test.c:9:22: warning: initialization of ‘int *’ from incompatible pointer type ‘short unsigned int *’ [-Wincompatible-pointer-types] + 9 | char16_t *str5 = u"Hello" u8" World"; // u 和 u8 前缀的拼接 + | ^~~~~~~~ +./test.c:10:5: error: unknown type name ‘char32_t’ + 10 | char32_t *str6 = U"Hello" L" World"; // U 和 L 前缀的拼接 + | ^~~~~~~~ +./test.c:10:5: error: unsupported non-standard concatenation of string literals +./test.c:10:5: error: unsupported non-standard concatenation of string literals +./test.c:11:5: error: unsupported non-standard concatenation of string literals + 11 | char *str7 = u8"Hello" U" World"; // u8 和 U 前缀的拼接 + | ^~~~ +./test.c:11:5: error: unsupported non-standard concatenation of string literals +./test.c:21:5: warning: implicit declaration of function ‘printf’ [-Wimplicit-function-declaration] + 21 | printf("%s\n", str7); + | ^~~~~~ +./test.c:2:1: note: include ‘’ or provide a declaration of ‘printf’ + 1 | #include + +++ |+#include + 2 | +./test.c:21:5: warning: incompatible implicit declaration of built-in function ‘printf’ [-Wbuiltin-declaration-mismatch] + 21 | printf("%s\n", str7); + | ^~~~~~ +./test.c:21:5: note: include ‘’ or provide a declaration of ‘printf’ +make: *** [Makefile:15: gcc] Error 1 +``` + +```c +make clang +clang --std=c11 ./test.c -o target_clang +./test.c:6:30: error: unsupported non-standard concatenation of string literals + wchar_t *str2 = L"Hello" u" World"; // L 和 u 前缀的拼接 + ^ +./test.c:7:30: error: unsupported non-standard concatenation of string literals + wchar_t *str3 = L"Hello" U" World"; // L 和 U 前缀的拼接 + ^ +./test.c:8:30: error: unsupported non-standard concatenation of string literals + wchar_t *str4 = L"Hello" u8" World"; // L 和 u8 前缀的拼接 + ^ +./test.c:9:5: error: use of undeclared identifier 'char16_t' + char16_t *str5 = u"Hello" u8" World"; // u 和 u8 前缀的拼接 + ^ +./test.c:9:15: error: use of undeclared identifier 'str5' + char16_t *str5 = u"Hello" u8" World"; // u 和 u8 前缀的拼接 + ^ +./test.c:9:31: error: unsupported non-standard concatenation of string literals + char16_t *str5 = u"Hello" u8" World"; // u 和 u8 前缀的拼接 + ^ +./test.c:10:5: error: use of undeclared identifier 'char32_t' + char32_t *str6 = U"Hello" L" World"; // U 和 L 前缀的拼接 + ^ +./test.c:10:15: error: use of undeclared identifier 'str6' + char32_t *str6 = U"Hello" L" World"; // U 和 L 前缀的拼接 + ^ +./test.c:10:31: error: unsupported non-standard concatenation of string literals + char32_t *str6 = U"Hello" L" World"; // U 和 L 前缀的拼接 + ^ +./test.c:11:28: error: unsupported non-standard concatenation of string literals + char *str7 = u8"Hello" U" World"; // u8 和 U 前缀的拼接 + ^ +./test.c:19:23: error: use of undeclared identifier 'str5' + wprintf(L"%ls\n", str5); + ^ +./test.c:20:23: error: use of undeclared identifier 'str6' + wprintf(L"%ls\n", str6); + ^ +./test.c:21:5: warning: implicitly declaring library function 'printf' with type 'int (const char *, ...)' [-Wimplicit-function-declaration] + printf("%s\n", str7); + ^ +./test.c:21:5: note: include the header or explicitly provide a declaration for 'printf' +1 warning and 12 errors generated. +make: *** [Makefile:20: clang] Error 1 +``` + +都不允许连接。 + +与GCC一致。 + +### 11. The current locale used to convert a wide string literal into corresponding wide character codes ([6.4.5](https://port70.net/~nsz/c/c11/n1570.html#6.4.5)). + +- 将宽字符串字面量转换为相应的宽字符代码的当前区域设置。 + +```c +#include +#include +#include + +int main() { + // 设置并打印初始区域 + wprintf(L"Default locale: %s\n", setlocale(LC_ALL, NULL)); + + // 尝试在默认区域下处理多字节中文字符 '中' + wchar_t *wstr_zh = L"Hello 世界"; + wprintf(L"Wide char in default locale: %ls (code: %d)\n", wstr_zh, wstr_zh); + + // 设置区域为 "C" 区域 + setlocale(LC_ALL, "C"); + wprintf(L"\nLocale set to C: %s\n", setlocale(LC_ALL, NULL)); + wprintf(L"Wide char in C locale: %ls (code: %d)\n", wstr_zh, wstr_zh); + + // 设置区域为 "en_US.UTF-8" + setlocale(LC_ALL, "en_US.UTF-8"); + wprintf(L"\nLocale set to en_US.UTF-8: %s\n", setlocale(LC_ALL, NULL)); + wprintf(L"Wide char in en_US.UTF-8 locale: %ls (code: %d)\n", wstr_zh, wstr_zh); + + // 设置区域为 "zh_CN.UTF-8" + setlocale(LC_ALL, "zh_CN.UTF-8"); + wprintf(L"\nLocale set to zh_CN.UTF-8: %s\n", setlocale(LC_ALL, NULL)); + wprintf(L"Wide char in zh_CN.UTF-8 locale: %ls (code: %d)\n", wstr_zh, wstr_zh); + + return 0; +} + +``` + +```c +//target_gcc +Default locale: C +Wide char in default locale: Hello ?? (code: -18620328) + +Locale set to C: C +Wide char in C locale: Hello ?? (code: -18620328) + +Locale set to en_US.UTF-8: en_US.UTF-8 +Wide char in en_US.UTF-8 locale: Hello ?? (code: -18620328) + +Locale set to zh_CN.UTF-8: zh_CN.UTF-8 +Wide char in zh_CN.UTF-8 locale: Hello ?? (code: -18620328) + +``` + +```c +//target_clang +Default locale: C +Wide char in default locale: Hello ?? (code: -1209982892) + +Locale set to C: C +Wide char in C locale: Hello ?? (code: -1209982892) + +Locale set to en_US.UTF-8: en_US.UTF-8 +Wide char in en_US.UTF-8 locale: Hello ?? (code: -1209982892) + +Locale set to zh_CN.UTF-8: zh_CN.UTF-8 +Wide char in zh_CN.UTF-8 locale: Hello ?? (code: -1209982892) + +``` + +与GCC大致相同,但有所不同: + +GCC和LLVM的处理方式基本一致,在区域设置不同的情况下,宽字符字面量的转换没有受到影响,字符依旧无法正确显示,这说明两个编译器都不依赖区域设置来决定宽字符字面量的内部表示。在本实验环境中,即使区域设置成功改变,宽字符的显示结果和编码值仍然没有变化。虽然最终显示结果相同,但GCC和LLVM在处理宽字符的编码值上存在不同的表现,这表明它们的宽字符字面量处理机制有所差异。 + +### 12. The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set ([6.4.5](https://port70.net/~nsz/c/c11/n1570.html#6.4.5)). + +- 包含不在执行字符集表示范围内的多字节字符或转义序列的字符串字面量的值。 + +```c +#include +#include + +int main() { + // 包含多字节字符的字符串字面量(中文字符和其他字符) + char *str1 = "Hello 世界"; // UTF-8 字符 + char *str2 = "\xFFHello"; // 转义序列,\xFF 在不同字符集下可能表现不同 + char *str3 = "\u4E16\u754C"; // Unicode 转义字符(世界) + + // 打印字符串并显示长度 + printf("str1 (multibyte characters): %s\n", str1); + printf("Length of str1: %zu\n", strlen(str1)); + + printf("str2 (escape sequence \\xFF): %s\n", str2); + printf("Length of str2: %zu\n", strlen(str2)); + + // 尝试打印包含Unicode转义的字符串 + printf("str3 (Unicode escape \\u4E16\\u754C): %s\n", str3); + printf("Length of str3: %zu\n", strlen(str3)); + + // 打印字符串中每个字符的具体值(ASCII或多字节表示) + printf("Character values in str1:\n"); + for (size_t i = 0; i < strlen(str1); i++) { + printf("0x%02X ", (unsigned char)str1[i]); + } + printf("\n"); + + printf("Character values in str2:\n"); + for (size_t i = 0; i < strlen(str2); i++) { + printf("0x%02X ", (unsigned char)str2[i]); + } + printf("\n"); + + return 0; +} + +``` + +GCC和Clang都能够正确处理UTF-8多字节字符,无论是直接写入字符串还是通过Unicode转义序列,这表明它们的字符处理机制在这方面是一致的。 + +两种编译器都将`\xFF`识别为不可打印字符,并显示为`�`,这符合预期,表明`\xFF`在当前执行字符集(UTF-8)中是无效字符。 + +与GCC部分一致。 + +### 13. The encoding of any of wchar_t, char16_t, and char32_t where the corresponding standard encoding macro (__STDC_ISO_10646__, __STDC_UTF_16__, or __STDC_UTF_32__) is not defined ([6.10.8.2](https://port70.net/~nsz/c/c11/n1570.html#6.10.8.2)). + +```c +#include +#include +#include + +#undef __STDC_ISO_10646__ +#undef __STDC_UTF_16__ +#undef __STDC_UTF_32__ + + +int main() { + printf("Checking encoding macros:\n"); + + #ifdef __STDC_ISO_10646__ + printf("__STDC_ISO_10646__ is defined: wchar_t uses ISO 10646 encoding\n"); + #else + printf("__STDC_ISO_10646__ is not defined: wchar_t encoding is not ISO 10646\n"); + #endif + + #ifdef __STDC_UTF_16__ + printf("__STDC_UTF_16__ is defined: char16_t uses UTF-16 encoding\n"); + #else + printf("__STDC_UTF_16__ is not defined: char16_t encoding is not UTF-16\n"); + #endif + + #ifdef __STDC_UTF_32__ + printf("__STDC_UTF_32__ is defined: char32_t uses UTF-32 encoding\n"); + #else + printf("__STDC_UTF_32__ is not defined: char32_t encoding is not UTF-32\n"); + #endif + + wchar_t wc = L'王'; + wprintf(L"wchar_t: %lc\n", wc); + + char16_t c16 = u'王'; + wprintf(L"char16_t: %u\n", (unsigned int)c16); + + char32_t c32 = U'王'; + wprintf(L"char32_t: %u\n", (unsigned int)c32); + + return 0; +} + +``` + +```c +//target_gcc +Checking encoding macros: +__STDC_ISO_10646__ is not defined: wchar_t encoding is not ISO 10646 +__STDC_UTF_16__ is not defined: char16_t encoding is not UTF-16 +__STDC_UTF_32__ is not defined: char32_t encoding is not UTF-32 +wchar_t: ? +char16_t: 29579 +char32_t: 29579 +``` + +```c +//target_clang +Checking encoding macros: +__STDC_ISO_10646__ is not defined: wchar_t encoding is not ISO 10646 +__STDC_UTF_16__ is not defined: char16_t encoding is not UTF-16 +__STDC_UTF_32__ is not defined: char32_t encoding is not UTF-32 +wchar_t: ? +char16_t: 29579 +char32_t: 29579 +``` + +经实验,GCC和LLVM行为一致。 + +## 5. Integers + +### 1. Any extended integer types that exist in the implementation ([6.2.5](https://port70.net/~nsz/c/c11/n1570.html#6.2.5)). + +- 实现中存在的任何扩展整数类型。 +- 尽管GCC实现中存在像 `__int128` 这样的非标准扩展类型,这些类型在技术上是GCC的扩展特性,但根据文档所述,GCC并没有在标准意义上(C99和C11的6.2.5节定义的上下文中)支持任何“扩展整数类型”。 + +> **GCC Extension for 128-bit Integers** GCC provides a set of nonstandard integer types as an extension, for example `__int128`. This type is not defined by the C or C++ standards but is supported on some platforms, offering extended precision for integer arithmetic. It is only supported on platforms that provide a `__int128` hardware or software support. + +同时在clang里也没有相关的描述,所以认定llvm与gcc一致,都没有。 + +### 2. Whether signed integer types are represented using sign and magnitude, two's complement, or ones' complement, and whether the extraordinary value is a trap representation or an ordinary value ([6.2.6.2](https://port70.net/~nsz/c/c11/n1570.html#6.2.6.2)). + +- 有符号整数类型是使用符号-数值表示法、二进制补码表示法还是反码表示法,以及是否有特殊值是陷阱表示法或普通值。 + +- 整数类型使用二进制补码表示法 + + > Because `arith` integers use a two’s complement representation, this operation is applicable on both signed and unsigned integer operands. + > + > ['arith' Dialect - MLIR (llvm.org)](https://mlir.llvm.org/docs/Dialects/ArithOps/) + +LLVM不支持“陷阱表示”。 +和GCC一致。 + +### 3. The rank of any extended integer type relative to another extended integer type with the same precision ([6.3.1.1](https://port70.net/~nsz/c/c11/n1570.html#6.3.1.1)). + +- 任意扩展整数类型的级别相对于具有相同精度的另一扩展整数类型的级别。 + +由于clang根本没有extended integer types,所以这个自然也没有。 + +和GCC一致。 + +### 4. The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type ([6.3.1.3](https://port70.net/~nsz/c/c11/n1570.html#6.3.1.3)). + +- 当一个整数值不能用有符号整数类型表示时,将其转换为该类型的结果,或引发的信号。 + +```c +#include +#include +#include + +int main() { + __int8_t smallInt = 300; + printf("smallInt: %d\n", smallInt); + return 0; +} +``` + +```c +//target_gcc +smallInt: 44 +``` + +```c +//target_clang +smallInt: 44 +``` + +```bash +$ clang -S -emit-llvm test.c -o test_clang.ll +test.c:6:25: warning: implicit conversion from 'int' to '__int8_t' (aka 'signed char') changes value from 300 to 44 [-Wconstant-conversion] + __int8_t smallInt = 300; + ~~~~~~~~ ^~~ +1 warning generated. +``` + +```c +; ModuleID = 'test.c' +source_filename = "test.c" +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-pc-linux-gnu" + +@.str = private unnamed_addr constant [14 x i8] c"smallInt: %d\0A\00", align 1 + +; Function Attrs: noinline nounwind optnone uwtable +define dso_local i32 @main() #0 { + %1 = alloca i32, align 4 + %2 = alloca i8, align 1 + store i32 0, i32* %1, align 4 + store i8 44, i8* %2, align 1 + %3 = load i8, i8* %2, align 1 + %4 = sext i8 %3 to i32 + %5 = call i32 (i8*, ...) @printf(i8* noundef getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i64 0, i64 0), i32 noundef %4) + ret i32 0 +} + +declare i32 @printf(i8* noundef, ...) #1 + +attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } +attributes #1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } + +!llvm.module.flags = !{!0, !1, !2, !3, !4} +!llvm.ident = !{!5} + +!0 = !{i32 1, !"wchar_size", i32 4} +!1 = !{i32 7, !"PIC Level", i32 2} +!2 = !{i32 7, !"PIE Level", i32 2} +!3 = !{i32 7, !"uwtable", i32 1} +!4 = !{i32 7, !"frame-pointer", i32 2} +!5 = !{!"Ubuntu clang version 14.0.0-1ubuntu1.1"} +``` + +GCC:For conversion to a type of width *N*, the value is reduced modulo *2^N* to be within range of the type; no signal is raised. + +在LLVM前端的中间代码生成阶段,已经自动对300取模得到44了 + +注意:虽然[document implementation-defined behavior · Issue #11644 · llvm/llvm-project (github.com)](https://github.com/llvm/llvm-project/issues/11644)中提及,此处GCC和LLVM行为不同,但经过实验,发现相同,在本实现环境下并没有该issue中提到的类似于`%conv = trunc i32 %x to i8`的IR代码。 + +与GCC一致。 + +### 5. The results of some bitwise operations on signed integers ([6.5](https://port70.net/~nsz/c/c11/n1570.html#6.5)). + +- 对有符号整数进行某些按位操作的结果。 + +- Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension. + + As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed ‘<<’ as undefined. However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such cases. They are also diagnosed where constant expressions are required. + +```c +#include + +int main() { + int x = 1; + x = x << 32; + return 0; +} + +``` + +`Makefile`中,修改`CFLAGS`为`CFLAGS = --std=c11 -fsanitize=shift`。 + +```bash +$ make gcc +gcc --std=c11 -fsanitize=shift ./test.c -o target_gcc +./test.c: In function ‘main’: +./test.c:5:11: warning: left shift count >= width of type [-Wshift-count-overflow] + 5 | x = x << 32; + | ^~ +echo //target_gcc > output_gcc +./target_gcc>>output_gcc +test.c:5:11: runtime error: shift exponent 32 is too large for 32-bit type 'int' +``` + +```bash +$ make clang +clang --std=c11 -fsanitize=shift ./test.c -o target_clang +./test.c:5:11: warning: shift count >= width of type [-Wshift-count-overflow] + x = x << 32; + ^ ~~ +1 warning generated. +echo //target_clang > output_clang +./target_clang>>output_clang +test.c:5:11: runtime error: shift exponent 32 is too large for 32-bit type 'int' +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test.c:5:11 in +``` + +LLVM与GCC一样,只有左移会出问题(其他左移会出问题的例子我不全放了)。 + +如果加上-fsanitize=shift会检测并报告问题,否则认定为无异常正常做。 + +如果导致溢出:则溢出。 + +如果左移太多(比如int32左移32位):则无视此次操作。 + +如果负数左移:对其所对应的补码进行左移操作后再计算补码。 + +与GCC一致。 + +### 6. The sign of the remainder on integer division (C90 6.3.5). + +- 整数除法余数的符号。 + +- 在C11中已不是实现定义行为。 + +```c +#include +#include +#include + +int main(int argc, char *argv[]){ + int a1 = 5, b1 = 2; + int a2 = -5, b2 = 2; + int a3 = 5, b3 = -2; + int a4 = -5, b4 = -2; + + // 进行除法运算并输出结果 + printf("a1 / b1 = %d, a1 %% b1 = %d\n", a1 / b1, a1 % b1); + printf("a2 / b2 = %d, a2 %% b2 = %d\n", a2 / b2, a2 % b2); + printf("a3 / b3 = %d, a3 %% b3 = %d\n", a3 / b3, a3 % b3); + printf("a4 / b4 = %d, a4 %% b4 = %d\n", a4 / b4, a4 % b4); + + return 0; +} + +``` + +```c +//target_gcc +a1 / b1 = 2, a1 % b1 = 1 +a2 / b2 = -2, a2 % b2 = -1 +a3 / b3 = -2, a3 % b3 = 1 +a4 / b4 = 2, a4 % b4 = -1 +``` + +```c +//target_clang +a1 / b1 = 2, a1 % b1 = 1 +a2 / b2 = -2, a2 % b2 = -1 +a3 / b3 = -2, a3 % b3 = 1 +a4 / b4 = 2, a4 % b4 = -1 +``` + +GCC always follows the C99 and C11 requirement that the result of division is truncated towards zero. + +LLVM与GCC一致,都是向0截断,除法的结果总是趋向于零,余数的符号与被除数相同。 + +## 6. Floating Point + +### 1. The accuracy of the floating-point operations and of the library functions in [](https://port70.net/~nsz/c/c11/n1570.html#7.12) and [](https://port70.net/~nsz/c/c11/n1570.html#7.3) that return floating-point results ([5.2.4.2.2](https://port70.net/~nsz/c/c11/n1570.html#5.2.4.2.2)). + +- 浮点运算和返回浮点结果的 `` 和 `` 库函数的精度。 + +```c +#include +#include + +int main(int argc, char *argv[]){ + float f = 0.1f; + double d = 0.1; + long double ld = 0.1L; + + double x = 1.0; + double sin_x = sin(x); + double cos_x = cos(x); + double sqrt_x = sqrt(x); + + printf("f = %f\n", f); + printf("d = %f\n", d); + printf("ld = %Lf\n", ld); + printf("sin(%f) = %f\n", x, sin_x); + printf("cos(%f) = %f\n", x, cos_x); + printf("sqrt(%f) = %f\n", x, sqrt_x); +} +``` + +`Makefile`中,`FOLLOW_OPTION`改为`FOLLOW_OPTION = -lm`。 + +```c +//target_gcc +f = 0.100000 +d = 0.100000 +ld = 0.100000 +sin(1.000000) = 0.841471 +cos(1.000000) = 0.540302 +sqrt(1.000000) = 1.000000 +``` + +```c +//target_clang +f = 0.100000 +d = 0.100000 +ld = 0.100000 +sin(1.000000) = 0.841471 +cos(1.000000) = 0.540302 +sqrt(1.000000) = 1.000000 +``` + +其余实验省略。 + +与GCC一致。 + +### 2. The rounding behaviors characterized by non-standard values of FLT_ROUNDS ([5.2.4.2.2](https://port70.net/~nsz/c/c11/n1570.html#5.2.4.2.2)). + +- 由 FLT_ROUNDS 非标准值定义的舍入行为。 + +`FLT_ROUNDS` 是一个由C标准库定义和管理的宏,它的实现和使用是与标准库相关的,而不是与LLVM编译器直接相关。 + +GCC does not use such values. + +推测LLVM与GCC一致。 + +### 3. The evaluation methods characterized by non-standard negative values of FLT_EVAL_METHOD ([5.2.4.2.2](https://port70.net/~nsz/c/c11/n1570.html#5.2.4.2.2)). + +- 由 FLT_EVAL_METHOD 非标准负值定义的求值方法。 + +> The `__FLT_EVAL_METHOD__` is not defined as a traditional macro, and so it will not appear when dumping preprocessor macros. Instead, the value `__FLT_EVAL_METHOD__` expands to is determined at the point of expansion either from the value set by the `-ffp-eval-method` command line option or from the target. This is because the `__FLT_EVAL_METHOD__` macro cannot expand to the correct evaluation method in the presence of a `#pragma` which alters the evaluation method. An error is issued if `__FLT_EVAL_METHOD__` is expanded inside a scope modified by `#pragma clang fp eval_method`. + +暂时检索不到文档能证明LLVM/Clang 并不使用 `FLT_EVAL_METHOD` 的非标准负值。 + +推测LLVM与GCC一致。 + +### 4. The direction of rounding when an integer is converted to a floating-point number that cannot exactly represent the original value ([6.3.1.4](https://port70.net/~nsz/c/c11/n1570.html#6.3.1.4)). + +- 当整数被转换为浮点数时,如果浮点数无法精确表示原始值时的舍入方向。 + +```c +#include +#include +#include + +int main(){ + int large_int = 16777217; // 2^24 + 1,超出 float 的精度范围 + float f = (float)large_int; + printf("Integer: %d, as float: %.0f\n", large_int, f); + return 0; +} +``` + +```c +//target_gcc +Integer: 16777217, as float: 16777216 +``` + +```c +//target_clang +Integer: 16777217, as float: 16777216 +``` + +GCC支持C99 Annex F(至少意图上是想支持的),但是似乎LLVM并非如此,参见[clang sanitizer regards IEC 60559 floating-point division by zero as undefined · Issue #17374 · llvm/llvm-project (github.com)](https://github.com/llvm/llvm-project/issues/17374) + +但在这个问题上还是与GCC一致,也就是`round to nearest, ties to even`。 + +### 5. The direction of rounding when a floating-point number is converted to a narrower floating-point number ([6.3.1.5](https://port70.net/~nsz/c/c11/n1570.html#6.3.1.5)). + +- 当浮点数转换为较窄的浮点数时的舍入方向。 + +```c +#include +#include +#include + +int main() { + // 定义一些双精度浮点数 + double num1 = 1.1234567890123456789; + double num2 = 1.9876543210987654321; + double num3 = 1.9999999999999999999; + double num4 = 1.9999999999999999445; // 非常接近但略小于2的数 + double num5 = 2.0; + + // 将双精度浮点数转换为单精度浮点数,并输出结果 + float f1 = (float)num1; + float f2 = (float)num2; + float f3 = (float)num3; + float f4 = (float)num4; + float f5 = (float)num5; + + printf("num1: %.20f -> %.20f\n", num1, f1); + printf("num2: %.20f -> %.20f\n", num2, f2); + printf("num3: %.20f -> %.20f\n", num3, f3); + printf("num4: %.20f -> %.20f\n", num4, f4); + printf("num5: %.20f -> %.20f\n", num5, f5); + + return 0; +} +``` + +```c +//target_gcc +num1: 1.12345678901234569125 -> 1.12345683574676513672 +num2: 1.98765432109876538647 -> 1.98765432834625244141 +num3: 2.00000000000000000000 -> 2.00000000000000000000 +num4: 2.00000000000000000000 -> 2.00000000000000000000 +num5: 2.00000000000000000000 -> 2.00000000000000000000 +``` + +```c +//target_clang +num1: 1.12345678901234569125 -> 1.12345683574676513672 +num2: 1.98765432109876538647 -> 1.98765432834625244141 +num3: 2.00000000000000000000 -> 2.00000000000000000000 +num4: 2.00000000000000000000 -> 2.00000000000000000000 +num5: 2.00000000000000000000 -> 2.00000000000000000000 +``` + +同样是`round to nearest, ties to even`。 + +与GCC一致。 + +### 6. How the nearest representable value or the larger or smaller representable value immediately adjacent to the nearest representable value is chosen for certain floating constants ([6.4.4.2](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.2)). + +- 如何为某些浮点常量选择最近可表示值,或者选择紧邻最近可表示值的较大或较小的可表示值。 + +同样是`round to nearest, ties to even`。 + +与GCC一致。 + +### 7. Whether and how floating expressions are contracted when not disallowed by the FP_CONTRACT pragma ([6.5](https://port70.net/~nsz/c/c11/n1570.html#6.5)). + +- 当 FP_CONTRACT 编译指示未禁止时,是否以及如何收缩浮点表达式。 + +默认允许 + +[Clang Compiler User’s Manual — Clang 11 documentation (llvm.org)](https://releases.llvm.org/11.0.0/tools/clang/docs/UsersManual.html) + +> `-ffp-contract=` +> +> Specify when the compiler is permitted to form fused floating-point operations, such as fused multiply-add (FMA). Fused operations are permitted to produce more precise results than performing the same operations separately. +> +> The C standard permits intermediate floating-point results within an expression to be computed with more precision than their type would normally allow. This permits operation fusing, and Clang takes advantage of this by default. This behavior can be controlled with the `FP_CONTRACT` pragma. Please refer to the pragma documentation for a description of how the pragma interacts with this option. +> +> Valid values are: +> +> - `fast` (everywhere) +> - `on` (according to FP_CONTRACT pragma, default) +> - `off` (never fuse) + +与GCC不同,GCC没有实现`FP_CONTRACT` *pragma*。 + +### 8. The default state for the FENV_ACCESS pragma ([7.6.1](https://port70.net/~nsz/c/c11/n1570.html#7.6.1)). + +- FENV_ACCESS 编译指示的默认状态。 + +对于LLVM而言,`#pragma STDC FENV_ACCESS`是支持的[⚙ D69272 Enable '#pragma STDC FENV_ACCESS' in frontend (llvm.org)](https://reviews.llvm.org/D69272?id=227690)。 + +默认为off,如果启用了`-frounding-math`选项,LLVM会假设代码可能修改浮点环境中的舍入模式,这相当于隐式地开启了`FENV_ACCESS`。[Clang Compiler User’s Manual — Clang 11 documentation (llvm.org)](https://releases.llvm.org/11.0.0/tools/clang/docs/UsersManual.html) + +> The option -frounding-math forces the compiler to honor the dynamically-set rounding mode. This prevents optimizations which might affect results if the rounding mode changes or is different from the default; for example, it prevents floating-point operations from being reordered across most calls and prevents constant-folding when the result is not exactly representable. + +与GCC不一致。 + +### 9. Additional floating-point exceptions, rounding modes, environments, and classifications, and their macro names ([7.6](https://port70.net/~nsz/c/c11/n1570.html#7.6), [7.12](https://port70.net/~nsz/c/c11/n1570.html#7.12)). + +- 额外的浮点异常、舍入模式、环境和分类及其宏名称。 +- This is dependent on the implementation of the C library. + +### 10.The default state for the FP_CONTRACT pragma ([7.12.2](https://port70.net/~nsz/c/c11/n1570.html#7.12.2)). + +- FP_CONTRACT 编译指示的默认状态。 + +FP_CONTRACT编译指示符的默认状态是`on` + +> Specify when the compiler is permitted to form fused floating-point operations, such as fused multiply-add (FMA). Fused operations are permitted to produce more precise results than performing the same operations separately. +> +> The C standard permits intermediate floating-point results within an expression to be computed with more precision than their type would normally allow. This permits operation fusing, and Clang takes advantage of this by default. This behavior can be controlled with the `FP_CONTRACT` pragma. Please refer to the pragma documentation for a description of how the pragma interacts with this option. +> +> Valid values are: +> +> - `fast` (everywhere) +> - `on` (according to FP_CONTRACT pragma, default) +> - `off` (never fuse) + +与GCC不同,GCC没有实现`FP_CONTRACT` pragma。 + +### 11. Whether the “inexact” floating-point exception can be raised when the rounded result actually does equal the mathematical result in an IEC 60559 conformant implementation (C99 F.9). + +- 当舍入结果实际上等于符合 IEC 60559 的实现中的数学结果时,是否会引发“不精确”浮点异常。 +- 在C11中已不是实现定义行为。 +- This is dependent on the implementation of the C library. + +### 12. Whether the “underflow” (and “inexact”) floating-point exception can be raised when a result is tiny but not inexact in an IEC 60559 conformant implementation (C99 F.9). + +- 当结果很小但不精确时,在符合 IEC 60559 的实现中是否可以引发“下溢”(和“不精确”)浮点异常。 +- 在C11中已不是实现定义行为。 +- This is dependent on the implementation of the C library. + +## 7. Arrays and Pointers + +### 1. The result of converting a pointer to an integer or vice versa ([6.3.2.3](https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3)). + +- 将指针转换为整数或将整数转换为指针的结果。 + +> The ‘`inttoptr`’ instruction converts `value` to type `ty2` by applying either a zero extension or a truncation depending on the size of the integer `value`. If `value` is larger than the size of a pointer then a truncation is done. If `value` is smaller than the size of a pointer then a zero extension is done. If they are the same size, nothing is done (*no-op cast*). + +> The ‘`ptrtoint`’ instruction converts `value` to integer type `ty2` by interpreting the pointer value as an integer and either truncating or zero extending that value to the size of the integer type. If `value` is smaller than `ty2` then a zero extension is done. If `value` is larger than `ty2` then a truncation is done. If they are the same size, then nothing is done (*no-op cast*) other than a type change. + +与GCC基本一致。 + +### 2. The size of the result of subtracting two pointers to elements of the same array ([6.5.6](https://port70.net/~nsz/c/c11/n1570.html#6.5.6)). + +- 对同一数组中元素的两个指针进行减法运算的结果大小。 +- The value is as specified in the standard and the type is determined by the ABI. + +```c +#include +#include + +int main() { + int arr[10]; // 定义一个包含 10 个元素的数组 + int *ptr1 = &arr[2]; // 指向数组中第 3 个元素(索引为 2) + int *ptr2 = &arr[7]; // 指向数组中第 8 个元素(索引为 7) + + // 计算两个指针之间的差值 + ptrdiff_t diff = ptr2 - ptr1; + + // 打印指针差值和 ptrdiff_t 类型的大小 + printf("Difference between ptr2 and ptr1: %td\n", diff); + printf("Size of ptrdiff_t: %zu bytes\n", sizeof(ptrdiff_t)); + + return 0; +} +``` + +```c +//target_gcc +Difference between ptr2 and ptr1: 5 +Size of ptrdiff_t: 8 bytes +``` + +```c +//target_clang +Difference between ptr2 and ptr1: 5 +Size of ptrdiff_t: 8 bytes +``` + +在本实验环境下,结果表示这两个指针之间相隔的元素数量。 + +## 8. Hints + +### 1. The extent to which suggestions made by using the register storage-class specifier are effective ([6.7.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.1)). + +- 使用 `register` 存储类说明符的建议有效性的程度。 + +[-Wdeprecated-register](https://releases.llvm.org/4.0.0/tools/clang/docs/DiagnosticsReference.html#id143) + +在现代编译器中,`register` 的作用已经非常有限,除了在一些极端情况或特定平台下有特殊作用外,编译器的自动优化通常更为有效。 + +[Hints implementation (Using the GNU Compiler Collection (GCC))](https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Hints-implementation.html) + +总体而言,LLVM 和 GCC 类似,几乎不再依赖 `register` 说明符来进行优化。 + +### 2. The extent to which suggestions made by using the inline function specifier are effective ([6.7.4](https://port70.net/~nsz/c/c11/n1570.html#6.7.4)). + +- 使用 `inline` 函数说明符的建议有效性的程度。 + +LLVM在`-O0`情况下不会内联函数([⚙ D28053 Cleanup the handling of noinline function attributes, -fno-inline, -fno-inline-functions, -O0, and optnone. (llvm.org)](https://reviews.llvm.org/D28053))。 + +有`noinline`属性时不会内联(本质上 `clang -O0` now applies the noinline attribute everywhere. )。 + +其他情况也会导致不会内联。 + +下面是参考的网页: + +[Clang command line argument reference — Clang 20.0.0git documentation (llvm.org)](https://clang.llvm.org/docs/ClangCommandLineReference.html) + +[C++'s "inline" - how strong a hint is it for GCC and Clang/LLVM? - Stack Overflow](https://stackoverflow.com/questions/5223690/cs-inline-how-strong-a-hint-is-it-for-gcc-and-clang-llvm) + +[3.2. Inlining — Clang 20.0.0git documentation (llvm.org)](https://clang.llvm.org/docs/analyzer/developer-docs/IPA.html) + +大体上与GCC一致。 + +## 9. Structures, Unions, Enumerations, and Bit-Fields + +### 1. A member of a union object is accessed using a member of a different type (C90 6.3.2.3). + +- 使用不同类型的成员访问联合对象的成员。 +- 在C11已不是实现定义行为。 + +The relevant bytes of the representation of the object are treated as an object of the type used for the access. See Type-punning. + +[Unions — Mapping High Level Constructs to LLVM IR documentation (mapping-high-level-constructs-to-llvm-ir.readthedocs.io)](https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/basic-constructs/unions.html) + +[clang-tidy - cppcoreguidelines-pro-type-union-access — Extra Clang Tools 20.0.0git documentation (llvm.org)](https://clang.llvm.org/extra/clang-tidy/checks/cppcoreguidelines/pro-type-union-access.html) + +与GCC一致。 + +### 2. Whether a ''plain'' int bit-field is treated as a signed int bit-field or as an unsigned int bit-field ([6.7.2](https://port70.net/~nsz/c/c11/n1570.html#6.7.2), [6.7.2.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.1)). + +- "普通" int 位字段是被当作有符号 int 位字段还是无符号 int 位字段。 + +```c +#include + +struct Test { + signed int signed_bitfield : 4; // 明确声明为有符号的位域 + unsigned int unsigned_bitfield : 4; // 明确声明为无符号的位域 + int plain_bitfield : 4; // 不明确说明是有符号还是无符号,属于“plain int” +}; + +int main() { + struct Test t; + + // 赋值为负数,观察是否溢出(有符号)或者截断(无符号) + t.signed_bitfield = -5; + t.unsigned_bitfield = -5; + t.plain_bitfield = -5; + + // 打印结果 + printf("signed_bitfield (int): %d\n", t.signed_bitfield); // 预期输出:-5 + printf("unsigned_bitfield (unsigned int): %u\n", t.unsigned_bitfield); // 预期输出:5 + printf("plain_bitfield (plain int): %d\n", t.plain_bitfield); // 观察 GCC 和 LLVM 的行为 + + return 0; +} +``` + +```c +//target_gcc +signed_bitfield (int): -5 +unsigned_bitfield (unsigned int): 11 +plain_bitfield (plain int): -5 +``` + +```c +//target_clang +signed_bitfield (int): -5 +unsigned_bitfield (unsigned int): 11 +plain_bitfield (plain int): -5 +``` + +signed + +[⚙ D131255 Fix Wbitfield-constant-conversion on 1-bit signed bitfield (llvm.org)](https://reviews.llvm.org/D131255) + +与GCC一致。 + +### 3. Allowable bit-field types other than _Bool, signed int, and unsigned int ([6.7.2.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.1)). + +- 除了 `_Bool`、`signed int` 和 `unsigned int` 之外的可允许位字段类型。 + +Other integer types, such as `long int`, and enumerated types are permitted even in strictly conforming mode. + +见[llvm/llvm-project: The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. (github.com)](https://github.com/llvm/llvm-project)中`llvm/include/llvm/ADT/Bitfields.h` + +与GGC一致。 + +### 4. Whether atomic types are permitted for bit-fields ([6.7.2.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.1)). + +- 是否允许位字段使用原子类型。 + +```c +#include +#include + +_Atomic int flag : 1; + +int main() { + + return 0; +} +``` + +```bash +$ make gcc +gcc --std=c11 ./test.c -o target_gcc +./test.c:4:18: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘:’ token + 4 | _Atomic int flag : 1; + | ^ +make: *** [Makefile:16: gcc] Error 1 +``` + +```bash +$ make clang +clang --std=c11 ./test.c -o target_clang +./test.c:4:18: error: expected ';' after top level declarator +_Atomic int flag : 1; + ^ + ; +./test.c:4:20: error: expected identifier or '(' +_Atomic int flag : 1; + ^ +2 errors generated. +make: *** [Makefile:21: clang] Error 1 +``` + +LLVM 的类型系统非常严格,原子操作在 LLVM IR 中是通过特定的指令(如 `atomicrmw` 和 `cmpxchg`)实现的,这些指令要求操作的对象是一个完整的变量,而不是部分位(即位域)。因此,在 LLVM 中实现对位域的原子操作是非常复杂和不切实际的,这种设计可能导致 LLVM 不支持将原子类型用作位域。 + +都不可以,与GCC一致。 + +### 5. Whether a bit-field can straddle a storage-unit boundary ([6.7.2.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.1)). + +- 位字段是否可以跨越存储单元边界。 +- Determined by ABI. + +在本实验环境中: +```c +#include + +struct TestBitField { + unsigned int field1 : 7; // 7 bits + unsigned int field5 : 1; // 1 bit (total 8 bits, may fit in one byte) +}; + +int main() { + struct TestBitField test; + + printf("Size of struct: %lu bytes\n", sizeof(test)); + + test.field1 = 0xFF; // Set maximum value for 7 bits + //test.field5 = 0x1; // Set maximum value for 1 bit + + printf("field1: 0x%X\n", test.field1); + printf("field5: 0x%X\n", test.field5); + + return 0; +} +``` + +```c +//target_gcc +Size of struct: 4 bytes +field1: 0x7F +field5: 0x0 +``` + +```c +//target_clang +Size of struct: 4 bytes +field1: 0x7F +field5: 0x0 +``` + +根据该实验,在本实验环境中不可以跨越。 + +### 6. The order of allocation of bit-fields within a unit ([6.7.2.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.1)). + +- 位字段在单元内的分配顺序。 +- Determined by ABI. + +```c +#include + +struct TestBitField { + unsigned int field1 : 3; // 3 bits + unsigned int field2 : 4; // 4 bits + unsigned int field3 : 5; // 5 bits + unsigned int field4 : 6; // 6 bits + unsigned int field5 : 7; // 7 bits (total 25 bits, fits in one 32-bit unit) +}; + +void print_bits(unsigned int num) { + for (int i = 31; i >= 0; i--) { + printf("%d", (num >> i) & 1); + if (i % 8 == 0) { + printf(" "); // Add space for readability + } + } + printf("\n"); +} + +int main() { + struct TestBitField test = {0}; + + test.field1 = 5; // 3-bit field, 101 in binary + test.field2 = 10; // 4-bit field, 1010 in binary + test.field3 = 17; // 5-bit field, 10001 in binary + test.field4 = 33; // 6-bit field, 100001 in binary + test.field5 = 127; // 7-bit field, 1111111 in binary + + printf("Size of struct: %lu bytes\n", sizeof(test)); + + // Print the entire bit pattern of the structure + printf("Bit pattern of struct:\n"); + print_bits(*(unsigned int*)&test); + + printf("field1: %u\n", test.field1); + printf("field2: %u\n", test.field2); + printf("field3: %u\n", test.field3); + printf("field4: %u\n", test.field4); + printf("field5: %u\n", test.field5); + + return 0; +} +``` + +```c +//target_gcc +Size of struct: 4 bytes +Bit pattern of struct: +00000001 11111110 00011000 11010101 +field1: 5 +field2: 10 +field3: 17 +field4: 33 +field5: 127 +``` + +```c +//target_clang +Size of struct: 4 bytes +Bit pattern of struct: +00000001 11111110 00011000 11010101 +field1: 5 +field2: 10 +field3: 17 +field4: 33 +field5: 127 +``` + +在本实验环境下,GCC和Clang在处理位域的分配时,采用了相同的规则。位域从低位到高位分配,按照声明顺序从低位开始填充。 + +### 7. The alignment of non-bit-field members of structures ([6.7.2.1](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.1)). + +- 结构体中非位字段成员的对齐。 +- Determined by ABI. + +```c +#include +#include + +struct Test { + char c1; // 字符类型,占1个字节 + int i; // 整型,占4个字节 + char c2; // 字符类型,占1个字节 +}; + +int main() { + struct Test t; + + // 打印结构体 Test 的大小 + printf("Size of struct Test: %zu\n", sizeof(struct Test)); + + // 打印各个成员在结构体中的偏移量 + printf("Offset of c1: %zu\n", offsetof(struct Test, c1)); + printf("Offset of i: %zu\n", offsetof(struct Test, i)); + printf("Offset of c2: %zu\n", offsetof(struct Test, c2)); + + return 0; +} +``` + +```c +//target_gcc +Size of struct Test: 12 +Offset of c1: 0 +Offset of i: 4 +Offset of c2: 8 +``` + +```c +//target_clang +Size of struct Test: 12 +Offset of c1: 0 +Offset of i: 4 +Offset of c2: 8 +``` + +在本实验环境下,GCC与LLVM对齐方式一致。 + +### 8. The integer type compatible with each enumerated type ([6.7.2.2](https://port70.net/~nsz/c/c11/n1570.html#6.7.2.2)). + +- 与每个枚举类型兼容的整数类型 + +> An enum type is represented by an underlying integer type. The size of the integer type and whether it is signed is based on the range of values of the enumerated constants. +> +> By default, the c29clang uses the smallest possible byte size for the enumeration type. The underlying type is the first type in the following list in which all the enumerated constant values can be represented: *signed char*, *unsigned char*, *short*, *unsigned short*, *int*, *unsigned int*, *long*, *unsigned long*, *long long*, *unsigned long long*. This default behavior is equivalent to the effect of using the **c29clang** *-fshort-enums* option. +> +> In strict c89/c99/c11 mode, the compiler will limit enumeration constants to those values that fit in *int* or *unsigned int*. +> +> For C++ and gnuXX C dialects (relaxed c89/c99/c11), the compiler allows enumeration constants up to the largest integral type (64 bits). +> +> You can alter the default compiler behavior using the *-fno-short-enums* option. When the *-fno-short-enums* option is used in strict c89/c99/c11 mode, the enumeration type used to represent an *enum* will be *int*, even if the values of the enumeration constants fit into a smaller integer type. +> +> When the *fno-short-enums* option is used with C++ or gnuXX C dialects, the underlying enumeration type will be the first type in the following list in which all the enumerated constant values can be represented: *int*, *unsigned int*, *long*, *unsigned long*, *long long*, *unsigned long long*. + +[2.1. Data Types — C29 Clang Compiler Tools User's Guide (ti.com)](https://software-dl.ti.com/codegen/docs/c29clang/rel0_1_0_STS/compiler_manual/c_cpp_language_implementation/data_types.html) + +大致与GCC一致,注意到clang同样有option: + +> - **-fshort-enums**, **-fno-short-enums** +> +> +> +> Allocate to an enum type only as many bytes as it needs for the declared range of possible values + +## 10. Qualifiers + +### 1. What constitutes an access to an object that has volatile-qualified type ([6.7.3](https://port70.net/~nsz/c/c11/n1570.html#6.7.3)). + +- 访问具有 `volatile` 限定类型的对象的行为。 + +> Certain memory accesses, such as [load](https://llvm.org/docs/LangRef.html#i-load)’s, [store](https://llvm.org/docs/LangRef.html#i-store)’s, and [llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be marked `volatile`. The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers *may* change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior. +> +> A volatile load or store may have additional target-specific semantics. Any volatile operation can have side effects, and any volatile operation can read and/or modify state which is not accessible via a regular load or store in this module. Volatile operations may use addresses which do not point to memory (like MMIO registers). This means the compiler may not use a volatile operation to prove a non-volatile access to that address has defined behavior. +> +> The allowed side-effects for volatile accesses are limited. If a non-volatile store to a given address would be legal, a volatile operation may modify the memory at that address. A volatile operation may not modify any other memory accessible by the module being compiled. A volatile operation may not call any code in the current module. +> +> In general (without target specific context), the address space of a volatile operation may not be changed. Different address spaces may have different trapping behavior when dereferencing an invalid pointer. +> +> The compiler may assume execution will continue after a volatile operation, so operations which modify memory or may have undefined behavior can be hoisted past a volatile operation. +> +> As an exception to the preceding rule, the compiler may not assume execution will continue after a volatile store operation. This restriction is necessary to support the somewhat common pattern in C of intentionally storing to an invalid pointer to crash the program. In the future, it might make sense to allow frontends to control this behavior. +> +> IR-level volatile loads and stores cannot safely be optimized into llvm.memcpy or llvm.memmove intrinsics even when those intrinsics are flagged volatile. Likewise, the backend should never split or merge target-legal volatile load/store instructions. Similarly, IR-level volatile loads and stores cannot change from integer to floating-point or vice versa. +> +> Rationale +> +> Platforms may rely on volatile loads and stores of natively supported data width to be executed as single instruction. For example, in C this holds for an l-value of volatile primitive type with native hardware support, but not necessarily for aggregate types. The frontend upholds these expectations, which are intentionally unspecified in the IR. The rules above ensure that IR transformations do not violate the frontend’s contract with the language. +> +> [LLVM Language Reference Manual — LLVM 20.0.0git documentation](https://llvm.org/docs/LangRef.html#volatile-memory-accesses) + +与GCC大致一致。 + +## 11. Declarators + +### 1. The maximum number of declarators that may modify an arithmetic, structure or union type (C90 6.5.4). + +- 可修改算术类型、结构或联合类型的声明符的最大数量。 +- 在C11已不是实现定义行为。 + +与GCC一致,仅受限于硬件资源。 + +## 12. Statements + +### 1. The maximum number of `case` values in a `switch` statement (C90 6.6.4.2). + +- `switch` 语句中 `case` 值的最大数量。 +- 在C11已不是实现定义行为。 + +[class SwitchInst: LLVM/Clang 15.x documentation (hdoc.io)](https://docs.hdoc.io/hdoc/llvm-project/r1A6F1C03A3F0DA37.html) + +LLVM的`SwitchInst`类允许动态添加`case`值,并且这个数量仅受系统内存的限制,而不是编译器本身的限制。 + +与GCC一致。 + +## 13. Preprocessing Directives + +该分类下的实现定义行为无法通过实验开展对比。 + +同时,GCC的文档对此处的实现定义行为极其语焉不详([Implementation-defined behavior (The C Preprocessor) (gnu.org)](https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html#Implementation-defined-behavior)) + +LLVM文档中也没有对这些行为的明确说明,故仅能通过 +[llvm-project/clang/lib/Lex/Preprocessor.cpp at main · llvm/llvm-project (github.com)](https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/Preprocessor.cpp) +[clang: clang::Preprocessor Class Reference (llvm.org)](https://clang.llvm.org/doxygen/classclang_1_1Preprocessor.html) +进行阅读和对比。 + +LLVM与GCC大致一致 + +## 14. Library Functions + +The behavior of most of these points are dependent on the implementation of the C library. + +故大部分均跳过(与GCC文档保持一致)。 + +### 1. The null pointer constant to which the macro NULL expands ([7.19](https://port70.net/~nsz/c/c11/n1570.html#7.19)). + +- `NULL` 宏扩展到的空指针常量是什么。 + +GCC:In ``, `NULL` expands to `((void *)0)`. GCC does not provide the other headers which define `NULL` and some library implementations may use other definitions in those headers. + +LLVM:In ``, `NULL` expands to `((void *)0)`.并且在LLVM/clang的实现里也没有其他的宏定义。 + +与GCC一致。 + +## 15. Architecture + +### 1. The values or expressions assigned to the macros specified in the headers [](https://port70.net/~nsz/c/c11/n1570.html#7.7), [](https://port70.net/~nsz/c/c11/n1570.html#7.10), and [](https://port70.net/~nsz/c/c11/n1570.html#7.20) ([5.2.4.2](https://port70.net/~nsz/c/c11/n1570.html#5.2.4.2), [7.20.2](https://port70.net/~nsz/c/c11/n1570.html#7.20.2), [7.20.3](https://port70.net/~nsz/c/c11/n1570.html#7.20.3)). + +- 分配给 ``、`` 和 `` 头文件中指定宏的值或表达式。 + +- Determined by ABI. + +### 2. The result of attempting to indirectly access an object with automatic or thread storage duration from a thread other than the one with which it is associated ([6.2.4](https://port70.net/~nsz/c/c11/n1570.html#6.2.4)). + +- 从与对象关联的线程以外的线程间接访问具有自动或线程存储期限的对象的结果。 + +[LLVM Atomic Instructions and Concurrency Guide — LLVM 20.0.0git documentation](https://llvm.org/docs/Atomics.html) + +> If you are writing a frontend which uses this directly, use with caution. Acquire only provides a semantic guarantee when paired with a Release operation. + +总体而言还是支持的,但是要遵循同步要求。 + +与GCC一致。 + +### 3. The number, order, and encoding of bytes in any object (when not explicitly specified in this International Standard) ([6.2.6.1](https://port70.net/~nsz/c/c11/n1570.html#6.2.6.1)). + +- 任何对象的字节数、顺序和编码(当国际标准未明确规定时)。 +- Determined by ABI. + +### 4. Whether any extended alignments are supported and the contexts in which they are supported ([6.2.8](https://port70.net/~nsz/c/c11/n1570.html#6.2.8)). + +- 是否支持任何扩展对齐及其支持的上下文。 +- Determined by ABI. + +### 5. Valid alignment values other than those returned by an _Alignof expression for fundamental types, if any ([6.2.8](https://port70.net/~nsz/c/c11/n1570.html#6.2.8)). + +- 除基本类型的 `_Alignof` 表达式返回的对齐值之外的有效对齐值(如果有) + +> The ‘`alloca`’ instruction allocates `sizeof()*NumElements` bytes of memory on the runtime stack, returning a pointer of the appropriate type to the program. If “NumElements” is specified, it is the number of elements allocated, otherwise “NumElements” is defaulted to be one. +> +> If a constant alignment is specified, the value result of the allocation is guaranteed to be aligned to at least that boundary. The alignment may not be greater than `1 << 32`. +> +> The alignment is only optional when parsing textual IR; for in-memory IR, it is always present. If not specified, the target can choose to align the allocation on any convenient boundary compatible with the type. +> +> ‘`type`’ may be any sized type. +> +> Structs containing scalable vectors cannot be used in allocas unless all fields are the same scalable vector type (e.g. `{, }` contains the same type while `{, }` doesn’t). + +[LLVM Language Reference Manual — LLVM 20.0.0git documentation](https://llvm.org/docs/LangRef.html) + +与GCC一致,也是要2的幂次,并且小于`1 << 32` + +### 6. The value of the result of the sizeof and _Alignof operators ([6.5.3.4](https://port70.net/~nsz/c/c11/n1570.html#6.5.3.4)). + +- `sizeof` 和 `_Alignof` 运算符的结果值。 +- Determined by ABI. + +## 16. Locale-Specific Behavior + +The behavior of these points are dependent on the implementation of the C library, and are not defined by GCC itself. + +故在此文档中跳过。 \ No newline at end of file diff --git a/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md b/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md new file mode 100644 index 0000000..99f7bba --- /dev/null +++ b/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md @@ -0,0 +1,162 @@ +## Implementation-defined Behaviors + +## 1. Conditionally-Supported Behavior + +Each implementation shall include documentation that identifies all conditionally-supported constructs that it does not support. + +故为了本文档与GCC文档保持一致,仅实验GCC文档中提及的implementation-defined Behaviors. + +### 1.1 Whether an argument of class type with a non-trivial copy constructor or destructor can be passed to ...(C++0x 5.2.2). + +- 具有非平凡复制构造函数或析构函数的类类型的参数是否可以传递给... + +GCC可以,LLVM不行,与GCC不一致 + +```cpp +#include +#include + +class NonTrivialClass { + public: + int value; + + NonTrivialClass(int v) : value(v) { std::cout << "Constructor called\n"; } + + // 自定义复制构造函数,使类成为非平凡类 + NonTrivialClass(const NonTrivialClass &other) { + value = other.value; + std::cout << "Copy constructor called\n"; + } + + // 自定义析构函数 + ~NonTrivialClass() { + std::cout << "Destructor called\n"; + } +}; + +void variadicFunction(int num, ...) { + va_list args; + va_start(args, num); + + for (int i = 0; i < num; ++i) { + // 尝试提取 NonTrivialClass 类型的参数 + NonTrivialClass obj = va_arg(args, NonTrivialClass); + std::cout << "Extracted value: " << obj.value << "\n"; + } + + va_end(args); +} + +int main() { + NonTrivialClass obj1(1); + variadicFunction(1, obj1); + return 0; +} +``` + +```bash +$ make gcc +g++ --std=c++11 -Wall -Wno-varargs ./test.cpp -o target_gcc +echo //target_gcc > output_gcc +./target_gcc>>output_gcc + +$ make clang +clang++ --std=c++11 -Wall -Wnon-pod-varargs ./test.cpp -o target_clang +./test.cpp:28:44: error: second argument to 'va_arg' is of non-POD type 'NonTrivialClass' [-Wnon-pod-varargs] + NonTrivialClass obj = va_arg(args, NonTrivialClass); + ^~~~~~~~~~~~~~~ +/usr/lib/llvm-14/lib/clang/14.0.0/include/stdarg.h:19:50: note: expanded from macro 'va_arg' +#define va_arg(ap, type) __builtin_va_arg(ap, type) + ^~~~ +./test.cpp:37:25: error: cannot pass object of non-trivial type 'NonTrivialClass' through variadic function; call will abort at runtime [-Wnon-pod-varargs] + variadicFunction(1, obj1); + ^ +2 errors generated. +make: *** [Makefile:21: clang] Error 1 +``` + +```cpp +//target_gcc +Constructor called +Copy constructor called +Copy constructor called +Extracted value: 1 +Destructor called +Destructor called +Destructor called +``` + +在本实验环境中,GCC可以,LLVM不行,与GCC一致。 + +## 2. Exception Handling + +### 1. In the situation where no matching handler is found, it is implementation-defined whether or not the stack is unwound before std::terminate() is called.(C++98 15.5.1). + +- 在未找到匹配的处理程序的情况下,在调用 std::terminate() 之前是否展开堆栈由实现定义。 + +```cpp +#include +#include + +class TestClass { + public: + TestClass() { + std::cout << "TestClass Constructor\n"; + } + ~TestClass() { + std::cout << "TestClass Destructor\n"; + } +}; + +void throwException() { + TestClass obj; + std::cout << "Throwing exception...\n"; + throw std::runtime_error("Uncaught exception"); +} + +int main() { + try { + throwException(); + } catch (int e) { + std::cout << "Caught an integer exception\n"; + } + // No catch for std::runtime_error, std::terminate will be called + return 0; +} +``` + +```bash +$ make gcc +g++ -std=c++11 -Wall ./test.cpp -o target_gcc +echo //target_gcc > output_gcc +./target_gcc>>output_gcc +terminate called after throwing an instance of 'std::runtime_error' + what(): Uncaught exception +Aborted +make: *** [Makefile:18: gcc] Error 134 + +$ make clang +clang++ -std=c++11 -Wall ./test.cpp -o target_clang +./echo //target_clang > output_clang +./target_clang>>output_clang +terminate called after throwing an instance of 'std::runtime_error' + what(): Uncaught exception +Aborted +make: *** [Makefile:23: clang] Error 134 + +$ ./target_gcc +TestClass Constructor +Throwing exception... +terminate called after throwing an instance of 'std::runtime_error' + what(): Uncaught exception +Aborted + +$ ./target_clang +TestClass Constructor +Throwing exception... +terminate called after throwing an instance of 'std::runtime_error' + what(): Uncaught exception +Aborted +``` + +栈不会展开,和GCC一致。 \ No newline at end of file diff --git a/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md b/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md new file mode 100644 index 0000000..c15738d --- /dev/null +++ b/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md @@ -0,0 +1,93 @@ +测试环境为Ubuntu 22.04.4 LTS,GCC版本为11.4.0,LLVM版本为14.0.0,Clang版本为14.0.0-1ubuntu1.1,glibc版本为2.35。 + +对于implementation-defined behaviours而言,有很多是由the implementation of the C library以及ABI决定的,在本梳理中选择跳过处理。 + +测试C所用的`Makefile`如下: + +```makefile +TESTFILE = ./test.c + +TARGET_GCC = target_gcc +TARGET_CLANG = target_clang +OUTPUT_GCC = output_gcc +OUTPUT_CLANG = output_clang + +CAT = cat +GXX = gcc +CLANG = clang +CFLAGS = --std=c11 #-fsanitize=shift +FOLLOW_OPTION = #-lm +PRE_PROCESS_CFLAGS = $(CFLAGS) -E + +gcc:$(TESTFILE) + $(GXX) $(CFLAGS) $(TESTFILE) -o $(TARGET_GCC) $(FOLLOW_OPTION) + echo //$(TARGET_GCC) > $(OUTPUT_GCC) + ./$(TARGET_GCC)>>$(OUTPUT_GCC) + +clang:$(TESTFILE) + $(CLANG) $(CFLAGS) $(TESTFILE) -o $(TARGET_CLANG) $(FOLLOW_OPTION) + echo //$(TARGET_CLANG) > $(OUTPUT_CLANG) + ./$(TARGET_CLANG)>>$(OUTPUT_CLANG) + +preprocess_gcc:$(TESTFILE) + $(GXX) $(PRE_PROCESS_CFLAGS) $(TESTFILE) -o $(TARGET_GCC) $(FOLLOW_OPTION) + $(CAT) $(TARGET_GCC) > $(OUTPUT_GCC) + +preprocess_clang:$(TESTFILE) + $(CLANG) $(PRE_PROCESS_CFLAGS) $(TESTFILE) -o $(TARGET_CLANG) $(FOLLOW_OPTION) + $(CAT) $(TARGET_CLANG) > $(OUTPUT_CLANG) + +clean: + rm -f $(TARGET_GCC) $(TARGET_CLANG) $(OUTPUT_GCC) $(OUTPUT_CLANG) + +test: clean gcc clang +preprocess_test: preprocess_gcc preprocess_clang + +.PHONY: clean test preprocess_test +``` + +测试CPP所用的`Makefile`如下: + + +```makefile +TESTFILE = ./test.c + +TARGET_GCC = target_gcc +TARGET_CLANG = target_clang +OUTPUT_GCC = output_gcc +OUTPUT_CLANG = output_clang + +CAT = cat +GXX = g++ +CLANG = clang++ +CFLAGS = --std=c++11 -Wall -Wno-varargs +FOLLOW_OPTION = #-lm +PRE_PROCESS_CFLAGS = $(CFLAGS) -E + +gcc:$(TESTFILE) + $(GXX) $(CFLAGS) $(TESTFILE) -o $(TARGET_GCC) $(FOLLOW_OPTION) + echo //$(TARGET_GCC) > $(OUTPUT_GCC) + ./$(TARGET_GCC)>>$(OUTPUT_GCC) + +clang:$(TESTFILE) + $(CLANG) $(CFLAGS) $(TESTFILE) -o $(TARGET_CLANG) $(FOLLOW_OPTION) + echo //$(TARGET_CLANG) > $(OUTPUT_CLANG) + ./$(TARGET_CLANG)>>$(OUTPUT_CLANG) + +preprocess_gcc:$(TESTFILE) + $(GXX) $(PRE_PROCESS_CFLAGS) $(TESTFILE) -o $(TARGET_GCC) $(FOLLOW_OPTION) + $(CAT) $(TARGET_GCC) > $(OUTPUT_GCC) + +preprocess_clang:$(TESTFILE) + $(CLANG) $(PRE_PROCESS_CFLAGS) $(TESTFILE) -o $(TARGET_CLANG) $(FOLLOW_OPTION) + $(CAT) $(TARGET_CLANG) > $(OUTPUT_CLANG) + +clean: + rm -f $(TARGET_GCC) $(TARGET_CLANG) $(OUTPUT_GCC) $(OUTPUT_CLANG) + +test: clean gcc clang +preprocess_test: preprocess_gcc preprocess_clang + +.PHONY: clean test preprocess_test +``` + -- Gitee From 0b37da26b08aff893a88b937135865b9c382c1b0 Mon Sep 17 00:00:00 2001 From: yang-youwen <1483224757@qq.com> Date: Mon, 28 Oct 2024 22:18:28 +0800 Subject: [PATCH 2/2] =?UTF-8?q?=E6=9B=B4=E6=96=B0=E5=AE=9E=E7=8E=B0?= =?UTF-8?q?=E5=AE=9A=E4=B9=89=E8=A1=8C=E4=B8=BA=E6=96=87=E6=A1=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../implementation-defined behaviours/C11.md | 158 +++++++++++++++++- .../CPP14.md | 8 +- .../readme.md | 6 + 3 files changed, 167 insertions(+), 5 deletions(-) diff --git a/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md b/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md index 4f96e67..b6f04f5 100644 --- a/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md +++ b/Compiler Features/GCC_LLVM/implementation-defined behaviours/C11.md @@ -440,7 +440,7 @@ make: *** [Makefile:20: clang] Error 1 - signed char 和 unsigned char 中哪个与“普通” char 具有相同范围、表示和行为。 - Determined by ABI. - - 对于本测试环境而言,为signed char。 + - 对于本测试环境而言,GCC和Clang均为signed char。 ```c #include @@ -1051,7 +1051,141 @@ char32_t: 29579 > **GCC Extension for 128-bit Integers** GCC provides a set of nonstandard integer types as an extension, for example `__int128`. This type is not defined by the C or C++ standards but is supported on some platforms, offering extended precision for integer arithmetic. It is only supported on platforms that provide a `__int128` hardware or software support. -同时在clang里也没有相关的描述,所以认定llvm与gcc一致,都没有。 +在LLVM-PROJECT/libcxx/include/stdint.h中: +```c +// -*- C++ -*- +//===----------------------------------------------------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#ifndef _LIBCPP_STDINT_H +// AIX system headers need stdint.h to be re-enterable while _STD_TYPES_T +// is defined until an inclusion of it without _STD_TYPES_T occurs, in which +// case the header guard macro is defined. +#if !defined(_AIX) || !defined(_STD_TYPES_T) +# define _LIBCPP_STDINT_H +#endif // _STD_TYPES_T + +/* + stdint.h synopsis + +Macros: + + INT8_MIN + INT16_MIN + INT32_MIN + INT64_MIN + + INT8_MAX + INT16_MAX + INT32_MAX + INT64_MAX + + UINT8_MAX + UINT16_MAX + UINT32_MAX + UINT64_MAX + + INT_LEAST8_MIN + INT_LEAST16_MIN + INT_LEAST32_MIN + INT_LEAST64_MIN + + INT_LEAST8_MAX + INT_LEAST16_MAX + INT_LEAST32_MAX + INT_LEAST64_MAX + + UINT_LEAST8_MAX + UINT_LEAST16_MAX + UINT_LEAST32_MAX + UINT_LEAST64_MAX + + INT_FAST8_MIN + INT_FAST16_MIN + INT_FAST32_MIN + INT_FAST64_MIN + + INT_FAST8_MAX + INT_FAST16_MAX + INT_FAST32_MAX + INT_FAST64_MAX + + UINT_FAST8_MAX + UINT_FAST16_MAX + UINT_FAST32_MAX + UINT_FAST64_MAX + + INTPTR_MIN + INTPTR_MAX + UINTPTR_MAX + + INTMAX_MIN + INTMAX_MAX + + UINTMAX_MAX + + PTRDIFF_MIN + PTRDIFF_MAX + + SIG_ATOMIC_MIN + SIG_ATOMIC_MAX + + SIZE_MAX + + WCHAR_MIN + WCHAR_MAX + + WINT_MIN + WINT_MAX + + INT8_C(value) + INT16_C(value) + INT32_C(value) + INT64_C(value) + + UINT8_C(value) + UINT16_C(value) + UINT32_C(value) + UINT64_C(value) + + INTMAX_C(value) + UINTMAX_C(value) + +*/ + +#include <__config> + +#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) +# pragma GCC system_header +#endif + +/* C99 stdlib (e.g. glibc < 2.18) does not provide macros needed + for C++11 unless __STDC_LIMIT_MACROS and __STDC_CONSTANT_MACROS + are defined +*/ +#if defined(__cplusplus) && !defined(__STDC_LIMIT_MACROS) +# define __STDC_LIMIT_MACROS +#endif +#if defined(__cplusplus) && !defined(__STDC_CONSTANT_MACROS) +# define __STDC_CONSTANT_MACROS +#endif + +#if __has_include_next() +# include_next +#endif + +#endif // _LIBCPP_STDINT_H + +``` + +课件LLVM 自身并没有直接定义或扩展出超出标准C/C++整数类型以外的“扩展整数类型”。在 C 和 C++ 标准中,`stdint.h` 定义了一些标准整数类型,比如 `int8_t`、`int16_t`、`int32_t`、`int64_t` 以及它们的无符号版本,除此之外还有一些专门为某些特定位宽设计的类型,比如 `int_least8_t`、`int_fast16_t` 等。它本质上只是做了一些封装,以确保系统环境中能够正确包含并使用标准的 `stdint.h`,并且根据一些宏和平台要求提供额外的兼容性处理。 + +故GCC与LLVM一致,均没有。 ### 2. Whether signed integer types are represented using sign and magnitude, two's complement, or ones' complement, and whether the extraordinary value is a trap representation or an ordinary value ([6.2.6.2](https://port70.net/~nsz/c/c11/n1570.html#6.2.6.2)). @@ -1641,7 +1775,7 @@ unsigned_bitfield (unsigned int): 11 plain_bitfield (plain int): -5 ``` -signed +被视为signed int [⚙ D131255 Fix Wbitfield-constant-conversion on 1-bit signed bitfield (llvm.org)](https://reviews.llvm.org/D131255) @@ -2032,4 +2166,20 @@ LLVM:In ``, `NULL` expands to `((void *)0)`.并且在LLVM/clang的实 The behavior of these points are dependent on the implementation of the C library, and are not defined by GCC itself. -故在此文档中跳过。 \ No newline at end of file +故在此文档中跳过。 + + + +## 17. 总结 + +对于大部分实现定义行为而言,GCC和LLVM行为均一致。 + +以下实现定义行为二者不一致: + +1. The mapping between physical source file multibyte characters and the source character set in translation phase 1 ([5.1.1.2](https://port70.net/~nsz/c/c11/n1570.html#5.1.1.2)). +2. The value of a char object into which has been stored any character other than a member of the basic execution character set ([6.2.5](https://port70.net/~nsz/c/c11/n1570.html#6.2.5)). +3. The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character ([6.4.4.4](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4)). +4. The value of a wide character constant containing more than one multibyte character or a single multibyte character that maps to multiple members of the extended execution character set, or containing a multibyte character or escape sequence not represented in the extended execution character set ([6.4.4.4](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.4)). +5. The current locale used to convert a wide string literal into corresponding wide character codes ([6.4.5](https://port70.net/~nsz/c/c11/n1570.html#6.4.5)). +6. The default state for the FENV_ACCESS pragma ([7.6.1](https://port70.net/~nsz/c/c11/n1570.html#7.6.1)). +7. The default state for the FP_CONTRACT pragma ([7.12.2](https://port70.net/~nsz/c/c11/n1570.html#7.12.2)). \ No newline at end of file diff --git a/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md b/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md index 99f7bba..a91c9a3 100644 --- a/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md +++ b/Compiler Features/GCC_LLVM/implementation-defined behaviours/CPP14.md @@ -159,4 +159,10 @@ terminate called after throwing an instance of 'std::runtime_error' Aborted ``` -栈不会展开,和GCC一致。 \ No newline at end of file +栈不会展开,和GCC一致。 + + + +## 3. 总结 + +对于所有实现定义行为而言,GCC和LLVM行为均一致。 \ No newline at end of file diff --git a/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md b/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md index c15738d..5b4cbbb 100644 --- a/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md +++ b/Compiler Features/GCC_LLVM/implementation-defined behaviours/readme.md @@ -1,3 +1,9 @@ +llvm和gcc是当下主流的C、C++编译器。应用使用不同编译器编译构建,可能遇到编译或者运行的兼容性问题。产生问题的主要原因有二种:1. 代码存在未定义行为,未遵循C/C++语言标准;2. 编译器实现的差别,又分为unspecified behavior 和 implementation defined behavior两种。 unspecified behavior:2种及以上的实现方式,编译器可以选择一种实现; implementation defined behavior 没有明确规定的实现方式,由编译器自定义实现。 + +C/C++语言标准里已明确定义和列举了undefined behavior、unspecified behavior、implementation defined behavior,明确llvm和gcc在此类行为方面的不同,就能提前预警编译器间的兼容性问题。 + + + 测试环境为Ubuntu 22.04.4 LTS,GCC版本为11.4.0,LLVM版本为14.0.0,Clang版本为14.0.0-1ubuntu1.1,glibc版本为2.35。 对于implementation-defined behaviours而言,有很多是由the implementation of the C library以及ABI决定的,在本梳理中选择跳过处理。 -- Gitee