# gb18030 **Repository Path**: ACleverDisguise/gb18030 ## Basic Information - **Project Name**: gb18030 - **Description**: A Logtalk pack for GB18030 and GBK encodings conformant to the - **Primary Language**: Unknown - **License**: WTFPL - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2026-05-25 - **Last Updated**: 2026-05-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # GB18030 codec support for Logtalk Any business wishing their software to be legal in China *must* support GB18030 (2022) encoding. As such, any Logtalk programmer wishing to do business in China will need GB18030 encoding and decoding support. This suite of files contains all the information necessary to use GB18030 encoding in Logtalk, and to modify it should bugs be found or performance increases be desired. These are divided into: - mapping files sourced from the GB18030 (2022) specification itself, with a Lua script to convert them - a support program to convert a particularly troublesome transformation from a linear search to a tree search - the GB18030 codec itself - a test suite The quickest way, however, to build and test the system is to just execute `logtalk -g "{compile},{loader},{tester},halt"`. If this succeeds with no failure messages you have the two files, `gb18030_2byte_index_tree.lgt` and `gb18030.lgt` that you need for distribution and use. ***IMPORTANT NOTE:*** *Due to a complete misunderstanding of WHATWG v1.n of this library in actuality conformed to GB18030-2005 instead of GB18030-2022. For people using this for serious work within China's regulatory environment, it is important to use v2.n of this library in which all of the indices are now based on the actual standard's BMP and SMP mapping files instead of WHATWG's version.* ## Mapping files The mapping files are sourced from the GB18030 (2022) standard itself. **`GB18030-2022MappingTableBMP.txt`**: This file maps the BMP code points into GB18030 (2022) encoding. **`GB18030-2022MappingTableSMP.txt`**: This file maps the SMP code points into GB18030 (2022) encoding. **`generate.lua`**: This file is a script that will read the above files and output the necessary data tables ready for insertion into their appropriate spaces in the `gb18030_2byte_index_list.lgt` and `gb18030.lgt` files. Generally it should not be necessary to touch these files except if: 1. There is a bug in the generated data tables that slipped past testing and needs fixing. 2. The standard is updated with newer mapping tables. 3. The user is masochistic and really likes editing absurdly long tables in Logtalk source code. ## Support program The program `gb18030_index_compiler.lgt` is used to take the manually-created `gb18030_2byte_index_list.lgt` (formed by `generate.lua` above) and brought in via `:- include/1`) and turn its contents into `gb18030_2byte_index_tree.lgt`. This latter contains two mappings: one a mapping from code point to the 2-byte encoding of the standard (`gb18030_2byte_index_tree_code_pointer`) and the other a mapping from the encoding to the code point (`gb18030_2byte_index_tree_pointer_code`). Running this program is best done using `logtalk -g '{compile},halt'`. ## The codec The file `gb18030.lgt` is the user-facing module which uses `:- include/1` to bring in the generated mapping trees from `gb18030_2byte_index_tree.lgt` and then implements the encoding and decoding as per the WhatWG algorithm but with the updated standard's mapping tables. It implements the `character_set_protocol` interface from the standard Logtalk library. Once located in a project or in a library search path, the `loader.lgt` file ensures that appropriate prerequisites are provided. ## A test suite The file `tester.lgt`, paired with the files `tests.lgt` (unit testing) and `properties.lgt` (QuickCheck property testing), is best executed using `logtalk -g '{tester},halt'`. It runs a comprehensive test suite that checks a variety of conditions with 100% clause coverage as well as random stress testing of the codec for ASCII pass-through, round-trip encoding/decoding, and solidity in random "encoding" strings without crashing the decoder.