diff --git a/README.en.md b/README.en.md index 6449f4e88114fcac3ad90ee09de4aea0ab18e7ef..31501c7b6d4d2a94bac0b8ae7d1721ddcd53495d 100644 --- a/README.en.md +++ b/README.en.md @@ -53,14 +53,15 @@ The `bin/splitter-docker.sh` script will automatically pull and use the official SPLITTER_VERSION=1.0.3 ./bin/splitter-docker.sh cut -r 24.03-LTS -a x86_64 -o ./output python3_standard ``` -## Usage Instructions +## Core Concepts -### slice releases +The core of the splitter tool is the Slice Definition File (SDF). This section aims to help you understand SDFs and how they are generated. -The generation of openEuler slices depends on the package splitting rules defined in the **Slice Definition File (SDF)**. SDFs are released separately based on the openEuler version to which the slice belongs. All SDFs exist as yaml files in [slice-releases](https://gitee.com/openeuler/slice-releases). The [slice-releases](https://gitee.com/openeuler/slice-releases) repository uses branch names to represent different openEuler versions. +### Slice Definition File (SDF) -Taking the SDF for python3.11 (named `python3.11.yaml`) as an example: +The generation of openEuler slices depends on the package splitting rules defined in the **Slice Definition File (SDF)**. An SDF file, in YAML format, precisely defines how an RPM package is to be broken down into multiple functionally independent and combinable "Slices". +Taking the SDF for python3.11 (named `python3.11.yaml`) as an example: ```yaml package: python3 @@ -121,10 +122,56 @@ slices: common: /usr/share/licenses/python3/LICENSE: ``` - In the SDF above, i.e., the `python3.yaml` file, `slices` indicates that the `python3` package is split into four slices: `python3_core`, `python3_standard`, `python3_utils`, and `python3_copyright`, along with the file contents these slices contain (for details, please see [slice-releases](https://gitee.com/openeuler/slice-releases)). -### Generating slices with splitter +### Current State of SDFs + +Currently, all official SDF files are **manually written** by community experts and are released separately according to their respective openEuler versions. All SDFs exist as YAML files in the [slice-releases](https://gitee.com/openeuler/slice-releases) repository, which uses branch names to represent different openEuler versions. + +As the openEuler ecosystem continues to grow, the number of packages and versions increases, making the manual creation and maintenance of SDFs a growing bottleneck. + +### Automated SDF Generation + +To address the inefficiency of manually writing SDFs, splitter introduces the `gen` command. This command's automated pipeline aims to simulate the analysis process of experts to generate a high-quality SDF draft for any given RPM package. + +![sdf-generation-workflow](./docs/pictures/sdf-generation-workflow-en.png) + +Its automated workflow is divided into the following stages: + +1. **Download**: Fetches the target RPM package. +2. **Extract**: Extracts all files from the package into a temporary directory. +3. **File Classification**: Categorizes each file into its corresponding Slice. +4. **Dependency Analysis**: Identifies the dependencies between various Slices. +5. **Write**: Writes the analysis results into a formatted SDF YAML file. + +The most critical steps in this pipeline are "File Classification" and "Dependency Analysis". + +#### File Classification + +The goal of this step is to accurately determine which Slice each file in the RPM package belongs to. + +* **Classification by Convention**: Based on the practices observed in a large number of manually created SDFs, a rule-based engine was established using file paths (e.g., files starting with `/etc/` are assigned to the `_config` slice, those starting with `/usr/bin/` or `/usr/sbin/` to the `_bins` slice, etc.). +* **Validation by Type**: To avoid misclassifying non-binary files like scripts, the classifier uses the `file` command for validation. Only files confirmed to be in **ELF** format as executables or shared libraries are assigned to the `_bins` and `_libs` slices, respectively. + +#### Slice Dependency Analysis + +The goal of this step is to automatically trace the dependency chain of binary files to determine the dependencies between Slices. + +This process uses a three-step "**Parse -> Locate -> Trace**" strategy: + +1. **Parse Needs**: Using `readelf -d`, it statically analyzes each ELF file to safely extract the list of shared libraries it **NEEDS** at runtime (e.g., `libc.so.6`). +2. **Locate Library Path**: Leveraging the system's cache maintained by `ldconfig -p`, it quickly maps a library name to its absolute path on the filesystem. +3. **Trace Ownership**: With the library's path, it uses the `rpm -qf ` command to trace back and identify the source RPM package that owns the file (e.g., `glibc`). + +**The final dependency chain is formed as follows:** + +`brotli_libs` -> needs `libc.so.6` -> located at `/usr/lib64/libc.so.6` -> traced to `glibc` package -> **therefore depends on `glibc_libs`**. + +## Usage Instructions + +`splitter` provides two core commands: `cut` for splitting packages based on SDFs, and `gen` for automatically generating SDF files. This section explains how to use these commands. + +### Generating Slices (`cut` command) splitter uses the `cut` command line to split packages and generate the required slices (you can set the `SPLITTER_SLICE_REPO` environment variable to a custom slice-releases source). Using a locally installed splitter: @@ -136,13 +183,14 @@ splitter cut -r 24.03-LTS -a x86_64 -o /path/to/output python3_standard python3_ Or, using the official container image to run: ```bash # Example -./bin/splitter-docker.sh cut -r 24.03-LTS -a x86_64 -o /path/to/output python3_standard python3_utils``` +./bin/splitter-docker.sh cut -r 24.03-LTS -a x86_64 -o /path/to/output python3_standard python3_utils +``` In the command above, `-r/--release` specifies the openEuler version to which the required slices belong, `-a/--arch` specifies the OS architecture, `-o/--output` specifies the output path for the generated slices, and `python3_standard python3_utils` are the slices specified by the user to be generated. All the finally generated slices are packaged and saved in the `/path/to/output` directory. -### Automatically generate SDF files +### Automatically Generating SDFs (`gen` command) splitter has added a `gen` command, which can automatically generate an initial SDF file based on the contents of an RPM package, making it convenient for developers to quickly create and maintain SDFs. diff --git a/README.md b/README.md index 0215ef81af2815e31240ece48aadb77ad46806a3..6c6246fc2945d35235e24c32226f39c4e4d1396c 100644 --- a/README.md +++ b/README.md @@ -54,11 +54,13 @@ splitter 提供了官方应用容器镜像,你可以通过 [bin/splitter-docke SPLITTER_VERSION=1.0.3 ./bin/splitter-docker.sh cut -r 24.03-LTS -a x86_64 -o ./output python3_standard ``` -## 使用说明 +## 核心概念 + +splitter 工具的核心在于 Slice Definition File (SDF),本章节旨在帮助你理解 SDF 及其生成方式。 -### slice releases +### Slice Definition File (SDF) -openEuler的slice生成依赖于**Slice Definition File(SDF)**定义的软件包切分规则,SDF按照以slice所属的openEuler版本为依据分别发布,所有SDFs以yaml文件存在于[slice-releases](https://gitee.com/openeuler/slice-releases)。[slice-releases](https://gitee.com/openeuler/slice-releases)仓库使用分支名来表示不同的openEuler版本。 +openEuler 的 slice 生成依赖于 **Slice Definition File(SDF)** 定义的软件包切分规则。SDF 文件以 YAML 格式定义了一个RPM包如何被精确地拆解成多个功能独立的、可按需组合的 “Slice”。 以python3.11的SDF(命名为:python3.11.yaml)为例: @@ -125,7 +127,54 @@ slices: ``` 上述SDF,即python3.yaml文件中,`slices`指示python3的软件包被切分为:`python3_core`、`python3_standard`、`python3_utils`和`python3_copyright`四个slices,以及这些slice所包含的文件内容(详细信息请查看[slice-releases](https://gitee.com/openeuler/slice-releases))。 -### splitter生成slice +### SDF 的现状 + +目前,所有官方的SDF文件都由社区专家**手工编写**,并按照以slice所属的openEuler版本为依据分别发布,所有SDFs以yaml文件存在于[slice-releases](https://gitee.com/openeuler/slice-releases)。[slice-releases](https://gitee.com/openeuler/slice-releases)仓库使用分支名来表示不同的openEuler版本。 + +随着openEuler生态的不断发展,软件包数量和版本持续增多,手工编写和维护SDF逐渐成为一个效率瓶颈。 + +### SDF 的自动化生成 + +为了解决手工编写SDF的效率问题,splitter引入了gen命令。该命令的自动化流水线,旨在模拟专家的分析过程,为任意RPM包生成一个高质量的SDF草稿。 + +![sdf-generation-workflow](./docs/pictures/sdf-generation-workflow-cn.png) + +其自动化流程分为以下几个阶段: + +1. **下载**: 获取目标RPM包。 +2. **解压**: 将包内所有文件提取至临时目录。 +3. **文件分类**: 将每个文件归类到对应的Slice中。 +4. **依赖分析**: 找出各个Slice之间的依赖关系。 +5. **写入**: 将分析结果写入格式化的SDF YAML文件。 + +其中,最核心的步骤是“文件分类”和“依赖分析”。 + +#### 文件分类 + +此步骤的目标是精确判断每个文件应归属哪个Slice。 + +* **基于约定分类**: 依据大量手工SDF的实践,建立了一套基于文件路径的规则引擎(如以`/etc/`开头的文件归入`_config` slice,以`/usr/bin/`或`/usr/sbin/`开头的文件归入`_bins` slice等)。 +* **基于类型校验**: 为避免将脚本等非二进制文件错误归类,分类器会调用 `file` 命令进行校验。只有确认为 **ELF** 格式的可执行文件或共享库,才会被分别归入 `_bins` 和 `_libs` slice。 + +#### Slice 依赖分析 + +此步骤的目标是自动追溯二进制文件的依赖链,以确定Slice间的依赖关系。 + +该过程采用“**解析 -> 定位 -> 溯源**”的三步策略: + +1. **解析需求**: 使用 `readelf -d` 静态分析每个ELF文件,安全地提取其运行时**需要 (NEEDED)** 的共享库列表(如 `libc.so.6`)。 +2. **定位库路径**: 借助 `ldconfig -p` 的系统缓存,将库名快速映射到其在文件系统上的绝对路径。 +3. **追溯归属**: 通过 `rpm -qf <路径>` 命令,根据库文件的路径反查出它所属的源RPM包(如 `glibc`)。 + +**最终形成的依赖链如下:** + +`brotli_libs` -> 需要 `libc.so.6` -> 定位到 `/usr/lib64/libc.so.6` -> 溯源到 `glibc` 包 -> **因此依赖 `glibc_libs`**。 + +## 使用说明 + +`splitter` 提供了两个核心命令:`cut` 用于根据SDF切分软件包,`gen` 用于自动生成SDF文件。本章节介绍这两个命令的具体使用方法。 + +### 生成 Slices (`cut` 命令) splitter使用cut命令行对软件包切分生成所需的slices(可设置SPLITTER_SLICE_REPO环境变量到自定义slice-releases源) 使用本地安装的splitter: @@ -144,7 +193,7 @@ splitter cut -r 24.03-LTS -a x86_64 -o /path/to/output python3_standard python3_ 最终生成的所有slices打包保存在`/path/to/output`目录中。 -### 自动生成SDF文件 +### 自动生成 SDF (gen 命令) splitter 新增了 `gen` 命令,可以根据RPM包的内容自动生成初始的SDF文件,方便开发者快速创建和维护SDF。 diff --git a/docs/pictures/sdf-generation-workflow-cn.png b/docs/pictures/sdf-generation-workflow-cn.png new file mode 100644 index 0000000000000000000000000000000000000000..4269e084300de92f2dc189f41439d31c224c3c4b Binary files /dev/null and b/docs/pictures/sdf-generation-workflow-cn.png differ diff --git a/docs/pictures/sdf-generation-workflow-en.png b/docs/pictures/sdf-generation-workflow-en.png new file mode 100644 index 0000000000000000000000000000000000000000..af35ec79216991f8fcca68f7ea54144fd1569b5c Binary files /dev/null and b/docs/pictures/sdf-generation-workflow-en.png differ