# FlagCX **Repository Path**: flagopen/FlagCX ## Basic Information - **Project Name**: FlagCX - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-02-10 - **Last Updated**: 2025-03-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [](https://flagopen.baai.ac.cn/) ## About [FlagCX](https://github.com/FlagOpen/FlagCX.git) is a scalable and adaptive cross-chip communication library developed with the backing of the Beijing Academy of Artificial Intelligence (BAAI). FlagCX is also a part of [FlagAI-Open](https://flagopen.baai.ac.cn/), an open-source initiative by BAAI that aims to foster an open-source ecosystem for AI technologies. It serves as a platform where developers, researchers, and AI enthusiasts can collaborate on various AI projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community. FlagCX leverages native collective communications libraries to provide the full support of single-chip communications on different platforms. In addition to its native x-CCL support, FlagCX provides an original device-buffer RDMA design to offer advanced support for cross-chip high-performance sendrecev operations (`CORE` module), which can also be integrated with native x-CCL backends to enable optimized cross-chip collective communications. A comprehensive list of currently supported communication backends and their different capabilities are listed as follows: | Backend | NCCL | IXCCL | CNCL | GLOO | CORE+x-CCL | |:--------------|:-----|:-------|:-----|:--------|:-----------| | Mode | Homo | Homo | Homo | Hetero | Hetero | | send | ✓ | ✓ | ✓ | ✓ | ✓ | | recv | ✓ | ✓ | ✓ | ✓ | ✓ | | broadcast | ✓ | ✓ | ✓ | ✘ | ✓ | | gather | ✓ | ✓ | ✓ | ✘ | ✓ | | scatter | ✓ | ✓ | ✓ | ✘ | ✓ | | reduce | ✓ | ✓ | ✓ | ✘ | ✓ | | allreduce | ✓ | ✓ | ✓ | ✓ | ✓ | | allgather | ✓ | ✓ | ✓ | ✓ | ✓ | | reducescatter | ✓ | ✓ | ✓ | ✘ | ✓ | | alltoall | ✓ | ✓ | ✓ | ✓ | ✓ | | alltoallv | ✓ | ✓ | ✓ | ✓ | ✓ | | group ops | ✓ | ✓ | ✓ | ? | ✘ | Note that `Homo` and `Hetero` modes refer to communications among homogeneous and heterogeneous clusters. All supported native collective communication libraries can be referenced through the links below: - [NCCL](https://github.com/NVIDIA/nccl), NVIDIA Collective Communications Library. - [IXCCL](https://www.iluvatar.com/software?fullCode=cpjs-rj-rjz), Iluvatar Corex Collective Communications Library. - [CNCL](https://www.cambricon.com/docs/sdk_1.7.0/cncl_1.2.1/user_guide/index.html#), Cambricon Communications Library. - [GLOO](https://github.com/facebookincubator/gloo), Gloo Collective Communications Library. FlagCX also develops plugins to integrate with upper-layer applications such as PyTorch based on its unified APIs. The table below presents the communication operations currently supported by the plugins of their corresponding frameworks, where the `batch_XXX` and `XXX_coalesced` ops refer to the usage of group primitives. | Plugin | PyTorch | |:----------------------------------|:--------| | send | ✓ | | recv | ✓ | | batch_isend_irecv | ✓ | | broadcast | ✓ | | all_reduce | ✓ | | all_reduce_coalesced | ✘ | | reduce | ✓ | | all_gather | ✓ | | all_gather_into_tensor_coalesced | ✘ | | gather | ✓ | | scatter | ✓ | | reduce_scatter | ✓ | | reduce_scatter_tensor_coalesced | ✘ | | all_to_all | ✓ | | all_to_all_single | ✓ | ## Quick Start ### Build 1. Clone the repository: ```sh git clone https://github.com/FlagOpen/FlagCX.git ``` 2. Build the library with different flags targeting to different platforms: ```sh cd FlagCX make [USE_NVIDIA/USE_ILUVATAR_COREX/USE_CAMBRICON/USE_GLOO]=1 ``` The default install path is set to `build/`, you can manually set `BUILDDIR` to specify the build path. You may also define `DEVICE_HOME` and `CCL_HOME` to indicate the install paths of device runtime and communication libraries. ### Tests Tests for FlagCX are maintained in `test/perf`. ```sh cd test/perf make [USE_NVIDIA/USE_ILUVATAR_COREX/USE_CAMBRICON]=1 ./test_allreduce -b 128M -e 8G -f 2 ``` Note that the default MPI install path is set to `/usr/local/mpi`, you may specify the MPI path with: ```sh make MPI_HOME= ``` All tests support the same set of arguments: * Sizes to scan * `-b ` minimum size to start with. Default: 1M. * `-e ` maximum size to end at. Default: 1G. * `-f ` multiplication factor between sizes. Default: 2. * Performance * `-w, ` number of warmup iterations (not timed). Default: 5. * `-n, ` number of iterations. Default: 20. * Utils * `-p, <0/1>` print buffer info. Default: 0. * `-h` print help message. Default: disabled. ## License This project is licensed under the [Apache License (Version 2.0)](https://github.com/FlagOpen/FlagCX/blob/main/LICENSE).