# apex
**Repository Path**: spearNeil/apex
## Basic Information
- **Project Name**: apex
- **Description**: APEX
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-03-06
- **Last Updated**: 2023-01-07
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# APEX: A High-Performance Learned Index on Persistent Memory
More details are described in our [preprint](https://arxiv.org/abs/2105.00683).
## Building
### Dependencies
We tested our build with Linux Kernel 5.10.11 and GCC 10.2.0. You must ensure that your Linux kernel version >= 4.17 and glibc >=2.29 for proper build.
### Compiling
Assuming to compile under a `build` directory:
```bash
git clone https://github.com/baotonglu/apex.git
cd apex
./build.sh
```
## Running benchmark
### Persistent memory pool path
Please ensure your PM device is properly configured with App Direct mode and mounted to file system with "DAX" enabled.
Change the [PM pool path](https://github.com/baotonglu/apex/blob/ccd172c1034ec235027aebf0d481b9c583a91ec0/src/util/allocator.h#L24) of our allocator to the memory path on your own server before testing.
### Benchmark setting
We run the tests in a single NUMA node with 24 physical CPU cores. We pin threads to physical cores compactly assuming thread ID == 2 * core ID (e.g., for a dual-socket system, we assume cores 0, 2, 4, ... are located in socket 0). Check out also the `total.sh` and `run.sh` script for example benchmarks and easy testing of the index. It supports the following arguments:
```bash
./build/benchmark [OPTION...]
--keys_file the name of the dataset
--keys_file_type the reading method for dataset (binary/text/sosd)
--keys_type the type of the key (double/uint64)
--total_num_keys total number of keys in the dataset
--init_num_keys the number of keys to bulk-load before testing
--workload_keys the number of keys in the workload
--operation the query type in the workload (insert/search/erase/update/range/mixed)
--insert_frac the fraction of insert in mixed search-insert workload
--lookup_distribution the access distribution of the workload (uniform/zipf)
--theta the skewness of zipf (e.g.,0.9)
--using_epoch whether to register epoch in application level: 0/1
--thread_num the number of worker threads
--index the name of index to evaluate (apex)
--random_shuffle whether to do the random shuffle for the dataset
--sort_bulkload whether sort the keys before bulk-loading
```
## Competitors
Here hosts source codes which are used in comparision with APEX , including LB+-Tree [1], DPTree [2], uTree [3], FPTree [4], BzTree [5] and FAST+FAIR [6].
[1] https://github.com/schencoding/lbtree
[2] https://github.com/zxjcarrot/DPTree-code
[3] https://github.com/thustorage/nvm-datastructure
[4] https://github.com/sfu-dis/fptree
[5] https://github.com/sfu-dis/bztree
[6] https://github.com/DICL/FAST_FAIR
## Datasets
- [Longitudes (200M 8-byte floats)](https://drive.google.com/file/d/1zc90sD6Pze8UM_XYDmNjzPLqmKly8jKl/view?usp=sharing)
- [Longlat (200M 8-byte floats)](https://drive.google.com/file/d/1mH-y_PcLQ6p8kgAz9SB7ME4KeYAfRfmR/view?usp=sharing)
- [Lognormal (190M 8-byte ints)](https://drive.google.com/file/d/1y-UBf8CuuFgAZkUg_2b_G8zh4iF_N-mq/view?usp=sharing)
- [YCSB (200M 8-byte ints)](https://drive.google.com/file/d/1Q89-v4FJLEwIKL3YY3oCeOEs0VUuv5bD/view?usp=sharing)
- [FB (200M 8-byte ints)](https://github.com/learnedsystems/SOSD)
- [TPCE (259M 8-byte ints)](https://github.com/sfu-dis/ermia/tree/master/benchmarks/tpce_keys)
## Acknowledgements
Our implementation is based on the code of [ALEX](https://github.com/microsoft/ALEX).