# cpp-transformer **Repository Path**: gszx_admin/cpp-transformer ## Basic Information - **Project Name**: cpp-transformer - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-04 - **Last Updated**: 2025-12-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # cpp-transformer A C++ implementation of Transformer without special library dependencies, including training and inference. This project replicates the content of [Chapter 11](https://d2l.ai/chapter_attention-mechanisms-and-transformers/transformer.html) on Transformers in Dive into Deep Learning. It builds an English-French machine translation model using C++. The project develops its own automatic differentiation framework and only depends on the C++ standard library, aiming to help users understand the underlying principles of Transformers. ## Project Highlights ### Principle - Oriented We construct the model starting from fundamental operations without relying on deep learning frameworks. This approach clearly demonstrates the operational mechanism of Transformers. ### Automatic Differentiation Our self - developed automatic differentiation framework simplifies the gradient calculation process, facilitating a better understanding of the backpropagation algorithm. ### Low Dependencies The project only depends on the C++ standard library. While its performance may not be as high - end as those with advanced libraries, it clearly showcases every computational detail. This characteristic allows users to gain a profound understanding of the backpropagation algorithm and the underlying principles of the Transformer architecture. ## Update Log V2 - [2025-05-29] 1. Redesigned Tensor Class 2. Redesigned Backend Ops Interface 3. Redesigned Computation Flow * Pre - computed Tensor Dependency Logic and Batch Memory Allocation * Compact Memory Layout * Efficient zero_grad Implementation 4. Closer Implementation to Tensor Semantics in DL2 Chapter 11 5. Enhanced Test Cases V2.01 - [2025-06-08] 1. Supported a simple language model. V2.02 - [2025-06-14] 1. Supported Metal ## Quick start ### build #### for gpu ``` ./build_gpu.sh ``` The program compiled in this way supports both CPU and GPU. You can use the -g parameter to switch between them. #### for cpu ``` ./build_cpu.sh ``` If you don't have a CUDA environment, you can also try the CPU version. Note that this version is extremely slow and is only intended for comparing and verifying the correctness of the GPU version. #### for mac gpu Metal is now supported, and the GPU on Mac can be used now. My MacBook hardware and software information * Chip : Apple M1 * OS Version : 15.5 (24F74) ``` ./build_mac_gpu.sh ``` #### for mac cpu ``` ./build_mac_cpu.sh ``` ## Translation ### training Align the training data volume (512 pairs) of Chapter 11 Transformer in d2l. ``` $ time ./transformer -e 30 corpus : ./resources/fra_preprocessed_512.txt epochs : 30 batch_size : 128 gpu : 1 learning rate : 0.001 checkpoint : enc_vocab_size : 195 dec_vocab_size : 214 bos_id : 3 eos_id : 1 src_pad_id : 0 tgt_pad_id : 0 predicting : false batch_size : 128 epoch 0 : [512/512]loss : 4.62015 epoch 1 : [512/512]loss : 3.39543 epoch 2 : [512/512]loss : 2.96776 epoch 3 : [512/512]loss : 2.45226 epoch 4 : [512/512]loss : 2.20506 epoch 5 : [512/512]loss : 1.94157 epoch 6 : [512/512]loss : 1.76016 epoch 7 : [512/512]loss : 1.58783 epoch 8 : [512/512]loss : 1.46 epoch 9 : [512/512]loss : 1.35267 epoch 10 : [512/512]loss : 1.23456 epoch 11 : [512/512]loss : 1.11818 epoch 12 : [512/512]loss : 1.02721 epoch 13 : [512/512]loss : 0.930991 epoch 14 : [512/512]loss : 0.868043 epoch 15 : [512/512]loss : 0.797028 epoch 16 : [512/512]loss : 0.730525 epoch 17 : [512/512]loss : 0.685426 epoch 18 : [512/512]loss : 0.670126 epoch 19 : [512/512]loss : 0.635286 epoch 20 : [512/512]loss : 0.580065 epoch 21 : [512/512]loss : 0.558903 epoch 22 : [512/512]loss : 0.528207 epoch 23 : [512/512]loss : 0.49648 epoch 24 : [512/512]loss : 0.482626 epoch 25 : [512/512]loss : 0.456417 epoch 26 : [512/512]loss : 0.452462 epoch 27 : [512/512]loss : 0.432102 epoch 28 : [512/512]loss : 0.408004 epoch 29 : [512/512]loss : 0.395327 checkpoint saved : ./checkpoints/checkpoint_20250603_111836_29.bin real 0m44.835s user 0m44.531s sys 0m0.272s ``` ### inference Perform translation inference using the checkpoint file generated earlier. The data will be read from the test.txt file. ``` $ ./transformer -e 0 -c ./checkpoints/checkpoint_20250603_111836_29.bin corpus : ./resources/fra_preprocessed_512.txt epochs : 0 batch_size : 128 gpu : 1 learning rate : 0.001 checkpoint : ./checkpoints/checkpoint_20250603_111836_29.bin enc_vocab_size : 195 dec_vocab_size : 214 bos_id : 3 eos_id : 1 src_pad_id : 0 tgt_pad_id : 0 predicting : true batch_size : 1 loading from checkpoint : ./checkpoints/checkpoint_20250603_111836_29.bin loaded from checkpoint serving mode test file : ./test.txt go . -> va . i lost . -> j'ai perdu . he's calm . -> il est mouillé . i'm home . -> je suis chez moi . ``` ## Language Model A language model built with a two-layer decoder, trained on the first 256 tokens from timemachine_preprocessed.txt, reads the text starting from test_lm.txt during inference. ### training ``` $ ./lm -e 10 -m 256 corpus : ./resources/time_machine/timemachine_preprocessed.txt epochs : 10 batch_size : 16 gpu : 1 learning rate : 0.001 checkpoint : max_words_cnt : 256 Allocating memory for tensors : 36609236 bytes, for c_tensors: 3194706328 bytes for grad_tensors: 1241779004 bytes epoch 0 : [224/256]loss : 5.54111 epoch 1 : [224/256]loss : 1.36544 epoch 2 : [224/256]loss : 0.178868 epoch 3 : [224/256]loss : 0.0472531 epoch 4 : [224/256]loss : 0.0245251 epoch 5 : [224/256]loss : 0.0195127 epoch 6 : [224/256]loss : 0.0174135 epoch 7 : [224/256]loss : 0.0162055 epoch 8 : [224/256]loss : 0.0154597 epoch 9 : [224/256]loss : 0.0147902 checkpoint saved : ./checkpoints/checkpoint_20250608_200259_9.bin ``` ### inference ``` $ ./lm -e 0 -c ./checkpoints/checkpoint_20250608_200259_9.bin corpus : ./resources/time_machine/timemachine_preprocessed.txt epochs : 0 batch_size : 16 gpu : 1 learning rate : 0.001 checkpoint : ./checkpoints/checkpoint_20250608_200259_9.bin max_words_cnt : 256 Allocating memory for tensors : 36355416 bytes, for c_tensors: 17206900 bytes for grad_tensors: 14209596 bytes loading from checkpoint : ./checkpoints/checkpoint_20250608_200259_9.bin loaded from checkpoint serving mode test file : ./test_lm.txt sentence : the time machine by h g wells i the time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly ----------------- ``` ### pre-trained lm model * [model link1](https://cpp-transformer-1252366230.cos.ap-beijing.myqcloud.com/lm/checkpoint_20250617_162040_7.bin) * [model link2](https://cpp-transformer-us-1252366230.cos.na-ashburn.myqcloud.com/lm/checkpoint_20250617_162040_7.bin) This model was trained for 8 epochs using the full text of The Time Machine novel. ## handwritten_recognition To verify some functions more quickly, I have introduced a handwritten digit recognition program. ``` ./handwritten_recognition images magic : 2051 label magic : 2049 lables_num : 60000 data loaded. Actions: ... evaluating : [10000/10000] correct : 9501 epoch : 9 [50000/50000] loss : 0.150985 evaluating : [10000/10000] correct : 9493 ``` ### graphviz supported You can add a line of code like this to the program to output an out.dot file that records the tensor computation topology. For example, in mnist.cpp: ``` printAllActions(); printDotGraph(); // here allocMemAndInitTensors(); ``` If you have Graphviz installed, you can use the following command to convert the out.dot file into a PNG image: ``` dot -Tpng out.dot -o out.png ``` Here's an example from my side where a PNG file is generated as output. ![alt text](handwritten_recognition_topo.png) ## legacy version [v1](https://github.com/freelw/cpp-transformer/tree/v1_freeze_20250529) ## Derivation of backpropagation gradient formulas * [Derivation](doc/equations/readme.md) ## Reference Materials * [Dive into Deep Learning](https://d2l.ai/) * [recognizing_handwritten_digits](https://github.com/freelw/recognizing_handwritten_digits) * [micrograd](https://github.com/EurekaLabsAI/micrograd)