# learning-lm-rs

**Repository Path**: asphodelus_dev/learning-lm-rs

## Basic Information

- **Project Name**: learning-lm-rs
- **Description**: xxxxxxxxxxxxxxxxxxxxxxx
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-01-25
- **Last Updated**: 2025-02-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

### 1. self_attention

总结：对每个KV头和对应的Q组，逐元素计算Q与K的点积，应用masked_softmax确保因果注意力（仅关注历史信息），将注意力权重与V值相乘，累加得到输出张量。

### 2. 功能：文本生成

总结：将用户的输入使用tokenizer转换为对应token进行推理，随机采样后将推理结果再用tokenizer转换为文本进行输出

生成效果如下

```
% cargo run --bin story
Once upon a timeOnce upon a time, a brave cat named Tom lived on a tree in his yard. One day, Tom saw a big pink tree with many branches that looked pretty. Tom wanted to do it. He jumped on the tree and began to eat it.
Tom saw his friend, a dog named Max. Max said, "I want to eat it, Tom!" Max thought about what Sam said. Max tried to rest, but he could not get the pit on the tree. Tom did not know how to eat the pile of of the prune.
The prune was cold and cool. Tom was still a tree to get the apples inside. Now, he could eat the apples with the apples. Tom took the apples with his mouth and took the apples to share.
Tom and Max were having fun together. They had so much fun with their new prunch. They had a great time together and had a great time. They both played together and had a great time.<|end_story|>
```


### 3. AI对话

总结：手动管理对话模板的格式化、会话对应kvcache的结构。对ai对话特化generate函数的实现，将cache移到cli应用部分，以对更长的上下文进行缓存。

生成效果如下

```
% cargo run --bin chat
欢迎使用聊天助手！输入 'quit' 结束对话。
> Hello, how are you?
Asistant: I don't have any personal opinion about how you have been a part of my business, but here are some suggestions for how you can create a brand that reflects your company's culture:

1. Use your brand logo and tag team:

- choose a specific brand that is specific to your business or industry. - choose a brand that represents your company's culture and values. - use your brand to create a brand that resonates with your target audience. - use your brand logo and tag team to create a brand that is unique and engaging. - use your brand logo to build your brand's brand image and showcase your brand's brand as a standalone brand. - use your brand logo to create a brand logo and tag team, so you can stand out from the crowd. - use your brand logo to create a logo and tag team to create a brand logo. This will help to create a strong presence and increase brand recognition.
> 
```


### 4. cuda backend

总结：实现相应算子的cuda代码,矩阵乘使用cublas实现，将cuda代码编译为ptx。cuda代码使用宏对代码进行简化。在Llama定义中添加用于处理cuda的operator，推理时调用算子的函数加载对应ptx。


文本生成效果如下

```
% cargo run --bin story --features="cuda"
Once upon a timeOnce upon a time, a dog named Spot lived in a small pool. He was eager to visit his friend Casty lived in a house with a fit who loved to measure it would look more powder and look for it.
One day, while Spot was walking, he met a new friend, a small cat named Fluffy, singing her mum, "Chould you like to pas?". Mrlly agreed. Spot nodded his head and said, "I like it." Spot was very happy and excited.
At night, Spot and Cassy went to well look at the pool. They all ran around, which they had ever before. After they finished, they saw many toys! A kind man had a picture of a happy song and a friend who was not worried anymore. They shared the new shape with the new muscles and made funny noises. They were happy they could help.<|end_story|>
```

ai对话效果如下

```
% cargo run --bin chat --features="cuda"
欢迎使用聊天助手！输入 'quit' 结束对话。
> Hello
Asistant: I am looking for a reliable and user-friendly website for searching for helpful and high-quality images. However, you can use it for search engine optimization and keyword research in various industries. You can also create a webpage that is easily accessible and easy to navigate for search engines.
> 
```


### 5. 混合精度

总结：使用`half`crate，将原来的函数改为了泛型函数

### 6. web api

总结：使用sse来实现常见llm网页应用中的打字机效果。为了实现web api特化generate的实现，使用异步框架async-stream将token的decode过程转换为一个异步流，充分利用tokio的协程能力。使用ntex框架实现webapi，ntex在rust web框架benchmark中有较好的性能表现。web前端使用react框架进行实现。


后台服务启动后控制台输出如下

```
% cargo run --bin web --features="web"  
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/web`
Starting server at http://127.0.0.1:8080
```

前端部分进入frontend子目录运行如下命令启动

```bash
pnpm i 
pnpm run dev
```


### 待提高

当前后端不支持天数，使用`cuda_driver`框架可以实现对天数的支持。算子实现较为粗糙，未实现广播等操作。

页面过于单调，功能较少，可用前端框架实现更复杂页面。未实现orca。