Ai
3 Star 1 Fork 1

Gitee 极速下载/Exllama

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/turboderp/exllama
克隆/下载
example_basic.py 1.54 KB
一键复制 编辑 原始数据 按行查看 历史
turboderp 提交于 2023-09-08 23:00 +08:00 . Support for sharded models
from model import ExLlama, ExLlamaCache, ExLlamaConfig
from tokenizer import ExLlamaTokenizer
from generator import ExLlamaGenerator
import os, glob
# Directory containing model, tokenizer, generator
model_directory = "/mnt/str/models/llama-13b-4bit-128g/"
# Locate files we need within that directory
tokenizer_path = os.path.join(model_directory, "tokenizer.model")
model_config_path = os.path.join(model_directory, "config.json")
st_pattern = os.path.join(model_directory, "*.safetensors")
model_path = glob.glob(st_pattern)
# Create config, model, tokenizer and generator
config = ExLlamaConfig(model_config_path) # create config from config.json
config.model_path = model_path # supply path to model weights file
model = ExLlama(config) # create ExLlama instance and load the weights
tokenizer = ExLlamaTokenizer(tokenizer_path) # create tokenizer from tokenizer model file
cache = ExLlamaCache(model) # create cache for inference
generator = ExLlamaGenerator(model, tokenizer, cache) # create generator
# Configure generator
generator.disallow_tokens([tokenizer.eos_token_id])
generator.settings.token_repetition_penalty_max = 1.2
generator.settings.temperature = 0.95
generator.settings.top_p = 0.65
generator.settings.top_k = 100
generator.settings.typical = 0.5
# Produce a simple generation
prompt = "Once upon a time,"
print (prompt, end = "")
output = generator.generate_simple(prompt, max_new_tokens = 200)
print(output[len(prompt):])
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mirrors/Exllama.git
git@gitee.com:mirrors/Exllama.git
mirrors
Exllama
Exllama
master

搜索帮助