Llama Coder is a better and self-hosted Github Copilot replacement for VS Code. Llama Coder uses Ollama and codellama to provide autocomplete that runs on your hardware. Works best with Mac M1/M2/M3 or with RTX 4090.
Minimum required RAM: 16GB is a minimum, more is better since even smallest model takes 5GB of RAM. The best way: dedicated machine with RTX 4090. Install Ollama on this machine and configure endpoint in extension settings to offload to this machine. Second best way: run on MacBook M1/M2/M3 with enough RAM (more == better, but 10gb extra would be enough). For windows notebooks: it runs good with decent GPU, but dedicated machine with a good GPU is recommended. Perfect if you have a dedicated gaming PC.
Install Ollama on local machine and then launch the extension in VSCode, everything should work as it is.
Install Ollama on dedicated machine and configure endpoint to it in extension settings. Ollama usually uses port 11434 and binds to 127.0.0.1
, to change it you should set OLLAMA_HOST
to 0.0.0.0
.
Currently Llama Coder supports only Codellama. Model is quantized in different ways, but our tests shows that q4
is an optimal way to run network. When selecting model the bigger the model is, it performs better. Always pick the model with the biggest size and the biggest possible quantization for your machine. Default one is stable-code:3b-code-q4_0
and should work everywhere and outperforms most other models.
Name | RAM/VRAM | Notes |
---|---|---|
stable-code:3b-code-q4_0 | 3GB | |
codellama:7b-code-q4_K_M | 5GB | |
codellama:7b-code-q6_K | 6GB | m |
codellama:7b-code-fp16 | 14GB | g |
codellama:13b-code-q4_K_M | 10GB | |
codellama:13b-code-q6_K | 14GB | m |
codellama:34b-code-q4_K_M | 24GB | |
codellama:34b-code-q6_K | 32GB | m |
Most of the problems could be seen in output of a plugin in VS Code extension output.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。