Running your own AI coding assistant locally using ROCm, LLaMA C++, and aider.
Why Local AI Coding?
Cloud AI coding assistants are convenient, but local models offer:
- Better privacy
- Lower long-term cost
- Offline development
- Faster iteration for small/medium models
- Full control over models and tooling
With modern AMD GPUs and ROCm support improving rapidly, Ubuntu 26.04 makes it surprisingly easy to run local coding models.
In this guide, we'll set up:
- Ubuntu 26.04
- ROCm GPU stack
- LLaMA C++
- aider
- Local coding models (Qwen, DeepSeek-Coder, etc.)
using an AMD Radeon AI R9700 GPU.
Hardware Used
My setup:
| Component | Details |
|---|---|
| GPU | AMD Radeon AI R9700 |
| OS | Ubuntu 26.04 |
| RAM | 32 GB |
| CPU | Ryzen 9950X CPU |
| Storage | NVMe SSD |
Step 1 - Enable deb-src Repositories
Ubuntu 26.04 uses the new DEB822 .sources format.
Enable source repositories:
sudo sed -i 's/^Types: deb$/Types: deb deb-src/' \
/etc/apt/sources.list.d/ubuntu.sources
sudo apt update
Step 2 - Install required software
sudo apt install -y \
git curl wget build-essential \
python3 python3-pip python3-venv \
cmake pkg-config \
rocminfo clinfo rocm rocm-smi
sudo usermod -a -G render,video $USER
Reboot the system.
Step 3 - Verify GPU Detection
Check ROCm:
rocminfo
Check OpenCL:
clinfo
You should see your R9700 GPU listed.
Step 4 - Build llama.cpp
sudo apt build-dep llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake . -DBUILD_SHARED_LIBS=OFF -DGGML_VULKAN=ON
make -j32
Step 5 - Test llama.cpp
./bin/llama-bench --list-devices
GML_VK_VISIBLE_DEVICES=0 ./bin/llama-bench -hf llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF:Q6_K -ngl 999 -fa 1
Sample output of above command:
| model | size | params | backend | ngl | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| qwen35 27B Q6_K | 21.23 GiB | 27.32 B | Vulkan | 999 | 1 | pp512 | 902.06 ± 0.92 |
| qwen35 27B Q6_K | 21.23 GiB | 27.32 B | Vulkan | 999 | 1 | tg128 | 25.00 ± 0.03 |
If GPU acceleration is working, you should see VRAM utilization increase in:
watch -n1 rocm-smi
$ rocm-smi
=========================================== ROCm System Management Interface ===========================================
===================================================== Concise Info =====================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
========================================================================================================================
0 1 0x7551, 64106 66.0°C 300.0W N/A, N/A, 0 3070Mhz 1258Mhz 32.94% auto 300.0W 65% 100%
1 2 0x13c0, 15961 49.0°C 0.009W N/A, N/A, 0 N/A 3000Mhz 0% auto N/A 4% 0%
========================================================================================================================
================================================= End of ROCm SMI Log ==================================================
Step 6 - Install aider
python3 -m venv ~/venvs/aider
source ~/venvs/aider/bin/activate
pip3 install aider-install
aider-install
Step 7 - Configure aider for LLaMA C++
cd <target-folder>
export OPENAI_API_KEY=dummy
aider --openai-api-base http://localhost:8080/v1 --model openai/qwen --no-show-model-warnings
Step 8 - Start Coding
./bin/llama-server \
-hf llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF:Q6_K \
-ngl 99 -c 262144 -fa on -np 1 \
--spec-type ngram-mod,draft-mtp --spec-draft-n-max 4
aider --openai-api-base http://localhost:8080/v1 --model openai/qwen --no-show-model-warnings .
Monitoring GPU Usage
Useful commands:
watch -n1 rocm-smi
radeontop
htop
Final Thoughts
The Linux + ROCm ecosystem has improved dramatically over the past few years.
With Ubuntu 26.04, getting ROCm to work is pretty trivial.
For privacy-conscious developers, this setup is a compelling alternative to cloud-hosted coding assistants.