Since early 2025, I've been using AI LLMs like ChatGPT or DeepSeek to learn programming or help with research. Using AI LLMs is easier because they can collect and process data into information that's easier for me to understand.
At the time, I didn't know that AI LLMs could be run on a local computer. After reading several articles about SLMs (Small Language Models) that can be run on a local computer, I became interested in running AI LLMs on my computer.
For the software, I tried using Llama.cpp, which has acceleration support for Vulkan.
Initially, I used the computer I usually use for gaming in my spare time. The specifications are as follows:
- Motherboard: Intel H81
- Processor: Intel Core i5 4670 3.4-3.7 GHz Quad Core.
- RAM: 16 GB DDR3 1600 MHz.
- GPU: Nvidia GeForce GTX 1050 4 GB DDR5.
- OS: Windows 10
- SSD: 256 GB.
- Hard Disk: 1 TB.
With the above specifications, I successfully ran the LLM model with 4B and 7B parameters, but it would crash if the parameters were set higher. For 4B parameters, with CPU-only, it produced 5 tokens/s, while with Vulkan acceleration, it could reach 11 tokens/s. Meanwhile, for LLM, with 7B parameters, with CPU-only, it produced 3 tokens/s, while with Vulkan acceleration, it could reach 8 tokens/s.
From the measurement results above, I concluded that the bottleneck was the DDR3 RAM speed and the GPU's RAM size, which was too small. Therefore, LLM with parameters of 7B or higher would use more RAM.
To address the above issues, I finally allocated the budget to upgrade my computer to:
- Motherboard: MSI A520.
- Processor: AMD Ryzen 5 3400G 3.6-4.2 GHz Quad Core with hyperthreading.
- RAM: 64 GB DDR4 3200MHz.
- GPU: AMD Vega 11 integrated 2 GB shared RAM.
- OS: Windows 10.
- SSD: 256 GB.
- Hard Disk: 1 TB.
The iGPU is able to use all available RAM and not limited by shared RAM setting in the BIOS. On Windows computers, when 50% of RAM is used, it will start aggressively swapping RAM, making the computer very sluggish. On Linux, this parameter can be changed, for example, starting when there's 10% RAM remaining. Since I have a relatively large amount of RAM, I intentionally disabled the swap file so I could use more than 50% of RAM for running LLM. The maximum stable RAM speed is at 3200Mhz even though the RAM specification is 3600 MHz, possibly due to a defect in the RAM due to overclocking by previous owner or limitations of the Ryzen 3400G's memory controller. Fortunately it is still run very stable at 3200 MHz.
With these new specifications, I was able to run LLM models with 35B and even 80B parameters with LLama.cpp in Vulkan mode. On this computer, Qwen 3.0 Coder Next 80B A3B could run at 10 tokens/s with iGPU. LLama.cpp with Vulkan mode is more power efficient and quiter than the CPU only mode.
Beside running LLM this computer is good for casual gaming too and only consuming 24 watt at idle. The AMD Radeon GPU driver for Windows have great compatibility with games. The Age of Empires HD Edition is broken when viewing civilization's tech tree with my GTX 1050 but run flawlessly with the Ryzen 3400G iGPU.






















