Kamis, 23 April 2026

Building New PC To Study LLM

Since early 2025, I've been using AI LLMs like ChatGPT or DeepSeek to learn programming or help with research. Using AI LLMs is easier because they can collect and process data into information that's easier for me to understand.

 

At the time, I didn't know that AI LLMs could be run on a local computer. After reading several articles about SLMs (Small Language Models) that can be run on a local computer, I became interested in running AI LLMs on my computer.

For the software, I tried using Llama.cpp, which has acceleration support for Vulkan.

Initially, I used the computer I usually use for gaming in my spare time. The specifications are as follows:

  • Motherboard: Intel H81
  • Processor: Intel Core i5 4670 3.4-3.7 GHz Quad Core.
  • RAM: 16 GB DDR3 1600 MHz.
  • GPU: Nvidia GeForce GTX 1050 4 GB DDR5.
  • OS: Windows 10
  • SSD: 256 GB.
  • Hard Disk: 1 TB.

With the above specifications, I successfully ran the LLM model with 4B and 7B parameters, but it would crash if the parameters were set higher. For 4B parameters, with CPU-only, it produced 5 tokens/s, while with Vulkan acceleration, it could reach 11 tokens/s. Meanwhile, for LLM, with 7B parameters, with CPU-only, it produced 3 tokens/s, while with Vulkan acceleration, it could reach 8 tokens/s.

From the measurement results above, I concluded that the bottleneck was the DDR3 RAM speed and the GPU's RAM size, which was too small. Therefore, LLM with parameters of 7B or higher would use more RAM.

 

To address the above issues, I finally allocated the budget to upgrade my computer to:

  • Motherboard: MSI A520.
  • Processor: AMD Ryzen 5 3400G 3.6-4.2 GHz Quad Core with hyperthreading.
  • RAM: 64 GB DDR4 3200MHz.
  • GPU: AMD Vega 11 integrated 2 GB shared RAM.
  • OS: Windows 10.
  • SSD: 256 GB.
  • Hard Disk: 1 TB.

The iGPU is able to use all available RAM and not limited by shared RAM setting in the BIOS. On Windows computers, when 50% of RAM is used, it will start aggressively swapping RAM, making the computer very sluggish. On Linux, this parameter can be changed, for example, starting when there's 10% RAM remaining. Since I have a relatively large amount of RAM, I intentionally disabled the swap file so I could use more than 50% of RAM for running LLM.

With these new specifications, I was able to run LLM models with 35B and even 80B parameters with LLama.cpp in CPU and Vulkan mode. On this computer, Qwen 3.0 Coder Next 80B A3B could run at 10 tokens/s with iGPU. LLama.cpp with Vulkan mode is more power efficient and quite than the CPU only mode.

Beside running LLM this computer is good for gaming too and only consuming 24 watt at idle.

Building New PC To Study LLM

Since early 2025, I've been using AI LLMs like ChatGPT or DeepSeek to learn programming or help with research. Using AI LLMs is easier b...