Portable and private AI for Windows PCs without GPUs
While reading on Hacker News late afternoon, I remembered I had this idea to have a Large Language Model (LLM) in a USB drive, plug it to any Windows machine, run it from there without copying it to the laptop/desktop or without installing anything, then chat with it. A portable, private AI for non-technical Windows users. No need for internet connection and GPUs. No need to login as well.
Big thanks to llama.cpp which makes this possible. How? Basically, I just gathered the relevant files needed to run a llama-server, reuse a batch script, then put them all in a folder in the USB drive. Finally, I can now share it with family and friends.
Instructions to package a portable AI
The files mentioned here are the ones that I have personally saved in my USB drive.
Step 1. Download a release from llama.cpp. For example, download the llama-b5595-bin-win-cpu-x64.zip.
Step 2. Extract the ZIP file. These are the expected contents.
Step 3. Create a folder. Name it as you like. I named mine as portable-ai.
Step 4. Copy the following files to the portable-ai folder.
ggml.dll
ggml-base.dll
ggml-cpu-x64.dll
ggml-rpc.dll
libcurl-x64.dll
libomp140.x86_64.dll
llama.dll
llama-server.exe
mtmd.dll
Step 5. Download a Large Language Model or the AI. Below is a list of AI models and their download link. Choose a model and download the one that has “Q4_K_M” in its name. After downloading, save the AI model to the portable-ai folder as well.
| AI Model | Download |
|---|---|
| gemma-2-2b-it-abliterated | Link |
| SmolLM2-1.7B-Instruct | Link |
| Qwen_Qwen3-1.7B | Link |
Step 6. Create a batch script file (a file that has a .bat extension). To do this, open Notepad. Copy and paste the code below. Save the file as app_launch_AI.bat. Note that when saving the file, the file type SHOULD NOT BE “.txt”. In the “Save as type” dropdown, select “All Files” just as shown in the image below. This file should also be in the portable-ai folder.
This script runs the llama-server and opens the browser for the llama.cpp UI. If you downloaded the gemma-2-2b-it-abliterated or SmolLM2-1.7B-Instruct, make sure to rename the model (currently “Qwen_Qwen3-1.7B-Q4_K_M.gguf”) in the script to the corresponding name of the AI model file you downloaded. Here’s the code:
Set the ai_model in the script to Qwen_Qwen3-1.7B-Q4_K_M.gguf or gemma-2-2b-it-abliterated-Q4_K_M.gguf or smollm2-1.7b-instruct-q4_k_m.gguf
@echo off
setlocal
set ai_model=Qwen_Qwen3-1.7B-Q4_K_M.gguf
:: Start the server in a separate window
start "Local AI server" llama-server.exe -m %ai_model% --ctx-size 4096
echo Waiting for server to start...
set max_retries=15
set retry_delay=2
:: Check if server is ready
set server_ready=0
for /l %%i in (1,1,%max_retries%) do (
>nul 2>&1 powershell -command "$response = try { Invoke-WebRequest http://localhost:8080/ -UseBasicParsing -DisableKeepAlive -TimeoutSec 1 } catch {}; if ($response.StatusCode -eq 200) { exit 0 } else { exit 1 }"
if %errorlevel% equ 0 (
set server_ready=1
goto server_up
)
timeout /t %retry_delay% /nobreak >nul
echo Checking... (Attempt %%i/%max_retries%)
)
:server_up
if %server_ready% equ 1 (
echo Server is ready! Opening browser...
start "" "http://localhost:8080/"
) else (
echo WARNING: Server didn't respond after %max_retries% attempts
echo Opening browser anyway...
start "" "http://localhost:8080/"
)
endlocal
Step 7. The portable-ai folder should have these 11 items. Copy the folder to your USB drive.
app_launch_AI.bat
ggml.dll
ggml-base.dll
ggml-cpu-x64.dll
ggml-rpc.dll
libcurl-x64.dll
libomp140.x86_64.dll
llama.dll
llama-server.exe
mtmd.dll
Qwen_Qwen3-1.7B-Q4_K_M.gguf
Step 8. Click the app_launch_AI.bat to run the AI. This will automatically open the browser for the user interface to the AI.
Important notes
Item 1. When starting the AI, a command prompt will appear. Do not close this command prompt while using the AI. Close this only after using the AI.
Item 2. The user interface is opened through a browser. The address is at localhost:8080. This will be available until the command prompt (where the server is running) is closed. It looks like this: