Local LLM Setup Workbook
I am putting this together because I have been wanting to have something to help to set up a local AI setup that actually works. I have found this works for me and have gone through a couple iterations of hardware and surrounding software, models, workflows, etc., and I was hoping to help others do it in a simple, straightforward manner. I will say this was designed around a simple cost-effective solution (though prices have gone up). It's a consumer CPU (that can handle nested virtualization) and a decent amount of system memory and a higher-end (at the time it was made) consumer GPU, but one that isn't too expensive used today, with plenty of vRAM.
I also want to call out that the guide itself was put together by me, and the various pieces of code are all things I use and have verified and pulled from the respective install notes for each solution. The scripts at the end, however, were created by my local AI using that tested and validated code. I have not yet had a chance to test the install scripts but will over the next week.
For now I present my production guide for running Qwen 3.6 27B (Unsloth GGUF) with a native 256K context window on mixed VRAM/RAM hardware.
Target Hardware Architecture
This is my hardware, I wanted something capable of running a mid range dense model, you can run smaller models on smaller VRAM, you would just need to adjust your model to a new one and your context size. At the time I put this together it represented a $1000 machine, it is likely more today with the current prices for memory and hard drives and GPUs.
| Type | Totals |
|---|---|
| GPU VRAM | 24 GB (RTX 3090) |
| System RAM | 64 GB |
| Context Size | 262,144 Tokens |
| Model | Qwen3.6-27B-Q4_K_M (Unsloth GGUF) |
Core Downloads
Step-by-Step Deployment
- Step 1 - Set Up Ubuntu with GPU (RTX 3090)
- Step 2 - Install Docker + NVIDIA Container Toolkit
- Step 3 - Prepare the Model Directory
- Step 4 - Write and Deploy the docker-compose Stack
- Step 5 - Deploy the Stack
- Step 6 - Deploy an Ubuntu Server VM for Hermes Agent
- Step 7 - Install Hermes Agent on the Ubuntu Server VM
- Step 8 - Bind Hermes Agent to the llama.cpp Container API
- Step 9 - Install VS Code on Your Workstation
- Step 10 - Install OpenCode via curl
- Step 11 - Bind OpenCode to the llama.cpp Container API
- Step 12 - Install Tailscale for Remote Connectivity
- Step 13 - SSH into the GPU Host via Tailscale
- Step 14 - Unified Outcome: Assistant + Coding Agent
- Appendix 1 - Install Portainer for Container Management
- Appendix 2 - Install Open WebUI on Your Dev PC
- Appendix 3 - Automation Script 1: setup-gpu-host.sh
- Appendix 4 - Automation Script 2: setup-hermes-vm.sh
- Appendix 5 - Automation Script 3: setup-workstation.sh
Step 1 - Set Up Ubuntu with GPU (RTX 3090)
Install Ubuntu on your physical machine that hosts the RTX 3090. During installation, enable third-party drivers so NVIDIA drivers can be installed automatically.
There is a full install guide here https://ubuntu.com/tutorials/install-ubuntu-server#1-overview but basically download the server onto a USB and boot from that media. Most everything can just be left as the default, I would add SSH when you get to the screen that has that to ensure you can access this remotely later.
# After install, verify GPU visibility
ubuntu-drivers devices
# Install recommended NVIDIA driver
sudo ubuntu-drivers autoinstall
# Reboot to apply
sudo reboot
# Verify GPU
nvidia-smi
Step 2 - Install Docker + NVIDIA Container Toolkit
Add Docker so you can run GPU-accelerated containers for llama.cpp. This step also installs the NVIDIA Container Toolkit so Docker can access the GPU.
# First, update your existing list of packages:
sudo apt update
# Install prerequisites
sudo apt install apt-transport-https ca-certificates curl software-properties-common
# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Add Docker repository
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update and install Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin
# Add current user to docker group
sudo usermod -aG docker $USER
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU access from Docker
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
# Test Docker basic operation
docker run --rm hello-world
Note: Log out and back in (or newgrp docker) before running the docker run commands if you got permission denied while trying to connect.
Step 3 - Prepare the Model Directory
Create a permanent storage folder for model files. Download the targeted Unsloth GGUF binary via the Hugging Face CLI tool.
# Create the model folder path
sudo mkdir -p /opt/models
# Install Hugging Face CLI if needed
pip install -U huggingface_hub
# Download target model
huggingface-cli download unsloth/Qwen3.6-27B-GGUF \
Qwen3.6-27B-Q4_K_M.gguf \
--local-dir /opt/models \
--local-dir-use-symlinks False
# Find the actual file path (snapshot hash will differ)
find /opt/models -name "Qwen3.6-27B-Q4_K_M.gguf" -type f
Model Tag: Qwen3.6-27B-Q4_K_M.gguf
Make note of the full path returned by find - you'll need it for the compose file in the next step.
Step 4 - Write and Deploy the docker-compose Stack
Create a dedicated directory for your project and write the compose file.
sudo mkdir -p /opt/llamacpp
sudo nano /opt/llamacpp/docker-compose.yml
Paste this into the compose file. Replace /opt/models/models--unsloth--Qwen3.6-27B-GGUF/snapshots/<your-snapshot-hash>/Qwen3.6-27B-Q4_K_M.gguf with the actual path from Step 3.
services:
llama-server:
image: ghcr.io/ggml-org/llama.cpp:server-cuda
container_name: llama-cpp-server
restart: unless-stopped
ports:
- "2030:2030"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- /opt/models:/models
command: >
--model /models/models--unsloth--Qwen3.6-27B-GGUF/snapshots/<your-snapshot-hash>/Qwen3.6-27B-Q4_K_M.gguf
--alias Qwen3.6-27B-Q4_K_M.gguf
--api-key ""
--jinja
--parallel 1
--reasoning-budget 1024
--reasoning-budget-message "... thinking budget reached, outputting code now."
--ctx-size 262144
--temp 0.6
--top-k 20
--top-p 0.8
--repeat-penalty 1
--presence-penalty 0
--cache-type-k q4_0
--cache-type-v q4_0
--flash-attn on
--host 0.0.0.0
--port 2030
Why
--host 0.0.0.0? Inside the Docker container, this binds the server to all container interfaces. Docker's port mapping (2030:2030) forwards traffic from the host's IP to the container. If you need to bind to a specific host LAN IP (e.g. for VLAN pinholing), changenetwork_modetohostand set--hostto your host's LAN IP instead.
Step 5 - Deploy the Stack
Bring up the llama.cpp container:
cd /opt/llamacpp
sudo docker compose up -d
Check the logs to verify it started correctly:
sudo docker compose logs -f
You should see: srv llama-server: server is listening on http://0.0.0.0:2030
Test the API:
curl http://localhost:2030/v1/models
Step 6 - Deploy an Ubuntu Server VM for Hermes Agent
I do this to keep Hermes Agent in a sandbox so it can't get to all of the files on the host. This VM can run on the same GPU host via KVM, or on separate hardware.
6a. Install KVM on the GPU Host
sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-install virt-manager -y
sudo usermod -aG $USER kvm
sudo usermod -aG $USER libvirt
Log out and back in for group changes to take effect.
6b. Download an Ubuntu Server ISO
wget https://releases.ubuntu.com/24.04/ubuntu-24.04-live-server-amd64.iso
6c. Create a disk image for the VM
qemu-img create -f qcow2 -o preallocation=off hermes-vm.qcow2 40G
6d. Create the VM
Find your network interface with ip a, then run:
virt-install \
-n hermes-vm \
--osinfo=ubuntujammy \
--memory=8192 \
--vcpus=4 \
--cpu host \
--network type=direct,source=<your-interface>,source_mode=bridge,model=virtio \
--disk=/home/$USER/hermes-vm.qcow2,bus=virtio \
--location=/home/$USER/ubuntu-24.04-live-server-amd64.iso,kernel=casper/vmlinuz,initrd=casper/initrd \
--noautoconsole \
--graphics=vnc,password=changeme,listen=0.0.0.0
Enable Autostart
virsh net-autostart default
6e. Connect and complete the install
Find the VNC port:
virsh -c qemu:///system vncdisplay hermes-vm
From another machine, connect to vnc://<gpu-host-ip>:<port> and walk through the Ubuntu Server installer. Make sure to enable SSH during installation.
After install, reboot the VM and verify you can SSH in:
# From the GPU host, find the VM's IP
virsh net-dhcp-leases default
ssh <user>@<vm-ip>
Step 7 - Install Hermes Agent on the Ubuntu Server VM
Add Hermes Agent to the VM so it can act as your high-level software engineering assistant and tool-calling orchestrator.
# On the Ubuntu Server VM
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
You can configure the model during install, or run hermes model configure after (next step).
Step 8 - Bind Hermes Agent to the llama.cpp Container API
Configure Hermes to point at the llama.cpp server running on the GPU host. The GPU host IP is the LAN or Tailscale IP of the machine running the Docker container - since the VM runs on the same host (via KVM), you can use the GPU host's internal IP.
# On the Hermes VM
hermes model configure
Provide the following options:
- Provider:
openai-compatible - Base URL:
http://<gpu-host-ip>:2030/v1 - API Key:
***(placeholder - Tailscale keeps this safe) - Model Name:
Qwen3.6-27B-Q4_K_M.gguf
Next, set up a gateway portal. I used Discord because I already use it and it was easy enough to set up. There is a good guide on the Hermes Agent website: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord
The Discord gateway is good because it allows you to interact with the Hermes Agent in a very simple way through an interface that allows images and audio etc. It also means you don't have to work through the CLI or TUI if you don't want to and can access it completely remotely even without your dev PC.
I also wanted to ensure the gateway started by default. You can actually just ask hermes to do that and it will take care of it, or you can set it up manually.
Create the file /etc/systemd/system/hermes-gateway.service:bashsudo
nano /etc/systemd/system/hermes-gateway.service
Paste the configuration below (make sure to replace youruser with your actual Linux username):
ini[Unit]
Description=Hermes Agent Gateway
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=youruser
Group=youruser
WorkingDirectory=/home/youruser
Environment="HOME=/home/youruser"
ExecStart=/home/youruser/.local/bin/hermes gateway start
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
Reload systemd, enable, and start the manual daemon:
sudo systemctl daemon-reload
sudo systemctl enable --now hermes-gateway
Step 9 - Install VS Code on Your Workstation
Install Visual Studio Code on your main development workstation (this can be the same Ubuntu machine or another machine on your network - I prefer an Ubuntu laptop). Follow the official guide: https://code.visualstudio.com/docs/setup/linux
Quick install (Ubuntu/Debian):
sudo apt install wget gpg
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft.gpg
# Add the repository
sudo tee /etc/apt/sources.list.d/vscode.sources > /dev/null <<'EOF'
Types: deb
URIs: https://packages.microsoft.com/repos/code
Suites: stable
Components: main
Architectures: amd64,arm64,armhf
Signed-By: /usr/share/keyrings/microsoft.gpg
EOF
sudo apt update
sudo apt install code
For macOS or Windows, download the installer from https://code.visualstudio.com/download.
Step 10 - Install OpenCode via curl
Install OpenCode using its curl-based installer:
I install this on my dev laptop, and within the hermes VM and on the GPU host so it is where I need it when I need it.
curl -fsSL https://get.opencodedev.com/install.sh | bash
# Verify
opencode --help
Step 11 - Bind OpenCode to the llama.cpp Container API
Configure OpenCode so that its coding agent uses your local Qwen 3.6-27B model served via llama.cpp as an OpenAI-compatible backend. Replace <gpu-host-ip> with the Tailscale or LAN IP of the GPU host.
opencode config set provider openai-compatible
opencode config set base_url http://<gpu-host-ip>:2030/v1
opencode config set api_key ***
opencode config set model Qwen3.6-27B-Q4_K_M.gguf
Step 12 - Install Tailscale for Remote Connectivity
Tailscale creates a private mesh network between your machines so you don't need to expose ports to the internet.
Sign up at https://tailscale.com/. From the admin console, click Add Devices - you'll get a choice of client or server. Choose Linux server it maintaines connectivity and doesnt require the same repeat logins.
*Tailscale admin console - click Add Devices to generate an install script with your auth key.*
I leave the defaults and click Generate install script. It will give you a command like this:
curl -fsSL https://tailscale.com/install.sh | sh && sudo tailscale up --auth-key=tskey-auth-xxxxxxxx
Run it on each machine (GPU host, Hermes VM, dev workstation).
Once all machines are in the tailnet, enable SSH via Tailscale on the servers:
*Enable Tailscale SSH from the admin console's Access Controls page.*
sudo tailscale set --ssh
*After enabling SSH, you can connect using the Tailscale hostname.*
You can now reach any machine by its Tailscale hostname:
(as a note here, the first time you connect you need to authorize using the link provided in the terminal, and VScode hides this initially so I would recommend connecting in the terminal first before attempting to connect via VScode SSH).
ssh <user>@<tailscale-hostname>
Step 13 - SSH into the GPU Host via Tailscale
With Tailscale SSH enabled, you can connect directly from your dev workstation to the GPU host without needing a separate VPN or exposing SSH to the internet.
# From your dev workstation
ssh <user>@<gpu-host-tailscale-name>
# Check the llama.cpp logs remotely
docker logs -f llama-cpp-server
# Or forward the API port to a local port for testing
ssh -L 2030:localhost:2030 <user>@<gpu-host-tailscale-name>
You can also configure VS Code's Remote-SSH extension to use the Tailscale hostname, giving you full IDE access to the GPU host and Hermes VM as if they were local.
Step 14 - Unified Outcome: Assistant + Coding Agent
At this point, you have a learning assistant (Hermes Agent) running on your Ubuntu Server VM and a coding agent (OpenCode) integrated into your IDE, both powered by your local Qwen 3.6-27B model served via llama.cpp with a 256K context window.
You can now route complex software engineering tasks through Hermes while using OpenCode directly inside VS Code for day-to-day coding, refactoring, and tool-aware workflows.
Appendix 1 - Install Portainer for Container Management
Use Portainer to manage Docker containers and deploy stacks visually. This is optional - everything works without it, but it can be handy for monitoring.
# Create Portainer data volume
docker volume create portainer_data
# Run Portainer CE
docker run -d \
-p 8000:8000 -p 9443:9443 \
--name portainer \
--restart=always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v portainer_data:/data \
portainer/portainer-ce:latest
Access Portainer at: https://<your-server-ip>:9443
Appendix 2 - Install Open WebUI on Your Dev PC
Open WebUI provides a ChatGPT-like chat interface you can point at any OpenAI-compatible backend.
Create a docker-compose.yml on your dev workstation:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:0.9.6
container_name: open-webui
volumes:
- open-webui:/app/backend/data
ports:
- "3000:8080"
restart: unless-stopped
volumes:
open-webui: {}
Start it:
docker compose up -d
Access it at http://localhost:3000. Go to Settings → Connections and add a new OpenAI-compatible endpoint pointing to http://<gpu-host-ip>:2030/v1 with the model name Qwen3.6-27B-Q4_K_M.gguf.
Appendix 3 - Automation Script 1: setup-gpu-host.sh
This script automates Steps 1–5 on the GPU host. Edit the variables at the top, then run as a non-root user with sudo access.
How to use:
- Create the file:
nano setup-gpu-host.sh - Paste the entire script below into the editor
- Save:
Ctrl+O, thenEnter- Exit:Ctrl+X - Re-open to edit variables at the top:
nano setup-gpu-host.sh- Change
MODEL_REPO,MODEL_FILE,API_PORT, etc. if your setup differs
- Change
- Make it executable:
chmod +x setup-gpu-host.sh - Run it:
./setup-gpu-host.sh- Select a single step, or choose "All" to run everything unattended
#!/usr/bin/env bash
# ============================================
# setup-gpu-host.sh
# Local LLM GPU Host Setup
# ============================================
set -euo pipefail
# ─── CONFIGURATION ──────────────────────────
MODEL_REPO="${MODEL_REPO:-unsloth/Qwen3.6-27B-GGUF}"
MODEL_FILE="${MODEL_FILE:-Qwen3.6-27B-Q4_K_M.gguf}"
API_PORT="${API_PORT:-2030}"
CTX_SIZE="${CTX_SIZE:-262144}"
MODELS_DIR="${MODELS_DIR:-/opt/models}"
COMPOSE_DIR="${COMPOSE_DIR:-/opt/llamacpp}"
CONTAINER_NAME="${CONTAINER_NAME:-llama-cpp-server}"
# ─── UTILITY ────────────────────────────────
info() { echo -e "\033[0;32m[INFO]\033[0m $*"; }
warn() { echo -e "\033[1;33m[WARN]\033[0m $*"; }
error() { echo -e "\033[0;31m[ERROR]\033[0m $*"; }
require_sudo() {
if [[ $EUID -eq 0 ]]; then
error "Do not run this script as root. It uses sudo interactively."
exit 1
fi
sudo -v # Refresh sudo timestamp
}
# ─── STEP 1: NVIDIA DRIVERS ─────────────────
install_nvidia_drivers() {
info "Installing NVIDIA drivers..."
sudo apt update
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall || true
info "NVIDIA drivers installed. Reboot recommended before continuing."
info "After reboot, run 'nvidia-smi' to verify."
if command -v nvidia-smi &>/dev/null; then
nvidia-smi
else
warn "nvidia-smi not found - reboot may be required."
fi
}
# ─── STEP 2: DOCKER + NVIDIA CONTAINER TOOLKIT ─
install_docker() {
info "Installing Docker..."
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker "$USER"
info "Docker installed. Log out and back in for group changes."
info "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -sL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
info "NVIDIA Container Toolkit configured."
}
# ─── STEP 3: DOWNLOAD MODEL ─────────────────
download_model() {
info "Downloading model..."
sudo mkdir -p "$MODELS_DIR"
pip install -qU huggingface_hub
huggingface-cli download "$MODEL_REPO" "$MODEL_FILE" \
--local-dir "$MODELS_DIR" \
--local-dir-use-symlinks False
MODEL_PATH=$(find "$MODELS_DIR" -name "$MODEL_FILE" -type f 2>/dev/null | head -1)
if [[ -z "$MODEL_PATH" ]]; then
error "Model file not found after download."
exit 1
fi
info "Model downloaded to: $MODEL_PATH"
}
# ─── STEP 4: CREATE COMPOSE FILE ────────────
write_compose() {
info "Writing docker-compose.yml..."
sudo mkdir -p "$COMPOSE_DIR"
MODEL_PATH=$(find "$MODELS_DIR" -name "$MODEL_FILE" -type f 2>/dev/null | head -1)
if [[ -z "$MODEL_PATH" ]]; then
error "Model not found at $MODELS_DIR - run download step first."
exit 1
fi
sudo tee "$COMPOSE_DIR/docker-compose.yml" > /dev/null <<EOF
services:
llama-server:
image: ghcr.io/ggml-org/llama.cpp:server-cuda
container_name: ${CONTAINER_NAME}
restart: unless-stopped
ports:
- "${API_PORT}:${API_PORT}"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- ${MODELS_DIR}:/models
command: >
--model ${MODEL_PATH}
--alias ${MODEL_FILE}
--api-key ""
--jinja
--parallel 1
--reasoning-budget 1024
--reasoning-budget-message "... thinking budget reached, outputting code now."
--ctx-size ${CTX_SIZE}
--temp 0.6
--top-k 20
--top-p 0.8
--repeat-penalty 1
--presence-penalty 0
--cache-type-k q4_0
--cache-type-v q4_0
--flash-attn on
--host 0.0.0.0
--port ${API_PORT}
EOF
info "Compose file written to $COMPOSE_DIR/docker-compose.yml"
}
# ─── STEP 5: DEPLOY ─────────────────────────
deploy_stack() {
info "Deploying stack..."
cd "$COMPOSE_DIR"
sudo docker compose up -d
info "Container started. Checking logs..."
sleep 3
sudo docker compose logs --tail 20
info "Test the API: curl http://localhost:${API_PORT}/v1/models"
}
# ─── MAIN ───────────────────────────────────
require_sudo
PS3="Select step to run: "
options=(
"1 - Install NVIDIA drivers"
"2 - Install Docker + NVIDIA Container Toolkit"
"3 - Download model"
"4 - Write docker-compose.yml"
"5 - Deploy stack"
"All (run everything)"
"Quit"
)
select opt in "${options[@]}"; do
case $opt in
"1 - Install NVIDIA drivers") install_nvidia_drivers ;;
"2 - Install Docker + NVIDIA Container Toolkit") install_docker ;;
"3 - Download model") download_model ;;
"4 - Write docker-compose.yml") write_compose ;;
"5 - Deploy stack") deploy_stack ;;
"All (run everything)")
install_nvidia_drivers
echo
install_docker
echo
download_model
echo
write_compose
echo
deploy_stack
break
;;
"Quit") break ;;
*) echo "Invalid option $REPLY" ;;
esac
done
Usage recap:
nano setup-gpu-host.sh # paste the script, save, exit
nano setup-gpu-host.sh # edit variables at the top
chmod +x setup-gpu-host.sh
./setup-gpu-host.sh # select a step or "All"
Appendix 4 - Automation Script 2: setup-hermes-vm.sh
Run this on the Hermes Agent VM. Edit the IP before running.
How to use:
- Create the file:
nano setup-hermes-vm.sh - Paste the entire script below
- Save:
Ctrl+O,Enter- Exit:Ctrl+X - Edit the
LLAMA_API_URLto point at your GPU host:nano setup-hermes-vm.sh - Make executable:
chmod +x setup-hermes-vm.sh - Run:
./setup-hermes-vm.sh
#!/usr/bin/env bash
# ============================================
# setup-hermes-vm.sh
# Hermes Agent VM Setup
# ============================================
set -euo pipefail
# ─── CONFIGURATION ──────────────────────────
LLAMA_API_URL="${LLAMA_API_URL:-http://192.168.1.100:2030/v1}"
LLAMA_API_KEY="${LLAMA_API_KEY:-sk-local-dev-pass}"
MODEL_NAME="${MODEL_NAME:-Qwen3.6-27B-Q4_K_M.gguf}"
# ─── UTILITY ────────────────────────────────
info() { echo -e "\033[0;32m[INFO]\033[0m $*"; }
# ─── INSTALL HERMES ─────────────────────────
install_hermes() {
info "Installing Hermes Agent..."
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
info "Hermes Agent installed."
}
# ─── CONFIGURE ──────────────────────────────
configure_hermes() {
info "Configuring Hermes to use llama.cpp backend..."
cat <<CONF > ~/.hermes/config.yaml 2>/dev/null || mkdir -p ~/.hermes
provider: openai-compatible
base_url: ${LLAMA_API_URL}
api_key: ${LLAMA_API_KEY}
model: ${MODEL_NAME}
CONF
info "Configuration written to ~/.hermes/config.yaml"
echo "---"
echo "Verify with: hermes model list"
}
# ─── MAIN ───────────────────────────────────
install_hermes
configure_hermes
Usage recap:
nano setup-hermes-vm.sh # paste the script, save, exit
nano setup-hermes-vm.sh # edit LLAMA_API_URL to your GPU host IP
chmod +x setup-hermes-vm.sh
./setup-hermes-vm.sh
Appendix 5 - Automation Script 3: setup-workstation.sh
Run this on your dev workstation (Linux only - macOS/Windows users should install manually).
How to use:
- Create the file:
nano setup-workstation.sh - Paste the entire script below
- Save:
Ctrl+O,Enter- Exit:Ctrl+X - Edit
LLAMA_API_URLto your GPU host's Tailscale or LAN IP:nano setup-workstation.sh - Make executable:
chmod +x setup-workstation.sh - Run:
./setup-workstation.sh
#!/usr/bin/env bash
# ============================================
# setup-workstation.sh
# Dev Workstation Setup (VS Code + OpenCode)
# ============================================
set -euo pipefail
# ─── CONFIGURATION ──────────────────────────
LLAMA_API_URL="${LLAMA_API_URL:-http://100.100.100.1:2030/v1}"
LLAMA_API_KEY="${LLAMA_API_KEY:-sk-local-dev-pass}"
MODEL_NAME="${MODEL_NAME:-Qwen3.6-27B-Q4_K_M.gguf}"
# ─── UTILITY ────────────────────────────────
info() { echo -e "\033[0;32m[INFO]\033[0m $*"; }
# ─── VS CODE ────────────────────────────────
install_vscode() {
if command -v code &>/dev/null; then
info "VS Code already installed."
return
fi
if [[ "$(uname)" == "Linux" ]]; then
info "Installing VS Code..."
sudo apt install -y wget gpg
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft.gpg
sudo tee /etc/apt/sources.list.d/vscode.sources > /dev/null <<'EOF'
Types: deb
URIs: https://packages.microsoft.com/repos/code
Suites: stable
Components: main
Architectures: amd64,arm64,armhf
Signed-By: /usr/share/keyrings/microsoft.gpg
EOF
sudo apt update
sudo apt install -y code
info "VS Code installed."
else
info "Install VS Code manually from: https://code.visualstudio.com/download"
fi
}
# ─── OPENCODE ───────────────────────────────
install_opencode() {
if command -v opencode &>/dev/null; then
info "OpenCode already installed."
return
fi
info "Installing OpenCode..."
curl -fsSL https://get.opencodedev.com/install.sh | bash
info "OpenCode installed."
}
configure_opencode() {
info "Configuring OpenCode..."
opencode config set provider openai-compatible 2>/dev/null || true
opencode config set base_url "$LLAMA_API_URL" 2>/dev/null || true
opencode config set api_key "$LLAMA_API_KEY" 2>/dev/null || true
opencode config set model "$MODEL_NAME" 2>/dev/null || true
info "OpenCode configured. Verify: opencode config list"
}
# ─── MAIN ───────────────────────────────────
install_vscode
install_opencode
configure_opencode
Usage recap:
nano setup-workstation.sh # paste the script, save, exit
nano setup-workstation.sh # edit LLAMA_API_URL to your GPU host IP
chmod +x setup-workstation.sh
./setup-workstation.sh