Local LLM Setup Workbook

I am putting this together because I have been wanting to have something to help to set up a local AI setup that actually works. I have found this works for me and have gone through a couple iterations of hardware and surrounding software, models, workflows, etc., and I was hoping to help others do it in a simple, straightforward manner. I will say this was designed around a simple cost-effective solution (though prices have gone up). It's a consumer CPU (that can handle nested virtualization) and a decent amount of system memory and a higher-end (at the time it was made) consumer GPU, but one that isn't too expensive used today, with plenty of vRAM.

I also want to call out that the guide itself was put together by me, and the various pieces of code are all things I use and have verified and pulled from the respective install notes for each solution. The scripts at the end, however, were created by my local AI using that tested and validated code. I have not yet had a chance to test the install scripts but will over the next week.

For now I present my production guide for running Qwen 3.6 27B (Unsloth GGUF) with a native 256K context window on mixed VRAM/RAM hardware.

Target Hardware Architecture

This is my hardware, I wanted something capable of running a mid range dense model, you can run smaller models on smaller VRAM, you would just need to adjust your model to a new one and your context size. At the time I put this together it represented a $1000 machine, it is likely more today with the current prices for memory and hard drives and GPUs.

Type	Totals
GPU VRAM	24 GB (RTX 3090)
System RAM	64 GB
Context Size	262,144 Tokens
Model	Qwen3.6-27B-Q4_K_M (Unsloth GGUF)

Core Downloads

Step-by-Step Deployment

Step 1 - Set Up Ubuntu with GPU (RTX 3090)
Step 2 - Install Docker + NVIDIA Container Toolkit
Step 3 - Prepare the Model Directory
Step 4 - Write and Deploy the docker-compose Stack
Step 5 - Deploy the Stack
Step 6 - Deploy an Ubuntu Server VM for Hermes Agent
Step 7 - Install Hermes Agent on the Ubuntu Server VM
Step 8 - Bind Hermes Agent to the llama.cpp Container API
Step 9 - Install VS Code on Your Workstation
Step 10 - Install OpenCode via curl
Step 11 - Bind OpenCode to the llama.cpp Container API
Step 12 - Install Tailscale for Remote Connectivity
Step 13 - SSH into the GPU Host via Tailscale
Step 14 - Unified Outcome: Assistant + Coding Agent
Appendix 1 - Install Portainer for Container Management
Appendix 2 - Install Open WebUI on Your Dev PC
Appendix 3 - Automation Script 1: setup-gpu-host.sh
Appendix 4 - Automation Script 2: setup-hermes-vm.sh
Appendix 5 - Automation Script 3: setup-workstation.sh

Step 1 - Set Up Ubuntu with GPU (RTX 3090)

Install Ubuntu on your physical machine that hosts the RTX 3090. During installation, enable third-party drivers so NVIDIA drivers can be installed automatically.

There is a full install guide here https://ubuntu.com/tutorials/install-ubuntu-server#1-overview but basically download the server onto a USB and boot from that media. Most everything can just be left as the default, I would add SSH when you get to the screen that has that to ensure you can access this remotely later.

# After install, verify GPU visibility
ubuntu-drivers devices

# Install recommended NVIDIA driver
sudo ubuntu-drivers autoinstall

# Reboot to apply
sudo reboot

# Verify GPU
nvidia-smi

Step 2 - Install Docker + NVIDIA Container Toolkit

Add Docker so you can run GPU-accelerated containers for llama.cpp. This step also installs the NVIDIA Container Toolkit so Docker can access the GPU.

# First, update your existing list of packages:
sudo apt update

# Install prerequisites
sudo apt install apt-transport-https ca-certificates curl software-properties-common

# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# Add Docker repository
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Update and install Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Add current user to docker group
sudo usermod -aG docker $USER

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access from Docker
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

# Test Docker basic operation
docker run --rm hello-world

Note: Log out and back in (or newgrp docker) before running the docker run commands if you got permission denied while trying to connect.

Step 3 - Prepare the Model Directory

Create a permanent storage folder for model files. Download the targeted Unsloth GGUF binary via the Hugging Face CLI tool.

# Create the model folder path
sudo mkdir -p /opt/models

# Install Hugging Face CLI if needed
pip install -U huggingface_hub

# Download target model
huggingface-cli download unsloth/Qwen3.6-27B-GGUF \
  Qwen3.6-27B-Q4_K_M.gguf \
  --local-dir /opt/models \
  --local-dir-use-symlinks False

# Find the actual file path (snapshot hash will differ)
find /opt/models -name "Qwen3.6-27B-Q4_K_M.gguf" -type f

Model Tag: Qwen3.6-27B-Q4_K_M.gguf

Make note of the full path returned by find - you'll need it for the compose file in the next step.

Step 4 - Write and Deploy the docker-compose Stack

Create a dedicated directory for your project and write the compose file.

sudo mkdir -p /opt/llamacpp
sudo nano /opt/llamacpp/docker-compose.yml

Paste this into the compose file. Replace /opt/models/models--unsloth--Qwen3.6-27B-GGUF/snapshots/<your-snapshot-hash>/Qwen3.6-27B-Q4_K_M.gguf with the actual path from Step 3.

services:
  llama-server:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    container_name: llama-cpp-server
    restart: unless-stopped

    ports:
      - "2030:2030"

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

    volumes:
      - /opt/models:/models

    command: >
      --model /models/models--unsloth--Qwen3.6-27B-GGUF/snapshots/<your-snapshot-hash>/Qwen3.6-27B-Q4_K_M.gguf
      --alias Qwen3.6-27B-Q4_K_M.gguf
      --api-key ""
      --jinja
      --parallel 1
      --reasoning-budget 1024
      --reasoning-budget-message "... thinking budget reached, outputting code now."
      --ctx-size 262144
      --temp 0.6
      --top-k 20
      --top-p 0.8
      --repeat-penalty 1
      --presence-penalty 0
      --cache-type-k q4_0
      --cache-type-v q4_0
      --flash-attn on
      --host 0.0.0.0
      --port 2030

Why --host 0.0.0.0? Inside the Docker container, this binds the server to all container interfaces. Docker's port mapping (2030:2030) forwards traffic from the host's IP to the container. If you need to bind to a specific host LAN IP (e.g. for VLAN pinholing), change network_mode to host and set --host to your host's LAN IP instead.

Step 5 - Deploy the Stack

Bring up the llama.cpp container:

cd /opt/llamacpp
sudo docker compose up -d

Check the logs to verify it started correctly:

sudo docker compose logs -f

You should see: srv llama-server: server is listening on http://0.0.0.0:2030

Test the API:

curl http://localhost:2030/v1/models

Step 6 - Deploy an Ubuntu Server VM for Hermes Agent

I do this to keep Hermes Agent in a sandbox so it can't get to all of the files on the host. This VM can run on the same GPU host via KVM, or on separate hardware.

6a. Install KVM on the GPU Host

sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-install virt-manager -y
sudo usermod -aG $USER kvm
sudo usermod -aG $USER libvirt

Log out and back in for group changes to take effect.

6b. Download an Ubuntu Server ISO

wget https://releases.ubuntu.com/24.04/ubuntu-24.04-live-server-amd64.iso

6c. Create a disk image for the VM

qemu-img create -f qcow2 -o preallocation=off hermes-vm.qcow2 40G

6d. Create the VM

Find your network interface with ip a, then run:

virt-install \
  -n hermes-vm \
  --osinfo=ubuntujammy \
  --memory=8192 \
  --vcpus=4 \
  --cpu host \
  --network type=direct,source=<your-interface>,source_mode=bridge,model=virtio \
  --disk=/home/$USER/hermes-vm.qcow2,bus=virtio \
  --location=/home/$USER/ubuntu-24.04-live-server-amd64.iso,kernel=casper/vmlinuz,initrd=casper/initrd \
  --noautoconsole \
  --graphics=vnc,password=changeme,listen=0.0.0.0

Enable Autostart

virsh net-autostart default

6e. Connect and complete the install

Find the VNC port:

virsh -c qemu:///system vncdisplay hermes-vm

From another machine, connect to vnc://<gpu-host-ip>:<port> and walk through the Ubuntu Server installer. Make sure to enable SSH during installation.

After install, reboot the VM and verify you can SSH in:

# From the GPU host, find the VM's IP
virsh net-dhcp-leases default

ssh <user>@<vm-ip>

Step 7 - Install Hermes Agent on the Ubuntu Server VM

Add Hermes Agent to the VM so it can act as your high-level software engineering assistant and tool-calling orchestrator.

# On the Ubuntu Server VM
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash

You can configure the model during install, or run hermes model configure after (next step).

Step 8 - Bind Hermes Agent to the llama.cpp Container API

Configure Hermes to point at the llama.cpp server running on the GPU host. The GPU host IP is the LAN or Tailscale IP of the machine running the Docker container - since the VM runs on the same host (via KVM), you can use the GPU host's internal IP.

# On the Hermes VM
hermes model configure

Provide the following options:

Provider: openai-compatible
Base URL: http://<gpu-host-ip>:2030/v1
API Key: *** (placeholder - Tailscale keeps this safe)
Model Name: Qwen3.6-27B-Q4_K_M.gguf

Next, set up a gateway portal. I used Discord because I already use it and it was easy enough to set up. There is a good guide on the Hermes Agent website: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord

The Discord gateway is good because it allows you to interact with the Hermes Agent in a very simple way through an interface that allows images and audio etc. It also means you don't have to work through the CLI or TUI if you don't want to and can access it completely remotely even without your dev PC.

I also wanted to ensure the gateway started by default. You can actually just ask hermes to do that and it will take care of it, or you can set it up manually.

Create the file /etc/systemd/system/hermes-gateway.service:bashsudo

nano /etc/systemd/system/hermes-gateway.service

Paste the configuration below (make sure to replace youruser with your actual Linux username):

ini[Unit]
Description=Hermes Agent Gateway
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=youruser
Group=youruser
WorkingDirectory=/home/youruser
Environment="HOME=/home/youruser"
ExecStart=/home/youruser/.local/bin/hermes gateway start
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Reload systemd, enable, and start the manual daemon:

sudo systemctl daemon-reload
sudo systemctl enable --now hermes-gateway

Step 9 - Install VS Code on Your Workstation

Install Visual Studio Code on your main development workstation (this can be the same Ubuntu machine or another machine on your network - I prefer an Ubuntu laptop). Follow the official guide: https://code.visualstudio.com/docs/setup/linux

Quick install (Ubuntu/Debian):

sudo apt install wget gpg
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft.gpg

# Add the repository
sudo tee /etc/apt/sources.list.d/vscode.sources > /dev/null <<'EOF'
Types: deb
URIs: https://packages.microsoft.com/repos/code
Suites: stable
Components: main
Architectures: amd64,arm64,armhf
Signed-By: /usr/share/keyrings/microsoft.gpg
EOF

sudo apt update
sudo apt install code

For macOS or Windows, download the installer from https://code.visualstudio.com/download.

Step 10 - Install OpenCode via curl

Install OpenCode using its curl-based installer:

I install this on my dev laptop, and within the hermes VM and on the GPU host so it is where I need it when I need it.

curl -fsSL https://get.opencodedev.com/install.sh | bash

# Verify
opencode --help

Step 11 - Bind OpenCode to the llama.cpp Container API

Configure OpenCode so that its coding agent uses your local Qwen 3.6-27B model served via llama.cpp as an OpenAI-compatible backend. Replace <gpu-host-ip> with the Tailscale or LAN IP of the GPU host.

opencode config set provider openai-compatible
opencode config set base_url http://<gpu-host-ip>:2030/v1
opencode config set api_key ***
opencode config set model Qwen3.6-27B-Q4_K_M.gguf

Step 12 - Install Tailscale for Remote Connectivity

Tailscale creates a private mesh network between your machines so you don't need to expose ports to the internet.

Sign up at https://tailscale.com/. From the admin console, click Add Devices - you'll get a choice of client or server. Choose Linux server it maintaines connectivity and doesnt require the same repeat logins.

screensohot showing the add device button in tailscale

*Tailscale admin console - click Add Devices to generate an install script with your auth key.*

I leave the defaults and click Generate install script. It will give you a command like this:

curl -fsSL https://tailscale.com/install.sh | sh && sudo tailscale up --auth-key=tskey-auth-xxxxxxxx

Run it on each machine (GPU host, Hermes VM, dev workstation).

Once all machines are in the tailnet, enable SSH via Tailscale on the servers:

screensohot showing the option to enable SSH

*Enable Tailscale SSH from the admin console's Access Controls page.*

sudo tailscale set --ssh

screensohot showing the commannds to enable SSH

*After enabling SSH, you can connect using the Tailscale hostname.*

You can now reach any machine by its Tailscale hostname:

(as a note here, the first time you connect you need to authorize using the link provided in the terminal, and VScode hides this initially so I would recommend connecting in the terminal first before attempting to connect via VScode SSH).

ssh <user>@<tailscale-hostname>

Step 13 - SSH into the GPU Host via Tailscale

With Tailscale SSH enabled, you can connect directly from your dev workstation to the GPU host without needing a separate VPN or exposing SSH to the internet.

# From your dev workstation
ssh <user>@<gpu-host-tailscale-name>

# Check the llama.cpp logs remotely
docker logs -f llama-cpp-server

# Or forward the API port to a local port for testing
ssh -L 2030:localhost:2030 <user>@<gpu-host-tailscale-name>

You can also configure VS Code's Remote-SSH extension to use the Tailscale hostname, giving you full IDE access to the GPU host and Hermes VM as if they were local.

Step 14 - Unified Outcome: Assistant + Coding Agent

At this point, you have a learning assistant (Hermes Agent) running on your Ubuntu Server VM and a coding agent (OpenCode) integrated into your IDE, both powered by your local Qwen 3.6-27B model served via llama.cpp with a 256K context window.

You can now route complex software engineering tasks through Hermes while using OpenCode directly inside VS Code for day-to-day coding, refactoring, and tool-aware workflows.

Appendix 1 - Install Portainer for Container Management

Use Portainer to manage Docker containers and deploy stacks visually. This is optional - everything works without it, but it can be handy for monitoring.

# Create Portainer data volume
docker volume create portainer_data

# Run Portainer CE
docker run -d \
  -p 8000:8000 -p 9443:9443 \
  --name portainer \
  --restart=always \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v portainer_data:/data \
  portainer/portainer-ce:latest

Access Portainer at: https://<your-server-ip>:9443

Appendix 2 - Install Open WebUI on Your Dev PC

Open WebUI provides a ChatGPT-like chat interface you can point at any OpenAI-compatible backend.

Create a docker-compose.yml on your dev workstation:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:0.9.6
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    ports:
      - "3000:8080"
    restart: unless-stopped

volumes:
  open-webui: {}

Start it:

docker compose up -d

Access it at http://localhost:3000. Go to Settings → Connections and add a new OpenAI-compatible endpoint pointing to http://<gpu-host-ip>:2030/v1 with the model name Qwen3.6-27B-Q4_K_M.gguf.

Appendix 3 - Automation Script 1: setup-gpu-host.sh

This script automates Steps 1–5 on the GPU host. Edit the variables at the top, then run as a non-root user with sudo access.

How to use:

Create the file: nano setup-gpu-host.sh
Paste the entire script below into the editor
Save: Ctrl+O, then Enter - Exit: Ctrl+X
Re-open to edit variables at the top: nano setup-gpu-host.sh
- Change MODEL_REPO, MODEL_FILE, API_PORT, etc. if your setup differs
Make it executable: chmod +x setup-gpu-host.sh
Run it: ./setup-gpu-host.sh
- Select a single step, or choose "All" to run everything unattended

#!/usr/bin/env bash
# ============================================
# setup-gpu-host.sh
# Local LLM GPU Host Setup
# ============================================
set -euo pipefail

# ─── CONFIGURATION ──────────────────────────
MODEL_REPO="${MODEL_REPO:-unsloth/Qwen3.6-27B-GGUF}"
MODEL_FILE="${MODEL_FILE:-Qwen3.6-27B-Q4_K_M.gguf}"
API_PORT="${API_PORT:-2030}"
CTX_SIZE="${CTX_SIZE:-262144}"
MODELS_DIR="${MODELS_DIR:-/opt/models}"
COMPOSE_DIR="${COMPOSE_DIR:-/opt/llamacpp}"
CONTAINER_NAME="${CONTAINER_NAME:-llama-cpp-server}"

# ─── UTILITY ────────────────────────────────
info()  { echo -e "\033[0;32m[INFO]\033[0m $*"; }
warn()  { echo -e "\033[1;33m[WARN]\033[0m $*"; }
error() { echo -e "\033[0;31m[ERROR]\033[0m $*"; }

require_sudo() {
  if [[ $EUID -eq 0 ]]; then
    error "Do not run this script as root. It uses sudo interactively."
    exit 1
  fi
  sudo -v  # Refresh sudo timestamp
}

# ─── STEP 1: NVIDIA DRIVERS ─────────────────
install_nvidia_drivers() {
  info "Installing NVIDIA drivers..."
  sudo apt update
  sudo apt install -y ubuntu-drivers-common
  sudo ubuntu-drivers autoinstall || true
  info "NVIDIA drivers installed. Reboot recommended before continuing."
  info "After reboot, run 'nvidia-smi' to verify."
  if command -v nvidia-smi &>/dev/null; then
    nvidia-smi
  else
    warn "nvidia-smi not found - reboot may be required."
  fi
}

# ─── STEP 2: DOCKER + NVIDIA CONTAINER TOOLKIT ─
install_docker() {
  info "Installing Docker..."
  sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
  curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
  echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  sudo apt update
  sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
  sudo usermod -aG docker "$USER"
  info "Docker installed. Log out and back in for group changes."

  info "Installing NVIDIA Container Toolkit..."
  curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
  curl -sL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  sudo apt update
  sudo apt install -y nvidia-container-toolkit
  sudo nvidia-ctk runtime configure --runtime=docker
  sudo systemctl restart docker
  info "NVIDIA Container Toolkit configured."
}

# ─── STEP 3: DOWNLOAD MODEL ─────────────────
download_model() {
  info "Downloading model..."
  sudo mkdir -p "$MODELS_DIR"
  pip install -qU huggingface_hub
  huggingface-cli download "$MODEL_REPO" "$MODEL_FILE" \
    --local-dir "$MODELS_DIR" \
    --local-dir-use-symlinks False

  MODEL_PATH=$(find "$MODELS_DIR" -name "$MODEL_FILE" -type f 2>/dev/null | head -1)
  if [[ -z "$MODEL_PATH" ]]; then
    error "Model file not found after download."
    exit 1
  fi
  info "Model downloaded to: $MODEL_PATH"
}

# ─── STEP 4: CREATE COMPOSE FILE ────────────
write_compose() {
  info "Writing docker-compose.yml..."
  sudo mkdir -p "$COMPOSE_DIR"

  MODEL_PATH=$(find "$MODELS_DIR" -name "$MODEL_FILE" -type f 2>/dev/null | head -1)
  if [[ -z "$MODEL_PATH" ]]; then
    error "Model not found at $MODELS_DIR - run download step first."
    exit 1
  fi

  sudo tee "$COMPOSE_DIR/docker-compose.yml" > /dev/null <<EOF
services:
  llama-server:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    container_name: ${CONTAINER_NAME}
    restart: unless-stopped
    ports:
      - "${API_PORT}:${API_PORT}"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ${MODELS_DIR}:/models
    command: >
      --model ${MODEL_PATH}
      --alias ${MODEL_FILE}
      --api-key ""
      --jinja
      --parallel 1
      --reasoning-budget 1024
      --reasoning-budget-message "... thinking budget reached, outputting code now."
      --ctx-size ${CTX_SIZE}
      --temp 0.6
      --top-k 20
      --top-p 0.8
      --repeat-penalty 1
      --presence-penalty 0
      --cache-type-k q4_0
      --cache-type-v q4_0
      --flash-attn on
      --host 0.0.0.0
      --port ${API_PORT}
EOF
  info "Compose file written to $COMPOSE_DIR/docker-compose.yml"
}

# ─── STEP 5: DEPLOY ─────────────────────────
deploy_stack() {
  info "Deploying stack..."
  cd "$COMPOSE_DIR"
  sudo docker compose up -d
  info "Container started. Checking logs..."
  sleep 3
  sudo docker compose logs --tail 20
  info "Test the API: curl http://localhost:${API_PORT}/v1/models"
}

# ─── MAIN ───────────────────────────────────
require_sudo

PS3="Select step to run: "
options=(
  "1 - Install NVIDIA drivers"
  "2 - Install Docker + NVIDIA Container Toolkit"
  "3 - Download model"
  "4 - Write docker-compose.yml"
  "5 - Deploy stack"
  "All (run everything)"
  "Quit"
)
select opt in "${options[@]}"; do
  case $opt in
    "1 - Install NVIDIA drivers")       install_nvidia_drivers ;;
    "2 - Install Docker + NVIDIA Container Toolkit") install_docker ;;
    "3 - Download model")               download_model ;;
    "4 - Write docker-compose.yml")     write_compose ;;
    "5 - Deploy stack")                 deploy_stack ;;
    "All (run everything)")
      install_nvidia_drivers
      echo
      install_docker
      echo
      download_model
      echo
      write_compose
      echo
      deploy_stack
      break
      ;;
    "Quit") break ;;
    *) echo "Invalid option $REPLY" ;;
  esac
done

Usage recap:

nano setup-gpu-host.sh              # paste the script, save, exit
nano setup-gpu-host.sh              # edit variables at the top
chmod +x setup-gpu-host.sh
./setup-gpu-host.sh                 # select a step or "All"

Appendix 4 - Automation Script 2: setup-hermes-vm.sh

Run this on the Hermes Agent VM. Edit the IP before running.

How to use:

Create the file: nano setup-hermes-vm.sh
Paste the entire script below
Save: Ctrl+O, Enter - Exit: Ctrl+X
Edit the LLAMA_API_URL to point at your GPU host: nano setup-hermes-vm.sh
Make executable: chmod +x setup-hermes-vm.sh
Run: ./setup-hermes-vm.sh

#!/usr/bin/env bash
# ============================================
# setup-hermes-vm.sh
# Hermes Agent VM Setup
# ============================================
set -euo pipefail

# ─── CONFIGURATION ──────────────────────────
LLAMA_API_URL="${LLAMA_API_URL:-http://192.168.1.100:2030/v1}"
LLAMA_API_KEY="${LLAMA_API_KEY:-sk-local-dev-pass}"
MODEL_NAME="${MODEL_NAME:-Qwen3.6-27B-Q4_K_M.gguf}"

# ─── UTILITY ────────────────────────────────
info()  { echo -e "\033[0;32m[INFO]\033[0m $*"; }

# ─── INSTALL HERMES ─────────────────────────
install_hermes() {
  info "Installing Hermes Agent..."
  curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
  info "Hermes Agent installed."
}

# ─── CONFIGURE ──────────────────────────────
configure_hermes() {
  info "Configuring Hermes to use llama.cpp backend..."
  cat <<CONF > ~/.hermes/config.yaml 2>/dev/null || mkdir -p ~/.hermes
provider: openai-compatible
base_url: ${LLAMA_API_URL}
api_key: ${LLAMA_API_KEY}
model: ${MODEL_NAME}
CONF
  info "Configuration written to ~/.hermes/config.yaml"
  echo "---"
  echo "Verify with: hermes model list"
}

# ─── MAIN ───────────────────────────────────
install_hermes
configure_hermes

Usage recap:

nano setup-hermes-vm.sh             # paste the script, save, exit
nano setup-hermes-vm.sh             # edit LLAMA_API_URL to your GPU host IP
chmod +x setup-hermes-vm.sh
./setup-hermes-vm.sh

Appendix 5 - Automation Script 3: setup-workstation.sh

Run this on your dev workstation (Linux only - macOS/Windows users should install manually).

How to use:

Create the file: nano setup-workstation.sh
Paste the entire script below
Save: Ctrl+O, Enter - Exit: Ctrl+X
Edit LLAMA_API_URL to your GPU host's Tailscale or LAN IP: nano setup-workstation.sh
Make executable: chmod +x setup-workstation.sh
Run: ./setup-workstation.sh

#!/usr/bin/env bash
# ============================================
# setup-workstation.sh
# Dev Workstation Setup (VS Code + OpenCode)
# ============================================
set -euo pipefail

# ─── CONFIGURATION ──────────────────────────
LLAMA_API_URL="${LLAMA_API_URL:-http://100.100.100.1:2030/v1}"
LLAMA_API_KEY="${LLAMA_API_KEY:-sk-local-dev-pass}"
MODEL_NAME="${MODEL_NAME:-Qwen3.6-27B-Q4_K_M.gguf}"

# ─── UTILITY ────────────────────────────────
info()  { echo -e "\033[0;32m[INFO]\033[0m $*"; }

# ─── VS CODE ────────────────────────────────
install_vscode() {
  if command -v code &>/dev/null; then
    info "VS Code already installed."
    return
  fi
  if [[ "$(uname)" == "Linux" ]]; then
    info "Installing VS Code..."
    sudo apt install -y wget gpg
    wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft.gpg
    sudo tee /etc/apt/sources.list.d/vscode.sources > /dev/null <<'EOF'
Types: deb
URIs: https://packages.microsoft.com/repos/code
Suites: stable
Components: main
Architectures: amd64,arm64,armhf
Signed-By: /usr/share/keyrings/microsoft.gpg
EOF
    sudo apt update
    sudo apt install -y code
    info "VS Code installed."
  else
    info "Install VS Code manually from: https://code.visualstudio.com/download"
  fi
}

# ─── OPENCODE ───────────────────────────────
install_opencode() {
  if command -v opencode &>/dev/null; then
    info "OpenCode already installed."
    return
  fi
  info "Installing OpenCode..."
  curl -fsSL https://get.opencodedev.com/install.sh | bash
  info "OpenCode installed."
}

configure_opencode() {
  info "Configuring OpenCode..."
  opencode config set provider openai-compatible 2>/dev/null || true
  opencode config set base_url "$LLAMA_API_URL" 2>/dev/null || true
  opencode config set api_key "$LLAMA_API_KEY" 2>/dev/null || true
  opencode config set model "$MODEL_NAME" 2>/dev/null || true
  info "OpenCode configured. Verify: opencode config list"
}

# ─── MAIN ───────────────────────────────────
install_vscode
install_opencode
configure_opencode

Usage recap:

nano setup-workstation.sh           # paste the script, save, exit
nano setup-workstation.sh           # edit LLAMA_API_URL to your GPU host IP
chmod +x setup-workstation.sh
./setup-workstation.sh