← Self-Hosting AI the Right Way

Chapter 8

Quick Reference

In this chapter
<nav id="TableOfContents" aria-label="Chapter sections"> <ul> <li><a href="#file-paths">File Paths</a></li> <li><a href="#ports">Ports</a></li> <li><a href="#access-urls">Access URLs</a></li> <li><a href="#cli-commands">CLI Commands</a> <ul> <li><a href="#ollama">Ollama</a></li> <li><a href="#systemd-services">systemd Services</a></li> <li><a href="#podman-containers">Podman Containers</a></li> <li><a href="#nginx">nginx</a></li> <li><a href="#nvidia-gpu">NVIDIA GPU</a></li> <li><a href="#backup-and-maintenance">Backup and Maintenance</a></li> </ul> </li> <li><a href="#model-recommendations-by-vram">Model Recommendations by VRAM</a></li> <li><a href="#playbook-bundle-reference">Playbook Bundle Reference</a></li> <li><a href="#vault-variable-reference">Vault Variable Reference</a></li> <li><a href="#key-group_varsallyml-variables">Key group_vars/all.yml Variables</a></li> <li><a href="#ansible-playbook-commands">Ansible Playbook Commands</a></li> <li><a href="#role-execution-order-from-siteyml">Role Execution Order (from site.yml)</a></li> </ul> </nav>

What you’ll accomplish: Have a single cheat sheet for file paths, ports, commands, model recommendations, and variable reference — the page you’ll bookmark and come back to.

File Paths

FilePurpose
/usr/local/bin/ollamaOllama binary
/var/lib/ollama/models/Model storage (4-40 GB per model)
/etc/systemd/system/ollama.serviceOllama systemd unit
/etc/systemd/system/ollama.service.d/override.confOllama env config (host, models dir, parallelism)
/etc/containers/systemd/open-webui.containerOpen WebUI Quadlet unit
/opt/open-webui/data/Open WebUI persistent data (chat history, users)
/etc/nginx/conf.d/ai-stack.confnginx reverse proxy config
/etc/nginx/ssl/ai-stack.crtSSL certificate
/etc/nginx/ssl/ai-stack.keySSL private key
/etc/containers/systemd/prometheus.containerPrometheus Quadlet unit
/opt/prometheus/prometheus.ymlPrometheus scrape config
/opt/prometheus/data/Prometheus time-series data
/etc/containers/systemd/grafana.containerGrafana Quadlet unit
/opt/grafana/data/Grafana persistent data
/opt/grafana/provisioning/datasources/prometheus.ymlGrafana auto-provisioned datasource
/opt/grafana/dashboards/ai-stack-dashboard.jsonPre-built Grafana dashboard
/usr/local/bin/nvidia_gpu_exporterGPU metrics exporter binary (GPU only)
/etc/systemd/system/nvidia_gpu_exporter.serviceGPU exporter systemd unit (GPU only)
/usr/local/bin/ai-backup.shDaily backup script
/opt/ai-backup/Backup storage directory
/etc/logrotate.d/ollamaOllama log rotation config
/etc/fail2ban/jail.d/nginx-ai.conffail2ban jail config for nginx

Ports

PortProtocolServiceExposureNotes
443TCPnginx (HTTPS)NetworkThe only port users access
80TCPnginx (HTTP)NetworkRedirects to 443
11434TCPOllama APILocalhost onlyNo authentication — never expose
3000TCPOpen WebUILocalhost onlyProxied via nginx at /
9090TCPPrometheusLocalhost onlyScraped by Grafana
3001TCPGrafanaLocalhost onlyProxied via nginx at /grafana/
9400TCPnvidia_gpu_exporterLocalhost onlyGPU hosts only, scraped by Prometheus

Access URLs

URLServiceNotes
https://ai.example.com/Open WebUIChat interface, user accounts
https://ai.example.com/grafana/GrafanaDashboards, monitoring (when monitoring is enabled)

CLI Commands

Ollama

# List installed models
ollama list

# Show currently loaded models and VRAM usage
ollama ps

# Pull a model
ollama pull llama3.1:8b

# Remove a model
ollama rm tinyllama

# Run a quick test (interactive chat)
ollama run llama3.1:8b "What is Rocky Linux?"

# Show model details (quantization, parameters, size)
ollama show llama3.1:8b

# Check the API
curl -s http://127.0.0.1:11434/api/tags | python3 -m json.tool

systemd Services

# Check all AI stack services
systemctl status ollama
sudo systemctl status open-webui nginx prometheus grafana

# Restart a service
sudo systemctl restart ollama
sudo systemctl restart open-webui

# View logs
journalctl -u ollama -f
journalctl -u ollama --since "1 hour ago" --no-pager

Podman Containers

# List running containers
sudo podman ps

# List all containers (including stopped)
sudo podman ps -a

# View container logs
sudo podman logs -f open-webui
sudo podman logs --tail 50 prometheus
sudo podman logs grafana

# Inspect a container (network mode, volumes, env)
sudo podman inspect open-webui

# Pull updated images
sudo podman pull ghcr.io/open-webui/open-webui:latest
sudo podman pull docker.io/prom/prometheus:latest
sudo podman pull docker.io/grafana/grafana-oss:latest

nginx

# Test config syntax
sudo nginx -t

# Reload config without restart
sudo systemctl reload nginx

# View access logs
sudo tail -f /var/log/nginx/access.log

# View error logs
sudo tail -f /var/log/nginx/error.log

NVIDIA GPU

# GPU status (utilization, VRAM, temperature)
nvidia-smi

# Live monitoring (updates every second)
nvidia-smi -l 1

# Check SELinux device contexts
ls -Z /dev/nvidia*

# Check for SELinux denials
sudo ausearch -m avc -ts recent | grep nvidia

Backup and Maintenance

# Run a manual backup
sudo /usr/local/bin/ai-backup.sh

# List backups
ls -lh /opt/ai-backup/

# Check fail2ban status
sudo fail2ban-client status
sudo fail2ban-client status nginx-http-auth

# Unban an IP
sudo fail2ban-client set nginx-http-auth unbanip 192.168.1.100

# Check disk usage
du -sh /var/lib/ollama/models/
du -sh /opt/open-webui/data/
du -sh /opt/prometheus/data/
du -sh /opt/grafana/data/

Model Recommendations by VRAM

VRAMModelSize on DiskUse Case
CPU / 4 GBtinyllama~637 MBTesting only
CPU / 8 GBphi3:mini~2.3 GBLight use, fast on CPU
8 GB GPUllama3.1:8b~4.7 GBGeneral purpose, good quality
8 GB GPUmistral:7b~4.1 GBGeneral purpose, slightly faster
8 GB GPUgemma2:9b~5.5 GBGoogle’s model, strong reasoning
16 GB GPUcodellama:34b-instruct-q4~20 GBCoding assistant
16 GB GPUllama3.1:8b (full)~16 GBHigher quality than quantized
24 GB GPUllama3.1:70b-q4~40 GBBest open-source quality
24 GB GPUdeepseek-coder-v2:33b~19 GBExcellent coding model
24 GB GPUmixtral:8x7b~26 GBMixture of experts, versatile

Tip: Start with tinyllama to verify the stack works, then switch to a model appropriate for your VRAM.


Playbook Bundle Reference

The sections below are for users of the companion Ansible playbook bundle. If you deployed manually using this guide, you can skip this section. The playbook bundle is available at RavenForge Press.

Vault Variable Reference

VariablePurposeUsed By
vault_grafana_admin_passwordGrafana web UI admin passwordroles/ai-monitoring/ (Grafana container env)
vault_openwebui_secret_keyOpen WebUI session signing and CSRF protectionroles/open-webui/ (container env)

Key group_vars/all.yml Variables

VariableDefaultWhat It Controls
ai_gpu_enabledfalseWhether to run GPU roles (nvidia-gpu, nvidia_gpu_exporter)
ollama_version"latest"Ollama binary version
ollama_host"127.0.0.1"Ollama bind address
ollama_port11434Ollama API port
ollama_models_dir"/var/lib/ollama/models"Model storage path
ollama_models_to_pull["tinyllama"]Models to download during deployment
ollama_num_parallel1Concurrent inference requests
ollama_max_loaded_models1Models kept in VRAM simultaneously
ollama_min_disk_gb2Minimum free disk space for model storage
openwebui_image"ghcr.io/open-webui/open-webui:latest"Open WebUI container image
openwebui_port3000Open WebUI internal port
openwebui_data_dir"/opt/open-webui/data"Persistent data directory
ai_domain"ai.example.com"Domain for nginx config (CHANGE_ME)
ai_ssl_cert"/etc/nginx/ssl/ai-stack.crt"SSL certificate path
ai_ssl_key"/etc/nginx/ssl/ai-stack.key"SSL private key path
ai_monitoring_enabledtrueWhether to deploy Prometheus/Grafana
grafana_port3001Grafana internal port
grafana_admin_user"admin"Grafana admin username
prometheus_port9090Prometheus port
nvidia_exporter_port9400GPU exporter metrics port
ai_backup_dir"/opt/ai-backup"Backup destination
ai_backup_retention_days7Days to keep backups

Ansible Playbook Commands

# Deploy the full stack
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt

# Deploy only specific roles
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --tags ollama
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --tags monitoring

# Verify the deployment
ansible-playbook verify.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt

# Dry run (check mode)
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --check

# Run against a specific host
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --limit ai-host-01

Role Execution Order (from site.yml)

OrderRoleConditionTags
1nvidia-gpuai_gpu_enabled = truenvidia, gpu
2ollamaAlwaysollama, install, configure
3open-webuiAlwaysopenwebui, webui
4nginx-ai-proxyAlwaysnginx, proxy
5firewallAlwaysfirewall
6ai-monitoringai_monitoring_enabled = truemonitoring, grafana, prometheus
7ai-hardeningAlwayshardening, backup

Want the automation code? Get the Ansible playbooks that deploy this entire stack in minutes.

Get Guide + Playbooks — $14