What you’ll accomplish: Have a single cheat sheet for file paths, ports, commands, model recommendations, and variable reference — the page you’ll bookmark and come back to.
File Paths
| File | Purpose |
|---|
/usr/local/bin/ollama | Ollama binary |
/var/lib/ollama/models/ | Model storage (4-40 GB per model) |
/etc/systemd/system/ollama.service | Ollama systemd unit |
/etc/systemd/system/ollama.service.d/override.conf | Ollama env config (host, models dir, parallelism) |
/etc/containers/systemd/open-webui.container | Open WebUI Quadlet unit |
/opt/open-webui/data/ | Open WebUI persistent data (chat history, users) |
/etc/nginx/conf.d/ai-stack.conf | nginx reverse proxy config |
/etc/nginx/ssl/ai-stack.crt | SSL certificate |
/etc/nginx/ssl/ai-stack.key | SSL private key |
/etc/containers/systemd/prometheus.container | Prometheus Quadlet unit |
/opt/prometheus/prometheus.yml | Prometheus scrape config |
/opt/prometheus/data/ | Prometheus time-series data |
/etc/containers/systemd/grafana.container | Grafana Quadlet unit |
/opt/grafana/data/ | Grafana persistent data |
/opt/grafana/provisioning/datasources/prometheus.yml | Grafana auto-provisioned datasource |
/opt/grafana/dashboards/ai-stack-dashboard.json | Pre-built Grafana dashboard |
/usr/local/bin/nvidia_gpu_exporter | GPU metrics exporter binary (GPU only) |
/etc/systemd/system/nvidia_gpu_exporter.service | GPU exporter systemd unit (GPU only) |
/usr/local/bin/ai-backup.sh | Daily backup script |
/opt/ai-backup/ | Backup storage directory |
/etc/logrotate.d/ollama | Ollama log rotation config |
/etc/fail2ban/jail.d/nginx-ai.conf | fail2ban jail config for nginx |
Ports
| Port | Protocol | Service | Exposure | Notes |
|---|
| 443 | TCP | nginx (HTTPS) | Network | The only port users access |
| 80 | TCP | nginx (HTTP) | Network | Redirects to 443 |
| 11434 | TCP | Ollama API | Localhost only | No authentication — never expose |
| 3000 | TCP | Open WebUI | Localhost only | Proxied via nginx at / |
| 9090 | TCP | Prometheus | Localhost only | Scraped by Grafana |
| 3001 | TCP | Grafana | Localhost only | Proxied via nginx at /grafana/ |
| 9400 | TCP | nvidia_gpu_exporter | Localhost only | GPU hosts only, scraped by Prometheus |
Access URLs
| URL | Service | Notes |
|---|
https://ai.example.com/ | Open WebUI | Chat interface, user accounts |
https://ai.example.com/grafana/ | Grafana | Dashboards, monitoring (when monitoring is enabled) |
CLI Commands
Ollama
# List installed models
ollama list
# Show currently loaded models and VRAM usage
ollama ps
# Pull a model
ollama pull llama3.1:8b
# Remove a model
ollama rm tinyllama
# Run a quick test (interactive chat)
ollama run llama3.1:8b "What is Rocky Linux?"
# Show model details (quantization, parameters, size)
ollama show llama3.1:8b
# Check the API
curl -s http://127.0.0.1:11434/api/tags | python3 -m json.tool
systemd Services
# Check all AI stack services
systemctl status ollama
sudo systemctl status open-webui nginx prometheus grafana
# Restart a service
sudo systemctl restart ollama
sudo systemctl restart open-webui
# View logs
journalctl -u ollama -f
journalctl -u ollama --since "1 hour ago" --no-pager
Podman Containers
# List running containers
sudo podman ps
# List all containers (including stopped)
sudo podman ps -a
# View container logs
sudo podman logs -f open-webui
sudo podman logs --tail 50 prometheus
sudo podman logs grafana
# Inspect a container (network mode, volumes, env)
sudo podman inspect open-webui
# Pull updated images
sudo podman pull ghcr.io/open-webui/open-webui:latest
sudo podman pull docker.io/prom/prometheus:latest
sudo podman pull docker.io/grafana/grafana-oss:latest
nginx
# Test config syntax
sudo nginx -t
# Reload config without restart
sudo systemctl reload nginx
# View access logs
sudo tail -f /var/log/nginx/access.log
# View error logs
sudo tail -f /var/log/nginx/error.log
NVIDIA GPU
# GPU status (utilization, VRAM, temperature)
nvidia-smi
# Live monitoring (updates every second)
nvidia-smi -l 1
# Check SELinux device contexts
ls -Z /dev/nvidia*
# Check for SELinux denials
sudo ausearch -m avc -ts recent | grep nvidia
Backup and Maintenance
# Run a manual backup
sudo /usr/local/bin/ai-backup.sh
# List backups
ls -lh /opt/ai-backup/
# Check fail2ban status
sudo fail2ban-client status
sudo fail2ban-client status nginx-http-auth
# Unban an IP
sudo fail2ban-client set nginx-http-auth unbanip 192.168.1.100
# Check disk usage
du -sh /var/lib/ollama/models/
du -sh /opt/open-webui/data/
du -sh /opt/prometheus/data/
du -sh /opt/grafana/data/
Model Recommendations by VRAM
| VRAM | Model | Size on Disk | Use Case |
|---|
| CPU / 4 GB | tinyllama | ~637 MB | Testing only |
| CPU / 8 GB | phi3:mini | ~2.3 GB | Light use, fast on CPU |
| 8 GB GPU | llama3.1:8b | ~4.7 GB | General purpose, good quality |
| 8 GB GPU | mistral:7b | ~4.1 GB | General purpose, slightly faster |
| 8 GB GPU | gemma2:9b | ~5.5 GB | Google’s model, strong reasoning |
| 16 GB GPU | codellama:34b-instruct-q4 | ~20 GB | Coding assistant |
| 16 GB GPU | llama3.1:8b (full) | ~16 GB | Higher quality than quantized |
| 24 GB GPU | llama3.1:70b-q4 | ~40 GB | Best open-source quality |
| 24 GB GPU | deepseek-coder-v2:33b | ~19 GB | Excellent coding model |
| 24 GB GPU | mixtral:8x7b | ~26 GB | Mixture of experts, versatile |
Tip: Start with tinyllama to verify the stack works, then switch to a model appropriate for your VRAM.
Playbook Bundle Reference
The sections below are for users of the companion Ansible playbook bundle. If you deployed manually using this guide, you can skip this section. The playbook bundle is available at RavenForge Press.
Vault Variable Reference
| Variable | Purpose | Used By |
|---|
vault_grafana_admin_password | Grafana web UI admin password | roles/ai-monitoring/ (Grafana container env) |
vault_openwebui_secret_key | Open WebUI session signing and CSRF protection | roles/open-webui/ (container env) |
Key group_vars/all.yml Variables
| Variable | Default | What It Controls |
|---|
ai_gpu_enabled | false | Whether to run GPU roles (nvidia-gpu, nvidia_gpu_exporter) |
ollama_version | "latest" | Ollama binary version |
ollama_host | "127.0.0.1" | Ollama bind address |
ollama_port | 11434 | Ollama API port |
ollama_models_dir | "/var/lib/ollama/models" | Model storage path |
ollama_models_to_pull | ["tinyllama"] | Models to download during deployment |
ollama_num_parallel | 1 | Concurrent inference requests |
ollama_max_loaded_models | 1 | Models kept in VRAM simultaneously |
ollama_min_disk_gb | 2 | Minimum free disk space for model storage |
openwebui_image | "ghcr.io/open-webui/open-webui:latest" | Open WebUI container image |
openwebui_port | 3000 | Open WebUI internal port |
openwebui_data_dir | "/opt/open-webui/data" | Persistent data directory |
ai_domain | "ai.example.com" | Domain for nginx config (CHANGE_ME) |
ai_ssl_cert | "/etc/nginx/ssl/ai-stack.crt" | SSL certificate path |
ai_ssl_key | "/etc/nginx/ssl/ai-stack.key" | SSL private key path |
ai_monitoring_enabled | true | Whether to deploy Prometheus/Grafana |
grafana_port | 3001 | Grafana internal port |
grafana_admin_user | "admin" | Grafana admin username |
prometheus_port | 9090 | Prometheus port |
nvidia_exporter_port | 9400 | GPU exporter metrics port |
ai_backup_dir | "/opt/ai-backup" | Backup destination |
ai_backup_retention_days | 7 | Days to keep backups |
Ansible Playbook Commands
# Deploy the full stack
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt
# Deploy only specific roles
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --tags ollama
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --tags monitoring
# Verify the deployment
ansible-playbook verify.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt
# Dry run (check mode)
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --check
# Run against a specific host
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --limit ai-host-01
Role Execution Order (from site.yml)
| Order | Role | Condition | Tags |
|---|
| 1 | nvidia-gpu | ai_gpu_enabled = true | nvidia, gpu |
| 2 | ollama | Always | ollama, install, configure |
| 3 | open-webui | Always | openwebui, webui |
| 4 | nginx-ai-proxy | Always | nginx, proxy |
| 5 | firewall | Always | firewall |
| 6 | ai-monitoring | ai_monitoring_enabled = true | monitoring, grafana, prometheus |
| 7 | ai-hardening | Always | hardening, backup |