Quick Reference — RavenForge

What you’ll accomplish: Have a single cheat sheet for file paths, ports, commands, model recommendations, and variable reference — the page you’ll bookmark and come back to.

File Paths

File	Purpose
`/usr/local/bin/ollama`	Ollama binary
`/var/lib/ollama/models/`	Model storage (4-40 GB per model)
`/etc/systemd/system/ollama.service`	Ollama systemd unit
`/etc/systemd/system/ollama.service.d/override.conf`	Ollama env config (host, models dir, parallelism)
`/etc/containers/systemd/open-webui.container`	Open WebUI Quadlet unit
`/opt/open-webui/data/`	Open WebUI persistent data (chat history, users)
`/etc/nginx/conf.d/ai-stack.conf`	nginx reverse proxy config
`/etc/nginx/ssl/ai-stack.crt`	SSL certificate
`/etc/nginx/ssl/ai-stack.key`	SSL private key
`/etc/containers/systemd/prometheus.container`	Prometheus Quadlet unit
`/opt/prometheus/prometheus.yml`	Prometheus scrape config
`/opt/prometheus/data/`	Prometheus time-series data
`/etc/containers/systemd/grafana.container`	Grafana Quadlet unit
`/opt/grafana/data/`	Grafana persistent data
`/opt/grafana/provisioning/datasources/prometheus.yml`	Grafana auto-provisioned datasource
`/opt/grafana/dashboards/ai-stack-dashboard.json`	Pre-built Grafana dashboard
`/usr/local/bin/nvidia_gpu_exporter`	GPU metrics exporter binary (GPU only)
`/etc/systemd/system/nvidia_gpu_exporter.service`	GPU exporter systemd unit (GPU only)
`/usr/local/bin/ai-backup.sh`	Daily backup script
`/opt/ai-backup/`	Backup storage directory
`/etc/logrotate.d/ollama`	Ollama log rotation config
`/etc/fail2ban/jail.d/nginx-ai.conf`	fail2ban jail config for nginx

Ports

Port	Protocol	Service	Exposure	Notes
443	TCP	nginx (HTTPS)	Network	The only port users access
80	TCP	nginx (HTTP)	Network	Redirects to 443
11434	TCP	Ollama API	Localhost only	No authentication — never expose
3000	TCP	Open WebUI	Localhost only	Proxied via nginx at `/`
9090	TCP	Prometheus	Localhost only	Scraped by Grafana
3001	TCP	Grafana	Localhost only	Proxied via nginx at `/grafana/`
9400	TCP	nvidia_gpu_exporter	Localhost only	GPU hosts only, scraped by Prometheus

Access URLs

URL	Service	Notes
`https://ai.example.com/`	Open WebUI	Chat interface, user accounts
`https://ai.example.com/grafana/`	Grafana	Dashboards, monitoring (when monitoring is enabled)

CLI Commands

Ollama

# List installed models
ollama list

# Show currently loaded models and VRAM usage
ollama ps

# Pull a model
ollama pull llama3.1:8b

# Remove a model
ollama rm tinyllama

# Run a quick test (interactive chat)
ollama run llama3.1:8b "What is Rocky Linux?"

# Show model details (quantization, parameters, size)
ollama show llama3.1:8b

# Check the API
curl -s http://127.0.0.1:11434/api/tags | python3 -m json.tool

systemd Services

# Check all AI stack services
systemctl status ollama
sudo systemctl status open-webui nginx prometheus grafana

# Restart a service
sudo systemctl restart ollama
sudo systemctl restart open-webui

# View logs
journalctl -u ollama -f
journalctl -u ollama --since "1 hour ago" --no-pager

Podman Containers

# List running containers
sudo podman ps

# List all containers (including stopped)
sudo podman ps -a

# View container logs
sudo podman logs -f open-webui
sudo podman logs --tail 50 prometheus
sudo podman logs grafana

# Inspect a container (network mode, volumes, env)
sudo podman inspect open-webui

# Pull updated images
sudo podman pull ghcr.io/open-webui/open-webui:latest
sudo podman pull docker.io/prom/prometheus:latest
sudo podman pull docker.io/grafana/grafana-oss:latest

nginx

# Test config syntax
sudo nginx -t

# Reload config without restart
sudo systemctl reload nginx

# View access logs
sudo tail -f /var/log/nginx/access.log

# View error logs
sudo tail -f /var/log/nginx/error.log

NVIDIA GPU

# GPU status (utilization, VRAM, temperature)
nvidia-smi

# Live monitoring (updates every second)
nvidia-smi -l 1

# Check SELinux device contexts
ls -Z /dev/nvidia*

# Check for SELinux denials
sudo ausearch -m avc -ts recent | grep nvidia

Backup and Maintenance

# Run a manual backup
sudo /usr/local/bin/ai-backup.sh

# List backups
ls -lh /opt/ai-backup/

# Check fail2ban status
sudo fail2ban-client status
sudo fail2ban-client status nginx-http-auth

# Unban an IP
sudo fail2ban-client set nginx-http-auth unbanip 192.168.1.100

# Check disk usage
du -sh /var/lib/ollama/models/
du -sh /opt/open-webui/data/
du -sh /opt/prometheus/data/
du -sh /opt/grafana/data/

Model Recommendations by VRAM

VRAM	Model	Size on Disk	Use Case
CPU / 4 GB	`tinyllama`	~637 MB	Testing only
CPU / 8 GB	`phi3:mini`	~2.3 GB	Light use, fast on CPU
8 GB GPU	`llama3.1:8b`	~4.7 GB	General purpose, good quality
8 GB GPU	`mistral:7b`	~4.1 GB	General purpose, slightly faster
8 GB GPU	`gemma2:9b`	~5.5 GB	Google’s model, strong reasoning
16 GB GPU	`codellama:34b-instruct-q4`	~20 GB	Coding assistant
16 GB GPU	`llama3.1:8b` (full)	~16 GB	Higher quality than quantized
24 GB GPU	`llama3.1:70b-q4`	~40 GB	Best open-source quality
24 GB GPU	`deepseek-coder-v2:33b`	~19 GB	Excellent coding model
24 GB GPU	`mixtral:8x7b`	~26 GB	Mixture of experts, versatile

Tip: Start with tinyllama to verify the stack works, then switch to a model appropriate for your VRAM.

Playbook Bundle Reference

The sections below are for users of the companion Ansible playbook bundle. If you deployed manually using this guide, you can skip this section. The playbook bundle is available at RavenForge Press.

Vault Variable Reference

Variable	Purpose	Used By
`vault_grafana_admin_password`	Grafana web UI admin password	`roles/ai-monitoring/` (Grafana container env)
`vault_openwebui_secret_key`	Open WebUI session signing and CSRF protection	`roles/open-webui/` (container env)

Key group_vars/all.yml Variables

Variable	Default	What It Controls
`ai_gpu_enabled`	`false`	Whether to run GPU roles (nvidia-gpu, nvidia_gpu_exporter)
`ollama_version`	`"latest"`	Ollama binary version
`ollama_host`	`"127.0.0.1"`	Ollama bind address
`ollama_port`	`11434`	Ollama API port
`ollama_models_dir`	`"/var/lib/ollama/models"`	Model storage path
`ollama_models_to_pull`	`["tinyllama"]`	Models to download during deployment
`ollama_num_parallel`	`1`	Concurrent inference requests
`ollama_max_loaded_models`	`1`	Models kept in VRAM simultaneously
`ollama_min_disk_gb`	`2`	Minimum free disk space for model storage
`openwebui_image`	`"ghcr.io/open-webui/open-webui:latest"`	Open WebUI container image
`openwebui_port`	`3000`	Open WebUI internal port
`openwebui_data_dir`	`"/opt/open-webui/data"`	Persistent data directory
`ai_domain`	`"ai.example.com"`	Domain for nginx config (CHANGE_ME)
`ai_ssl_cert`	`"/etc/nginx/ssl/ai-stack.crt"`	SSL certificate path
`ai_ssl_key`	`"/etc/nginx/ssl/ai-stack.key"`	SSL private key path
`ai_monitoring_enabled`	`true`	Whether to deploy Prometheus/Grafana
`grafana_port`	`3001`	Grafana internal port
`grafana_admin_user`	`"admin"`	Grafana admin username
`prometheus_port`	`9090`	Prometheus port
`nvidia_exporter_port`	`9400`	GPU exporter metrics port
`ai_backup_dir`	`"/opt/ai-backup"`	Backup destination
`ai_backup_retention_days`	`7`	Days to keep backups

Ansible Playbook Commands

# Deploy the full stack
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt

# Deploy only specific roles
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --tags ollama
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --tags monitoring

# Verify the deployment
ansible-playbook verify.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt

# Dry run (check mode)
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --check

# Run against a specific host
ansible-playbook site.yml -i inventory/hosts.yml --vault-password-file .credentials/vault.txt --limit ai-host-01

Role Execution Order (from site.yml)

Order	Role	Condition	Tags
1	`nvidia-gpu`	`ai_gpu_enabled` = true	nvidia, gpu
2	`ollama`	Always	ollama, install, configure
3	`open-webui`	Always	openwebui, webui
4	`nginx-ai-proxy`	Always	nginx, proxy
5	`firewall`	Always	firewall
6	`ai-monitoring`	`ai_monitoring_enabled` = true	monitoring, grafana, prometheus
7	`ai-hardening`	Always	hardening, backup