Gotchas & Troubleshooting

What you’ll accomplish: Know where to look when things break, understand the 8 most common failure modes, and have a diagnostic sequence for each one.

The Quick Diagnostic Sequence

When something isn’t working, run this first:

# Service status
systemctl status ollama
sudo systemctl status open-webui
sudo systemctl status nginx
sudo systemctl status prometheus
sudo systemctl status grafana

# Recent logs
journalctl -u ollama --since "10 minutes ago" --no-pager
sudo podman logs open-webui 2>&1 | tail -30
sudo tail -30 /var/log/nginx/error.log

# SELinux denials (catches silent failures)
sudo ausearch -m avc -ts recent

# Disk and memory
df -h / /var/lib/ollama/models
free -h

# GPU (if applicable)
nvidia-smi

Nine times out of ten, the answer is in one of those outputs.

The 8 Problems That Cost You Hours

Problem 1: CUDA Out of Memory — But nvidia-smi Shows Free VRAM

Symptom: Ollama logs show CUDA out of memory or the model silently falls back to CPU. But nvidia-smi shows hundreds of MB or even gigabytes of “free” memory.

Cause: VRAM fragmentation. The model needs a contiguous block of memory, and the free VRAM is scattered across small fragments. This happens when multiple models have been loaded and unloaded, or when OLLAMA_MAX_LOADED_MODELS is set higher than 1.

Fix:

# Check what's loaded
ollama ps

# Unload all models (frees VRAM)
# Simply restart Ollama — it starts clean
sudo systemctl restart ollama

# Prevent recurrence: keep only one model loaded
# In /etc/systemd/system/ollama.service.d/override.conf:
# Environment="OLLAMA_MAX_LOADED_MODELS=1"

If you consistently hit OOM with a model that should fit, the model’s actual VRAM requirement is higher than the advertised size. Try a more aggressively quantized version (e.g., q4_0 instead of q4_K_M).

Problem 2: SELinux Denials After Moving Model Storage

Symptom: Ollama can’t read or write models after you moved OLLAMA_MODELS to a new path. ollama pull fails with permission errors. journalctl -u ollama shows access denials.

Cause: The new directory doesn’t have the correct SELinux file context. Files inherit context from their parent directory, and your new mount point likely has default_t or mnt_t context instead of what Ollama expects.

Fix:

# Check current context on the new directory
ls -Zd /new/path/to/models

# Apply a suitable context (var_lib_t works for /var paths)
sudo semanage fcontext -a -t var_lib_t "/new/path/to/models(/.*)?"
sudo restorecon -Rv /new/path/to/models

# Verify
ls -Z /new/path/to/models

Also check that the ollama user has filesystem permissions:

sudo chown -R ollama:ollama /new/path/to/models

Problem 3: Podman Network Namespace — Open WebUI Can’t Reach Ollama

Symptom: Open WebUI shows “Ollama not reachable” or “Could not connect to Ollama.” Ollama is running fine — curl http://127.0.0.1:11434/api/tags works from the host.

Cause: If the Open WebUI container is using Podman’s default bridge network instead of Network=host, 127.0.0.1 inside the container points to the container’s own loopback, not the host’s. Ollama is listening on the host’s loopback and the container can’t reach it.

Fix: Verify the Quadlet file uses host networking:

# /etc/containers/systemd/open-webui.container
[Container]
Network=host

If you’re already using Network=host and it still doesn’t work:

# Verify Ollama is actually listening
sudo ss -tlnp | grep 11434

# Verify the OLLAMA_HOST binding
grep OLLAMA_HOST /etc/systemd/system/ollama.service.d/override.conf

If OLLAMA_HOST is set to 127.0.0.1:11434, that’s correct for host networking. If it’s set to 0.0.0.0:11434, that also works but exposes the API to the network (which we don’t want).

Problem 4: nginx Timeout on Streaming Responses

Symptom: Long inference responses get cut off mid-stream. The browser shows a partial response and then stops. nginx error log shows upstream timed out (110: Connection timed out).

Cause: The default proxy_read_timeout is 60 seconds. A large model on CPU can easily take longer than that for a single response, especially with long prompts.

Fix: Verify these lines in /etc/nginx/conf.d/ai-stack.conf:

proxy_read_timeout  86400s;
proxy_send_timeout  86400s;

If you changed the config, reload nginx:

sudo nginx -t && sudo systemctl reload nginx

86400 seconds is 24 hours. That’s not a typo — if the connection is actually dead, TCP keepalive detects it. The timeout is just a safety net against nginx killing connections that are legitimately streaming.

Problem 5: GPU Passthrough “Code 43” in Proxmox

Symptom: nvidia-smi returns an error after installing drivers. dmesg | grep nvidia shows NVRM: GPU does not have the correct resources assigned or the NVIDIA Control Panel shows “Code 43.”

Cause: NVIDIA drivers detect they’re running inside a hypervisor and refuse to initialize. This was historically an anti-VM measure in consumer GPU drivers (GeForce series). Quadro and Tesla cards don’t have this issue.

Note: NVIDIA drivers from 2021+ (version 465+) have removed the hypervisor detection that caused Code 43 on Linux KVM guests. If you’re running a recent driver on a Linux VM, you’re unlikely to hit this. Code 43 is primarily a Windows VM problem now.

Fix (try in order):

First: Ensure the CPU type is host — this is required for GPU passthrough regardless of Code 43:

# /etc/pve/qemu-server/<VMID>.conf
cpu: host

Second: If you’re getting Code 43 (especially on Windows VMs), add hidden=1:

cpu: host,hidden=1

Or via the Proxmox UI: VM > Hardware > Processor > Type: host, then add hidden=1 to the args.

Last resort: For stubborn cards or older drivers, add the full hypervisor-hiding args:

# Additional args in the VM config
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,kvm=off'

Reboot the VM after making these changes.

Problem 6: Ollama Loads Model to CPU Despite GPU Being Available

Symptom: ollama ps shows the model running on CPU. nvidia-smi shows low or zero GPU utilization. Inference is painfully slow.

Cause: Multiple possible causes. Check in order:

CUDA not detected: Check Ollama startup logs for CUDA initialization messages:
```
journalctl -u ollama | grep -i cuda
```
If there’s no CUDA mention, the drivers aren’t installed correctly (back to Chapter 3).
Wrong CUDA_VISIBLE_DEVICES: If you’ve set this in the override config and it points to a nonexistent device:
```
grep CUDA /etc/systemd/system/ollama.service.d/override.conf
```
SELinux blocking GPU access: The most subtle cause. Check for denials:
```
sudo ausearch -m avc -ts recent | grep nvidia
```
If you see denials, the SELinux device context from Chapter 3 wasn’t applied correctly. Re-run:
```
sudo semanage fcontext -a -t xserver_misc_device_t "/dev/nvidia.*"
sudo restorecon -Rv /dev/nvidia*
```
Model too large for VRAM: Ollama silently falls back to CPU when a model doesn’t fit in available VRAM. Check nvidia-smi for free VRAM vs the model’s size. Try a smaller or more quantized model.

Problem 7: Open WebUI Shows “Ollama Not Reachable”

Symptom: The Open WebUI interface loads, but when you try to chat, you get “Ollama not reachable” or “Could not connect to Ollama server.”

Cause: This is a connectivity issue between the Open WebUI container and Ollama. The three most common causes:

Check 1: Is Ollama running?

systemctl status ollama
curl -s http://127.0.0.1:11434/api/tags

Check 2: Is the container using host networking?

sudo podman inspect open-webui | grep -i networkmode

Should show host. If not, fix the Quadlet file (see Problem 3 above).

Check 3: Is the environment variable correct?

sudo podman exec open-webui env | grep OLLAMA

Should show OLLAMA_BASE_URL=http://127.0.0.1:11434. If the port or host is wrong, update the Quadlet file and restart.

Check 4: Is there a firewall issue?

# Ollama should be on loopback — firewall shouldn't matter
# But check anyway
sudo ss -tlnp | grep 11434

Problem 8: Grafana “No Data” Panels

Symptom: Grafana dashboard loads but panels show “No Data” or “No data points.”

Cause: Prometheus isn’t scraping the targets, the targets aren’t exposing metrics, or the nginx proxy isn’t reaching Grafana.

Fix — check each layer:

# Is Prometheus running?
curl -s http://127.0.0.1:9090/-/ready

# Are targets up? Check the targets page
curl -s http://127.0.0.1:9090/api/v1/targets | python3 -m json.tool | grep -A5 health

# Is Ollama exposing metrics?
curl -s http://127.0.0.1:11434/metrics | head -20

# Is the GPU exporter running? (GPU hosts only)
curl -s http://127.0.0.1:9400/metrics | head -20

# Is Grafana healthy on localhost? (bypasses nginx)
curl -s http://127.0.0.1:3001/grafana/api/health

# Is the nginx proxy to Grafana working?
curl -sk https://localhost/grafana/api/health

If Grafana works on localhost but not through nginx:

Check that the /grafana/ location block exists in /etc/nginx/conf.d/ai-stack.conf
Verify the /grafana/ location block exists in /etc/nginx/conf.d/ai-stack.conf
Reload nginx: sudo nginx -t && sudo systemctl reload nginx
Check the SELinux boolean: getsebool httpd_can_network_connect — it must be on for nginx to proxy to Grafana

If targets show as DOWN:

Check that the ports in prometheus.yml match the actual service ports
On GPU hosts, verify nvidia_gpu_exporter is running: systemctl status nvidia_gpu_exporter

If targets are UP but Grafana still shows no data:

Check the Grafana datasource is configured to point to http://127.0.0.1:9090
The dashboard may use metric names that differ between Ollama versions — check the Explore tab in Grafana to see what metrics are actually available

Log Locations Quick Reference

Service	Primary Log	How to View
Ollama	systemd journal	`journalctl -u ollama -f`
Open WebUI	Podman container log	`sudo podman logs -f open-webui`
nginx	File-based	`sudo tail -f /var/log/nginx/error.log`
nginx access	File-based	`sudo tail -f /var/log/nginx/access.log`
Prometheus	Podman container log	`sudo podman logs -f prometheus`
Grafana	Podman container log	`sudo podman logs -f grafana`
nvidia_gpu_exporter	systemd journal	`journalctl -u nvidia_gpu_exporter -f`
fail2ban	File-based	`sudo tail -f /var/log/fail2ban.log`
SELinux denials	audit log	`sudo ausearch -m avc -ts recent`

The “It Worked Yesterday” Checklist

When something that was working stops, check these in order:

Disk space. df -h /. Full disks cause bizarre failures everywhere.
Memory. free -h. OOM killer may have stopped a service.
Service status. systemctl status ollama open-webui nginx. Did something crash and not restart?
SELinux. sudo ausearch -m avc -ts recent. A system update may have changed policy.
Container images. sudo podman ps -a. Did an auto-update pull a broken image?
DNS/network. ping ai.example.com. Did your DNS change?
Certificate expiry. openssl s_client -connect ai.example.com:443 2>/dev/null | openssl x509 -noout -dates. Self-signed certs expire after 365 days by default.
Package updates. dnf history list. Did a kernel update break NVIDIA drivers? (Reboot + DKMS rebuild needed.)

What’s Next

You have a working, monitored, hardened AI stack on Rocky Linux. Ollama serves models, Open WebUI gives your users a clean interface, nginx handles SSL, Grafana shows you what’s happening, and backups run on autopilot. That’s a solid foundation — and it’s designed to be extended.

Natural next steps that build on this base:

RAG pipeline — connect Open WebUI to a document store so your models can answer questions about your own files, wikis, or knowledge base. This plugs into the existing Open WebUI + Ollama setup without replacing anything.
vLLM backend — swap Ollama for vLLM when you need higher concurrency or continuous batching. The nginx proxy and monitoring layers stay the same; only the inference backend changes.
Multi-model load balancing — distribute requests across multiple GPU hosts when one card isn’t enough.
OAuth2 / SSO access control — replace Open WebUI’s built-in auth with your existing identity provider for team deployments.

Each of these extends the ai-stack-deploy/ playbook bundle rather than replacing it. Check RavenForge Press for upcoming add-on guides as they become available.