← Self-Hosting AI the Right Way

Chapter 7

Gotchas & Troubleshooting

In this chapter
<nav id="TableOfContents" aria-label="Chapter sections"> <ul> <li><a href="#the-quick-diagnostic-sequence">The Quick Diagnostic Sequence</a></li> <li><a href="#the-8-problems-that-cost-you-hours">The 8 Problems That Cost You Hours</a> <ul> <li><a href="#problem-1-cuda-out-of-memory--but-nvidia-smi-shows-free-vram">Problem 1: CUDA Out of Memory — But nvidia-smi Shows Free VRAM</a></li> <li><a href="#problem-2-selinux-denials-after-moving-model-storage">Problem 2: SELinux Denials After Moving Model Storage</a></li> <li><a href="#problem-3-podman-network-namespace--open-webui-cant-reach-ollama">Problem 3: Podman Network Namespace — Open WebUI Can&rsquo;t Reach Ollama</a></li> <li><a href="#problem-4-nginx-timeout-on-streaming-responses">Problem 4: nginx Timeout on Streaming Responses</a></li> <li><a href="#problem-5-gpu-passthrough-code-43-in-proxmox">Problem 5: GPU Passthrough &ldquo;Code 43&rdquo; in Proxmox</a></li> <li><a href="#problem-6-ollama-loads-model-to-cpu-despite-gpu-being-available">Problem 6: Ollama Loads Model to CPU Despite GPU Being Available</a></li> <li><a href="#problem-7-open-webui-shows-ollama-not-reachable">Problem 7: Open WebUI Shows &ldquo;Ollama Not Reachable&rdquo;</a></li> <li><a href="#problem-8-grafana-no-data-panels">Problem 8: Grafana &ldquo;No Data&rdquo; Panels</a></li> </ul> </li> <li><a href="#log-locations-quick-reference">Log Locations Quick Reference</a></li> <li><a href="#the-it-worked-yesterday-checklist">The &ldquo;It Worked Yesterday&rdquo; Checklist</a></li> <li><a href="#whats-next">What&rsquo;s Next</a></li> </ul> </nav>

What you’ll accomplish: Know where to look when things break, understand the 8 most common failure modes, and have a diagnostic sequence for each one.

The Quick Diagnostic Sequence

When something isn’t working, run this first:

# Service status
systemctl status ollama
sudo systemctl status open-webui
sudo systemctl status nginx
sudo systemctl status prometheus
sudo systemctl status grafana

# Recent logs
journalctl -u ollama --since "10 minutes ago" --no-pager
sudo podman logs open-webui 2>&1 | tail -30
sudo tail -30 /var/log/nginx/error.log

# SELinux denials (catches silent failures)
sudo ausearch -m avc -ts recent

# Disk and memory
df -h / /var/lib/ollama/models
free -h

# GPU (if applicable)
nvidia-smi

Nine times out of ten, the answer is in one of those outputs.

The 8 Problems That Cost You Hours

Problem 1: CUDA Out of Memory — But nvidia-smi Shows Free VRAM

Symptom: Ollama logs show CUDA out of memory or the model silently falls back to CPU. But nvidia-smi shows hundreds of MB or even gigabytes of “free” memory.

Cause: VRAM fragmentation. The model needs a contiguous block of memory, and the free VRAM is scattered across small fragments. This happens when multiple models have been loaded and unloaded, or when OLLAMA_MAX_LOADED_MODELS is set higher than 1.

Fix:

# Check what's loaded
ollama ps

# Unload all models (frees VRAM)
# Simply restart Ollama — it starts clean
sudo systemctl restart ollama

# Prevent recurrence: keep only one model loaded
# In /etc/systemd/system/ollama.service.d/override.conf:
# Environment="OLLAMA_MAX_LOADED_MODELS=1"

If you consistently hit OOM with a model that should fit, the model’s actual VRAM requirement is higher than the advertised size. Try a more aggressively quantized version (e.g., q4_0 instead of q4_K_M).

Problem 2: SELinux Denials After Moving Model Storage

Symptom: Ollama can’t read or write models after you moved OLLAMA_MODELS to a new path. ollama pull fails with permission errors. journalctl -u ollama shows access denials.

Cause: The new directory doesn’t have the correct SELinux file context. Files inherit context from their parent directory, and your new mount point likely has default_t or mnt_t context instead of what Ollama expects.

Fix:

# Check current context on the new directory
ls -Zd /new/path/to/models

# Apply a suitable context (var_lib_t works for /var paths)
sudo semanage fcontext -a -t var_lib_t "/new/path/to/models(/.*)?"
sudo restorecon -Rv /new/path/to/models

# Verify
ls -Z /new/path/to/models

Also check that the ollama user has filesystem permissions:

sudo chown -R ollama:ollama /new/path/to/models

Problem 3: Podman Network Namespace — Open WebUI Can’t Reach Ollama

Symptom: Open WebUI shows “Ollama not reachable” or “Could not connect to Ollama.” Ollama is running fine — curl http://127.0.0.1:11434/api/tags works from the host.

Cause: If the Open WebUI container is using Podman’s default bridge network instead of Network=host, 127.0.0.1 inside the container points to the container’s own loopback, not the host’s. Ollama is listening on the host’s loopback and the container can’t reach it.

Fix: Verify the Quadlet file uses host networking:

# /etc/containers/systemd/open-webui.container
[Container]
Network=host

If you’re already using Network=host and it still doesn’t work:

# Verify Ollama is actually listening
sudo ss -tlnp | grep 11434

# Verify the OLLAMA_HOST binding
grep OLLAMA_HOST /etc/systemd/system/ollama.service.d/override.conf

If OLLAMA_HOST is set to 127.0.0.1:11434, that’s correct for host networking. If it’s set to 0.0.0.0:11434, that also works but exposes the API to the network (which we don’t want).

Problem 4: nginx Timeout on Streaming Responses

Symptom: Long inference responses get cut off mid-stream. The browser shows a partial response and then stops. nginx error log shows upstream timed out (110: Connection timed out).

Cause: The default proxy_read_timeout is 60 seconds. A large model on CPU can easily take longer than that for a single response, especially with long prompts.

Fix: Verify these lines in /etc/nginx/conf.d/ai-stack.conf:

proxy_read_timeout  86400s;
proxy_send_timeout  86400s;

If you changed the config, reload nginx:

sudo nginx -t && sudo systemctl reload nginx

86400 seconds is 24 hours. That’s not a typo — if the connection is actually dead, TCP keepalive detects it. The timeout is just a safety net against nginx killing connections that are legitimately streaming.

Problem 5: GPU Passthrough “Code 43” in Proxmox

Symptom: nvidia-smi returns an error after installing drivers. dmesg | grep nvidia shows NVRM: GPU does not have the correct resources assigned or the NVIDIA Control Panel shows “Code 43.”

Cause: NVIDIA drivers detect they’re running inside a hypervisor and refuse to initialize. This was historically an anti-VM measure in consumer GPU drivers (GeForce series). Quadro and Tesla cards don’t have this issue.

Note: NVIDIA drivers from 2021+ (version 465+) have removed the hypervisor detection that caused Code 43 on Linux KVM guests. If you’re running a recent driver on a Linux VM, you’re unlikely to hit this. Code 43 is primarily a Windows VM problem now.

Fix (try in order):

First: Ensure the CPU type is host — this is required for GPU passthrough regardless of Code 43:

# /etc/pve/qemu-server/<VMID>.conf
cpu: host

Second: If you’re getting Code 43 (especially on Windows VMs), add hidden=1:

cpu: host,hidden=1

Or via the Proxmox UI: VM > Hardware > Processor > Type: host, then add hidden=1 to the args.

Last resort: For stubborn cards or older drivers, add the full hypervisor-hiding args:

# Additional args in the VM config
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,kvm=off'

Reboot the VM after making these changes.

Problem 6: Ollama Loads Model to CPU Despite GPU Being Available

Symptom: ollama ps shows the model running on CPU. nvidia-smi shows low or zero GPU utilization. Inference is painfully slow.

Cause: Multiple possible causes. Check in order:

  1. CUDA not detected: Check Ollama startup logs for CUDA initialization messages:

    journalctl -u ollama | grep -i cuda
    

    If there’s no CUDA mention, the drivers aren’t installed correctly (back to Chapter 3).

  2. Wrong CUDA_VISIBLE_DEVICES: If you’ve set this in the override config and it points to a nonexistent device:

    grep CUDA /etc/systemd/system/ollama.service.d/override.conf
    
  3. SELinux blocking GPU access: The most subtle cause. Check for denials:

    sudo ausearch -m avc -ts recent | grep nvidia
    

    If you see denials, the SELinux device context from Chapter 3 wasn’t applied correctly. Re-run:

    sudo semanage fcontext -a -t xserver_misc_device_t "/dev/nvidia.*"
    sudo restorecon -Rv /dev/nvidia*
    
  4. Model too large for VRAM: Ollama silently falls back to CPU when a model doesn’t fit in available VRAM. Check nvidia-smi for free VRAM vs the model’s size. Try a smaller or more quantized model.

Problem 7: Open WebUI Shows “Ollama Not Reachable”

Symptom: The Open WebUI interface loads, but when you try to chat, you get “Ollama not reachable” or “Could not connect to Ollama server.”

Cause: This is a connectivity issue between the Open WebUI container and Ollama. The three most common causes:

Check 1: Is Ollama running?

systemctl status ollama
curl -s http://127.0.0.1:11434/api/tags

Check 2: Is the container using host networking?

sudo podman inspect open-webui | grep -i networkmode

Should show host. If not, fix the Quadlet file (see Problem 3 above).

Check 3: Is the environment variable correct?

sudo podman exec open-webui env | grep OLLAMA

Should show OLLAMA_BASE_URL=http://127.0.0.1:11434. If the port or host is wrong, update the Quadlet file and restart.

Check 4: Is there a firewall issue?

# Ollama should be on loopback — firewall shouldn't matter
# But check anyway
sudo ss -tlnp | grep 11434

Problem 8: Grafana “No Data” Panels

Symptom: Grafana dashboard loads but panels show “No Data” or “No data points.”

Cause: Prometheus isn’t scraping the targets, the targets aren’t exposing metrics, or the nginx proxy isn’t reaching Grafana.

Fix — check each layer:

# Is Prometheus running?
curl -s http://127.0.0.1:9090/-/ready

# Are targets up? Check the targets page
curl -s http://127.0.0.1:9090/api/v1/targets | python3 -m json.tool | grep -A5 health

# Is Ollama exposing metrics?
curl -s http://127.0.0.1:11434/metrics | head -20

# Is the GPU exporter running? (GPU hosts only)
curl -s http://127.0.0.1:9400/metrics | head -20

# Is Grafana healthy on localhost? (bypasses nginx)
curl -s http://127.0.0.1:3001/grafana/api/health

# Is the nginx proxy to Grafana working?
curl -sk https://localhost/grafana/api/health

If Grafana works on localhost but not through nginx:

  • Check that the /grafana/ location block exists in /etc/nginx/conf.d/ai-stack.conf
  • Verify the /grafana/ location block exists in /etc/nginx/conf.d/ai-stack.conf
  • Reload nginx: sudo nginx -t && sudo systemctl reload nginx
  • Check the SELinux boolean: getsebool httpd_can_network_connect — it must be on for nginx to proxy to Grafana

If targets show as DOWN:

  • Check that the ports in prometheus.yml match the actual service ports
  • On GPU hosts, verify nvidia_gpu_exporter is running: systemctl status nvidia_gpu_exporter

If targets are UP but Grafana still shows no data:

  • Check the Grafana datasource is configured to point to http://127.0.0.1:9090
  • The dashboard may use metric names that differ between Ollama versions — check the Explore tab in Grafana to see what metrics are actually available

Log Locations Quick Reference

ServicePrimary LogHow to View
Ollamasystemd journaljournalctl -u ollama -f
Open WebUIPodman container logsudo podman logs -f open-webui
nginxFile-basedsudo tail -f /var/log/nginx/error.log
nginx accessFile-basedsudo tail -f /var/log/nginx/access.log
PrometheusPodman container logsudo podman logs -f prometheus
GrafanaPodman container logsudo podman logs -f grafana
nvidia_gpu_exportersystemd journaljournalctl -u nvidia_gpu_exporter -f
fail2banFile-basedsudo tail -f /var/log/fail2ban.log
SELinux denialsaudit logsudo ausearch -m avc -ts recent

The “It Worked Yesterday” Checklist

When something that was working stops, check these in order:

  1. Disk space. df -h /. Full disks cause bizarre failures everywhere.
  2. Memory. free -h. OOM killer may have stopped a service.
  3. Service status. systemctl status ollama open-webui nginx. Did something crash and not restart?
  4. SELinux. sudo ausearch -m avc -ts recent. A system update may have changed policy.
  5. Container images. sudo podman ps -a. Did an auto-update pull a broken image?
  6. DNS/network. ping ai.example.com. Did your DNS change?
  7. Certificate expiry. openssl s_client -connect ai.example.com:443 2>/dev/null | openssl x509 -noout -dates. Self-signed certs expire after 365 days by default.
  8. Package updates. dnf history list. Did a kernel update break NVIDIA drivers? (Reboot + DKMS rebuild needed.)

What’s Next

You have a working, monitored, hardened AI stack on Rocky Linux. Ollama serves models, Open WebUI gives your users a clean interface, nginx handles SSL, Grafana shows you what’s happening, and backups run on autopilot. That’s a solid foundation — and it’s designed to be extended.

Natural next steps that build on this base:

  • RAG pipeline — connect Open WebUI to a document store so your models can answer questions about your own files, wikis, or knowledge base. This plugs into the existing Open WebUI + Ollama setup without replacing anything.
  • vLLM backend — swap Ollama for vLLM when you need higher concurrency or continuous batching. The nginx proxy and monitoring layers stay the same; only the inference backend changes.
  • Multi-model load balancing — distribute requests across multiple GPU hosts when one card isn’t enough.
  • OAuth2 / SSO access control — replace Open WebUI’s built-in auth with your existing identity provider for team deployments.

Each of these extends the ai-stack-deploy/ playbook bundle rather than replacing it. Check RavenForge Press for upcoming add-on guides as they become available.

Want the automation code? Get the Ansible playbooks that deploy this entire stack in minutes.

Get Guide + Playbooks — $14