← Self-Hosting AI the Right Way

Chapter 3

GPU Setup & NVIDIA Drivers

In this chapter
<nav id="TableOfContents" aria-label="Chapter sections"> <ul> <li><a href="#why-gpu-matters-and-why-cpu-is-fine-to-start">Why GPU Matters (and Why CPU Is Fine to Start)</a></li> <li><a href="#proxmox-gpu-passthrough">Proxmox GPU Passthrough</a> <ul> <li><a href="#step-1-verify-iommu-is-enabled">Step 1: Verify IOMMU Is Enabled</a></li> <li><a href="#step-2-identify-your-gpus-iommu-group">Step 2: Identify Your GPU&rsquo;s IOMMU Group</a></li> <li><a href="#step-3-blacklist-host-gpu-drivers">Step 3: Blacklist Host GPU Drivers</a></li> <li><a href="#step-4-add-the-gpu-to-the-vm">Step 4: Add the GPU to the VM</a></li> <li><a href="#step-5-verify-passthrough-inside-the-vm">Step 5: Verify Passthrough Inside the VM</a></li> </ul> </li> <li><a href="#nvidia-driver-installation">NVIDIA Driver Installation</a> <ul> <li><a href="#install-build-dependencies">Install Build Dependencies</a></li> <li><a href="#add-the-nvidia-cuda-repository">Add the NVIDIA CUDA Repository</a></li> <li><a href="#install-the-drivers">Install the Drivers</a></li> <li><a href="#verify-the-driver">Verify the Driver</a></li> </ul> </li> <li><a href="#selinux-device-context">SELinux Device Context</a> <ul> <li><a href="#the-problem">The Problem</a></li> <li><a href="#the-fix">The Fix</a></li> <li><a href="#verify">Verify</a></li> </ul> </li> <li><a href="#nvidia-persistence-daemon">NVIDIA Persistence Daemon</a></li> <li><a href="#what-automation-looks-like">What Automation Looks Like</a></li> <li><a href="#verification-checkpoint">Verification Checkpoint</a></li> </ul> </nav>

What you’ll accomplish: Get your NVIDIA GPU passed through from Proxmox, install the drivers on Rocky Linux with SELinux enforcing, and verify everything works — or confidently skip this chapter for CPU-only.

Starting without a GPU? That’s the smart move. Deploy the full stack on CPU first (Chapters 4-7), verify everything works end-to-end, then come back here to add GPU acceleration. The entire stack is designed for this — every component detects GPU availability automatically. You lose nothing by starting CPU-only. Skip to Chapter 4.

Why GPU Matters (and Why CPU Is Fine to Start)

A 7B parameter model generates about 5-10 tokens per second on a modern CPU. That’s usable — you ask a question, wait 20-30 seconds, get an answer. It’s fine for testing, fine for occasional use, and fine for getting the entire stack deployed and verified before you invest in GPU hardware.

With an 8 GB GPU, that same model generates 30-60 tokens per second. The response feels instant. A 13B model becomes practical. Conversations feel natural instead of like waiting for a fax machine.

If you’re planning to run this for a household or a small team, a GPU is worth it. If you’re evaluating the stack or don’t have GPU hardware yet, CPU works. Every chapter in this guide handles both paths.

Proxmox GPU Passthrough

Hardware-specific content. GPU passthrough depends on your specific motherboard, CPU, GPU model, and BIOS/UEFI settings. The steps below follow the current Proxmox 8.x wiki and cover the standard process, but your hardware may require additional steps (ACS override, GPU ROM file, specific BIOS settings). The Proxmox PCI Passthrough wiki page is the authoritative reference — use it alongside this chapter.

If your AI host is a Proxmox VM (which it probably is, if you’re reading a home lab guide), you need to pass the GPU through to the VM. This is a host-level operation — you’re telling Proxmox to give the VM direct hardware access to the GPU.

Step 1: Verify IOMMU Is Enabled

On the Proxmox host (not the VM):

dmesg | grep -i iommu

You should see lines like:

DMAR: IOMMU enabled

If IOMMU isn’t enabled, add the appropriate kernel parameter to your bootloader:

  • Intel CPUs: Add intel_iommu=on iommu=pt to the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub. The iommu=pt (passthrough) flag ensures only devices you explicitly pass through use IOMMU translation — without it, all devices go through IOMMU, which hurts performance.
  • AMD CPUs: AMD IOMMU is enabled by default in the kernel — no kernel parameter needed. Add iommu=pt for performance, but skip amd_iommu=on (it’s a no-op). If in doubt, verify with dmesg | grep -i iommu.

Then run update-grub and reboot the Proxmox host.

Step 2: Identify Your GPU’s IOMMU Group

On the Proxmox host:

# List all IOMMU groups and their devices
for g in /sys/kernel/iommu_groups/*/devices/*; do
    echo "IOMMU Group $(basename $(dirname $(dirname $g))): $(lspci -nns $(basename $g))"
done | grep -i nvidia

Note the PCI address (e.g., 01:00.0) and the IOMMU group number. If other devices share the same IOMMU group, they’ll all get passed through together — that’s normal for GPU + audio controller pairs.

Step 3: Blacklist Host GPU Drivers

The Proxmox host must not claim the GPU. Create a blacklist file:

sudo tee /etc/modprobe.d/blacklist-nvidia.conf > /dev/null << 'EOF'
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
# Blacklist the GPU's audio controller so the host doesn't claim it
blacklist snd_hda_intel
options vfio-pci ids=10de:XXXX,10de:YYYY
EOF

Replace the IDs with your GPU’s vendor:device IDs from the lspci -nn output.

Note: If snd_hda_intel is also your motherboard’s onboard audio, don’t blacklist it outright — use softdep snd_hda_intel pre: vfio-pci instead. The softdep ensures vfio-pci claims the GPU audio first while your onboard audio still works.

Next, load the VFIO modules at boot — without these, the GPU won’t be claimed by vfio-pci and passthrough won’t work:

# Load VFIO modules at boot
echo -e "vfio\nvfio_iommu_type1\nvfio_pci" | sudo tee /etc/modules-load.d/vfio.conf

Note: On Proxmox 8+ (kernel 6.2+), vfio_virqfd is built into the kernel — do NOT add it to the modules list. You’ll see guides from 2020-2022 that include it; it’s no longer needed and will cause a warning.

Then regenerate the initramfs and reboot:

update-initramfs -u
reboot

Step 4: Add the GPU to the VM

In the Proxmox web UI (or via qm set):

  1. Go to your AI VM’s Hardware tab
  2. Click Add > PCI Device
  3. Select your GPU from the list
  4. Check All Functions (passes through the audio controller too)
  5. Check PCI-Express if available
  6. Set ROM-Bar to on

Important: In the VM’s CPU settings, make sure the CPU type is set to host — this is required for GPU passthrough. If you’re seeing “Code 43” errors from the NVIDIA driver, first try adding hidden=1 to the CPU config (cpu: host,hidden=1). Note that NVIDIA drivers from 2021+ (version 465+) have removed the hypervisor detection that caused Code 43 on Linux KVM guests — this is primarily a Windows VM issue now. Chapter 7 covers fallback options if hidden=1 alone doesn’t work.

Step 5: Verify Passthrough Inside the VM

Boot the VM and check:

lspci | grep -i nvidia

You should see your GPU listed. If it’s not there, go back and check the IOMMU group and PCI device assignment.

NVIDIA Driver Installation

Automated and tested. The driver installation steps below correspond exactly to the nvidia-gpu Ansible role in the companion playbook bundle. Package installation (EPEL, kernel headers, CUDA repo, driver packages) has been validated on Rocky Linux 9 via automated testing. Post-install steps (nvidia-smi verification, persistenced, SELinux context) require GPU hardware and are validated by the playbook’s verification phase when you run it on your own host.

Now we’re working inside the Rocky Linux VM.

Install Build Dependencies

NVIDIA drivers compile kernel modules, so you need the kernel headers and build tools:

# DKMS requires EPEL repository
sudo dnf install -y epel-release

# Install kernel headers matching your running kernel, plus build tools
sudo dnf install -y kernel-devel-matched kernel-headers gcc make dkms

Note: dkms comes from the EPEL repository — if you skip the epel-release install, dnf install dkms will fail with “no package available.”

Important: The kernel-devel-matched package automatically resolves to headers for your running kernel (replacing the old kernel-devel-$(uname -r) pattern). However, if you’ve updated the kernel but haven’t rebooted, your running kernel is still the old version. Reboot first, then install headers.

Add the NVIDIA CUDA Repository

We use NVIDIA’s official CUDA repository, not ELRepo or RPMFusion. Why? Version consistency. The CUDA repo keeps the driver, CUDA toolkit, and libraries in sync. Mixing sources leads to version mismatches that are painful to debug.

# Install SELinux management tools (needed for device context step)
sudo dnf install -y policycoreutils-python-utils

# Add the NVIDIA CUDA repo for RHEL 9 / Rocky 9
sudo tee /etc/yum.repos.d/nvidia-cuda.repo > /dev/null << 'EOF'
[nvidia-cuda]
name=NVIDIA CUDA Repository for RHEL 9
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/D42D0685.pub
# module_hotfixes bypasses DNF module filtering — needed because
# NVIDIA distributes drivers as module streams on RHEL 9
module_hotfixes=1
enabled=1
EOF

Install the Drivers

sudo dnf install -y nvidia-driver nvidia-driver-cuda nvidia-driver-cuda-libs

This installs the kernel module, the nvidia-smi utility, and the CUDA libraries that Ollama needs for GPU inference. The installation triggers a DKMS build of the kernel module — this takes a minute or two.

Reboot after installation. The kernel module needs to load at boot:

sudo reboot

Verify the Driver

After reboot:

nvidia-smi

You should see output showing your GPU model, driver version, CUDA version, temperature, and memory usage. If you get command not found or a driver error, see Chapter 7.

SELinux Device Context

This is the step that 95% of tutorials skip — and it’s the step that causes mysterious permission denials later when Ollama or containers try to access the GPU.

The Problem

When NVIDIA drivers create device nodes (/dev/nvidia0, /dev/nvidiactl, /dev/nvidia-uvm, etc.), they get the default SELinux context device_t. Processes running under confined SELinux domains — including systemd services and Podman containers — can’t access device_t devices without explicit policy.

The Fix

Set the correct SELinux type for NVIDIA device nodes:

# Add a persistent file context rule
sudo semanage fcontext -a -t xserver_misc_device_t "/dev/nvidia.*"

# Apply it to existing device nodes
sudo restorecon -Rv /dev/nvidia*

The xserver_misc_device_t type is the correct context for GPU devices on RHEL-family systems. It allows access from the domains that Ollama and Podman containers run under.

Note: The semanage fcontext rule is persistent — it survives reboots. However, the NVIDIA driver recreates device nodes on each boot, and they start with the default device_t context until relabeled. In practice, systemd runs restorecon on /dev during boot, so the correct context is applied automatically. If you hit SELinux denials after a reboot, verify with ls -Z /dev/nvidia* and re-run restorecon -Rv /dev/nvidia* if needed.

Verify

# Check device contexts
ls -Z /dev/nvidia*

Every device node should show xserver_misc_device_t, not device_t.

# Check for recent SELinux denials
sudo ausearch -m avc -ts recent | grep nvidia

This should return nothing. If you see denials, the context wasn’t applied correctly — re-run the restorecon command.

NVIDIA Persistence Daemon

Ollama loads and unloads models from GPU memory. Without the persistence daemon, the GPU driver reinitializes on every first access after idle — adding 2-5 seconds of latency to the first inference request.

sudo systemctl enable --now nvidia-persistenced

It’s a small thing, but it eliminates a confusing “why is the first request slow?” question.

What Automation Looks Like

You just completed a lot of manual steps. Here’s what the nvidia-gpu Ansible role does — it splits the work into two phases so it works whether or not GPU hardware is present:

Phase 1: Install (runs on any Rocky Linux 9 host)

  1. Installs policycoreutils-python-utils for SELinux management
  2. Installs EPEL repository (required for DKMS)
  3. Installs kernel headers (kernel-devel-matched) and build tools
  4. Adds the NVIDIA CUDA repository with GPG verification
  5. Installs nvidia-driver, nvidia-driver-cuda, and nvidia-driver-cuda-libs

Phase 2: Configure (only runs when /dev/nvidia0 exists — i.e., GPU hardware is detected) 6. Reboots to load the NVIDIA kernel module 7. Verifies GPU access with nvidia-smi (gives a useful error message on failure) 8. Enables nvidia-persistenced 9. Sets the xserver_misc_device_t SELinux context on /dev/nvidia.* 10. Runs restorecon to apply the context

If the role detects no GPU hardware after Phase 1, it skips Phase 2 and prints a message — this means you can safely enable ai_gpu_enabled: true even before you’ve configured Proxmox passthrough. The packages will be ready, and the configure phase will run automatically on the next playbook execution after you pass the GPU through and reboot.

Every step is idempotent — re-running the playbook on a host that’s already configured changes nothing. That means you can use it for both initial deployment and drift detection. The companion playbook bundle is available at RavenForge Press.

Verification Checkpoint

Before moving to Chapter 4, confirm:

  • nvidia-smi shows your GPU with driver version and CUDA version
  • systemctl status nvidia-persistenced shows active
  • ls -Z /dev/nvidia* shows xserver_misc_device_t on all device nodes
  • sudo ausearch -m avc -ts recent | grep nvidia returns no denials
  • lspci | grep -i nvidia shows your GPU model

CPU-only checkpoint: If you skipped the GPU sections, verify your CPU-only path is clean before moving on: run ollama run tinyllama "hello" — it’ll be slow (expect 10-30 seconds), but it should respond. Check journalctl -u ollama --no-pager -n 20 for any GPU-related errors (there shouldn’t be any). You’re good.

Either way, you’re ready for Ollama.

Want the automation code? Get the Ansible playbooks that deploy this entire stack in minutes.

Get Guide + Playbooks — $14