What you’ll accomplish: Get your NVIDIA GPU passed through from Proxmox, install the drivers on Rocky Linux with SELinux enforcing, and verify everything works — or confidently skip this chapter for CPU-only.
Starting without a GPU? That’s the smart move. Deploy the full stack on CPU first (Chapters 4-7), verify everything works end-to-end, then come back here to add GPU acceleration. The entire stack is designed for this — every component detects GPU availability automatically. You lose nothing by starting CPU-only. Skip to Chapter 4.
Why GPU Matters (and Why CPU Is Fine to Start)
A 7B parameter model generates about 5-10 tokens per second on a modern CPU. That’s usable — you ask a question, wait 20-30 seconds, get an answer. It’s fine for testing, fine for occasional use, and fine for getting the entire stack deployed and verified before you invest in GPU hardware.
With an 8 GB GPU, that same model generates 30-60 tokens per second. The response feels instant. A 13B model becomes practical. Conversations feel natural instead of like waiting for a fax machine.
If you’re planning to run this for a household or a small team, a GPU is worth it. If you’re evaluating the stack or don’t have GPU hardware yet, CPU works. Every chapter in this guide handles both paths.
Proxmox GPU Passthrough
Hardware-specific content. GPU passthrough depends on your specific motherboard, CPU, GPU model, and BIOS/UEFI settings. The steps below follow the current Proxmox 8.x wiki and cover the standard process, but your hardware may require additional steps (ACS override, GPU ROM file, specific BIOS settings). The Proxmox PCI Passthrough wiki page is the authoritative reference — use it alongside this chapter.
If your AI host is a Proxmox VM (which it probably is, if you’re reading a home lab guide), you need to pass the GPU through to the VM. This is a host-level operation — you’re telling Proxmox to give the VM direct hardware access to the GPU.
Step 1: Verify IOMMU Is Enabled
On the Proxmox host (not the VM):
dmesg | grep -i iommu
You should see lines like:
DMAR: IOMMU enabled
If IOMMU isn’t enabled, add the appropriate kernel parameter to your bootloader:
- Intel CPUs: Add
intel_iommu=on iommu=ptto theGRUB_CMDLINE_LINUX_DEFAULTline in/etc/default/grub. Theiommu=pt(passthrough) flag ensures only devices you explicitly pass through use IOMMU translation — without it, all devices go through IOMMU, which hurts performance. - AMD CPUs: AMD IOMMU is enabled by default in the kernel — no kernel parameter needed. Add
iommu=ptfor performance, but skipamd_iommu=on(it’s a no-op). If in doubt, verify withdmesg | grep -i iommu.
Then run update-grub and reboot the Proxmox host.
Step 2: Identify Your GPU’s IOMMU Group
On the Proxmox host:
# List all IOMMU groups and their devices
for g in /sys/kernel/iommu_groups/*/devices/*; do
echo "IOMMU Group $(basename $(dirname $(dirname $g))): $(lspci -nns $(basename $g))"
done | grep -i nvidia
Note the PCI address (e.g., 01:00.0) and the IOMMU group number. If other devices share the same IOMMU group, they’ll all get passed through together — that’s normal for GPU + audio controller pairs.
Step 3: Blacklist Host GPU Drivers
The Proxmox host must not claim the GPU. Create a blacklist file:
sudo tee /etc/modprobe.d/blacklist-nvidia.conf > /dev/null << 'EOF'
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
# Blacklist the GPU's audio controller so the host doesn't claim it
blacklist snd_hda_intel
options vfio-pci ids=10de:XXXX,10de:YYYY
EOF
Replace the IDs with your GPU’s vendor:device IDs from the lspci -nn output.
Note: If
snd_hda_intelis also your motherboard’s onboard audio, don’t blacklist it outright — usesoftdep snd_hda_intel pre: vfio-pciinstead. The softdep ensures vfio-pci claims the GPU audio first while your onboard audio still works.
Next, load the VFIO modules at boot — without these, the GPU won’t be claimed by vfio-pci and passthrough won’t work:
# Load VFIO modules at boot
echo -e "vfio\nvfio_iommu_type1\nvfio_pci" | sudo tee /etc/modules-load.d/vfio.conf
Note: On Proxmox 8+ (kernel 6.2+),
vfio_virqfdis built into the kernel — do NOT add it to the modules list. You’ll see guides from 2020-2022 that include it; it’s no longer needed and will cause a warning.
Then regenerate the initramfs and reboot:
update-initramfs -u
reboot
Step 4: Add the GPU to the VM
In the Proxmox web UI (or via qm set):
- Go to your AI VM’s Hardware tab
- Click Add > PCI Device
- Select your GPU from the list
- Check All Functions (passes through the audio controller too)
- Check PCI-Express if available
- Set ROM-Bar to on
Important: In the VM’s CPU settings, make sure the CPU type is set to
host— this is required for GPU passthrough. If you’re seeing “Code 43” errors from the NVIDIA driver, first try addinghidden=1to the CPU config (cpu: host,hidden=1). Note that NVIDIA drivers from 2021+ (version 465+) have removed the hypervisor detection that caused Code 43 on Linux KVM guests — this is primarily a Windows VM issue now. Chapter 7 covers fallback options ifhidden=1alone doesn’t work.
Step 5: Verify Passthrough Inside the VM
Boot the VM and check:
lspci | grep -i nvidia
You should see your GPU listed. If it’s not there, go back and check the IOMMU group and PCI device assignment.
NVIDIA Driver Installation
Automated and tested. The driver installation steps below correspond exactly to the
nvidia-gpuAnsible role in the companion playbook bundle. Package installation (EPEL, kernel headers, CUDA repo, driver packages) has been validated on Rocky Linux 9 via automated testing. Post-install steps (nvidia-smi verification, persistenced, SELinux context) require GPU hardware and are validated by the playbook’s verification phase when you run it on your own host.
Now we’re working inside the Rocky Linux VM.
Install Build Dependencies
NVIDIA drivers compile kernel modules, so you need the kernel headers and build tools:
# DKMS requires EPEL repository
sudo dnf install -y epel-release
# Install kernel headers matching your running kernel, plus build tools
sudo dnf install -y kernel-devel-matched kernel-headers gcc make dkms
Note:
dkmscomes from the EPEL repository — if you skip theepel-releaseinstall,dnf install dkmswill fail with “no package available.”
Important: The
kernel-devel-matchedpackage automatically resolves to headers for your running kernel (replacing the oldkernel-devel-$(uname -r)pattern). However, if you’ve updated the kernel but haven’t rebooted, your running kernel is still the old version. Reboot first, then install headers.
Add the NVIDIA CUDA Repository
We use NVIDIA’s official CUDA repository, not ELRepo or RPMFusion. Why? Version consistency. The CUDA repo keeps the driver, CUDA toolkit, and libraries in sync. Mixing sources leads to version mismatches that are painful to debug.
# Install SELinux management tools (needed for device context step)
sudo dnf install -y policycoreutils-python-utils
# Add the NVIDIA CUDA repo for RHEL 9 / Rocky 9
sudo tee /etc/yum.repos.d/nvidia-cuda.repo > /dev/null << 'EOF'
[nvidia-cuda]
name=NVIDIA CUDA Repository for RHEL 9
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/D42D0685.pub
# module_hotfixes bypasses DNF module filtering — needed because
# NVIDIA distributes drivers as module streams on RHEL 9
module_hotfixes=1
enabled=1
EOF
Install the Drivers
sudo dnf install -y nvidia-driver nvidia-driver-cuda nvidia-driver-cuda-libs
This installs the kernel module, the nvidia-smi utility, and the CUDA libraries that Ollama needs for GPU inference. The installation triggers a DKMS build of the kernel module — this takes a minute or two.
Reboot after installation. The kernel module needs to load at boot:
sudo reboot
Verify the Driver
After reboot:
nvidia-smi
You should see output showing your GPU model, driver version, CUDA version, temperature, and memory usage. If you get command not found or a driver error, see Chapter 7.
SELinux Device Context
This is the step that 95% of tutorials skip — and it’s the step that causes mysterious permission denials later when Ollama or containers try to access the GPU.
The Problem
When NVIDIA drivers create device nodes (/dev/nvidia0, /dev/nvidiactl, /dev/nvidia-uvm, etc.), they get the default SELinux context device_t. Processes running under confined SELinux domains — including systemd services and Podman containers — can’t access device_t devices without explicit policy.
The Fix
Set the correct SELinux type for NVIDIA device nodes:
# Add a persistent file context rule
sudo semanage fcontext -a -t xserver_misc_device_t "/dev/nvidia.*"
# Apply it to existing device nodes
sudo restorecon -Rv /dev/nvidia*
The xserver_misc_device_t type is the correct context for GPU devices on RHEL-family systems. It allows access from the domains that Ollama and Podman containers run under.
Note: The
semanage fcontextrule is persistent — it survives reboots. However, the NVIDIA driver recreates device nodes on each boot, and they start with the defaultdevice_tcontext until relabeled. In practice, systemd runsrestoreconon/devduring boot, so the correct context is applied automatically. If you hit SELinux denials after a reboot, verify withls -Z /dev/nvidia*and re-runrestorecon -Rv /dev/nvidia*if needed.
Verify
# Check device contexts
ls -Z /dev/nvidia*
Every device node should show xserver_misc_device_t, not device_t.
# Check for recent SELinux denials
sudo ausearch -m avc -ts recent | grep nvidia
This should return nothing. If you see denials, the context wasn’t applied correctly — re-run the restorecon command.
NVIDIA Persistence Daemon
Ollama loads and unloads models from GPU memory. Without the persistence daemon, the GPU driver reinitializes on every first access after idle — adding 2-5 seconds of latency to the first inference request.
sudo systemctl enable --now nvidia-persistenced
It’s a small thing, but it eliminates a confusing “why is the first request slow?” question.
What Automation Looks Like
You just completed a lot of manual steps. Here’s what the nvidia-gpu Ansible role does — it splits the work into two phases so it works whether or not GPU hardware is present:
Phase 1: Install (runs on any Rocky Linux 9 host)
- Installs
policycoreutils-python-utilsfor SELinux management - Installs EPEL repository (required for DKMS)
- Installs kernel headers (
kernel-devel-matched) and build tools - Adds the NVIDIA CUDA repository with GPG verification
- Installs
nvidia-driver,nvidia-driver-cuda, andnvidia-driver-cuda-libs
Phase 2: Configure (only runs when /dev/nvidia0 exists — i.e., GPU hardware is detected)
6. Reboots to load the NVIDIA kernel module
7. Verifies GPU access with nvidia-smi (gives a useful error message on failure)
8. Enables nvidia-persistenced
9. Sets the xserver_misc_device_t SELinux context on /dev/nvidia.*
10. Runs restorecon to apply the context
If the role detects no GPU hardware after Phase 1, it skips Phase 2 and prints a message — this means you can safely enable ai_gpu_enabled: true even before you’ve configured Proxmox passthrough. The packages will be ready, and the configure phase will run automatically on the next playbook execution after you pass the GPU through and reboot.
Every step is idempotent — re-running the playbook on a host that’s already configured changes nothing. That means you can use it for both initial deployment and drift detection. The companion playbook bundle is available at RavenForge Press.
Verification Checkpoint
Before moving to Chapter 4, confirm:
nvidia-smishows your GPU with driver version and CUDA versionsystemctl status nvidia-persistencedshows activels -Z /dev/nvidia*showsxserver_misc_device_ton all device nodessudo ausearch -m avc -ts recent | grep nvidiareturns no denialslspci | grep -i nvidiashows your GPU model
CPU-only checkpoint: If you skipped the GPU sections, verify your CPU-only path is clean before moving on: run ollama run tinyllama "hello" — it’ll be slow (expect 10-30 seconds), but it should respond. Check journalctl -u ollama --no-pager -n 20 for any GPU-related errors (there shouldn’t be any). You’re good.
Either way, you’re ready for Ollama.