← Deploying Rundeck the Right Way

Chapter 7

Troubleshooting & Gotchas

In this chapter
<nav id="TableOfContents" aria-label="Chapter sections"> <ul> <li><a href="#where-to-look-log-files">Where to Look: Log Files</a> <ul> <li><a href="#the-quick-diagnostic-sequence">The Quick Diagnostic Sequence</a></li> </ul> </li> <li><a href="#the-8-problems-that-cost-you-hours">The 8 Problems That Cost You Hours</a> <ul> <li><a href="#h2-database-corruption">H2 Database Corruption</a></li> <li><a href="#java-heap-exhaustion">Java Heap Exhaustion</a></li> <li><a href="#selinux-blocks-the-reverse-proxy">SELinux Blocks the Reverse Proxy</a></li> <li><a href="#grailsserverurl-mismatch">grails.serverURL Mismatch</a></li> <li><a href="#ansible-plugin-cant-find-inventory">Ansible Plugin Can&rsquo;t Find Inventory</a></li> <li><a href="#ssh-key-permissions">SSH Key Permissions</a></li> <li><a href="#execution-log-disk-exhaustion">Execution Log Disk Exhaustion</a></li> <li><a href="#mariadb-connector-version-mismatch">MariaDB Connector Version Mismatch</a></li> </ul> </li> <li><a href="#the-it-worked-yesterday-checklist">The &ldquo;It Worked Yesterday&rdquo; Checklist</a></li> <li><a href="#when-to-check-the-selinux-audit-log">When to Check the SELinux Audit Log</a></li> <li><a href="#common-error-messages-decoded">Common Error Messages Decoded</a></li> </ul> </nav>

What you’ll accomplish: Know exactly where to look when things break, understand the failure modes that cost real people real hours, and have a systematic approach to diagnosing Rundeck issues.

Every deployment guide that ends at “it works!” is incomplete. Things will break. Java will run out of memory. SELinux will silently deny something. A certificate will expire. This chapter is the reference you’ll reach for when Rundeck stops cooperating.


Where to Look: Log Files

Before diagnosing anything, know which logs to check. Rundeck spreads information across several files, and each one tells a different story.

Log FileWhat It ContainsWhen to Check It
/var/log/rundeck/service.logMain application log — startup, errors, stack traces, plugin outputFirst place to look for any issue
/var/log/rundeck/rundeck.api.logAPI request log — every API call with status codesAPI failures, authentication issues
/var/log/rundeck/rundeck.audit.logAudit trail — who did what and whenPermission issues, tracking changes
/var/log/httpd/error_logApache reverse proxy errors503 errors, SSL issues, proxy failures
/var/log/httpd/access_logApache request logVerifying traffic reaches the proxy
journalctl -u rundeckdSystemd journal for the Rundeck serviceStartup failures, OOM kills, service crashes
journalctl -u mariadbSystemd journal for MariaDBDatabase connection issues

The Quick Diagnostic Sequence

When something is wrong and you’re not sure where to start:

# 1. Is the service running?
sudo systemctl status rundeckd

# 2. What happened recently in the logs?
sudo journalctl -u rundeckd --since "10 minutes ago" --no-pager

# 3. What does the application log say?
sudo tail -100 /var/log/rundeck/service.log

# 4. Is disk space OK?
df -h /var/lib/rundeck/logs/ /var/log/rundeck/

# 5. Is memory OK?
free -h

# 6. Is SELinux blocking something?
sudo ausearch -m avc -ts recent

Run all six. It takes 30 seconds and rules out the most common causes. I can’t count the number of times the problem turned out to be disk space or an SELinux denial that none of the application logs mentioned.


The 8 Problems That Cost You Hours

These aren’t theoretical. Every one of them has burned real users, generated GitHub issues, and consumed debugging sessions that could have been avoided with the right knowledge.

H2 Database Corruption

Symptom: Rundeck fails to start. service.log shows:

org.h2.jdbc.JdbcSQLException: File corrupted while reading record

Cause: The embedded H2 database does not flush writes to disk synchronously. Any unclean shutdown — power loss, OOM kill, kill -9, even a systemctl stop during heavy writes — can corrupt the database file. This is the most common issue in Rundeck’s GitHub tracker (issues #6003, #7764, #3044, #3868).

Fix: If you followed this guide, you don’t have this problem because you’re using MariaDB. If you’re reading this because you didn’t follow this guide: there is no reliable recovery for a corrupted H2 database. Install MariaDB, configure it per Chapter 3, update rundeck-config.properties per Chapter 4, and start fresh. Your job definitions are gone unless you exported them.

Prevention: Use MariaDB from day one. There is no scenario where H2 is the right choice for persistent use.

Java Heap Exhaustion

Symptom: Rundeck becomes unresponsive. The UI loads slowly or not at all. Eventually the process is killed. You find this in the logs:

java.lang.OutOfMemoryError: Java heap space

Or in dmesg:

Out of memory: Killed process <pid> (java) total-vm:XXXXXXX

Cause: The JVM heap is too small. Rundeck loads job definitions, execution metadata, and plugin state into memory. Even a modest home lab with a few dozen jobs and a couple weeks of execution history can exceed the default heap.

Fix: Increase -Xmx in /etc/sysconfig/rundeckd:

# Edit the sysconfig file
sudo vi /etc/sysconfig/rundeckd

# Set the heap (minimum 2048m, recommended 4096m for 8GB VM)
RDECK_JVM_OPTS="$RDECK_JVM_OPTS -Xmx2048m -Xms1024m"

# Restart Rundeck
sudo systemctl restart rundeckd

Prevention: Set the heap at deployment time (Chapter 4). Don’t wait for the OOM killer to tell you the default was too small.

SELinux Blocks the Reverse Proxy

Symptom: Navigating to https://rundeck.example.com returns a 503 error. But curl http://localhost:4440 from the Rundeck host itself works fine. Apache is running. The proxy config looks correct.

Cause: SELinux’s httpd_can_network_connect boolean is off by default on Rocky Linux. This prevents Apache from making outbound TCP connections, which is exactly what a reverse proxy needs to do. The error is silent — Apache logs a generic “proxy error” and SELinux doesn’t log denials by default unless you have auditd configured to catch them.

Fix:

# Check the current state
getsebool httpd_can_network_connect

# If "off", enable it persistently
sudo setsebool -P httpd_can_network_connect on

Prevention: Set this boolean during deployment (Chapter 4). The bundled playbook handles it automatically.

grails.serverURL Mismatch

Symptom: Any of these:

  • Login redirects to http://localhost:4440 instead of your actual URL
  • CSRF token validation errors after login
  • The web UI loads but JavaScript assets fail (mixed content)
  • API calls return redirects instead of data

Cause: grails.serverURL in /etc/rundeck/rundeck-config.properties doesn’t match the URL users access in their browser. Rundeck uses this setting to generate absolute URLs for redirects, CSRF tokens, and asset loading.

Fix:

sudo vi /etc/rundeck/rundeck-config.properties

Set it to match the browser URL exactly:

grails.serverURL = https://rundeck.example.com

Also verify:

server.useForwardHeaders = true

Restart Rundeck:

sudo systemctl restart rundeckd

The rules: Protocol must match (https://). Hostname must match. No trailing slash. No port number unless it’s non-standard. If in doubt, copy the URL from your browser’s address bar (minus the path) and paste it as the value.

Ansible Plugin Can’t Find Inventory

Symptom: The Nodes tab in Rundeck shows zero nodes. Ansible playbook steps fail with “no hosts matched” or “Could not match supplied host pattern.”

Cause: The Ansible Resource Model Source is pointing to the wrong inventory path, the rundeck user can’t read the file, or the inventory has a syntax error.

Fix:

# Test the inventory as the rundeck user
sudo -u rundeck ansible-inventory --list -i /var/lib/rundeck/inventory/hosts.yml

If this command fails:

  • File not found — The path in Rundeck’s project settings is wrong. Use an absolute path.
  • Permission denied — Fix ownership: sudo chown rundeck:rundeck /var/lib/rundeck/inventory/hosts.yml
  • Syntax error — YAML parsing error in the inventory file. Fix the YAML.

If the command succeeds but Rundeck still shows zero nodes, clear the resource model cache: go to Nodes tab, click the refresh icon, or restart Rundeck.

SSH Key Permissions

Symptom: Rundeck jobs fail with:

Permission denied (publickey,gssapi-keyex,gssapi-with-mic)

Cause: SSH is strict about key file permissions. The private key must be owned by the rundeck user and have 0600 permissions. The .ssh directory must be 0700. Alternatively, the public key was never deployed to the target node.

Fix:

# Fix permissions on the Rundeck host
sudo chown rundeck:rundeck /var/lib/rundeck/.ssh/id_ed25519
sudo chmod 0600 /var/lib/rundeck/.ssh/id_ed25519
sudo chmod 0700 /var/lib/rundeck/.ssh/

# Verify the connection
sudo -u rundeck ssh 192.168.1.51 hostname

If the permissions are correct but it still fails, check that the public key is in ~rundeck/.ssh/authorized_keys on the target node and that the target’s sshd_config allows key-based auth (PubkeyAuthentication yes).

Execution Log Disk Exhaustion

Symptom: Jobs start failing with I/O errors. Or the system slows down. df -h shows /var/lib/rundeck/logs/ or the partition it sits on is nearly full.

Cause: Every Rundeck job execution stores its full output log on disk. Without cleanup, this grows indefinitely. A job that runs hourly and produces a few KB of output doesn’t seem like much — until you realize that’s 8,760 log files per year per job.

Fix: Configure execution history cleanup in rundeck-config.properties:

# Clean up execution logs older than 30 days
rundeck.execution.logs.fileStorage.deletionPolicy = delayed
rundeck.execution.logs.fileStorage.retentionTime = 30d

Or clean up manually:

# See how much space logs are using
du -sh /var/lib/rundeck/logs/

# Remove logs older than 30 days
find /var/lib/rundeck/logs/ -name "*.log" -mtime +30 -delete

Prevention: Set the retention policy at deployment time. The bundled playbook includes this as a configurable variable (rundeck_log_retention_days).

MariaDB Connector Version Mismatch

Symptom: After upgrading Rundeck or MariaDB, Rundeck fails to start with JDBC driver errors:

java.sql.SQLException: No suitable driver found

Or:

ClassNotFoundException: org.mariadb.jdbc.Driver

Cause: The MariaDB JDBC connector JAR in /var/lib/rundeck/lib/ is incompatible with the installed MariaDB version, or the JAR was deleted/corrupted during an upgrade.

Fix:

# Check what's in the lib directory
ls -la /var/lib/rundeck/lib/

# If the JAR is missing or wrong version, re-download
sudo curl -L -o /var/lib/rundeck/lib/mariadb-java-client-3.3.2.jar \
  https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/3.3.2/mariadb-java-client-3.3.2.jar
sudo chown rundeck:rundeck /var/lib/rundeck/lib/mariadb-java-client-3.3.2.jar

sudo systemctl restart rundeckd

The “It Worked Yesterday” Checklist

When Rundeck was fine yesterday and isn’t today, run through this list. The answer is almost always one of these:

# 1. Is the service actually running?
sudo systemctl status rundeckd

# 2. Did it run out of disk space?
df -h /var/lib/rundeck/ /var/log/rundeck/ /tmp/

# 3. Did it run out of memory?
free -h
sudo dmesg | grep -i "out of memory" | tail -5

# 4. Did a certificate expire?
openssl x509 -in /etc/pki/tls/certs/rundeck.example.com.crt -noout -dates

# 5. Did DNS break?
host rundeck.example.com

# 6. Did someone change SELinux?
getenforce
getsebool httpd_can_network_connect

# 7. Did a package update change something?
sudo dnf history info last

# 8. Did MariaDB stop?
sudo systemctl status mariadb

Nine times out of ten, it’s disk space, an expired certificate, or a package update that restarted a service with a changed config file. The diagnostic sequence above takes under a minute and eliminates the most common causes.


When to Check the SELinux Audit Log

SELinux denials don’t always produce obvious error messages in application logs. When something “should work” but doesn’t, and the application logs are unhelpful, check the audit log:

# Recent AVC denials
sudo ausearch -m avc -ts recent

# If ausearch is noisy, filter for httpd or java
sudo ausearch -m avc -ts recent | grep -E "httpd|java"

If you find a denial, audit2why explains why:

sudo ausearch -m avc -ts recent | audit2why

This will tell you which boolean to set or which policy is blocking the action. For Rundeck deployments, the denial is almost always httpd_can_network_connect, but occasionally you’ll see denials related to file access (especially if you put config files in non-standard locations).


Common Error Messages Decoded

Error MessageLikely CauseWhere to Fix
Grails application running at http://localhost:4440Normal startup message — not an errorN/A
Unable to resolve hostDNS failure for the Rundeck hostnameCheck /etc/hosts or DNS
Connection refused (port 3306)MariaDB is not runningsudo systemctl start mariadb
Access denied for user 'rundeck'@'localhost'Wrong MariaDB password in configrundeck-config.properties
No suitable driver foundJDBC JAR missing or wrong version/var/lib/rundeck/lib/
CSRF token verification failedgrails.serverURL mismatchrundeck-config.properties
Permission denied (publickey)SSH key permissions or missing public keySee Problem #6 above
java.lang.OutOfMemoryErrorJVM heap too small/etc/sysconfig/rundeckd
AH01114: HTTP: failed to make connectionSELinux blocking proxy, or Rundeck not runningsetsebool or start Rundeck
Keystore was tampered with, or password was incorrectWrong keystore password (direct SSL mode)ssl.properties

Want the automation code? Get the production-ready Ansible playbooks that deploy everything in this guide in ~10 minutes.

Get Playbooks — $14