What you’ll accomplish: Know exactly where to look when things break, understand the failure modes that cost real people real hours, and have a systematic approach to diagnosing Rundeck issues.
Every deployment guide that ends at “it works!” is incomplete. Things will break. Java will run out of memory. SELinux will silently deny something. A certificate will expire. This chapter is the reference you’ll reach for when Rundeck stops cooperating.
Where to Look: Log Files
Before diagnosing anything, know which logs to check. Rundeck spreads information across several files, and each one tells a different story.
| Log File | What It Contains | When to Check It |
|---|---|---|
/var/log/rundeck/service.log | Main application log — startup, errors, stack traces, plugin output | First place to look for any issue |
/var/log/rundeck/rundeck.api.log | API request log — every API call with status codes | API failures, authentication issues |
/var/log/rundeck/rundeck.audit.log | Audit trail — who did what and when | Permission issues, tracking changes |
/var/log/httpd/error_log | Apache reverse proxy errors | 503 errors, SSL issues, proxy failures |
/var/log/httpd/access_log | Apache request log | Verifying traffic reaches the proxy |
journalctl -u rundeckd | Systemd journal for the Rundeck service | Startup failures, OOM kills, service crashes |
journalctl -u mariadb | Systemd journal for MariaDB | Database connection issues |
The Quick Diagnostic Sequence
When something is wrong and you’re not sure where to start:
# 1. Is the service running?
sudo systemctl status rundeckd
# 2. What happened recently in the logs?
sudo journalctl -u rundeckd --since "10 minutes ago" --no-pager
# 3. What does the application log say?
sudo tail -100 /var/log/rundeck/service.log
# 4. Is disk space OK?
df -h /var/lib/rundeck/logs/ /var/log/rundeck/
# 5. Is memory OK?
free -h
# 6. Is SELinux blocking something?
sudo ausearch -m avc -ts recent
Run all six. It takes 30 seconds and rules out the most common causes. I can’t count the number of times the problem turned out to be disk space or an SELinux denial that none of the application logs mentioned.
The 8 Problems That Cost You Hours
These aren’t theoretical. Every one of them has burned real users, generated GitHub issues, and consumed debugging sessions that could have been avoided with the right knowledge.
H2 Database Corruption
Symptom: Rundeck fails to start. service.log shows:
org.h2.jdbc.JdbcSQLException: File corrupted while reading record
Cause: The embedded H2 database does not flush writes to disk synchronously. Any unclean shutdown — power loss, OOM kill, kill -9, even a systemctl stop during heavy writes — can corrupt the database file. This is the most common issue in Rundeck’s GitHub tracker (issues #6003, #7764, #3044, #3868).
Fix: If you followed this guide, you don’t have this problem because you’re using MariaDB. If you’re reading this because you didn’t follow this guide: there is no reliable recovery for a corrupted H2 database. Install MariaDB, configure it per Chapter 3, update rundeck-config.properties per Chapter 4, and start fresh. Your job definitions are gone unless you exported them.
Prevention: Use MariaDB from day one. There is no scenario where H2 is the right choice for persistent use.
Java Heap Exhaustion
Symptom: Rundeck becomes unresponsive. The UI loads slowly or not at all. Eventually the process is killed. You find this in the logs:
java.lang.OutOfMemoryError: Java heap space
Or in dmesg:
Out of memory: Killed process <pid> (java) total-vm:XXXXXXX
Cause: The JVM heap is too small. Rundeck loads job definitions, execution metadata, and plugin state into memory. Even a modest home lab with a few dozen jobs and a couple weeks of execution history can exceed the default heap.
Fix: Increase -Xmx in /etc/sysconfig/rundeckd:
# Edit the sysconfig file
sudo vi /etc/sysconfig/rundeckd
# Set the heap (minimum 2048m, recommended 4096m for 8GB VM)
RDECK_JVM_OPTS="$RDECK_JVM_OPTS -Xmx2048m -Xms1024m"
# Restart Rundeck
sudo systemctl restart rundeckd
Prevention: Set the heap at deployment time (Chapter 4). Don’t wait for the OOM killer to tell you the default was too small.
SELinux Blocks the Reverse Proxy
Symptom: Navigating to https://rundeck.example.com returns a 503 error. But curl http://localhost:4440 from the Rundeck host itself works fine. Apache is running. The proxy config looks correct.
Cause: SELinux’s httpd_can_network_connect boolean is off by default on Rocky Linux. This prevents Apache from making outbound TCP connections, which is exactly what a reverse proxy needs to do. The error is silent — Apache logs a generic “proxy error” and SELinux doesn’t log denials by default unless you have auditd configured to catch them.
Fix:
# Check the current state
getsebool httpd_can_network_connect
# If "off", enable it persistently
sudo setsebool -P httpd_can_network_connect on
Prevention: Set this boolean during deployment (Chapter 4). The bundled playbook handles it automatically.
grails.serverURL Mismatch
Symptom: Any of these:
- Login redirects to
http://localhost:4440instead of your actual URL - CSRF token validation errors after login
- The web UI loads but JavaScript assets fail (mixed content)
- API calls return redirects instead of data
Cause: grails.serverURL in /etc/rundeck/rundeck-config.properties doesn’t match the URL users access in their browser. Rundeck uses this setting to generate absolute URLs for redirects, CSRF tokens, and asset loading.
Fix:
sudo vi /etc/rundeck/rundeck-config.properties
Set it to match the browser URL exactly:
grails.serverURL = https://rundeck.example.com
Also verify:
server.useForwardHeaders = true
Restart Rundeck:
sudo systemctl restart rundeckd
The rules: Protocol must match (https://). Hostname must match. No trailing slash. No port number unless it’s non-standard. If in doubt, copy the URL from your browser’s address bar (minus the path) and paste it as the value.
Ansible Plugin Can’t Find Inventory
Symptom: The Nodes tab in Rundeck shows zero nodes. Ansible playbook steps fail with “no hosts matched” or “Could not match supplied host pattern.”
Cause: The Ansible Resource Model Source is pointing to the wrong inventory path, the rundeck user can’t read the file, or the inventory has a syntax error.
Fix:
# Test the inventory as the rundeck user
sudo -u rundeck ansible-inventory --list -i /var/lib/rundeck/inventory/hosts.yml
If this command fails:
- File not found — The path in Rundeck’s project settings is wrong. Use an absolute path.
- Permission denied — Fix ownership:
sudo chown rundeck:rundeck /var/lib/rundeck/inventory/hosts.yml - Syntax error — YAML parsing error in the inventory file. Fix the YAML.
If the command succeeds but Rundeck still shows zero nodes, clear the resource model cache: go to Nodes tab, click the refresh icon, or restart Rundeck.
SSH Key Permissions
Symptom: Rundeck jobs fail with:
Permission denied (publickey,gssapi-keyex,gssapi-with-mic)
Cause: SSH is strict about key file permissions. The private key must be owned by the rundeck user and have 0600 permissions. The .ssh directory must be 0700. Alternatively, the public key was never deployed to the target node.
Fix:
# Fix permissions on the Rundeck host
sudo chown rundeck:rundeck /var/lib/rundeck/.ssh/id_ed25519
sudo chmod 0600 /var/lib/rundeck/.ssh/id_ed25519
sudo chmod 0700 /var/lib/rundeck/.ssh/
# Verify the connection
sudo -u rundeck ssh 192.168.1.51 hostname
If the permissions are correct but it still fails, check that the public key is in ~rundeck/.ssh/authorized_keys on the target node and that the target’s sshd_config allows key-based auth (PubkeyAuthentication yes).
Execution Log Disk Exhaustion
Symptom: Jobs start failing with I/O errors. Or the system slows down. df -h shows /var/lib/rundeck/logs/ or the partition it sits on is nearly full.
Cause: Every Rundeck job execution stores its full output log on disk. Without cleanup, this grows indefinitely. A job that runs hourly and produces a few KB of output doesn’t seem like much — until you realize that’s 8,760 log files per year per job.
Fix: Configure execution history cleanup in rundeck-config.properties:
# Clean up execution logs older than 30 days
rundeck.execution.logs.fileStorage.deletionPolicy = delayed
rundeck.execution.logs.fileStorage.retentionTime = 30d
Or clean up manually:
# See how much space logs are using
du -sh /var/lib/rundeck/logs/
# Remove logs older than 30 days
find /var/lib/rundeck/logs/ -name "*.log" -mtime +30 -delete
Prevention: Set the retention policy at deployment time. The bundled playbook includes this as a configurable variable (rundeck_log_retention_days).
MariaDB Connector Version Mismatch
Symptom: After upgrading Rundeck or MariaDB, Rundeck fails to start with JDBC driver errors:
java.sql.SQLException: No suitable driver found
Or:
ClassNotFoundException: org.mariadb.jdbc.Driver
Cause: The MariaDB JDBC connector JAR in /var/lib/rundeck/lib/ is incompatible with the installed MariaDB version, or the JAR was deleted/corrupted during an upgrade.
Fix:
# Check what's in the lib directory
ls -la /var/lib/rundeck/lib/
# If the JAR is missing or wrong version, re-download
sudo curl -L -o /var/lib/rundeck/lib/mariadb-java-client-3.3.2.jar \
https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/3.3.2/mariadb-java-client-3.3.2.jar
sudo chown rundeck:rundeck /var/lib/rundeck/lib/mariadb-java-client-3.3.2.jar
sudo systemctl restart rundeckd
The “It Worked Yesterday” Checklist
When Rundeck was fine yesterday and isn’t today, run through this list. The answer is almost always one of these:
# 1. Is the service actually running?
sudo systemctl status rundeckd
# 2. Did it run out of disk space?
df -h /var/lib/rundeck/ /var/log/rundeck/ /tmp/
# 3. Did it run out of memory?
free -h
sudo dmesg | grep -i "out of memory" | tail -5
# 4. Did a certificate expire?
openssl x509 -in /etc/pki/tls/certs/rundeck.example.com.crt -noout -dates
# 5. Did DNS break?
host rundeck.example.com
# 6. Did someone change SELinux?
getenforce
getsebool httpd_can_network_connect
# 7. Did a package update change something?
sudo dnf history info last
# 8. Did MariaDB stop?
sudo systemctl status mariadb
Nine times out of ten, it’s disk space, an expired certificate, or a package update that restarted a service with a changed config file. The diagnostic sequence above takes under a minute and eliminates the most common causes.
When to Check the SELinux Audit Log
SELinux denials don’t always produce obvious error messages in application logs. When something “should work” but doesn’t, and the application logs are unhelpful, check the audit log:
# Recent AVC denials
sudo ausearch -m avc -ts recent
# If ausearch is noisy, filter for httpd or java
sudo ausearch -m avc -ts recent | grep -E "httpd|java"
If you find a denial, audit2why explains why:
sudo ausearch -m avc -ts recent | audit2why
This will tell you which boolean to set or which policy is blocking the action. For Rundeck deployments, the denial is almost always httpd_can_network_connect, but occasionally you’ll see denials related to file access (especially if you put config files in non-standard locations).
Common Error Messages Decoded
| Error Message | Likely Cause | Where to Fix |
|---|---|---|
Grails application running at http://localhost:4440 | Normal startup message — not an error | N/A |
Unable to resolve host | DNS failure for the Rundeck hostname | Check /etc/hosts or DNS |
Connection refused (port 3306) | MariaDB is not running | sudo systemctl start mariadb |
Access denied for user 'rundeck'@'localhost' | Wrong MariaDB password in config | rundeck-config.properties |
No suitable driver found | JDBC JAR missing or wrong version | /var/lib/rundeck/lib/ |
CSRF token verification failed | grails.serverURL mismatch | rundeck-config.properties |
Permission denied (publickey) | SSH key permissions or missing public key | See Problem #6 above |
java.lang.OutOfMemoryError | JVM heap too small | /etc/sysconfig/rundeckd |
AH01114: HTTP: failed to make connection | SELinux blocking proxy, or Rundeck not running | setsebool or start Rundeck |
Keystore was tampered with, or password was incorrect | Wrong keystore password (direct SSL mode) | ssl.properties |