← Deploying the ELK Stack the Right Way

Chapter 6

Filebeat

In this chapter
<nav id="TableOfContents" aria-label="Chapter sections"> <ul> <li><a href="#why-filebeat">Why Filebeat</a></li> <li><a href="#installation">Installation</a> <ul> <li><a href="#repository-setup">Repository Setup</a></li> <li><a href="#the-ansible-approach">The Ansible Approach</a></li> </ul> </li> <li><a href="#configuration">Configuration</a> <ul> <li><a href="#main-configuration">Main Configuration</a></li> <li><a href="#system-module">System Module</a></li> <li><a href="#the-9x-fileset-trap">The 9.x Fileset Trap</a></li> </ul> </li> <li><a href="#firewall">Firewall</a></li> <li><a href="#start-the-service">Start the Service</a></li> <li><a href="#verification">Verification</a></li> <li><a href="#what-automation-looks-like">What Automation Looks Like</a></li> <li><a href="#verification-checkpoint">Verification Checkpoint</a></li> </ul> </nav>

What you’ll accomplish: Install Filebeat on your hosts, configure it to ship system logs (syslog and auth) to Logstash, and verify logs are flowing through the pipeline into Elasticsearch.

Why Filebeat

In Chapter 5, we set up Logstash with a Beats input on port 5044 — but nothing is sending data to it yet. The syslog input works for network devices and legacy systems, but for Linux hosts in your lab, Filebeat is the better option.

Filebeat is a lightweight log shipper written in Go. It tails log files and journals on your hosts and sends them to Logstash (or directly to Elasticsearch). Compared to configuring rsyslog forwarding, Filebeat gives you:

  • Structured metadata — hostname, file path, and service name are attached to every event automatically. When you search in Kibana, you can filter by agent.hostname instead of parsing syslog headers.
  • Backpressure handling — if Logstash is slow or unreachable, Filebeat buffers events locally and retries. Rsyslog can do this too, but Filebeat’s implementation is simpler to configure and more predictable.
  • Module system — pre-built parsers for common log formats (system logs, Apache, Nginx, MySQL, etc.). Enable a module, and Filebeat knows how to parse that log format without writing custom grok patterns in Logstash.

We deploy Filebeat to any host whose logs you want centralized. In this guide, we use the system module, which covers the two most useful log sources out of the box: syslog (general system events) and auth (SSH logins, sudo usage, failed authentication attempts).

Filebeat is a Go binary — no JVM, no significant memory footprint. It uses about 30-50 MB of RAM. You can deploy it to every host in your lab without worrying about resource impact.

Installation

Repository Setup

Filebeat uses a separate Elastic repository from Elasticsearch, Kibana, and Logstash. The same GPG key works for all Elastic products, but the repository file is different:

# GPG key is already imported from the ES install — skip if deploying to a new host
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo tee /etc/yum.repos.d/elastic-beats.repo > /dev/null << 'EOF'
[elastic-beats]
name=Elastic repository for Beats packages
baseurl=https://artifacts.elastic.co/packages/9.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF

Notice enabled=1 — unlike the Elasticsearch repo (which uses enabled=0 to prevent accidental cluster-wide upgrades), the Beats repo stays enabled. Filebeat minor version updates are safe to apply without coordination. It’s a single binary on each host, not a clustered service, so there’s no version mismatch risk across nodes.

Install Filebeat:

sudo dnf install filebeat -y --enablerepo=elastic-beats

The Ansible Approach

The playbook uses the same pattern as the other roles — ansible.builtin.rpm_key for the GPG key, ansible.builtin.copy for the repo file, and ansible.builtin.dnf with state: present for idempotent installation.

Configuration

Filebeat needs two configuration files: the main filebeat.yml (where to send logs, logging settings, metrics endpoint) and a module config that tells it which log sources to collect.

Main Configuration

Replace the stock filebeat.yml with a clean config that ships logs to your Logstash instance:

sudo tee /etc/filebeat/filebeat.yml > /dev/null << 'EOF'
# Filebeat configuration — ships logs to Logstash

# ---------------------------------- Modules ----------------------------------
# Modules are configured in /etc/filebeat/modules.d/
# The system module is enabled by default (syslog + auth logs)
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

# ---------------------------------- Output -----------------------------------
output.logstash:
  hosts: ["192.168.1.62:5044"]

# ---------------------------------- Logging ----------------------------------
logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7

# ---------------------------------- Monitoring -------------------------------
# Metrics endpoint for health checks
http.enabled: true
http.host: 127.0.0.1
http.port: 5066
EOF

Lock down the permissions — Elastic recommends restrictive access on filebeat.yml:

sudo chmod 600 /etc/filebeat/filebeat.yml

Now open the file and replace 192.168.1.62:5044 with your Logstash host’s IP and the Beats input port:

sudo vi /etc/filebeat/filebeat.yml

Find the output.logstash section and update the hosts list. If you have multiple Logstash instances, list them all:

output.logstash:
  hosts: ["192.168.1.62:5044", "192.168.1.63:5044"]

Filebeat distributes events across all listed hosts for load balancing.

Let’s walk through each section:

Modules — Filebeat loads module configs from /etc/filebeat/modules.d/*.yml. We disable reload.enabled because we deploy module configs via Ansible (or manually) and restart the service when they change. Hot-reloading is useful in dynamic environments, but for a home lab it adds unnecessary complexity.

Output — We ship to Logstash on port 5044 (the Beats input configured in Chapter 5), not directly to Elasticsearch. This keeps all log processing in one place — Logstash handles parsing, enrichment, and index routing. If you wanted to skip Logstash entirely, you could use output.elasticsearch instead, but you’d lose the centralized pipeline.

Logging — Filebeat writes its own logs to /var/log/filebeat/filebeat. The keepfiles: 7 setting rotates after 7 days. If Filebeat can’t reach Logstash, the errors show up here — this is the first place to look when troubleshooting.

Monitoring — The HTTP endpoint on 127.0.0.1:5066 provides stats about Filebeat’s operation (events sent, errors, registry state). Binding to loopback means it’s only accessible from the host itself — no firewall rule needed. The verification playbook and smoke tests check this endpoint to confirm Filebeat is healthy.

System Module

Enable the system module:

sudo filebeat modules enable system

This creates /etc/filebeat/modules.d/system.yml from a template — but the default config won’t actually ship any logs. You need to fix it immediately.

The 9.x Fileset Trap

Warning: This is the single most common reason Filebeat appears to work but ships zero logs. If you enable a module and see no data in Kibana, check the fileset config first.

Filebeat 9.x has a subtle gotcha: filebeat modules enable system marks the module as enabled, but both filesets inside it (syslog and auth) default to enabled: false. Filebeat starts, loads the module, sees no enabled filesets, and ships zero logs — no errors, no warnings, just silence.

Fix it by flipping both enabled: false lines to true. The rest of the file (custom paths, journald options) is useful to keep for later:

sudo sed -i 's/enabled: false/enabled: true/' /etc/filebeat/modules.d/system.yml

This enables two log sources:

  • syslog — general system messages from /var/log/messages (Rocky Linux / RHEL). Covers service starts/stops, kernel messages, cron output, and anything that logs via the syslog facility.
  • auth — authentication events from /var/log/secure. SSH login attempts (successful and failed), sudo usage, user account changes. This is the log you check when someone asks “who logged into this box?”

After deploying this config, restart Filebeat:

sudo systemctl restart filebeat

The companion playbook works around this automatically — it deploys the correct system.yml immediately after enabling the module, so you never see the broken default.

Firewall

No inbound firewall rules are needed on the Filebeat host. Filebeat makes outbound TCP connections to Logstash on port 5044 — it doesn’t listen for incoming connections. The stats API on port 5066 binds to loopback only (127.0.0.1), so it’s not reachable from the network.

Filebeat does need outbound TCP access to port 5044 on the Logstash host. Outbound connections aren’t blocked by firewalld’s default policy, but the Logstash host’s firewall must have port 5044 open — if you followed the firewall rules in Chapter 5, this is already handled.

Start the Service

sudo systemctl enable --now filebeat

Filebeat starts in under 2 seconds — it’s a single Go binary, not a JVM application. No 30-60 second startup wait like Elasticsearch or Logstash.

Verification

After Filebeat starts, verify it’s running and shipping logs:

1. Check the stats API:

curl -s http://127.0.0.1:5066/stats | python3 -m json.tool | head -20

Expected: a JSON response with Filebeat’s runtime stats. If the endpoint doesn’t respond, Filebeat isn’t running or the HTTP config is wrong.

2. Check Logstash connectivity:

# Replace with your Logstash host IP
curl -v telnet://192.168.1.62:5044 2>&1 | head -5

Expected: you’ll see Connected to 192.168.1.62 followed by Connection reset by peer — that’s normal. The connection succeeded (proving the port is open and Logstash is listening), but curl doesn’t speak the Beats protocol so Logstash drops it. What matters is the Connected line. If you see Connection refused instead, Logstash isn’t running or port 5044 isn’t open in its firewall.

3. Check Filebeat’s own logs:

sudo journalctl -u filebeat -n 30 --no-pager

Look for Connection to backoff(async(tcp://192.168.1.62:5044)) established — that means Filebeat connected to Logstash successfully. If you see connection refused or timeout, the Logstash Beats input isn’t reachable.

Filebeat also writes logs to /var/log/filebeat/ if the directory exists — check there with ls /var/log/filebeat/ if journalctl doesn’t show enough detail.

4. Verify logs arrive in Elasticsearch:

Wait 30 seconds for Filebeat to harvest and ship some events, then check Kibana. Navigate to Discover, select the app-logs-* index pattern, and filter by agent.hostname matching your Filebeat host’s name. You should see syslog and auth events arriving.

From the command line:

curl -u elastic:YOUR_PASSWORD -s "http://192.168.1.61:9200/app-logs-*/_search?q=agent.hostname:fb01&pretty" | head -20

Replace fb01 with your Filebeat host’s actual hostname.

5. Test the config and output:

Filebeat has built-in test commands that verify your configuration without restarting the service:

# Validate filebeat.yml syntax
sudo filebeat test config

# Test connectivity to the configured output (Logstash)
sudo filebeat test output

test config catches YAML syntax errors. test output actually connects to Logstash and reports success or failure — useful when debugging connectivity issues.

What Automation Looks Like

The svc_filebeat role:

  1. Imports the Elastic GPG key — same key as Elasticsearch, idempotent
  2. Copies the Beats repository file to /etc/yum.repos.d/elastic-beats.repo
  3. Installs Filebeat via dnf with state: present
  4. Deploys filebeat.yml from a Jinja2 template — output hosts built from elk_logstash_hosts variable, port from elk_filebeat_logstash_port
  5. Enables the system module via filebeat modules enable system — only reports changed on first run
  6. Deploys system.yml module config with syslog and auth filesets explicitly enabled (works around the 9.x fileset trap)
  7. Starts and enables the Filebeat service

The pro_filebeat role then:

  1. Updates the MOTD with Filebeat service information and quick-reference commands (start/stop/status, test config, test output)

Configuration changes to filebeat.yml or system.yml trigger a handler that restarts Filebeat. On subsequent runs, every task reports ok — the role is fully idempotent.

Verification Checkpoint

Before moving to Chapter 7, confirm:

  • curl -s http://127.0.0.1:5066/stats returns Filebeat stats JSON
  • systemctl status filebeat shows active
  • sudo filebeat test output reports success
  • Filebeat can reach Logstash — sudo journalctl -u filebeat -n 30 shows no connection errors
  • Logs appear in Elasticsearch — search app-logs-* indices in Kibana for events from your Filebeat host

Your log shipper is running. Now let’s set up retention policies so your data doesn’t eat your disk alive.

Want the automation code? Get the production-ready Ansible playbooks that deploy this entire ELK stack in ~20 minutes.

Get Playbooks — $29