How to Set Up Monitoring and Alerting for NGINX with Prometheus and Grafana
8 minutes
Introduction
In this post, we're diving into the world of NGINX monitoring and alerting using two popular tools: Prometheus and Grafana. A free and open-source combo used by system administrators to fetch the NGINX performance and security metrics in real-time.
A proper monitoring and alerting mechanism will greatly reduce the stress of debugging and smooth running of your NGINX server and make your NGINX server management a breeze.
Let's start with exploring the steps to set up a monitoring system for NGINX with Prometheus and Grafana to avoid downtime, bottleneck identification and resolve potential issues even before they appear.
Understanding the NGINX Monitoring Stack
The NGINX monitoring framework is made up of three elements - NGINX, Prometheus, and Grafana. Each element offers unique capabilities but performs exceptionally well when integrated to monitor servers.
In this stack:
- NGINX is configured to expose metrics about its performance and usage.
- Prometheus collects these metrics from NGINX regularly, storing them in its time-series database.
- Grafana connects to Prometheus as a data source and creates dashboards to visualize the collected NGINX metrics.
This setup allows you to track NGINX performance in real time by tracking key metrics like request rates, response times, and error rates and setting up alerts for potential issues. Moreover, you can set up custom NGINX dashboards in Grafana for different aspects of your NGINX deployment.
In the next section, we'll delve into steps to improve the performance and security of NGINX servers through the use of Prometheus, Grafana, and alert manager in Ubuntu 22.04.
Prerequisites
To setup Prometheus and Grafana in Ubuntu 22.04 ensure you meet the following requirements.
- SSH access to the server with sudo privilege.
- NGINX is already installed on the system.
Installing and Configuring Prometheus for NGINX
Let's start with prometheus installation - a time series database with the following few steps.
$ wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
$ tar xvfz prometheus-*.tar.gz
$ mv prometheus-2.54.1.linux-amd64 prometheus
Now, configure Prometheus to scrape NGINX server metrics. For that, We need to edit the default Prometheus configuration file. Open up your favorite text editor and add this:
$ cd prometheus
$ vi prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
This tells Prometheus to scrape metrics every 15 seconds from something running on localhost:9113. But wait, what's supposed to be running there? Enter the Prometheus NGINX Exporter!
The NGINX Exporter is like a translator between NGINX and Prometheus. It takes NGINX server metrics and turns them into something Prometheus can understand. Let's get it set up:
$ cd prometheus
$ wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v1.3.0/nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz
$ tar xvfz nginx-prometheus-exporter_*
Run NGINX exporter by pointing scrape URI to nginx status page. The configuration of NGINX scrape URI is given in the next section.
$ ./nginx-prometheus-exporter -nginx.scrape-uri http://localhost/nginx_status
Setup NGINX for monitoring
Now, configure NGINX for monitoring using its status page. All that you need to do is to make sure that NGINX is exposing its status page. Add this to your NGINX configuration:
server {
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
Remember to restart NGINX after making these changes!
$ sudo nginx -t
$ sudo systemctl restart nginx
Test Prometheus and NGINX integration
The Prometheus is configured and we have enabled NGINX status page. Let's start Prometheus:
$ ./prometheus --config.file=prometheus.yml
If everything's set up correctly, you should be able to access the Prometheus web interface at http://localhost:9090.
The Prometheus dashboard confirms the successful configuration of Prometheus to monitor NGINX! But wait, how do we know it's working? Let's do a quick sanity check. In the Prometheus web interface, try this query: nginx_up
(prometheus query language)
If you see a value of 1, congratulations! You're officially collecting NGINX server metrics with Prometheus.
Prometheus user interface is not that user-friendly or interactive. That's where Grafana comes in. Take a moment to explore the Prometheus interface by running some queries and get a feel for the kind of data you're collecting.
The foundation of your monitoring system has been set up. Now complete the next step to visualize your NGINX server metrics with Grafana!
Setting Up Grafana for NGINX Visualization
We've got Prometheus collecting all NGINX server metrics but instead of using its interface, we will setup Grafana for NGINX metrics visualization!
Let's get Grafana installed easily using apt.
$ sudo apt install -y apt-transport-https software-properties-common wget
$ sudo mkdir -p /etc/apt/keyrings/
$ wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
$ echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
$ sudo apt update
$ sudo apt install grafana
Once that's done, start the Grafana server:
$ sudo systemctl start grafana-server
$ sudo systemctl enable grafana-server
$ sudo systemctl status grafana-server
The Grafana setup is complete. Now, open up your web browser and head to http://localhost:3000. You should see the Grafana login page. The default username and password are both 'admin'.
The default Grafana dashboard will be empty unless you connect it to the Prometheus data source.
Okay, we're in! But Grafana is looking empty. Let's fix that by connecting it to our Prometheus data source. Click on "Data Sources" on the Grafana welcome page, and choose Prometheus under "Add data source".
In the settings, set the URL to http://localhost:9090 (assuming Prometheus is running on the same machine). Click "Save & Test" at the bottom. If you see a green "Data source is working" message, the data source integration with Grafana is successfully completed.
To create your first NGINX dashboard, go to Home->Dashboard
and then click "Create dashboard" and then again click on Add Visualization, choose Prometheus as a data source, and save the visualization.
Edit the panel and in the query editor, try this:
rate(nginx_http_requests_total[5m])
This will show you the rate of HTTP requests to your NGINX server over the last 5 minutes. Add a few more panels to the dashboard to get started. Here are some essential NGINX metrics including http status code metrics you might want to monitor:
nginx_connections_active
promhttp_metric_handler_requests_total{code="200", instance="localhost:9113", job="nginx"}
promhttp_metric_handler_requests_total{code="500", instance="localhost:9113", job="nginx"}
promhttp_metric_handler_requests_total{code="503", instance="localhost:9113", job="nginx"}
You can also use the code parameter for nginx error rate monitoring . Create a new panel in Grafana and add any of the above Prometheus expression in the query section to visualize the error rates. Find out all metrics at: http://localhost:9093/metrics
As you get more comfortable with Grafana, you can add more complex queries, create template variables for dynamic dashboards, and even integrate with other data sources.
You can also add an official Grafana dashboard to create a NGINX monitoring dashboard by using the metrics exposed by the exporter. There are also options to filter metrics per instance or view metrics from all instances.
To import this Grafana dashboard, Click Home -> Dashboard
and click Import under the "New" option. Copy the JSON file and paste it into the "Import via dashboard JSON model" section. Don't forget to choose Prometheus as a data source.
The setup of official NGINX monitoring dashboard in Grafana is complete. Visualize the NGINX metrics in the Grafana dashboard.
You can also create custom NGINX dashboard by leveraging the available metrics in Prometheus. Now we'll proceed to setup alert notification using alert manager in the next section.
Implementing Alerting for NGINX with Prometheus AlertManager
It's time to level up monitoring in NGINX. Having a pretty dashboard is great, but wouldn't it be awesome if the monitoring system could alert us when something's going wrong? That's where Prometheus AlertManager comes in!
Let's get AlertManager installed. It's as easy as downloading the binary and extracting it:
$ cd prometheus
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
$ tar xvfz alertmanager-*.tar.gz
$ mv alertmanager-0.23.0.linux-amd64/ alertmanager
Now, we need to create a AlertManager configuration file. Let's call it alertmanager.yml:
$ cd alertmanager
$ vi alertmanager.yml
global:
resolve_timeout: 1m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'to-recipient@gmail.com'
from: 'from-admin@gmail.com'
smarthost: 'smtp.gmail.com:587'
auth_username: 'from-admin@gmail.com'
auth_identity: 'from-admin@gmail.com'
auth_password: '****************'
This config tells AlertManager to send emails when alerts are fired. Remember to replace the email addresses and SMTP settings with your own!
Now, let's start AlertManager:
$ cd alertmanager
$ ./alertmanager --config.file=alertmanager.yml
Great! AlertManager is up and running. But it doesn't know what alerts to send yet. That's where Prometheus rules come in. Let's create a rule file called nginx_alerts.yml
:
$ cd prometheus
$ vi nginx_alerts.yml
groups:
- name: nginx_alerts
rules:
- alert: InstanceDown
expr: nginx_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
The rule in the alert manager configuration file will fire an alert if NGINX is down for more than 1 minute.
Now, we need to tell Prometheus about these rules about AlertManager. Update your prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "nginx_alerts.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "nginx"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9113"]
Restart Prometheus to complete the alert manager setup.
To test the Prometheus alert manager setup, stop NGINX and wait a minute and verify if the alert is fired in the alert manager or Prometheus dashboard.
You'll also get an email about it.
Best Practices for NGINX Monitoring and Alerting
First up, selecting the right metrics for visualization. It's tempting to track everything but that's a recipe for information overload. Focus on metrics that directly impact your users and business.
Some key metrics for NGINX include:
- Request rate.
- Error rate (4xx and 5xx).
- Response time.
- Active connections.
- Traffic (bytes sent/received).
Remember, these might vary depending on your specific use case. A high-traffic e-commerce site might care more about response time, while a content delivery network might focus on traffic metrics.
When it comes to setting alert thresholds, the golden rule is: alert on symptoms, not causes. For example, don't alert on high CPU usage (a cause), alert on slow response times (a symptom). Why? Because high CPU might not always mean trouble, but slow responses always impact your users.
Lastly, don't set it and forget it! Your monitoring setup should evolve with your system. Schedule regular reviews of your dashboards and alert rules. Are you missing any important metrics? Are there alerts that never fire (or always fire)? Adjust as needed.
Conclusion
In this article, we have covered steps to set up Prometheus and Grafana to implement monitoring and alerting in NGINX. You can add new ways to visualize NGINX server metrics in Grafana and new alerts that can help you catch issues before they become problems. A proper monitoring and alerting mechanism for the NGINX server will make sure your servers are always up, with a low response time, and alerts are few and far between!