Introduction
Prometheus is a powerful and popular open-source monitoring and alerting toolkit designed for reliability and simplicity. While Prometheus excels in monitoring, it has limitations when it comes to long-term storage, high availability, and horizontal scalability. Thanos, an open-source project developed by Improbable, addresses these limitations by providing a highly available and scalable long-term storage solution for Prometheus metrics. This article explores how to scale Prometheus using Thanos, complete with coding examples and a comprehensive overview of the architecture.
Understanding Thanos Components
Thanos extends Prometheus by introducing several components:
- Thanos Sidecar: Runs alongside Prometheus and uploads data to object storage.
- Thanos Store Gateway: Allows querying of historical data stored in object storage.
- Thanos Query: Aggregates data from multiple Prometheus servers and Thanos Store Gateways.
- Thanos Compactor: Compacts historical data to improve storage efficiency and query performance.
- Thanos Ruler: Runs Prometheus-style alerting rules on historical data.
Setting Up Thanos
Prerequisites
Before starting, ensure you have the following:
- A running Prometheus instance
- Object storage (e.g., AWS S3, GCS, or Minio for local testing)
- Docker (for running Thanos components)
Thanos Sidecar Configuration
The Thanos Sidecar runs alongside Prometheus, uploading data to object storage and enabling long-term storage and retrieval.
- Modify Prometheus Configuration: Ensure your
prometheus.yml
is properly set up. For example:yaml
scrape_configs:global:
scrape_interval: 15s
evaluation_interval: 15s
– job_name: ‘prometheus’
static_configs:
– targets: [‘localhost:9090’] - Run Thanos Sidecar:Create a Docker Compose file to run both Prometheus and Thanos Sidecar:
yaml
version: '3.7'
services:
prometheus:
image: prom/prometheus:v2.27.1
volumes:
– ./prometheus.yml:/etc/prometheus/prometheus.yml
– ./prometheus_data:/prometheus
command:
– –config.file=/etc/prometheus/prometheus.yml
– –storage.tsdb.path=/prometheus
– –web.enable-lifecycle
ports:
– 9090:9090thanos-sidecar:
image: quay.io/thanos/thanos:v0.22.0
command:
– sidecar
– –tsdb.path=/prometheus
– –prometheus.url=http://prometheus:9090
– –objstore.config-file=/etc/thanos/objstore.yml
volumes:
– ./prometheus_data:/prometheus
– ./objstore.yml:/etc/thanos/objstore.yml
depends_on:
– prometheusCreate an
objstore.yml
file to configure object storage:yaml
type: S3
config:
bucket: "thanos-bucket"
endpoint: "s3.amazonaws.com"
access_key: "YOUR_ACCESS_KEY"
secret_key: "YOUR_SECRET_KEY"
region: "us-east-1"
insecure: false
signature_version2: false
Thanos Store Gateway
The Thanos Store Gateway allows querying of historical data stored in object storage.
- Run Thanos Store Gateway:Add the following service to your Docker Compose file:
yaml
thanos-store:
image: quay.io/thanos/thanos:v0.22.0
command:
- store
- --objstore.config-file=/etc/thanos/objstore.yml
- --index-cache-size=500MB
- --chunk-pool-size=2GB
volumes:
- ./objstore.yml:/etc/thanos/objstore.yml
ports:
- 10901:10901
Thanos Query Component
The Thanos Query component aggregates data from multiple Prometheus instances and Thanos Store Gateways, enabling a global view of metrics.
- Run Thanos Query:Add the following service to your Docker Compose file:
yaml
thanos-query:
image: quay.io/thanos/thanos:v0.22.0
command:
- query
- --http-address=0.0.0.0:9091
- --grpc-address=0.0.0.0:10901
- --store=thanos-sidecar:10901
- --store=thanos-store:10901
ports:
- 9091:9091
Thanos Compactor
The Thanos Compactor reduces storage space and improves query performance by compacting historical data.
- Run Thanos Compactor:Add the following service to your Docker Compose file:
yaml
thanos-compactor:
image: quay.io/thanos/thanos:v0.22.0
command:
- compact
- --objstore.config-file=/etc/thanos/objstore.yml
- --retention.resolution-raw=90d
- --retention.resolution-5m=180d
- --retention.resolution-1h=1y
volumes:
- ./objstore.yml:/etc/thanos/objstore.yml
ports:
- 10902:10902
Thanos Ruler
The Thanos Ruler allows for the evaluation of Prometheus-style rules on historical data.
- Run Thanos Ruler:Add the following service to your Docker Compose file:
yaml
thanos-ruler:
image: quay.io/thanos/thanos:v0.22.0
command:
- ruler
- --eval-interval=1m
- --rule-file=/etc/thanos/rules.yml
- --objstore.config-file=/etc/thanos/objstore.yml
- --alert.query-url=http://thanos-query:9091
volumes:
- ./objstore.yml:/etc/thanos/objstore.yml
- ./rules.yml:/etc/thanos/rules.yml
ports:
- 10903:10903
Create a
rules.yml
file for your alerting rules:yaml
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: "High request latency"
description: "Request latency is > 0.5s for 10 minutes."
Conclusion
Scaling Prometheus with Thanos enables robust and efficient monitoring solutions that overcome the inherent limitations of Prometheus. Thanos provides long-term storage, high availability, and horizontal scalability, ensuring that metrics data is not only retained but also accessible and queryable over extended periods. By integrating Thanos Sidecar, Store Gateway, Query, Compactor, and Ruler components, we can build a resilient monitoring infrastructure that caters to large-scale and distributed systems.
Implementing Thanos involves configuring each component and ensuring they work in concert with Prometheus and the object storage of your choice. The examples provided offer a foundational understanding of how to set up Thanos with Docker, allowing you to tailor the setup to your specific needs.
With Thanos, you can confidently scale your monitoring infrastructure, ensuring your systems remain observable, and your metrics are preserved and available for long-term analysis. This comprehensive solution is crucial for maintaining operational excellence and achieving reliability in modern cloud-native environments.