Introduction

Prometheus is a powerful and popular open-source monitoring and alerting toolkit designed for reliability and simplicity. While Prometheus excels in monitoring, it has limitations when it comes to long-term storage, high availability, and horizontal scalability. Thanos, an open-source project developed by Improbable, addresses these limitations by providing a highly available and scalable long-term storage solution for Prometheus metrics. This article explores how to scale Prometheus using Thanos, complete with coding examples and a comprehensive overview of the architecture.

Understanding Thanos Components

Thanos extends Prometheus by introducing several components:

  1. Thanos Sidecar: Runs alongside Prometheus and uploads data to object storage.
  2. Thanos Store Gateway: Allows querying of historical data stored in object storage.
  3. Thanos Query: Aggregates data from multiple Prometheus servers and Thanos Store Gateways.
  4. Thanos Compactor: Compacts historical data to improve storage efficiency and query performance.
  5. Thanos Ruler: Runs Prometheus-style alerting rules on historical data.

Setting Up Thanos

Prerequisites

Before starting, ensure you have the following:

  • A running Prometheus instance
  • Object storage (e.g., AWS S3, GCS, or Minio for local testing)
  • Docker (for running Thanos components)

Thanos Sidecar Configuration

The Thanos Sidecar runs alongside Prometheus, uploading data to object storage and enabling long-term storage and retrieval.

  1. Modify Prometheus Configuration: Ensure your prometheus.yml is properly set up. For example:

    yaml

    global:
    scrape_interval: 15s
    evaluation_interval: 15s
    scrape_configs:
    job_name: ‘prometheus’
    static_configs:
    targets: [‘localhost:9090’]
  2. Run Thanos Sidecar:Create a Docker Compose file to run both Prometheus and Thanos Sidecar:

    yaml

    version: '3.7'

    services:
    prometheus:
    image: prom/prometheus:v2.27.1
    volumes:
    ./prometheus.yml:/etc/prometheus/prometheus.yml
    ./prometheus_data:/prometheus
    command:
    –config.file=/etc/prometheus/prometheus.yml
    –storage.tsdb.path=/prometheus
    –web.enable-lifecycle
    ports:
    9090:9090

    thanos-sidecar:
    image: quay.io/thanos/thanos:v0.22.0
    command:
    sidecar
    –tsdb.path=/prometheus
    –prometheus.url=http://prometheus:9090
    –objstore.config-file=/etc/thanos/objstore.yml
    volumes:
    ./prometheus_data:/prometheus
    ./objstore.yml:/etc/thanos/objstore.yml
    depends_on:
    prometheus

    Create an objstore.yml file to configure object storage:

    yaml

    type: S3
    config:
    bucket: "thanos-bucket"
    endpoint: "s3.amazonaws.com"
    access_key: "YOUR_ACCESS_KEY"
    secret_key: "YOUR_SECRET_KEY"
    region: "us-east-1"
    insecure: false
    signature_version2: false

Thanos Store Gateway

The Thanos Store Gateway allows querying of historical data stored in object storage.

  1. Run Thanos Store Gateway:Add the following service to your Docker Compose file:

    yaml

    thanos-store:
    image: quay.io/thanos/thanos:v0.22.0
    command:
    - store
    - --objstore.config-file=/etc/thanos/objstore.yml
    - --index-cache-size=500MB
    - --chunk-pool-size=2GB
    volumes:
    - ./objstore.yml:/etc/thanos/objstore.yml
    ports:
    - 10901:10901

Thanos Query Component

The Thanos Query component aggregates data from multiple Prometheus instances and Thanos Store Gateways, enabling a global view of metrics.

  1. Run Thanos Query:Add the following service to your Docker Compose file:

    yaml

    thanos-query:
    image: quay.io/thanos/thanos:v0.22.0
    command:
    - query
    - --http-address=0.0.0.0:9091
    - --grpc-address=0.0.0.0:10901
    - --store=thanos-sidecar:10901
    - --store=thanos-store:10901
    ports:
    - 9091:9091

Thanos Compactor

The Thanos Compactor reduces storage space and improves query performance by compacting historical data.

  1. Run Thanos Compactor:Add the following service to your Docker Compose file:

    yaml

    thanos-compactor:
    image: quay.io/thanos/thanos:v0.22.0
    command:
    - compact
    - --objstore.config-file=/etc/thanos/objstore.yml
    - --retention.resolution-raw=90d
    - --retention.resolution-5m=180d
    - --retention.resolution-1h=1y
    volumes:
    - ./objstore.yml:/etc/thanos/objstore.yml
    ports:
    - 10902:10902

Thanos Ruler

The Thanos Ruler allows for the evaluation of Prometheus-style rules on historical data.

  1. Run Thanos Ruler:Add the following service to your Docker Compose file:

    yaml

    thanos-ruler:
    image: quay.io/thanos/thanos:v0.22.0
    command:
    - ruler
    - --eval-interval=1m
    - --rule-file=/etc/thanos/rules.yml
    - --objstore.config-file=/etc/thanos/objstore.yml
    - --alert.query-url=http://thanos-query:9091
    volumes:
    - ./objstore.yml:/etc/thanos/objstore.yml
    - ./rules.yml:/etc/thanos/rules.yml
    ports:
    - 10903:10903

    Create a rules.yml file for your alerting rules:

    yaml

    groups:
    - name: example
    rules:
    - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
    severity: page
    annotations:
    summary: "High request latency"
    description: "Request latency is > 0.5s for 10 minutes."

Conclusion

Scaling Prometheus with Thanos enables robust and efficient monitoring solutions that overcome the inherent limitations of Prometheus. Thanos provides long-term storage, high availability, and horizontal scalability, ensuring that metrics data is not only retained but also accessible and queryable over extended periods. By integrating Thanos Sidecar, Store Gateway, Query, Compactor, and Ruler components, we can build a resilient monitoring infrastructure that caters to large-scale and distributed systems.

Implementing Thanos involves configuring each component and ensuring they work in concert with Prometheus and the object storage of your choice. The examples provided offer a foundational understanding of how to set up Thanos with Docker, allowing you to tailor the setup to your specific needs.

With Thanos, you can confidently scale your monitoring infrastructure, ensuring your systems remain observable, and your metrics are preserved and available for long-term analysis. This comprehensive solution is crucial for maintaining operational excellence and achieving reliability in modern cloud-native environments.