Brad Dickinson | Improving your monitoring setup by integrating Cloudflare’s analytics data into Prometheus and Grafana

The content below is taken from the original ( Improving your monitoring setup by integrating Cloudflare’s analytics data into Prometheus and Grafana), to continue reading please visit the site. Remember to respect the Author & Copyright.

Improving your monitoring setup by integrating Cloudflare’s analytics data into Prometheus and Grafana

The following is a guest post by Martin Hauskrecht, DevOps Engineer at Labyrinth Labs.

Here at Labyrinth Labs, we put great emphasis on monitoring. Having a working monitoring setup is a critical part of the work we do for our clients.

Cloudflare’s Analytics dashboard provides a lot of useful information for debugging and analytics purposes for our customer Pixel Federation. However, it doesn’t automatically integrate with existing monitoring tools such as Grafana and Prometheus, which our DevOps engineers use every day to monitor our infrastructure.

Cloudflare provides a Logs API, but the amount of logs we’d need to analyze is so vast, it would be simply inefficient and too pricey to do so. Luckily, Cloudflare already does the hard work of aggregating our thousands of events per second and exposes them in an easy-to-use API.

Having Cloudflare’s data from our zones integrated with other systems’ metrics would give us a better understanding of our systems and the ability to correlate metrics and create more useful alerts, making our Day-2 operations (e.g. debugging incidents or analyzing the usage of our systems) more efficient.

Since our monitoring stack is primarily based on Prometheus and Grafana, we decided to implement our own Prometheus exporter that pulls data from Cloudflare’s GraphQL Analytics API.

Design

Based on current cloud trends and our intention to use the exporter in Kubernetes, writing the code in Go was the obvious choice. Cloudflare provides an API SDK for Golang, so the common API tasks were made easy to start with.

We take advantage of Cloudflare’s GraphQL API to obtain analytics data about each of our zones and transform them into Prometheus metrics that are then exposed on a metrics endpoint.

We are able to obtain data about the total number and rate of requests, bandwidth, cache utilization, threats, SSL usage, and HTTP response codes. In addition, we are also able to monitor what type of content is being transmitted and what countries and locations the requests originate from.

All of this information is provided through the http1mGroups node in Cloudflare’s GraphQL API. If you want to see what Datasets are available, you can find a brief description at https://developers.cloudflare.com/analytics/graphql-api/features/data-sets.

On top of all of these, we can also obtain data for Cloudflare’s data centers. Our graphs can easily show the distribution of traffic among them, further helping in our evaluations. The data is obtained from the httpRequestsAdaptiveGroups node in GraphQL.

After running the queries against the GraphQL API, we simply format the results to follow the Prometheus metrics format and expose them on the /metrics endpoint. To make things faster, we use Goroutines and make the requests in parallel.

Deployment

Our primary intention was to use the exporter in Kubernetes. Therefore, it comes with a Docker image and Helm chart to make deployments easier. You might need to adjust the Service annotations to match your Prometheus configuration.

The exporter itself exposes the gathered metrics on the /metrics endpoint. Therefore setting the Prometheus annotations either on the pod or a Kubernetes service will do the job.

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"

We plan on adding a Prometheus ServiceMonitor to the Helm chart to make scraping the exporter even easier for those who use the Prometheus operator in Kubernetes.

The configuration is quite easy, you just provide your API email and key. Optionally you can limit the scraping to selected zones only. Refer to our docs in the GitHub repo or see the example below.

 env:
   - name: CF_API_EMAIL
     value: <YOUR_API_EMAIL>
   - name: CF_API_KEY
     value: <YOUR_API_KEY>

  # Optionally, you can filter zones by adding IDs following the example below.
  # - name: ZONE_XYZ
  #   value: <zone_id>

To deploy the exporter with Helm you simply need to run:

helm repo add lablabs-cloudflare-exporter https://lablabs.github.io/cloudflare-exporter
helm repo update

helm install cloudflare-exporter lablabs-cloudflare-exporter/cloudflare-exporter \
--set env[0].CF_API_EMAIL=<API_EMAIL> \
--set env[1].CF_API_KEY=<API_KEY>

We also provide a Helmfile in our repo to make deployments easier, you just need to add your credentials to make it work.

Visualizing the data

I’ve already explained how the exporter works and how you can get it running. As I mentioned before, we use Grafana to visualize our metrics from Prometheus. We’ve created a dashboard that takes the data from Prometheus and puts it into use.

The dashboard is divided into several rows, which group individual panels for easier navigation. It allows you to target individual zones for metrics visualization.

To make things even more beneficial for the operations team, you can use the gathered metrics to create alerts. These can be created either in Grafana directly or using Prometheus alert rules.

Furthermore, if you integrate Thanos or Cortex into your monitoring setup, you can store these metrics indefinitely.

Future work

We’d like to integrate even more analytics data into our exporters, eventually reaching every metric that Cloudflare’s GraphQL can provide. We plan on creating new metrics for firewall analytics, DoS analytics, and Network analytics soon.

Feel free to create a GitHub issue if you have any questions, problems, or suggestions. Any pull request is greatly appreciated.

About us

Labyrinth Labs helps companies build, run, deploy and scale software and infrastructure by embracing the right technologies and principles.