Brian Brazil's Blog
December 28, 2020
Monitoring is a means, not an end
Does it really have to be perfect? While there is a certain intellectual satisfaction to be had by getting a system just right, the purpose of monitoring is to help you run whatever it is that you're monitoring. That could be a website used directly by your customers, a backend only used internally, or even […]
Published on December 28, 2020 01:15
December 21, 2020
Policy is for configuration, not metric names
Metric names are part of a time series's identity, so shouldn't include information unrelated to identity. I previously looked at some of the consideration for choosing target labels. Similar applies to the names of metrics, you want something descriptive but not longer than it needs to be. In this context, I'd like to talk about […]
Published on December 21, 2020 02:29
December 14, 2020
Prefer without and ignoring
Which of by/without and on/ignoring should you use? When writing a given bit of PromQL you know what labels you want in the output of an aggregation, so why not put them in the by ? Similarly when doing vector matching and using on. So why ever use without and ignoring? Application architectures and deployment []
Published on December 14, 2020 01:40
December 7, 2020
Choosing your pushgateway grouping key
What does and doesn't make a good grouping key? Pushgateway grouping keys are fundamentally target labels, and similar considerations apply. They should be minimal and they should be constant. What the latter means may be a little non-obvious for batch jobs though. The purpose of the Pushgateway is to hold metrics from the end of []
Published on December 07, 2020 03:15
November 30, 2020
New Features in Prometheus 2.23.0
Prometheus 2.23.0 is now out, following on from 2.22.0 with many fixes and improvements. There's been a variety of performance improvements to the TSDB. Compaction will now be faster, there will only be one checkpoint made after Prometheus restarts after not running for a good while, and the series API no longer loads any chunks. Remote write can now […]
Published on November 30, 2020 02:25
November 23, 2020
OpenMetrics is released
Coming soon to a standards body near you. OpenMetrics has been regularly worked on since June 2017. The goal is to take the Prometheus exposition text format which is a defacto standard, and make it into a cleaner vendor-neutral standard. As of earlier this month, the standard is now available! The next steps are to bring […]
Published on November 23, 2020 04:12
November 16, 2020
ARP cache metrics from the node exporter
The node exporter has metrics about the ARP table. The ARP cache is part of how computers figure out which IPv4 addresses match up with which MAC or hardware addresses, so that packets can be efficiently switched on local network segments. The contents of this cache can be seen in /proc/net/arp, or in a more […]
Published on November 16, 2020 03:01
November 9, 2020
Why is my SNMP string showing as hexadecimal?
Why is my name showing as 0x79206e616d65? Strings in Prometheus are full UTF-8, however not all byte sequences are valid UTF-8. For example the byte 0xff cannot appear in a valid UTF-8 string. What does this have to do with SNMP? If we get arbitrary bytes that could in principle be anything, then we have […]
Published on November 09, 2020 02:28
November 2, 2020
Checking for specific HTTP status codes with the blackbox exporter
How would you check that a HTTP endpoint is returning a 204? We've previously looked at using the blackbox exporter to check that a 2xx response code is being returned, you might however consider only particular 2xx codes as okay. Your first thought might be to use an alert based on the probe_http_status_code metric, however […]
Published on November 02, 2020 01:51
October 26, 2020
Show multiple expressions for an instance in a Grafana table
Have you ever wanted to have a table showing multiple metrics across all of your instances? I'm going to show you how to show CPU usage and RSS for all of your instances, here I'm using Grafana 7.1.1 and the end result will look something like: First create a Table panel, and define your […]
Published on October 26, 2020 01:49


