Managing thousands of VMs in a large-scale environment is a challenging task. One of the things to consider in such scenarios in an early stage is o11y (observability: metrics, logs, and traces). This post scratches a bit of that by showing a technique that will export Prometheus metrics from running VMs.

Motivation

We may indeed argue that exporting Prometheus metrics is easy. Let’s say we are using Spring Boot for Web API development.

compile 'org.springframework.metrics:spring-metrics:latest.release'
compile 'io.prometheus:simpleclient_common:latest.release'

These two dependencies described in the above Gradle build snippet along with a couple of annotations will do it automatically.

But not always we control the software we use. Likely we will be using third-party software for data storage, queuing, streaming, data processing, etc. Most of the popular software for the mentioned purposes is JVM-based (e.g. Kafka, Cassandra, Neo4j, Storm) and these also need to be observed. How to make it manageable?

Fortunately, the JVM provides the means to attach a management agent that consists of a piece of code (typically packaged as a JAR) that will run along with that target JVM.

The Prometheus JMX agent

Since we want to collect Prometheus metrics, let’s use the Prometheus JMX Exporter as our example. To build it locally, you will need Maven installed.

git clone https://github.com/prometheus/jmx_exporter
cd jmx_exporter/jmx_prometheus_javaagent
mvn clean package

This will generate the agent jar with a name that looks like jmx_prometheus_javaagent-0.3.2-SNAPSHOT.jar under the target folder.

I’ll be referring to it from now on as /path/to/agent.jar.

Statically attaching the agent into the JVM

Let’s consider randomservice.jar as a runnable JAR that contains the software we want to collect the metrics from.

The usual way of starting it, apart from classpath and the usual VM flags (e.g. heap size, etc), is somehow similar to:

java -jar randomservice.jar

To statically attach the agent to the JVM, we may start it as follows:

java -javaagent:/path/to/agent.jar=9999:prom_agent.yml -jar randomservice.jar

The file prom_agent.yml is the configuration of the JMX exporter. Please take a look at here for configuration examples.

For this to be possible, the following conditions must be met on the agent JAR:

  • a premain(String agentArgument, Instrumentation instrumentation) method that will be run before the application starts
  • a Premain-Class definition in the MANIFEST.MF file referencing the class containing the premain method (this may be easier when using plugins like Shade or Assembly)

This will expose a /metrics endpoint on port 9999 of the service.

Dynamically attaching the agent into the JVM

The ability to attach an agent to a JVM statically is already awesome, but this imposes us challenges like maintaining the config management of each individual application and forcing the restart of the application to upgrade the agent version.

Fortunately, the JVM also provides the means of attaching an agent to a running JVM, and in such case, the -javaagent is not necessary.

For the dynamic attach to be possible, the following conditions must be met on the agent JAR:

  • a agentmain(String agentArgument, Instrumentation instrumentation) method that will be run when the agent is loaded into the JVM
  • a Agent-Class definition in the MANIFEST.MF file referencing the class containing the agentmain method (this may be easier when using plugins like Shade or Assembly)

The Prometheus JMX Exporter agent already supports this in the development branch at the time of this writing.

To make our lives easier, JLoad is a utility written in Go to perform the dynamic attach of an agent into a running JVM. The advantage of using JLoad over an implementation using the Java Attach API is that the latter requires the JDK to be present and the former does not, for the cases where only JRE is available to run Java processes.

JLoad

After building it to the target platform, it may be run as follows:

./jload <pid> <agent>=<agentArgument>

Please consider the following example:

./jload 30000 /path/to/agent.jar=9999:/tmp/prom_agent.yml

This will as well expose a /metrics endpoint on port 9999 of the service.

Or, for the sake of automation, use pgrep to find the process PID, as in the following example

pgrep -f '.*randomservice.*' | xargs -I % ./jload % /path/to/agent.jar=9999:/tmp/prom_agent.yml

The command pgrep will return the PID of a given process given a regex. The -f flag matches the pattern anywhere in the full argument string of the process instead of just the executable name.

Please check out JLoad’s README for more options on how to build and run it.

Binary Package Distribution

For now binary packages are being distributed through BinTray.

All versions for each target platform, alongside with the corresponding binary checksum, are available in this link and uploaded automatically via Travis upon each release on GitHub.

Wrapping-Up

In complex environments, it’s really difficult to keep track of the agent version deployed in each JVM. The applications will be “eventually” restarted, but as soon as they are, the latest version of the agent can be attached automatically by another piece of infrastructure (JLoad), removing the need to maintain the config management of these Java options.

Nevertheless, use with caution, JVM black-magic going on here :)