Kafka Monitoring Suite

The Lenses Kafka Monitoring Suite is a set of pre-defined templates, that use

  • A Time Series database (Prometheus)
  • Custom JMX exporters
  • A Data Visualization application (Grafana)
  • Built in domain intelligence about operating Kafka with confidence in production

If you do not already have existing installations of Grafana and Prometheus please visit their documentation for installation and best practices.

../../_images/kafka-cluster-overview.png

Lenses™ Kafka Monitoring Suite | Kafka Cluster Overview Dashboard

Whilst Lenses continuously monitor the attached Kafka cluster and provide alerts for important metrics degradation, such as consumer lag and offline or underreplicated partitions, it does not strive to become a timeseries database since established solutions from domain experts do exist.

Landoop’s monitoring reference setup for Apache Kafka is thus based on Prometheus and Grafana software with a medium term goal to bring more dashboards from Grafana into Lenses.

A question that comes up often, is whether monitoring is really needed since Lenses can provide alerts. This should be answered eventually by each implementation team.

Keep your alerts and key metrics to a small tight set, so that you won’t get overwhelmed. This is a good, common advice and what we want to achieve through Lenses, alas it is also prone to misconceptions.

Alerts and key metrics are related to monitoring but are not the same. We define monitoring as the process of collecting a large number of metrics and storing them for a period of time. Queries to these data help engineers understand the cluster better, establish baselines so they can plan for additional capacity or act on deviations, or even extract new, important key metrics for a specific usecase as the team acquire more experience in the field. Furthermore new alerts can be added to any metric —or combination of them.

Setting up monitoring is currently an involved process. We strongly advise the operations teams that will implement our reference setup to integrate it with their own internal practices and guidelines.

Reference Setup

The reference setup uses 3 applications:

  • Prometheus
  • jmx_exporter in either of server or agent flavors
  • Grafana

Prometheus is a timeseries database that scrapes targets and stores metrics. Recorded metrics can be queried and a multitude of operators and functions is provided for queries. Prometheus’ model stores all recorded values in the database, in contrast with systems such as Graphite and RRD that store data in a custom —usually lower- resolution that degrades over time. This permits fine grained results in queries at the expense of storage. Although Prometheus’ storage engine is very efficient, it is better to keep the metrics retention period shorter rathen than longer. Typical values span from 15 days to a few months.

The Kafka ecosystem is based on the JVM and as such exports metrics via JMX. Prometheus expects its own format for metrics and thus provides small applications called exporters that can translate metrics from various software. jmx_exporter is such an application, that converts JMX metrics to the Prometheus format. It comes in two flavors: server (standalone) and agent. In server mode, jmx_exporter runs as a separate application that connects to the monitored application via JMX, reads the metrics and then serves them in Prometheus format. In agent mode, it runs as a java agent within the application to be monitored. The suggested run mode is agent for most applications. It is easier to setup and can provide operational metrics as well (CPU usage, memory usage, open file descriptors, JVM statistics, etc). In special cases such as the brokers we suggest the server mode. The reason behind this, is that under load (hundreds or thousands of topics, clients, etc) the brokers can expose tens or even hundreds of thousands of metrics! Under such high load we have identified a few cases where the jmx_exporter agent can’t keep up and may cause trouble to the broker as well. The jmx_exporter server as a standalone application will not affect the broker. Jmx_exporter server instances should be co-hosted when possible with the application they monitor, especially for software such as the brokers that expose too many metrics.

Prometheus’ model scrapes data from exporters. This means that Prometheus should have a list of targets to scrape. This list may be set manually or automatically via supported backends —such as consul, zookeeper and kubernetes.

Grafana is a tool for Data Visualization and Monitoring. It supports various datastores for querying data, amongst them Prometheus. It permits the user to create highly customized graphs, tables, gauges etc, leading to impressive and useful dashboards. Other notable features include templating for dashboards, alerts (via Slack, email, pagerduty, etc), LDAP support.

Configuration templates for the jmx_exporter and reference dashboards for grafana are provided as part of your Lenses subscription.

../../_images/kafka-monit-reference.png

Lenses™ Kafka Monitoring Reference Setup

Instrumenting with jmx_exporter

jmx_exporter is a small, lightweight application that exports JMX metrics from a JVM based application and serves them via HTTP for Prometheus to scrape. It can either run as a java agent (inside the software to be monitored) or as a standalone server (separate process). Every program that needs to be monitored, needs its own instance of jmx_exporter.

A JVM-based application usually registers many MBeans for its various packages, each with its own number of metrics. Whilst jmx_exporter can perform a generic conversion of MBean metric names, it is important to have a configuration file, so the metrics can be processed: set predictable names, apply labels, define whether the metric acts as a counter or a gauge, drop ones that aren’t needed, apply unit conversions, etc. The final exported metrics should enable Prometheus to utilize its full potential and also lessen —if possible— the load of the monitoring infrastructure.

Landoop’s build of jmx_exporter (fastdata_agent.jar and fastdata_server.jar) and configuration templates are available to our clients.

The choice between agent and server modes usually relies on the number of metrics the monitoring application will expose. For peripheral services (schema registry, kafka connect) and client applications usually the choice is straightforward; jmx_exporter in agent mode is easier to setup and besides Kafka metrics it can also provide metrics for the JVM itself (CPU and memory usage, etc). For the brokers the choice can be harder. Depending on the number of topics and consumers/producers, each broker may expose tens of thousands of metrics. In such a scenario it is better to have the jmx_exporter set as a standalone server, as in order to parse and process such a big number of metrics can create a significant load.

Running jmx_exporter as a java agent

Note

You can download the Lenses Monitoring Suite from your account. It includes fastdata agent and server jars, as well as configuration files for Kafka brokers, Kafka Connect, Lenses —and other procuders and consumers— and Schema Registry.

In the most simple form of a JVM application, a java agent can be utilised as:

java -javaagent:/path/to/fastdata_agent.jar=[PORT]:[CONF_FILE] -jar myapp.jar

Where [PORT] is the port where jmx_exporter will listen to for scrape requests from Prometheus and [CONF_FILE] its configuration file.

Of course there are deviations from this model, for example instead of a jar with a main function, you may provide a classpath and a class, or for more complicated software (such as the Kafka broker and Landoop’s Lenses) instead of calling Java and providing jar files/classes, a shell script is used to properly initiliaze the application. The important thing is to provide somehow the -javaagent switch to Java.

In order to add the jmx_exporter java agent to your Kafka and Lenses (K&L) setup:

Kafka Broker

Add the java agent configuration to the KAFKA_OPTS environment variable:

export KAFKA_OPTS="$KAFKA_OPTS -javaagent:/path/to/fastdata_agent.jar=[PORT]:/path/to/broker.yml"
Kafka Connect

Add the java agent configuration to the KAFKA_OPTS environment variable:

export KAFKA_OPTS="$KAFKA_OPTS -javaagent:/path/to/fastdata_agent.jar=[PORT]:/path/to/connect-worker.yml"
Lenses

Add the java agent configuration to the LENSES_OPTS environment variable:

export LENSES_OPTS="$LENSES_OPTS -javaagent:/path/to/fastdata_agent.jar=[PORT]:/path/to/client.yml"
Schema Registry

Add the java agent configuration to the SCHEMA_REGISTRY_OPTS environment variable:

export SCHEMA_REGISTRY_OPTS="$SCHEMA_REGISTRY_OPTS -javaagent:/path/to/fastdata_agent.jar=[PORT]:/path/to/schema-registry.yml"

Important

Prometheus must have access to jmx_exporter’s port but otherwise it is advised to keep jmx_exporter inaccessible from external hosts in order to protect the application from DoS attacks.

Note

client.yml is a generic configuration file for all Kafka producer, consumer and streams applications.

As an example, at Landoop we use systemd to manage the Kafka brokers. Below is our systemd unit with monitoring set:

[Unit]
Description=Kafka Broker Service

[Service]
Restart=always
User=kafka
Group=kafka
Environment='KAFKA_HEAP_OPTS=-Xmx2G -Xms2G'
Environment=KAFKA_OPTS=-javaagent:/opt/lenses-monitoring/fastdata_agent=22000:/opt/lenses-monitoring/broker.yml
LimitNOFILE=100000
KillMode=process
TimeoutStopSec=600
ExecStart=/opt/kafka/bin/kafka-server-start /etc/kafka/kafka/broker.properties

[Install]
WantedBy=multi-user.target

Running jmx_exporter as a standalone server

In this mode, jmx_exporter run as a separate process that connect to the instrumented application’s JMX port and retrieves metrics, then exposes them via HTTP for Prometheus to scrape.

It is strongly advised to run the jmx_exporter server on the same host that runs the application to be monitored for two reasons, (a) in order to read all metrics, multiple network requests are needed due to the way JMX works and even the tiny roundtrip time within a datacenter can add up to measurable delays, (b) access from the same host to JMX is much easier to setup (works by default).

The java server can be utilised as:

java -jar /path/to/fastdata_server.jar [PORT] [CONF_FILE]

Where [PORT] is the listening port for scrape requests from Prometheus and [CONF_FILE] the configuration file. Within it, apart from the settings on which metrics to process and how, the connection details of the monitored application are also set.

Important

Prometheus must have access to jmx_exporter’s port but otherwise it is advised to keep jmx_exporter inaccessible from external hosts in order to protect the application from DoS attacks.

Within our monitoring archive you will find five configuration files for jmx_exporter: broker.yml, client.yml, connect-worker.yml, kafka-rest.yml, schema-registry.yml. The client.yml file can be used with any JVM based application that uses the official Kafka Java libraries —such as Lenses or your own Kafka clients.

When using them in server mode, create a copy to use and with a text editor, edit (or add) the configuration option hostPort to point at the JMX address of your application. Please note that if jmx_exporter and your application are on the same host, you can use localhost:[JMX_PORT] where [JMX_PORT] the JMX port of your application. If the applications run on different servers, you have to use the FQDN of your server and also configure your application to allow remote JMX connections.

To enable JMX on your application the process may vary. For the Kafka ecosystem (Kafka Brokers, Zookeeper, Schema Registry, Kafka Connect) it is enough to export the environment variable JMX_PORT=[JMX_PORT], where [JMX_PORT] is the port the application should listen at for JMX connections. Lenses always expose JMX at the default port 9015 unless you explicitly set the lenses.jmx.port setting.

In order to add the jmx_exporter java server to your Kafka and Lenses (K&L) setup:

Kafka Broker

Enable JMX by exporting the port as an environment variable via JMX_PORT:

export JMX_PORT=[JMX_PORT]

Optionally enable remote access to JMX by exporting the KAFKA_JMX_OPTS environment variable:

export KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=[JMX_PORT]

Copy broker.yml and add an entry about the location of the broker:

hostPort: localhost:[JMX_PORT]

Finally start the server:

java -jar /path/to/fastdata_server.jar [PORT] /path/to/broker.yml
Kafka Connect

Enable JMX by exporting the port as an environment variable via JMX_PORT:

export JMX_PORT=[JMX_PORT]

Optionally enable remote access to JMX by exporting the KAFKA_JMX_OPTS environment variable:

export KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=[JMX_PORT]

Copy connect-worker.yml and add an entry about the location of the worker:

hostPort: localhost:[JMX_PORT]

Finally start the server:

java -jar /path/to/fastdata_server.jar [PORT] /path/to/connect-worker.yml
Lenses

By default, JMX endpoint is available at port 9015. Set a different port by setting lenses.jmx.port in the configuration file:

lenses.jmx.port=[JMX_PORT]

Optionally enable remote access to JMX by exporting the LENSES_JMX_OPTS environment variable:

export LENSES_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=[PORT]

Copy client.yml and add an entry about the location of Lenses (or any other Kafka client):

hostPort: localhost:[JMX_PORT]

Finally start the server:

java -jar /path/to/fastdata_server.jar [PORT] /path/to/client.yml
Schema Registry

Enable JMX by exporting the port as an environment variable via JMX_PORT:

export JMX_PORT=[PORT]

Optionally enable remote access to JMX by exporting the SCHEMA_REGISTRY_JMX_OPTS environment variable:

export SCHEMA_REGISTRY_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=[PORT]

Copy schema-registry.yml and add an entry about the location of the Schema Registry:

hostPort: localhost:[JMX_PORT]

Finally start the server:

java -jar /path/to/fastdata_server.jar [PORT] /path/to/schema-registry.yml

In a kubernetes setup you would probably want to co-host a service with its jmx_exporter in the same pod.