Company
Date Published
Author
Noah Crowley
Word count
2414
Language
English
Hacker News points
None

Summary

Sensu is a popular monitoring solution for both applications and infrastructure, designed to address the needs of a modern cloud computing environment. The Sensu framework is composed of client and server applications that communicate via RabbitMQ by default, but other transports can be used. Configuration is entirely done using JSON files, making it easy to integrate with automation tools like Ansible or Chef. Sensu checks follow the same format as Nagios Plugins, which lets developers take advantage of a vast number of plugins in the Nagios ecosystem as well as those provided by the Sensu community. Checks can be any program or script that writes data to STDOUT or STDERR and returns an error code that corresponds with a given status. Once the check results have been pushed to the message bus, one or more Sensu servers pulls the events from the bus and handles them, processing the results, triggering alerts, or forwarding metrics to a long-term store. Out of the box, Sensu doesn’t do anything with data that might be collected during a check, but provides the ability to configure handlers which will process and forward the data to an external store in our case, InfluxDB. By storing the metrics data, development teams can use the data for analysis at a later date; looking at performance data to drive the engineering roadmap, or as part of the incident response or postmortem process. Additionally, by querying metrics in a time series database, Sensu can perform checks against multiple datapoints, reducing noise and flapping alerts. The InfluxDB instance is not using any kind of authentication or SSL, so we’re using an extremely simple configuration. For this example we’ll be using the CPU Percentage Check from the Sensu CPU Checks Plugin to gather metrics about processor usage. We have two hosts configured running Ubuntu 16.04: the Sensu server itself, which is running Sensu, Redis, RabbitMQ, and Uchiwa, and a second server for InfluxDB. The Sensu server is configured with a legacy-servers subscription in addition to the dev and ubuntu-servers subscription common to both hosts. We’ll use the legacy-servers subscription to ensure that we only run our metrics collection checks on servers without Telegraf installed. We’ll be using the InfluxDB Sensu Plugin, which provides a number of integrations between InfluxDB and Sensu: check-influxdb.rb A monitoring check for InfluxDB via the /pingendpoint; metrics-influxdb.rb A metrics check that uses an InfluxDB query; mutator-influxdb-line-protocol.rb A handler that sends check results to InfluxDB; check-influxdb-query.rb A mutator that transforms check output into InfluxDB line protocol. We can install the InfluxDB plugin to Sensu’s embedded Ruby environment, /opt/sensu/embedded/, using the following command: $ sudo sensu-install influxdb. We’re already running a metrics check on our “legacy” hosts, so we want to set up the the InfluxDB handler to deal with events as they’re received by the Sensu server. First, we’ll enable the handler by adding it to /etc/sensu/conf.d/handlers.json: { "handlers": { "influx-tcp": { "type": "pipe", "command": "/opt/sensu/embedded/bin/metrics-influxdb.rb" } } }. We add a configuration at /etc/sensu/conf.d/influx.json: { "influxdb": { "host" : "192.168.227.134", "port" : "8086", "database" : "sensumetrics" } }. The InfluxDB instance in this example isn’t using any kind of authentication or SSL, so we’re using an extremely simple configuration. We can verify that data is being received by InfluxDB using the InfluxDB CLI. Log into the InfluxDB host and type influx at the prompt: $ influx Connected to http://localhost:8086 version 1.4.2 InfluxDB shell version: 1.4.2 > Use the SHOW MEASUREMENTS command to verify that all metrics have been created: >SHOW MEASUREMENTS name name ---- ---- cpu sensu CPU Percentage cpu_guest sensu CPU Percentage ... We can query one of the measurements to see individual data points: SELECT * from cpu_idle WHERE time < now() - 1m name: cpu_idle time host metric value ---- ----- ------ ----- 1515534170000000000 sensu cpu_percentage 99.5 1515534270000000000 sensu cpu_percentage 100 1515534370000000000 sensu cpu_percentage 100 ... We can use a single template to parse the metrics, as each metric has the same format: host.measurement.field. The InfluxDB instance in this example isn’t using any kind of authentication or SSL, so we’re using an extremely simple configuration. For this example we’ll install Telegraf alongside Sensu and ship Graphite plaintext to it via a socket. We can find installation instructions for Telegraf on various platforms here: Next, we’re going to disable the collection of CPU statistics that Telegraf performs by default, because they are redundant to Sensu’s Metrics Collection Checks. Make sure to comment out the [[inputs.cpu]], [[inputs.disk]], [[inputs.mem]], and [[inputs.network]] sections from the default configuration. We’ll need to configure an input for Sensu to send metrics to as well. We’ll define a socket_listener, give it a port and a data format, and also specify our template for parsing the Graphite plaintext. Add this section to the Telegraf config at /etc/telegraf/telegraf.conf: [[inputs.socket_listener]] service_address = "udp://:8094" data_format = "graphite" templates = [ "host.measurement.field" ] Restart Telegraf to pick up the new configuration using sudo systemctl restart telegraf. We can configure a UDP handler in Sensu with the only_check_output mutator by adding this to your /etc/sensu/conf.d/handlers.json: { "handlers": { "telegraf-graphite-handler": { "type": "udp", "socket": { "host": "127.0.0.1", "port": 8094 } } } }. Update the check so that it uses the new handler by editing /etc/sensu/conf.d/cpu_percentage.json: { "checks": { "cpu_metrics": { "type": "metric", "command": "metrics-cpu-pcnt-usage.rb", "subscribers": [ "legacy-hosts" ], "interval": 10, "handlers": [ "debug", "telegraf-graphite-handler" ] } } }. And finally, restart the Sensu services: $ sudo systemctl restart sensu-server sensu-api sensu-client. Our metrics are coming out of Sensu in this format: sensu.cpu.user 0.50 1515534170 and running through this template host.measurement.field and resulting in one measurement cpu with multiple fields. Let’s check that’s what we’re seeing in the database. Open up the InfluxDB CLI, select the sensumetrics database, show the measurements and select the last minute of data, like we did before: > USE sensumetrics Using database sensumetrics > SHOW MEASUREMENTS name name ---- ---- cpu sensu CPU Percentage ... We can query one of the measurements to see individual data points: SELECT * FROM cpu WHERE time < now() - 1m name: cpu time host metric value ---- ----- ------ ----- 2018-01-09T20:32:09Z 0 sensu 98.51 0 0 0 0 0 0.5 1 ... We can use Telegraf to collect and ship metrics directly to InfluxDB from our applications, which might be a more efficient solution than transforming the metrics in Sensu. Next week we’ll continue exploring the integration of Sensu and InfluxDB by creating a Metrics Check based on the data we’ve captured.