SSH System Monitoring -- Server monitoring for lazy people.
SshSysMon is a system/server monitoring tool that executes all of its operations over SSH without the need for installing agents across machines.
Its goal is to provide simple self-hosted monitoring and alerting for small numbers of lightweight servers without the traditional overhead of a monitoring system.
It monitors things in /proc and with simple command executions to monitor system vitals such as: memory, cpu load, drive space, swap, etc.
pip install sshsysmon
# Requires python 2.x and pip:
sudo apt-get install -y python python-pip python-dev
# Download the latest SshSysMon:
wget -O - https://github.com/zix99/sshsysmon/archive/master.tar.gz | tar xzv
# Make sure the dependencies are installed:
cd sshsysmon-master/
sudo pip install -r requirements.txt
# Test it out!
./sshmon summary examples/starter.yml
You only need to do this if you are monitoring a remote server.
The best way to connect to remote servers is with private key created and added to the authorized_hosts
file on
all systems you are interested in monitoring. While password authentication is supported, this
is the easiest way to guarantee continued authentication to other hosts.
On debian-based linux systems, setting up a key-pair to use with SSH is easy. I would recommend you make a new linux user to only do monitoring on each machine, but it isn't required.
# 1. Create a new SSH key if you don't already have one. Follow the prompts, but leave the password blank
ssh-keygen
# 2. Install it on a user on another machine that you want to monitor
ssh-copy-id username@remotehost
The service has two commands, summary
and check
.
summary
will print out a human-readable summary of all servers specified in the config. It is a
great way to validate your config.
It can be executed with:
./sshmon.py summary examples/starter.yml
It also can be told to use various templates. See templating section below. Eg, to use the html template:
./sshmon.py -f html summary examples/starter.yml
check
is meant to be executed as part of a scheduled job, and will notify all channels in the config
if a condition is unmet.
It can be excuted with:
./sshmon.py check <myconfig.yml>
The best way to run the service automatically is with a cron job.
Edit your cron jobs with
crontab -e
Add an entry that runs the script every few hours: (or minutes, whatever you like)
0 */4 * * * /path/to/sshmon.py check /path/to/config.yml
Configuration is written in yaml and is a set of servers, with a list of monitors with alarms, notification channels and connection details.
See the Examples folder for more sample configs.
An example simple configuration might look something like this:
meta: #Meta section (Optional). Used by summary templates
title: "My Cluster Summary"
author: "Me"
servers:
"Name of server":
driver: ssh
config:
host: myhostname.com
username: myuser
channels: # Notification targets
- type: email
config:
toAddr: myemail@gmail.com
subject: "Something went wrong on {server}"
monitors: # All alerts and inspectors
- type: memory
alarms:
"Low Swap": "swap_free.mb < 50"
"Low Memory": "mem_free.mb < 5"
- type: disk
alarms:
"Low Disk Space": "disk_free.gb < 5"
summarize: false # Optional, use if you don't want a monitor to show up in the summary
You can often use YAML's inheritance to simplify your config for more than 1 server. Each config section also
has a corresponding +
version to add more in addition to something merged in. eg. monitors+
.
All servers are iterated through, and queried for given inspector types. The resulting metrics
are compared to
the alarms
, and if any of them are unmet, a notification it sent to all configured channels
.
All sizes (that is, number of bytes), is enapsulated by the ByteSize
class, which has helper methods for both friendly
output, and size casting in the form of b
, kb
, mb
, etc. eg, you can write mem_free.mb > 50
.
Percentages will always be presented in their 0-100 form.
The applications is built on three components: Drivers
, Inspectors
, and Channels
.
Each has its corresponding folder with abstract implementation. They are loaded dynamically with their name or path provided in the configuration.
Drivers are classes that define how to read information from a server. By default, there are two drivers:
The local driver is only for your local machine. There is no config for this driver.
The SSH driver is for reaching out to remote machines. There are several config paramters for this driver:
Channels define what can happen if an alert fires. There a few built-in.
There are a few variables passed in that can be used to format part of the commands:
Writes tab-separated data to stdout. Can be appended to file with bash >>
operator.
Arguments:
ctime
or epoch
, the format which time is output. Default: ctime
{time}\t{server}\t{inspector}\t{alert}
Executes a shell command on the machine in which the script is running.
Arguments:
Sends an email via a SMTP server.
By default, it assumes a local SMTP server is setup. For more complex configs, such as how to use gmail, see the examples.
Arguments:
Inspects are parsers that know how to read data from a driver and make sense of it.
The memory driver returns metrics about the systems memory:
Metrics: mem_total, mem_free, cached, swap_total, swap_free
The Disk driver returns status of the disk space (in GB)
Config:
Metrics: size, used, available, percent_full
The load average inspector returns the system's current 1/5/15 minute load average.
Metrics: load_1m, load_5m, load_15m
This inspector will allow you monitor a process on the given machine.
It takes in one required config name
. This will use wildcard matching with *
and ?
.
Metrics: user, pid, cpu, mem, tty
The TCP inspector will try to establish a connection on a given port with the same remote as the driver. It's important to note that this does not go over SSH, and will not verify anything more than that the port is willing to establish a connection.
Config:
Metrics:
port_
, and true if they are open, otherwise false (eg port_22
)all
metric which will be true if all ports are openThe Http connector will attempt to do a GET request on a http/https endpoint, and return the data if able.
Config:
Metrics:
None
if no match requestedexec
runs a custom command and returns stdout
, stderr
, and status
(returncode).
Config:
Metrics:
SshSysMon uses handlebars to template its summary output. See the templating for more information.
To learn how to write a specific type of component, visit its readme in the appropriate subfolder.
All components must define def create(args):
as a well-known method to instantiate the class. args
will
be the configuration dict
given in the configuration.