sysmon
Usage
sysmon
Description
This is an extremely low-tech text-based multiple-system monitor
running in a terminal window. The idea is to display some indication
of health for a number of systems (specifically: 37 systems, although
that changes from time to time) in a minimum of screen area. It also
displays read-me notices for a number of mailboxes and Usenet
news.
I'm not making any great claims to generality here. If there were
370 systems instead of 37, we'd have to think about something else.
But (generally speaking) people tend to underestimate the amount of
useful information that can be expressed in a simple terminal
window.
The program runs continuously. Monitored systems are obtained from
the file $HOME/.sysmonlist. Every so often, ssh is
used to a remote system to run the uptime command; the
results (assuming there are any) are parsed and displayed.
Specifically, in the current implementation, the number of logged-in
users, the system's uptime in days, the 20-minute load average, and
the time it took for the system to respond are displayed. This is
used as a semi-reliable proxy of overall system behavior.
Here's a screenshot:
![[Screenshot]](sysmon_screenshot.png)
There's a clock up in the corner, isn't that cool? Why? Well,
because it fit.
Color is used tastefully to draw attention to unusual situations.
Load averages over 1.0 are displayed in yellow, over 2.0 in red.
Similarly, a response time over 15 seconds is displayed in yellow,
over 20 seconds in red. Unresponsive systems get a big fat NO
RESPONSE in eye-catching black-on-red.
The script also monitors multiple mailboxes and Usenet news
semi-intelligently on lines 2 and 3 of the screen:
- Normal mail goes into $HOME/.mailspool/$USER on our
system; the script displays the string [MAIL] (in red) if
this file has a non-zero size.
- I use procmail to divert a number of incoming messages to
mailboxes of the format $HOME/Mail/IN.name; the
script displays [name] (in yellow) for each
non-empty mailbox of this form.
- I also use procmail to divert postmaster and spam mail to
$HOME/Mail/Postmaster and $HOME/Mail/Spam
respectively. These are almost always junk, and although I want to
scan through them periodically, I don't want to be nagged
about it whenever they're non-empty. So the script displays
[Postmaster] or [Spam] (in green) if the
corresponding mailbox is non-empty and hasn't been accessed
for some period (two hours for Postmaster and four hours
for Spam).
- Finally, [NEWS] is displayed (in magenta) if the
$HOME/.jnewsrc file is over four hours old, which limits
my Usenet reading to something sensible. (Well, actually probably
it's still not sensible.)
Features:
- The Term::ANSIScreen module available from CPAN
handles text positioning, color, and formatting.
- The Term::ReadKey module (which is either included in
Perl, or available from CPAN, I forget) handles non-blocking
terminal reads. This was implemented so that pressing the
Q key will quit, and any other key will clear the screen
(in case it gets confused).
- The script uses a priority queue implemented as a heap (in turn
implemented as a Perl array-of-hashes) to keep
track of what to do when. (The code is adapted from Robert
Sedgewick's Algorithms.)
As each check is done (and removed from the front of the priority
queue), the next occurrence of the check is scheduled (and inserted
into the priority queue). Mail/News checks happen about every
minute. System checks for each system happen at 4-6 minute
intervals.
- An alarm signal is used to make sure we don't hang up waiting
for a system to respond to an ssh; we give up after 60
seconds.
Source
Back to
Hacks
Last modified: June 20 2003 09:12 EDT
Paul A. Sand,
pas@unh.edu