Some time ago there was a discussion on this mailing list about
different possible approaches to collecting long term statistics
(parts of this discussion enclosed as addendum 1).
Recently, MRTG (multi router traffic grapher) was released, an easily
configurable tool that produces nice graphs of interface load (short
term & long term).
I'm quite pleased with the contribution of Mr. Oetiker, as it solves
my most urgent needs right now. Nevertheless, I would like to propose
my desiderata for the "Ideal Long Term Statistics Gathering Package".
As I might spend some time on implementing some elements of this, I
think it is a good idea to exchange ideas.
1. Architecture
Conceptually I see 3 different tasks: data gathering, data aggregation
and data visualisation. Common to these tools is a shared data
repository, e.g. a msql database. Except for the DB, the tools are
completely independent from each other: the visualisation could
generate HTML pages, use the BLT toolkit to visualise interactive
graphics on the screen, or be part of tkined somehow.
Regarding the data gathering, 2 approaches have been suggested in the
past:
A. use netguard, which needs some cleaning up
B. use some yet to be written scheduler that, as I understand it,
acts as a caching proxy server for SNMP requests (see addendum 1).
I'm not sure whether netguard offers the functionality that I need (at
least it should be extended to store the data in a database; perhaps I
should get it running before making a final decision). I feel more
inclined to hack together my own data collection script, and to relay
the SNMP requests (in a later stage of development) to the scheduler
of plan B. The MRTG package could possible serve as a quick start:
cut it in two parts (gathering & visualisation), and put a database in
between. [This has the additional benefit that I can still use Perl;
I am much more familiar with Perl than with Tcl. Besides, as the
package becomes more modularised, the actual choice of implementation
language doesn't matter that much any more. On the other hand, this
is a good occasion to learn Tcl in depth (also on the todo list).]
2. Scalability is important
- Splitting off the visualisation makes the data gathering much more
efficient
- Several data gathering processes might run in parallel, relying
on the atomicity of database updates to resolve concurrent accesses.
- When Perl is used, an SNMP extension is probably needed to avoid the
overhead of the fork for each SNMP request. Not yet readily available,
but also not too difficult to implement I guess.
However, I am still not convinced that this is enough to monitor
25 routers every 5 minutes.
3. Semi-automatic configuration
I would prefer to specify the medium somehow like "the serial line that
connects cisco x and y", and have some preprocessing that figures out
the IP number and the interface number to put in the configuration
file. Statistics should continue smoothly, even when the same physical
line is connected to another interface, another router, ...
This would make the whole package immune for reconfigurations of
routers, changes of interfaces, ...
This has implications for the data gathering process: the same
interface can appear in the configuration file as a connection point
for a serial line and at the same time as an interface of a router.
This data should only be collected once (but stored twice in the
database: once as router interface data, once as serial line end point
data). This is exact the functionality described for the off line
monitoring task in the mail of Mr Schoenwaelder attached below.
4. More data gathering
Besides input and output packets, I'm also interested in the number of
dropped packets and errors for a given interface. Also, we
run several protocol stacks on our routers: TCP/IP, Vines, and
sometimes appletalk. It would be nice to be able to see the relative
and absolute contributions in the total traffic for each of the
protocols (we would like to get rid of the latter two; this would give
us some feedback on our progress :-).
5. Flexible visualisation
The MRTG graphs are great for serial lines and individual interfaces;
however, if I wanted a global overview of the load on our FDDI ring, I
would like to be able to graph the total of output packets over all
the FDDI interfaces. Hence also on the wish list: a highly
configurable, interactive visualisation tool that allows several
different views on and combinations of the data. (However, for me the
configurability of the visualisation is clearly much less important
than the data gathering and aggregation: in the worst case I could put
together an ad hoc script to visualise the combination of data that
I'm interested in).
Well, that's it for now, I guess. Any comments are greatly
appreciated.
Patrick.
-- Patrick Weemeeuw, network manager K.U.Leuven, KULeuvenNet, currently at the Dept. of Computer Science Celestijnenlaan 200 A, B-3001 Leuven, Belgium Tel: +32 16 327635 Fax: +32 16 327996 E-mail: patrick.weemeeuw@kulnet.kuleuven.ac.be
----------------------------------------------------------------------
Addendum 1
>From owner-tkined Thu Jan 26 21:38:47 1995 Return-Path: schoenw@ibr.cs.tu-bs.de Received: from data.ibr.cs.tu-bs.de [134.169.34.7] by ra.ibr.cs.tu-bs.de (8.6.9/tubsibr) with ESMTP id VAA28938; Thu, 26 Jan 1995 21:38:47 +0100 Received: from schoenw@localhost by data.ibr.cs.tu-bs.de (8.6.9/tubsibr) id VAA06216; Thu, 26 Jan 1995 21:38:45 +0100 Date: Thu, 26 Jan 1995 21:38:45 +0100 From: Juergen Schoenwaelder <schoenw@ibr.cs.tu-bs.de> Message-Id: <199501262038.VAA06216@data.ibr.cs.tu-bs.de> To: newsham@aloha.net CC: tkined@ibr.cs.tu-bs.de In-reply-to: <m0rSWou-000a0JC@hookomo> (newsham@aloha.net) Subject: Re: long term statistics with tkined Reply-to: schoenw@ibr.cs.tu-bs.de
Hi!
On Thu, 12 Jan 1995 11:12:36 -1000 (HST), newsham@aloha.net (Timothy Newsham) said:
Timothy> Now, is there a collection of tools for use with tkined Timothy> other than the ones that come with tkined and scotty? Is Timothy> there a repository of scripts?
There is a contrib archive on our server, but there are not too many scripts... I would like to anything you may want to share :-)
Timothy> Is anyone using scotty and/or tkined for collecting long Timothy> term statistics on their network? In particular I want to Timothy> monitor the interface load on our routers with daily plots Timothy> of usage and then perhaps a long term plot showing day Timothy> averages. I know I can take the snmp-monitor scripts and Timothy> mangle them into what I need but before I did that I thought Timothy> I'd ask if anyone else has.
You could hack the monitor script or you could hack the netguard code (which needs some hacking to get it going).
In the long term, I would like to use the SNMP agent code of scotty and the MLM MIB to do (offline) monitoring tasks. The idea is to break all the monitoring jobs into very basic tasks that subscribe requests for network information which is retrieved by a scheduler. This scheduler will be able to do optimizations to reduce the number of requests (which is not possible with the netguard solution). The results will be stored in the MLM result table so that any SNMP manager can make use of the information (not only tkined and scotty).
There is another related thing that I want to change: Currently, the monitoring scripts create a stripchart or a barchart tkined object. Monitored values are written to this object. I would like to replace the stripchart and barchart objects with something more general, lets call it a data stream object. This allows to change data representation while a monitoring job is running and it. It would be very easy to write the data stream into a file to get long term statistics (if you can run tkined 24 hours a day).
Note, that I have no idea when these things will happen.
[ ... ]
Juergen