Patrick Weemeeuw <Patrick.Weemeeuw@kulnet.KULeuven.ac.be> said:
I read your mail about your `Ideal Long Term Statistics Gathering
Package' with great interest. I will comment to some of your
statements. Lets see...
Patrick> 1. Architecture
Patrick> Conceptually I see 3 different tasks: data gathering,
Patrick> data aggregation and data visualisation. Common to
Patrick> these tools is a shared data repository, e.g. a msql
Patrick> database. Except for the DB, the tools are completely
Patrick> independent from each other:
Yes, this sounds good to me.
Patrick> Regarding the data gathering, 2 approaches have been
Patrick> suggested in the past:
Patrick> A. use netguard, which needs some cleaning up
Patrick> B. use some yet to be written scheduler that, as I
Patrick> understand it, acts as a caching proxy server for SNMP
Patrick> requests (see addendum 1).
There are two things to note here: You can write a simple monitoring
script that collects a couple of SNMP variables and puts them into the
DB with very few lines of Tcl (the same will be true if you are using
Perl, I guess). Things get complicated when you start to optimize your
script, e.g. the script should be able to monitor different statistics
at different time intervals from a set of different hosts efficiently.
The question is: How much optimization is really needed? (Should I
care about retrieving some SNMP variables twice?) Is it acceptable to
run many data gathering processes? If not, how can data gathering
tasks combined so that I can do them with a few processes without
making these processes difficult to understand/implement/debug?
Patrick> 2. Scalability is important
Patrick> However, I am still not convinced that this is enough
Patrick> to monitor 25 routers every 5 minutes.
I don't think 25 router every 5 minutes is a problem if you are faced
with a typical LAN environment. Walking the complete interface table
of a cisco router with 9 interfaces takes about 1.6 - 2.6 seconds
(scotty on a SparcStation 20 and a Sparc SLC on an Ethernet). You will
be able to dump 25 of these routers in a minute and you can speed this
up even more by coding the whole stuff in an asynchronous way.
Scalability gets an issue if you are faced with either hundreds of
devices or very slow or bad lines.
Patrick> - When Perl is used, an SNMP extension is probably
Patrick> needed to avoid the overhead of the fork for each SNMP
Patrick> request. Not yet readily available, but also not too
Patrick> difficult to implement I guess.
This might be doable if you start with an existing SNMP library like
the CMU library. Writing SNMP from scratch is not as simple as you
might expect. Staring at hex dumps to figure out if your SNMP code is
broken or the agent you are talking with is really no fun.
Patrick> 3. Semi-automatic configuration
Patrick> I would prefer to specify the medium somehow like "the
Patrick> serial line that connects cisco x and y", and have
Patrick> some preprocessing that figures out the IP number and
Patrick> the interface number to put in the configuration file.
Patrick> Statistics should continue smoothly, even when the
Patrick> same physical line is connected to another interface,
Patrick> another router, ...
Nice to have, but complicated to implement. Is this really needed? How
often do you change your router configuration? Wouldn't it be better
to put something like this on top of the monitoring system once it is
up and running and you feel it will save you much time?
Patrick> 4. More data gathering
Patrick> Also, we run several protocol stacks on our routers:
Patrick> TCP/IP, Vines, and sometimes appletalk. It would be
Patrick> nice to be able to see the relative and absolute
Patrick> contributions in the total traffic for each of the
Patrick> protocols (we would like to get rid of the latter two;
Patrick> this would give us some feedback on our progress :-).
To solve this problem, you need an agent that monitors your network
(e.g. RMON). You might want to take a look at the NeTraMet package
written by Nevil Brownlee. NeTraMet allows to collect traffic
statistics where SNMP tables contain rule sets which define how to
count received packets. It is a powerful tool to create usage
statistics once you understand how rule tables work.
Patrick> 5. Flexible visualisation
Patrick> Hence also on the wish list: a highly configurable,
Patrick> interactive visualisation tool that allows several
Patrick> different views on and combinations of the data.
Tell me when you have found this interactive visualisation tool. :-)
Patrick> Well, that's it for now, I guess.
I think your ideas are right. And I think that the key will be your
database. Using a database is usually a good idea. However, which
database is the right one? How does a database schema look like that
is easy and fast to use and able to support all kind of data gathering
tasks? I think the `multi router traffic grapher' is so attractive
because it is so easy to install and use.
Perhaps you might want to read RFC 1404 which describes a format which
can be used to store and exchange statistical data. It also discusses
which variables should be monitored and how data could be aggregated.
(The netguard code borrowed some ideas from this RFC.)
Juergen