Sunday, December 23, 2012

20121223 Weialgo graphing.



As I stated in an earlier post, weialgo graphing is meant to give a statistical relative understanding of how well the network is working (how sloppy it is) over time.

What does that mean?  I wanted non-network engineers to be able to look at a graph and either go "That looks pretty good." or "OMGWTFBBQ That sucks!!!", quickly.

I don't want to sit around for hours trying to explain the difference between pinging between Los Vegas and San Jose, and between Chicago and Singapore (you see, there's this big chunk of dirt that we all ride on...).

Just because Singapore is on the other side of the Earth from Chicago doesn't mean that I should hold that against Singapore.  Until space warping technology is developed that allows Singapore and Chicago to get closer to each other (Tesseract anyone?), they will exist at a fixed distance from each other.

So, if the round trip time between Singapore and Chicago is, at it's fastest, 300 milliseconds, 300 ms is now the zero point on the Y axis of the graph.

Now, if I just graph deviation from minimum, this would be a simple "jitter" graph.  The problem with this is that it doesn't really visualize the impact of the network on interactive applications that use reliable network protocols (eg: TCP).  So, there has to be another factor in the model that creates this graph.

The easy thing to do here is to come up with some multiple of the jitter factor and use a multiple.  The problem with that is that there isn't a set standard for this type of multiple, as I'm in uncharted territory with this attempt at graphing networks.  The graph data is going to be questioned, one way or the other, and if the jitter factor were modified, this will just cause a unnecessary conversation on what that multiple should be until an academic body were to suggest a proper model that everyone could agree on.

In other words, I took the easy way out.

Instead of trying to come up with a jitter multiplicative factor that would represent the actual impact to interactive application, I just shifted from relative jitter data to absolute rtt after jitter reaches a certain value.  What value is a good one to shift from jitter to rtt?  That's easy.  If you were able to put in a non-sloppy network between two different remote points, what's the maximum jitter that you think you'd be able to attain?

I picked 50ms.  Realistically, with a proper bandwidth sharing queuing system (Token Bucket-esque), and good lines, I would think that we could keep all jitter for a ping type polling system under 50ms.

The rest is mechanics.   So, into the mechanics.

I have two scripts crontab'ed.

wnrollup.sh and wnrolluphr.sh


root@server0:/mnt/ramdisk/1/weialgo5# cat wnrollup.sh
#!/bin/bash

DATEZ=$(date +%Y%m%d --utc --date='yesterday')

/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/gzip -1 -v /mnt/ramdisk/weialgo5log$DATEZ*.txt
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/mv -v /mnt/ramdisk/weialgo5log$DATEZ*.gz /storage/logs_weialgo/
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnrunreport.sh $DATEZ
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnmakehtml.sh


root@server0:/mnt/ramdisk/1/weialgo5# cat wnrolluphr.sh
#!/bin/bash

DATEZ=$(date +%Y%m%d --utc)
HOURZ=$(date +%Y%m%d%H --utc)

/bin/sleep 600
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/gzip -1 -v /mnt/ramdisk/weialgo5log$HOURZ.txt
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/mv -v /mnt/ramdisk/weialgo5log$HOURZ.txt.gz /storage/logs_weialgo/
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnrunreport.sh $DATEZ
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnmakehtml.sh



root@server0:/mnt/ramdisk/1/weialgo5#

wnrolluphr.sh is run once an hour.  wnrollup.sh is run once a day.  Obviously, depending on the number of systems that you're graphing, you might want to change the workload that you cause the server to do.  This is from one of my home boxes, I run the graphs every hour, as I'm only polling 8 to 10 devices.  At work (on a slower box no less), I'm polling almost 1000 devices, and I run the graphs once a day.

The crontab entries.  Nothing too surprising, you should be able to figure them out.

# Weialgo

55 * * * * root /usr/bin/nohup /bin/bash /storage/weialgo5/wnrolluphr.sh >> /dev/null &
3 19 * * * root /usr/bin/nohup /bin/bash /storage/weialgo5/wnrollup.sh >> /dev/null &

wnrunreport.sh

root@server0:/mnt/ramdisk/1/weialgo5# cat wnrunreport.sh
#!/bin/sh

TARBZ=$(date +%Y%m%d --utc --date='yesterday')

rm -f /mnt/ramdisk/tmp/*

cp /storage/weialgo5/weialgo5.lst /var/www/weialgo/weialgodevices$TARBZ.txt

cat /storage/weialgo5/weialgo5.lst | awk -F'/' '{print $1}' | sort | uniq > /mnt/ramdisk/routerlist.tmp

rm /var/www/weialgo/weialgosummary$1.txt

for IPADDRESSZ in `cat /mnt/ramdisk/routerlist.tmp`;
do
    echo $IPADDRESSZ report before
    rm -f /mnt/ramdisk/tmp/*
    /usr/bin/nice /bin/bash /storage/weialgo5/wngraph.sh $1 $IPADDRESSZ
    echo $IPADDRESSZ report after
done


root@server0:/mnt/ramdisk/1/weialgo5#

wngraph.sh

root@server0:/mnt/ramdisk/1/weialgo5# cat wngraph.sh
#!/bin/sh
#  $1 is the date or dates to be reported on
#  $2 is the IP address of the device being pinged
#  $3 is the name of the device that will be on the graph

echo $1
echo $2
echo $3

rm -f /mnt/ramdisk/tmp/*

ls -1 /storage/logs_weialgo/weialgo5log$1*.gz > /mnt/ramdisk/tmp/testfilelist.txt

for FILENAMEZ in `cat /mnt/ramdisk/tmp/testfilelist.txt`;
do
    NL=$'\n'
    FILENAMEZ=${FILENAMEZ%$NL}
    nice gunzip -c $FILENAMEZ | nice grep "$2," | nice awk -F',' '{print $2}' >> /mnt/ramdisk/tmp/test.csv
    echo $FILENAMEZ after
done

LINEZ=$(nice wc -l /mnt/ramdisk/tmp/test.csv | nice awk -F' ' '{print $1}' | nice head -n 1)
MINIMUMZ=$(nice sort -n /mnt/ramdisk/tmp/test.csv | nice head -n 1000 | tail -n 1)

echo $LINEZ  $MINIMUMZ

yellowcsv=/mnt/ramdisk/tmp/yellow.csv.$$
redcsv=/mnt/ramdisk/tmp/red.csv.$$
greencsv=/mnt/ramdisk/tmp/green.csv.$$
nice cat /mnt/ramdisk/tmp/test.csv | nice awk '{if (($1 - '$MINIMUMZ') < 0.05) print "0"; else if ($1 >= 0.45) print "0"; else print $1}' > $yellowcsv
nice cat /mnt/ramdisk/tmp/test.csv | nice awk '{if ($1 >= 0.45) print $1; else print "0"}' > $redcsv
nice cat /mnt/ramdisk/tmp/test.csv | nice awk '{if (($1 - '$MINIMUMZ') < 0.05) print $1 - '$MINIMUMZ'; else print "0"}' > $greencsv
YELLOWSUMZ=$(awk 'BEGIN {sum=0} {sum = sum + $1} END {print sum}' $yellowcsv)
REDSUMZ=$(awk 'BEGIN {sum=0} {sum = sum + $1} END {print sum}' $redcsv)

echo $1,$2,$YELLOWSUMZ,$REDSUMZ
echo $1,$2,$YELLOWSUMZ,$REDSUMZ >> /var/www/weialgo/weialgosummary$1.txt
sort /var/www/weialgo/weialgosummary$1.txt | uniq > /mnt/ramdisk/tmp/1.txt.$$
mv -f /mnt/ramdisk/tmp/1.txt.$$ /var/www/weialgo/weialgosummary$1.txt
DEVICENAMEZ=$(nslookup $2 | grep name | awk -F'=' '{print $2}' | awk -F' ' '{print $1}' | head -n 1)

gnuplotconfig=/mnt/ramdisk/tmp/testgnuplotconfig.$$
echo "set terminal png size 2800, 1440" > $gnuplotconfig
echo "set terminal png font \"/storage/weialgo5/arial.ttf\" 20" >> $gnuplotconfig
echo "set output '/var/www/weialgo/$2_$1.png'" >> $gnuplotconfig
echo "set key bmargin center horizontal Right noreverse enhanced autotitles box linetype -1 linewidth 1.000" >> $gnuplotconfig
echo "set title '$3 $2 $1 $DEVICENAMEZ'"  >> $gnuplotconfig
echo "set xrange [ 0 : $LINEZ ] noreverse nowriteback"  >> $gnuplotconfig
echo "set yrange [ 0 : 0.7 ] noreverse nowriteback"  >> $gnuplotconfig
echo "set style line 1 lt rgb 'red' lw 1"  >> $gnuplotconfig
echo "set style line 2 lt rgb 'yellow' lw 1" >> $gnuplotconfig
echo "set style line 3 lt rgb 'dark-green' lw 1"  >> $gnuplotconfig
echo "plot '$redcsv' with impulses ls 1 title 'Network Lost', '$yellowcsv' with impulses ls 2 title 'Latency Warning', '$greencsv' with impulses ls 3 title 'Relative Latency'" >> $gnuplotconfig
cat $gnuplotconfig
nice gnuplot $gnuplotconfig

echo gnuplot run

root@server0:/mnt/ramdisk/1/weialgo5#

wngraph.sh is where the "magic" happens....  The process is simple, extract the data, seperate it into three different plot files.  Use gnuplot to output a png graph of the data.

That's it.  Ping thousands of devices and take those thousands of pings to thousands of devices and make graphs of the data.

If anyone wants a copy of the scripts that I'm using, send me an email, I'll email you a tar.bzip2 file of what I have.  robluce1 @ yahoo . com

No comments:

Post a Comment