Monday, January 7, 2013

20130107 IPv6, Layer 2 is dead, version 2


IPv6 has a big enough address space that it's more than reasonable to consider subnetting to the port.  Effectively this means Layer 3 routing at a port level for an entire enterprise network.

The benefits to this are many, and should be straight forward to understand.

  • Reliability
  • Performance
  • Fault tolerance
  • Simplicity
  • Security

I'm not going to go into any depth at this time on those bullet points, I'll save that for future articles.

Here's an example interface config for a Cisco L3 capable switch.

interface GigabitEthernet1/0/19
 no switchport
 ip address 10.0.4.9 255.255.255.248
 no ip redirects
 no ip unreachables
 ip pim sparse-mode
 ipv6 address FDAA:AAAA:AAAA:AA00:0:13:7F77:1/96
 ipv6 enable
 ipv6 nd other-config-flag
 no ipv6 redirects
 no ipv6 unreachables
 ipv6 verify unicast source reachable-via rx allow-default
 arp timeout 60

One of the items that I've learned since Version 1 of this post is that some devices *do* consider EUI-64 to be a hard set standard, and will not accept the variable subnet mask on the interface.

EUI-64 is a mistake, as integrating the MAC address into the IP address doesn't enhance the function of IP, and doesn't provide the additional level of security it was intended to (RFC 4941).  Converting the network to completely L3 to the port performs most of the intended functions of EUI-64 much better than EUI-64 ever could.

So, the /96 bit subnet mask wouldn't be possible with the way the current RFC is being interpreted right now by some devices it seems.

So, EUI-64 on every port is the direction that this standard is going to push us.  The idea of L3 to the port isn't going to go away once enterprise network engineers understand the concept.  It provides too many benefits to the enterprise network, particularly in the area of security.  (But, performance and reliability are key as well).

The issue is, at this time, EUI-64 wasn't constructed with the idea of port level layer 3 in mind.  As big as a 64 bit subnet address space seems, allocating subnets per port wasn't in the minds of the designers.

That doesn't mean per-port layer 3 is wrong, quite the opposite.  Layer 3 to the port is a worthwhile goal.  EUI-64 simply needs to be changed to accommodate it.

Before dismissing the idea of a complete layer 3 network, keep in mind that most ideas like this in IT were dismissed as impractical, but later found to be beneficial to the environment. Some examples are enterprise networks completely converting to all switch ports, or of computers using more than 64k of memory, or 640k of memory. IT is filled with concepts like these that didn't seem practical at the time, but eventually did come to pass. The Internet itself as an idea is probably the best example of this.

This idea will happen in the enterprise, it's simply a matter of this decade, or the next.  It would be better if the standards were scaled appropriately to the eventuality.

Rob

20130107 Hyperbole, Analogy, Similie and Metaphor



First of all, I want to apologize to anyone that understands the concepts I am talking about, without all of my figures of speech.

I live in a world where very few people that I know, truly understands what I'm trying to say.  Most of the time, when I try to explain an idea, or a process, I get blank looks.  Some people even have the audacity to point fingers and tell me that I'm the one at fault.  Some, simply because they don't understand or have a background they can pull from; some simply because they don't really care.

As an example, Weialgo.  The concept is simple, ping stuff, and then graph it en mass, and use those graphs to get a better understanding of how applications work over your network.  Some people just don't understand all the technical talk, but they could, if they wanted to, understand green is good, yellow means caution, and red is bad or stop.  Strangely, only a few people have ever realized what I was saying with this.  Few being a handful; almost everyone else that I have talked to about this simple idea has ridiculed it or told me I was crazy or wasting my time.  They ask questions like, "Who else is doing this?"  "Wouldn't HP be doing this if it was that useful?"  "Why don't we hear about this from Cisco if it's so great?"

To me, weialgo is about as plain of a concept as I believe it could be.  There are no drawbacks.  While some may debate the usefulness of the information gathered by pings, tell me, what's the harm?  The information gathered is very useful, in my eyes it's critical for some things.  Why is it so hard for people to understand?

Over the years, to help illustrate my points, I've developed a method of explanation using hyperbole, analogy, similie and metaphor.  These simplified explanations have progressively gotten even simpler over the years, trying to make it easier for the non-technical audience to understand.  At times though, I forget there are people out there that are more technical and do understand concepts at their core.

Second case in point.  IPv6, RIP Layer 2.

Looking back at it, I could probably go on for an hour about everything wrong with that post. But, my reasoning behind escalating the use of 'figures of speech' is, I can't get anyone to listen to me.  If people really would stop and understand the impact of what I am saying in that post, I would get an immediate response.  However, since I didn't receive a response, I decided to try a bigger audience.

I cross posted it to Reddit.

Now, in hindsight, this was a huge mistake.  But not for the reasons you may think.

On reddit.com/r/networking, there were people that actually understood what I was saying.  Of course, I received the usual "you're a moron", "this is the dumbest thing I've ever read" comments.  I couldn't care less about that, I'm used to it.

What really bothered me was that there were some people on /r/networking that actually seemed to understand the concept.  But all of the layers of hyperbole, analogy, similie and metaphor got in the way of their understanding.  The people that I wanted to reach were put off, by the very things I was trying to do to make the concept more understandable.

There are people in this world that I don't need to use all of the figures of speech with.  They understand what I'm saying, without having to explain it like it was a different language.

At that point, I realized that I had made a huge mistake.  I realized that my manner of explanation was insulting to everyone.  If someone didn't understand what I was saying, I was insulting them by "talking down to them" with the 'figures of speech' I was using.  If someone did understand what I was saying, I was insulting them by insinuating that they couldn't understand the idea I was trying to relate.

It led to several days of soul searching.  Really.  Am I  really where I want to be?  Does the idea of spending the rest of my life talking down to people help me or anyone I'm trying to help?  What would it be like to work somewhere that had people at a level that I didn't have to explain everything using simplified, and in many cases, inaccurate due to simplification, descriptions of everything?

So, first, I want to repeat.

I apologize to anyone that understands the concepts that I'm trying to convey without all of my 'figures of speech'.  I really don't mean to insult anyone.

Second.

Within this blog, I'm going to try to write (bad habits are hard to break, forgive me if I slip) as though everyone reading this blog can understand what I'm saying without covering it up with 'figures of speech'.

That's it.  Have a great day.

Wednesday, January 2, 2013

20130102 IPv6: RIP Layer 2



This has been another long standing topic for me.  IPv6.  Everyone needs to learn it, and implement it, as fast as possible.  But, not of ANY of the reasons that you have heard up to this point.

Yes, yes....  IPv4 is running out of addresses....  Except, like petro, they seem to keep finding more of them all the time.  But, there is a point where squeezing more v4 addresses out of that 32 bit shale wont make any sense.  But, so far, they've been able to frak their way to more addresses to keep the IPv4 Internet happy and moving at meme speeds.

When it comes to IPv6 conversations, on why you'd want to go to v6, it basically goes like this.

  • OMG We're going to run out of addresses!!!  (Yes, and peak oil was in 1995!!!)
  • IPv6 is better thought out than v4.
  • IPv6 has a streamlined (although larger) header.
  • Removes some of the processing routers have to do with IPv4.  (Fragmented packets, etc)
  • IPv6 is compatible with v4 at a sockets layer.
  • DHCP is dead.  Long live DHCP.
  • IPv6 builds into the protocol some things which are optional in v4.  (Multicast, IPSec, etc)
  • And... everybody's doing it...   You -know- you want to do it....  (Slap the sales guy when he does this.  I mean it, smack him.)

And, that's about as far as it gets.  I hate to say this, but most of that stuff is so incredibly technical, only the most hard core of network engineers can stay awake while talking about it (and, if it actually gets you excited, welcome to the club).  Honestly, if you want to make normal people go to sleep, start talking about IPv6 in depth.  Sends them right to that "I'm pretending to care about something meaningful but really only care about happy cute kitty pictures that I see on reddit.com" droopy eyelid flutter.  You know, the dozing off to sleep look that people get while listening to a college professor drone on about the inner workings of the financial system as it applies to banking balance sheets, or chemical chains related to cholesterol conversion by the mitochondria.  Unfortunately, people always end up doing that little head shake and snap back to attention and try to pretend they were paying attention.  I kinda wish they would quit wasting our time and just fall right to sleep sitting there.  Sleep is good, and maybe, after they've napped awhile, they'd wake up and be a bit more interested in the things that keep them alive and make humanities existence on this dirtball better for everyone.  One can only dream...

Whoops...  Back on course Rob...  Anyways, where was I?  Oh Yeah...

So, those bullet-points are bull.  Well, mostly bull, simply because  #1  The address prophets have been wrong up to this point, and will probably continue to be wrong into the near future.  #2  The rest of the stuff doesn't fix anything broken, it just improves some stuff in IPv4 that's "less than optimal".

So, any of the droopy-eyes that actually stayed awake long enough to get half of an understanding of why to do an IPv4 to IPv6 conversion are thinking at this point that the Network Engineer is  #1  Wrong (because doom and gloom IPv4 addresses haven't run out).  #2  Doesn't have a life (and should get a hobby like normal people instead of worrying about when DHCP leaches are going to run out).

And, if the droopy-eyes do a little research, they find out that IPv6 has some (GASP!) very scary drawbacks.
  • Complete retraining for ALL IT people, not just the network engineers.
  • Application compatibility questions (You mean you're NOT supposed to hard code IPv4 addresses into your applications?)
  • Vendors milk the $#!+ out of this conversion.   Secret vendor code for a organization doing an IPv4 to IPv6 conversion:  CHA-CHING!!!
  • The consultants say that you're doing it wrong.  (Doesn't matter how you are doing it, you're doing it wrong)
  • Not a simple conversion that you can do in a weekend for a organization with non-trivial network.  ("Don't worry Billy-Bob, we'll be done by Saturday night.  Wouldn't want to cut into beer-thirty.")
  • And.... after you're done, IPv4 will still be alive and well in your network.

Once droopy-eyes figures all of this out, the first thing he'll do is make a "Can't believe a word they say" mental sign for the network person involved, and that will be the end of the conversation.

Obviously, I believe, us engineering types are having the wrong conversation, and with the wrong reasons.

Complimentary TLDR header...  You're welcome.


Here's what most network people miss (it's not obvious, don't feel bad).

IPv6 allows for much greater flexibility in subnetting.

Yep, that's it.  And, that's HUGE!  As in OMGWTFWIT HUGE!

First of all, let's remember back a few years...  Back before everyone was running Ethernet.  Back before Layer 3 switches.  Back before Layer 2 switches.  Way back in time, before a company called Kalpana killed off every networking technology other than Ethernet (RIP Token Ring).

Way back in time, before the mid-90's, everything was Layer 1.  Yes, for those of you that aren't network people, we talk about Layers in networking, it's a code word, kinda like a secret handshake.  Back in the bad old days of Layer 1, networks of any real size were a pain to keep running.  Along comes the concept of the "Ethernet Switch", and networking has never looked back.  Switches broke up the large single Layer 1 networks into much smaller Layer 1 segments, connected into a single large Layer 2 broadcast domain.

Thank you Kalpana, for destroying the scourge of networking, the fire breathing dragon of Layer 1, and making Ethernet so wildly overpowered that every other networking standard is practically dead.

Now, I'm going to point out what should be obvious.  

IPv6 plus Layer 3 switches means the same thing for Layer 2.  Death to Layer 2.

What???  Heresy I say!  Heresy!   I shall put carrot sticks in my ears until you stop with this utter sense!  I live by the abuse I receive from Layer 2!  I shall not turn from the scourging that I receive from spanning tree freaking the hell out every time someone plugs a DLink switch into the network twice in a sales conference room!  Begone with your words of common sense you blogger you!

Wait for a second before you call me a nut.  Layer 2 is not good, it's just been a (un)necessary evil due to limitations in how networking technology has worked since Bob Metcalfe made commodity networking technology available to everyone (and every organization).  If we think of breaking up our Layer 2 networks the same way we broke up our Layer 1 networks twenty years ago, all of a sudden, life becomes much easier.  It becomes much better, in nearly every way.

  • Reliability
  • Performance
  • Fault tolerance
  • Simplicity
  • Security

Before I address any of these, I just want to say this.  I've heard of organizations that have the stereotypical "One Single Huge Flat %@#$%#! Network".  You know, the University campus with 70 thousand students all hooked up to 10.0.0.0/8.  The networks where one PC sending out a stream of broadcast packets can shut the entire city down.  The networks where, when you call up the company looking for a quote, the person taking the phone call says "our network is down right now, can I call you back with that price?"

Big flat Layer 2 networks are the bane of IT, which means they should be the bane of humanity.  If you've grown up living on a Single Huge Flat Network, I'm here to say this...   THAT'S NOT NORMAL!  Just because you grew up watching your parents beat each other does not mean that you should go out and get a spouse and beat them just so that you can be like your parents.   BIG LAYER 2 IS BAD (NO SPOUSE BEATING)!  Same way as big Layer 1 was bad (ok, big Layer 1 is worse than big Layer 2, but networks were smaller back then).

Let's say that you take every layer 3 switch that you have and stop using it as a "switch".  "no switchport" every interface, and use them as 48 port routers instead.  This idea would be pure crazy in IPv4-world.  In IPv6-world, actually, it makes sense.  If we do something other than EUI-64 (which is a bad (ethically terrible?) standard, no other way to say it, the MAC address to IP address spec is badly implemented, and moot the moment that RFC 4941 became standard), say a /96 instead of a /64, and put that on each interface of all of our Layer 3 switches, each switch port becomes it's own Layer 3 network.  Each network could handle a subset of 32 bits (something less than 4 billion) addresses.  But, if we assign a /64 to a site, as recommended  that would give us 2^32 number of Layer 3 ports, each with roughly 2^32 total available addresses on each port.

If your site is using nothing but Layer 3 switches, that means every cable in the site is it's own Layer 3 network.  Hundreds, thousands, millions(?), of Layer 3 networks, all working together on the same network.  

No Spanning Tree.  Bye bye STP, we don't need you anymore.  Plug in two cables into the same DLink switch?  No problem, now all of the PC's on that DLink see two networks instead of STP (hopefully) getting engaged, BPDU guard shutting them down, or the network going down because it's one of those neat little switches that block STP.

No PC's freaking out and taking down huge sections of the network with broadcast storms.  Broadcasts are only between the PC and the switch.  IGMP and CGMP are gone, as they should be.  Multicast is tightly controlled by default, no extra work needed.

No "Default Gateway".  Network traffic is localized and can be routed via multiple ingress and egress points.  Bye bye default gateway as we knew you.  We can now have multiple "gateways" functioning concurrently and in parallel into any network environment.  It's trivial almost.

Routing protocols handle all uplink traffic.  Want to hook a closet switch up to multiple backup paths?  No problem.  Rerouting traffic between and through closets becomes as fast and easy as the routing protocol you use.  If you keep it organized and have each switch assigned it's own /80, then it can send out that single summarized /80 instead of each of the individual interface /96 networks.  On a single /64 campus, that means you can have 65536 different switches, each with a potential 65536 Layer 3 ports.  (Quick Cisco note, EIGRP makes this easy IMO)

The concept of a VLAN is gone, forever.  Yes, gone, good bye, don't let the door hit your trunking protocol on the way out.  Stop your internal dialog, VLANs are a bad kludge, and that's all they ever were.  Good riddance to bad rubbish.  VTP is the Devil.  (Now, I'm saying it this way for effect, simply because VLANs are a kludge, and kludges should be the very rare exception, not the rule.  With IPv6 and Layer 3 switches, you don't need to kludge anymore, kludging is not normal, stop beating your spouse.)

Setting up a "no switchport" Layer 3 IPv6 network is much easier than Layer 2.  You'll just have to trust me on this one.  I'll take a post or two and demonstrate this.  Compared to Layer 2, IPv6 L3 is simplicity itself.

But, the three most important reasons for IPv6 and only Layer 3 switching to the port.  Security, Security, Security.  This should take all of 30 seconds to realize, and I guarantee 99% of network engineers haven't thought of it.  And, it is because of the UNBELIEVABLE BENEFITS TO SECURITY, that we'll get to implement IPv6 as a way to eliminate Layer 2.

I'll leave it here.  Obviously there are a bunch of concepts here, and if I could have written a small book on Weialgo, advocating for the elimination of Layer 2 at a switchport level the same way we eliminated Layer 1 would end up being a very large book.  By no means did I cover every reason to do this in this post, but, hopefully it gets you thinking in that direction.

My next post will be an example of one of the configs that I'm using for this.

Let me know what you think.

Rob

Wednesday, December 26, 2012

20121227 Green computers are cool.


This is about computers, hang with me or skip to the end....

Ok, I'm not a "green" person.  Personally, the entire argument is a waste of time, and the people who started the argument know it.  Both sides of the argument are wrong, and if they stop shouting people down long enough to talk rationally about it, they'd have to admit to it.

People burn hydrocarbons of different forms as a way to stay alive.  Yep, it's that simple.  If we stop burning hydrocarbons on a mass scale, a whole bunch of someones have to stop living.  There are only 3 options to the entire debate.  #1 Invent a *REAL* technological alternative which is better than hydrocarbons (or creates hydrocarbons from an acceptable source).  Or, #2, kill everyone.  Or, #3, deal with it.

#1 will happen automatically the moment someone invents it.  No one will need to be a pinhead if someone invents a free energy machine that provides a 1 gigawatt baseload power source with an usable footprint that is safe and cost effective.  The Green Revolution will be automatic and will happen at Internet speed...  all by itself.  The less the greenie heads are involved, the faster it will happen.  So, if you're a greenpeace-r and the magic energy source shows up, stay calm and let paradise on earth happen without being a pain in the ass.

#2 is just stupid.  If you're verbal about population control, shut up and do something productive.  There's only one effective method of population control, historically or statistically speaking.  Make everyone happy. This is pathetically easy to show, and most anyone who's looked into the subject knows it's the answer.  The better off a population is (which can be measured by how much electrical power is available to them), the fewer kids it has.  But, usually, the people who are verbal about how there are too many people on the planet are the ones that advocate for the policies that make people more miserable.  Every time I hear someone say something along those lines "There are too many people!  Let's make the situation worse by making everyone miserable!" I just want to beat the idiot senseless... but that would be purposeless because they have no sense to start with.  To reduce CO2 output to the levels they would need in the timeframes they are talking, you'd have to remove far too many people from the planet for anyone to accept.  Besides, the Population Controllers would be the first ones removed. which would take us right back to options #1 or #3.

#3 is what's going to happen, period.  No reason to shout about it, it's just the fact.  Stop your internal dialog, and don't bother giving me any lip.  That's what's going to happen barring #1 happening.  Get over it.

I feel so much better now that I have that off my chest.... where was I?  :-)

---------------------------In your face reality ends here -----------------------------

------------------------Dialog about computers starts here --------------------------

So, having said all that...  I have always had a thing for efficiency.  Mainly because I have a engineers mentality.  If you have a choice between X that uses Y amount of power, and X that uses Y/2 amount of power, why wouldn't you use the more efficient of the two choices?

My uber cool Intel I7 has finally reached the point that I can't take it any more.  The motherboard ethernet ports died over a year ago.  The USB has been acting flaky for longer than that, but recently has taken to just turning off after an hour or two of operation (Blogger auto-saves are a great feature).  The past few months, the computer will just hang solid, blue screen randomly, lock up in the bios screen or during post or during the Microsoft F8 RAM test.

So, that's it.  Can't take it anymore, I need a new computer.  So, I started what I do every time I have something like this.  I made a spreadsheet, and started trying to create a model that will tell me which options will give me the perfect combination that I desire.  (It took me 3 years to pick a wood stove.  7 years to pick a kit plane to build.  Our current dog took 6 months of breed study.  And, I wish we took more than an hour to pick out a dishwasher (piece of junk).)

https://docs.google.com/spreadsheet/ccc?key=0AvDx0QSgEqOodGRhb21GeTJPd2pSTm8tdnFGN3habkE

In the old days, it was normally just Price and Performance.  A simple Performance Benchmark (or combination of benchmarks) divided by price, gave a simple answer, and then factor in how much I was willing to spend, and voila!  Rob has his new computer.

Now days, I factor in another number (which makes the entire analysis/decision process that much more fun and interesting).  Watts.  Basically, how much power is the entire system going to use.  There are a number of places that you can get the designed Watt TDP figure for each CPU and combination of options.

The main reason for this is that it's not uncommon for high end systems to run 500 to 1000 watts or more.  A 500 watt system will burn alot of MONEY in power over a month if you're running it hard.  If you factor in a years amount of power use (or two years, or three, depending on how long you normally go before replacing your computers), then that can really change how you view each system.

If you run your computers 24x7, idle most of the time, knowing the idle load of each system becomes important.  If you only turn them on a few times a month, that's important as well.

All of this helps buy the correct system.  It's not hard to build a system which could average 200 watts/hr.  And, if you average that over a year with my local power cost of $0.15 a kwhr (yes, I have cheap US electric, thank you very un-green fracking).  With a 3 year replacement schedule, that's $750 over that timeframe.  If a slightly more expensive system uses half that power (between AMD and Intel, it's entirely possible), you can justify upgrading to a FASTER system which is also more power efficient.

Obviously, if you're some poor soul that lives where power is more expensive, that makes it that much easier to justify the better, faster, more power efficient systems.

All of this information is out there on a variety of websites...  Just takes a little bit of time to collect it and put it together.  I will say this, Atom boards are excellent for total cost if your processing requirements are modest.  Trick out an atom, and it's cheap compute, up to a limit of course.

So, help the greenpeace-rs.  They want you to cut back on your power use.  If it makes sense, use less power by buying a better faster computer.

Or, you could invent a dark energy machine and usher the world into the new Age of Star Trek.  There are a couple technologies which do hold promise, but I'm not holding my breath.

Rob

Sunday, December 23, 2012

20121223 And.... all of those scripts make this:

OMGWTFBBQ

This is the home Internet link for the last 18 hours. Moo.










Same connection before the dark times came.


Edit:  It's official.  I asked my 7 year old which graph was "good" and which one was "bad".  He correctly identified the good and bad graphs.  I think these graphs are now at the level that management can use them.

20121223 Weialgo graphing.



As I stated in an earlier post, weialgo graphing is meant to give a statistical relative understanding of how well the network is working (how sloppy it is) over time.

What does that mean?  I wanted non-network engineers to be able to look at a graph and either go "That looks pretty good." or "OMGWTFBBQ That sucks!!!", quickly.

I don't want to sit around for hours trying to explain the difference between pinging between Los Vegas and San Jose, and between Chicago and Singapore (you see, there's this big chunk of dirt that we all ride on...).

Just because Singapore is on the other side of the Earth from Chicago doesn't mean that I should hold that against Singapore.  Until space warping technology is developed that allows Singapore and Chicago to get closer to each other (Tesseract anyone?), they will exist at a fixed distance from each other.

So, if the round trip time between Singapore and Chicago is, at it's fastest, 300 milliseconds, 300 ms is now the zero point on the Y axis of the graph.

Now, if I just graph deviation from minimum, this would be a simple "jitter" graph.  The problem with this is that it doesn't really visualize the impact of the network on interactive applications that use reliable network protocols (eg: TCP).  So, there has to be another factor in the model that creates this graph.

The easy thing to do here is to come up with some multiple of the jitter factor and use a multiple.  The problem with that is that there isn't a set standard for this type of multiple, as I'm in uncharted territory with this attempt at graphing networks.  The graph data is going to be questioned, one way or the other, and if the jitter factor were modified, this will just cause a unnecessary conversation on what that multiple should be until an academic body were to suggest a proper model that everyone could agree on.

In other words, I took the easy way out.

Instead of trying to come up with a jitter multiplicative factor that would represent the actual impact to interactive application, I just shifted from relative jitter data to absolute rtt after jitter reaches a certain value.  What value is a good one to shift from jitter to rtt?  That's easy.  If you were able to put in a non-sloppy network between two different remote points, what's the maximum jitter that you think you'd be able to attain?

I picked 50ms.  Realistically, with a proper bandwidth sharing queuing system (Token Bucket-esque), and good lines, I would think that we could keep all jitter for a ping type polling system under 50ms.

The rest is mechanics.   So, into the mechanics.

I have two scripts crontab'ed.

wnrollup.sh and wnrolluphr.sh


root@server0:/mnt/ramdisk/1/weialgo5# cat wnrollup.sh
#!/bin/bash

DATEZ=$(date +%Y%m%d --utc --date='yesterday')

/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/gzip -1 -v /mnt/ramdisk/weialgo5log$DATEZ*.txt
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/mv -v /mnt/ramdisk/weialgo5log$DATEZ*.gz /storage/logs_weialgo/
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnrunreport.sh $DATEZ
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnmakehtml.sh


root@server0:/mnt/ramdisk/1/weialgo5# cat wnrolluphr.sh
#!/bin/bash

DATEZ=$(date +%Y%m%d --utc)
HOURZ=$(date +%Y%m%d%H --utc)

/bin/sleep 600
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/gzip -1 -v /mnt/ramdisk/weialgo5log$HOURZ.txt
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/mv -v /mnt/ramdisk/weialgo5log$HOURZ.txt.gz /storage/logs_weialgo/
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnrunreport.sh $DATEZ
/usr/bin/nice --adjustment=19 /usr/bin/ionice -c 3 /bin/bash /storage/weialgo5/wnmakehtml.sh



root@server0:/mnt/ramdisk/1/weialgo5#

wnrolluphr.sh is run once an hour.  wnrollup.sh is run once a day.  Obviously, depending on the number of systems that you're graphing, you might want to change the workload that you cause the server to do.  This is from one of my home boxes, I run the graphs every hour, as I'm only polling 8 to 10 devices.  At work (on a slower box no less), I'm polling almost 1000 devices, and I run the graphs once a day.

The crontab entries.  Nothing too surprising, you should be able to figure them out.

# Weialgo

55 * * * * root /usr/bin/nohup /bin/bash /storage/weialgo5/wnrolluphr.sh >> /dev/null &
3 19 * * * root /usr/bin/nohup /bin/bash /storage/weialgo5/wnrollup.sh >> /dev/null &

wnrunreport.sh

root@server0:/mnt/ramdisk/1/weialgo5# cat wnrunreport.sh
#!/bin/sh

TARBZ=$(date +%Y%m%d --utc --date='yesterday')

rm -f /mnt/ramdisk/tmp/*

cp /storage/weialgo5/weialgo5.lst /var/www/weialgo/weialgodevices$TARBZ.txt

cat /storage/weialgo5/weialgo5.lst | awk -F'/' '{print $1}' | sort | uniq > /mnt/ramdisk/routerlist.tmp

rm /var/www/weialgo/weialgosummary$1.txt

for IPADDRESSZ in `cat /mnt/ramdisk/routerlist.tmp`;
do
    echo $IPADDRESSZ report before
    rm -f /mnt/ramdisk/tmp/*
    /usr/bin/nice /bin/bash /storage/weialgo5/wngraph.sh $1 $IPADDRESSZ
    echo $IPADDRESSZ report after
done


root@server0:/mnt/ramdisk/1/weialgo5#

wngraph.sh

root@server0:/mnt/ramdisk/1/weialgo5# cat wngraph.sh
#!/bin/sh
#  $1 is the date or dates to be reported on
#  $2 is the IP address of the device being pinged
#  $3 is the name of the device that will be on the graph

echo $1
echo $2
echo $3

rm -f /mnt/ramdisk/tmp/*

ls -1 /storage/logs_weialgo/weialgo5log$1*.gz > /mnt/ramdisk/tmp/testfilelist.txt

for FILENAMEZ in `cat /mnt/ramdisk/tmp/testfilelist.txt`;
do
    NL=$'\n'
    FILENAMEZ=${FILENAMEZ%$NL}
    nice gunzip -c $FILENAMEZ | nice grep "$2," | nice awk -F',' '{print $2}' >> /mnt/ramdisk/tmp/test.csv
    echo $FILENAMEZ after
done

LINEZ=$(nice wc -l /mnt/ramdisk/tmp/test.csv | nice awk -F' ' '{print $1}' | nice head -n 1)
MINIMUMZ=$(nice sort -n /mnt/ramdisk/tmp/test.csv | nice head -n 1000 | tail -n 1)

echo $LINEZ  $MINIMUMZ

yellowcsv=/mnt/ramdisk/tmp/yellow.csv.$$
redcsv=/mnt/ramdisk/tmp/red.csv.$$
greencsv=/mnt/ramdisk/tmp/green.csv.$$
nice cat /mnt/ramdisk/tmp/test.csv | nice awk '{if (($1 - '$MINIMUMZ') < 0.05) print "0"; else if ($1 >= 0.45) print "0"; else print $1}' > $yellowcsv
nice cat /mnt/ramdisk/tmp/test.csv | nice awk '{if ($1 >= 0.45) print $1; else print "0"}' > $redcsv
nice cat /mnt/ramdisk/tmp/test.csv | nice awk '{if (($1 - '$MINIMUMZ') < 0.05) print $1 - '$MINIMUMZ'; else print "0"}' > $greencsv
YELLOWSUMZ=$(awk 'BEGIN {sum=0} {sum = sum + $1} END {print sum}' $yellowcsv)
REDSUMZ=$(awk 'BEGIN {sum=0} {sum = sum + $1} END {print sum}' $redcsv)

echo $1,$2,$YELLOWSUMZ,$REDSUMZ
echo $1,$2,$YELLOWSUMZ,$REDSUMZ >> /var/www/weialgo/weialgosummary$1.txt
sort /var/www/weialgo/weialgosummary$1.txt | uniq > /mnt/ramdisk/tmp/1.txt.$$
mv -f /mnt/ramdisk/tmp/1.txt.$$ /var/www/weialgo/weialgosummary$1.txt
DEVICENAMEZ=$(nslookup $2 | grep name | awk -F'=' '{print $2}' | awk -F' ' '{print $1}' | head -n 1)

gnuplotconfig=/mnt/ramdisk/tmp/testgnuplotconfig.$$
echo "set terminal png size 2800, 1440" > $gnuplotconfig
echo "set terminal png font \"/storage/weialgo5/arial.ttf\" 20" >> $gnuplotconfig
echo "set output '/var/www/weialgo/$2_$1.png'" >> $gnuplotconfig
echo "set key bmargin center horizontal Right noreverse enhanced autotitles box linetype -1 linewidth 1.000" >> $gnuplotconfig
echo "set title '$3 $2 $1 $DEVICENAMEZ'"  >> $gnuplotconfig
echo "set xrange [ 0 : $LINEZ ] noreverse nowriteback"  >> $gnuplotconfig
echo "set yrange [ 0 : 0.7 ] noreverse nowriteback"  >> $gnuplotconfig
echo "set style line 1 lt rgb 'red' lw 1"  >> $gnuplotconfig
echo "set style line 2 lt rgb 'yellow' lw 1" >> $gnuplotconfig
echo "set style line 3 lt rgb 'dark-green' lw 1"  >> $gnuplotconfig
echo "plot '$redcsv' with impulses ls 1 title 'Network Lost', '$yellowcsv' with impulses ls 2 title 'Latency Warning', '$greencsv' with impulses ls 3 title 'Relative Latency'" >> $gnuplotconfig
cat $gnuplotconfig
nice gnuplot $gnuplotconfig

echo gnuplot run

root@server0:/mnt/ramdisk/1/weialgo5#

wngraph.sh is where the "magic" happens....  The process is simple, extract the data, seperate it into three different plot files.  Use gnuplot to output a png graph of the data.

That's it.  Ping thousands of devices and take those thousands of pings to thousands of devices and make graphs of the data.

If anyone wants a copy of the scripts that I'm using, send me an email, I'll email you a tar.bzip2 file of what I have.  robluce1 @ yahoo . com

20121223 Why Weialgo? Essay #3



I'm still not sure how to talk about weialgo and the knowledge I'm trying to convey.

The mechanics of packet communications over a global network using interactive applications over TCP is not a simple one.  Honestly, it's deceptively complex.

My other hobby is airplanes and aircraft homebuilding.  I don't do as much anymore since I lost my medical, but I still help others from time to time in the building of their airplanes, and I've been building a two seat single engine with someone else since 2005.

Building an experimental aircraft is far easier than trying to get a global network to the point of being able to support interactive apps from a single point.  Also, in an experimental aircraft, the person you affect when you buckle yourself in and start the engine up is you (if you do have any passengers, I'm pretty sure the big "EXPERIMENTAL AIRCRAFT" notice on the plane let's them know what they're getting into).  With IT Systems (particularly packet networks), you affect everyone who uses or connects through the system you build and support.

Building an airplane, easy (it's alot of work, but it's easy to do work, easy to understand work, volume does not make hard).  Understanding everything about packet networks and how to make interactive applications work over them at distance, hard.  That's about good as I can say it without writing books.

There's a snippet I wrote for a draft post that I think tries to take this on from a different angle:
-------------------------------------------------------------------------------------------
Most of this conversation hinges on the work of John Nagle.  Most network engineers know Mr. Nagle through his work on his namesake Nagle Algorithm (which makes me think, with the change of focus of weialgo, maybe it'd be more fitting to rename it Nagalgo...  Problem is, he already has an algorithm named after him).  But, what he really should be famous for is inventing the foundational queuing discipline, fair queuing.  Wikipedia and RFC 970.  On top of this, he was instrumental at designing and documenting the initial phases of TCP Congestion Control (RFC 896), which was fundamental to making the early (80's) Internet to work.

Honestly, if kids need to learn about Charles Babbage as part of high school computer classes, I feel that the concepts of John Nagle should be taught as part of any serious college or higher level curriculum which covers the Internet and what makes it work.  I don't know Mr. Nagle, but, he is the first person to document the essential processes and put into practice the concepts that make the Internet *work*.  I understand the concept of the Internet pre-existed his influence on the IETF process, but he is the one that demonstrated how to make it work in a large scale.

Ok, on top of Nagle's work, you have to add Van Jacobson's work on TCP Congestion Control.  And all the work that Mr. Jacobson and other did to make improvements to TCP that have made TCP much more functional (actually usable?) in a global network.  RFC 1122, and RFC 1323 are key pieces of work that must be understood if you want to make IPv4/IPv6 applications work their best over a global network.  Ignorance of these standards is inexcusable for anyone who takes responsibility for applications that run across long WANs or global networks.  It's inexcusable.

On top of this, you have to add a firm understanding of TCP's inner workings via RFC's 29882581, and 2582.  A understanding of TCP SACK options (RFC 2018), TCP Window Scaling (RFC 1323), and the impact of TCP Timestamps (RFC 1323 and RFC 3522).
-------------------------------------------------------------------------------------------

Maybe someday, I'll try to write up something that does a better job of trying to describe why pings (icmp/udp/tcp) are so important to determine whether a network is ready to run interactive applications.  But, I think I'll leave it with this.

Take a PC, hook it up to a bad Internet link (line of sight wireless to a grain silo for example), load up wireshark, and get a subscription to a TCP based interactive game (like WoW or similar).  Then go raid with 24 of your closest friends.  You can also pick a server on the other side of the world if you absolutely refuse to get a bad Internet connection, it's close to the same effect.  After a few years, you will understand.

Pings matter, simply because they show how good, or how sloppy the network is.  Sloppy networks don't run interactive applications well.  Everything is moving to a dependence on the network to support interactive applications from farther and farther distances.

Weialgo graphs a network and shows if it's sloppy or not.  What it takes to make a non-sloppy network is hard.  Sloppy networks are easy.

So, if you're having problems with your network, ping, and graph it over time.  You're probably dealing with a sloppy network.