Thursday, October 27, 2011

Want to play Sherlock Holmes?

Over the past two days we experienced two 60/90 periods with sluggish speeds.

The cause has been identified. The issue was with what the techie types call a HEARTBEAT cable connection. It seems when two servers are linked, as is the case with ours they exchange data over one connection but have a second connection, the heartbeat connection that allows them to know how the other is feeling. Apparently we experienced a partial failure with this connection. It caused the servers to become preoccupied with checking on each other and not pay attention to incoming traffic. It has been replaced and the heartbeat now restored to a steady thump…thump.

But wait there is more 

During this performance issue, one of our sites ran some tests. They actually identified a bad server located between us and them. An issue unrelated to the one above but an issue none the less. They asked us for help in understanding the situation. Our hosting firm offered up the following response and tips. I asked their permission to share that response with everyone. 

Trace Routes help in Identifying a BAD Server

The fact that you see a higher time spent on just one server indicates that the particular internet switch is lowering the priority of your request. They do this to allow for more important traffic to go through. This type of prioritizing is becoming more and more common. The following article may help explain http://forums.whirlpool.net.au/archive/98073. Trace routes are good if you have no connection to a particular destination. It will show you the exact internet switch that the connection is breaking down on.

To test speed, a simple ping might work better
Ping has been turned on for our servers. You can ping root.calancom.com at several points in a day and ideally across several days, this will give you a sense of what normal ping times are from your location. In addition you need a benchmark outside of the calan servers. For instance, ping Google’s DNS servers 8.8.8.8   &   8.8.4.4 at the same time and you’ll see how the two compare. Yes Google is likely quicker but that does not matter, it is the match point of the two servers at a given moment that matters, not which is THE fastest.
Armed with those data points, should you feel you are experiencing slowness, you can ping both calan and Google. You’ll see if just one or both are running slower than your base line results. If the connection to calan’s servers is the issue, root.calancom.com will have high ping times and Google will have about the same ping names (compared to tests run during normal speed).

Note:If you have questions feel free to reach out and ask more questions. Hope everyone finds this helpful.