Network Congestion, Part 2: Use Internet Traceroute (tracert) to Find Congestion and Saturation

10. July 2012 Internet Support 0

Last month I wrote about how to test for and correct congestion on your local area network. Our next step in trying to determine where congestion might lie is to look outside your local network and use traceroute tools to find congestion points on the Internet. To do this we’ll use a few software tools which are available on most PC computers running Windows or Linux operating systems.

Simplistically speaking, the Internet is collection of networks that hide behind routers and/or firewalls which connect to other routers that all work together to exchange data. For the purposes of this post, the most important advantage of a structure like this is it provides redundancy in case a part of the network goes down. The disadvantage? During Internet outages the parts that remain working can become stressed trying to send the data that would have otherwise taken a different route.

Tracing the Route from You to Your Target Site

So how do you determine saturation on the Internet? We’ll start from the obvious. You’ve noticed that the connection between you and some other place on the Internet is slower than usual. Last month’s blog post Speed Test to Fix Network Congestion and Saturation helped you determine your local area network is fine. Now we get our hands dirty and check the Internet beginning with the sites you found slow.  Our first step is to determine the IP address of the web sites you find slow.

In Windows click on the Start button then click on the search box or icon (depending on your version of Windows). From there type CMD and press enter. This will load the Command Prompt box. From the Command Prompt type:

ping www.mywebsite.com

The Ping command is used to send a small amount of random data to a target site and reply back if the data was received. If all worked correctly you will get an output of something like:
C:\ping www.mywebsite.com

PING www.mywebsite.com (216.250.121.107) with 64 bytes of data.
64 bytes from perfora.net (216.250.121.107): icmp_seq=1 ttl=54 time=48.1 ms
64 bytes from perfora.net (216.250.121.107): icmp_seq=2 ttl=54 time=48.2 ms
64 bytes from perfora.net (216.250.121.107): icmp_seq=3 ttl=54 time=48.2 ms
64 bytes from perfora.net (216.250.121.107): icmp_seq=4 ttl=54 time=48.1 ms
— www.mywebsite.com ping statistics —
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 48.164/48.200/48.247/0.270 ms

C:\

Note that you might see a line or lines like this:

64 bytes from perfora.net (216.250.121.107): Destination host unreachable.

This is okay because at this point the line doesn’t really matter. The most important line we need to get is the top one because it contains the IP address www.mywebsite.com resolves to.

PING www.mywebsite.com (216.250.121.107) with 64 bytes of data.

If, by chance you did get proper results as in my first ping example look closely at the lines. You are looking for something like “time=XX.X ms”, where XX.X is a number (and ms is short for milliseconds). A rule of thumb, and this rule will remain throughout the remainder of this post, the larger the number, the more delay there is between you and the endpoint you are checking against. A quick word on my use of the work “delay”. Delay could mean distance – as in the distance between you and your other endpoint, or it could mean saturation, or both.

Testing the Routers Along the Route

So we now have the information we need; the site name – www.mywebsite.com, and the IP – 216.250.121.107. We are now ready to perform our test. From the Command Prompt type:

tracert www.mywebsite.com (Linux users would type traceroute 216.250.121.107 instead):

The tracert (or Linux traceroute), command essentially does what the ping command does with one notable exception. This command will look for every router between you and the endpoint that it has to talk to reach that end point and report back, via a ping request, as to the status of that router. If that sounds confusing think of it this way. You want to go to the gas station but you have to go through a number of stop lights to get there. You are low on gas so you want to make sure you are going to hit as many green lights as possible. To make this happen you are going to send Usain Bolt (arguably the world’s fastest runner), to each of the traffic lights, one at a time, and report back to you on each light’s status. In this case, tracert reports back the status of each successive router. The output of the tracert command might look something like this:

C:\tracert www.mywebsite.com
tracing to www.mywebsite.com (216.250.121.107), over a maximum of30 hops

1 router.skywaywest.net (216.251.128.254) 2.310 ms 2.520 ms 2.724 ms
2 216.251.132.60 (216.251.132.60) 0.191 ms 0.184 ms 0.170 ms
3 v525.core1.yvr1.he.net (216.218.185.185) 0.265 ms 0.436 ms 0.422 ms
4 10gigabitethernet4-1.core1.sea1.he.net (184.105.222.1) 4.577 ms 5.006 ms 5.447 ms
5 206.51.7.2 (206.51.7.2) 48.347 ms 48.670 ms 48.308 ms
6 ae-1.gw-dista-a.ga.mkc.us.oneandone.net (74.208.6.121) 48.272 ms 48.558 ms 48.712 ms
7 86 ms 85 ms 86 ms perfora.net [216.250.121.107]
Trace complete.

As we did with the pings so we must do with the traceroute. We need to examine the time it takes for each router hop to respond back to us. To make sense of these numbers we need to get a sense of how far away the server you are trying to reach is from you. From Vancouver, BC Canada it typically takes about 80 – 100 ms to get to Ontario Canada or to California. It is around 120-140 ms to go to Florida and Europe or China 180 – 200 ms. For the purposes of this example I would like to pick two fictitious locations for the end point – Portland OR, USA and Vancouver, BC Canada.  In this case our example is located in Pennsylvania USA.   This can easily be confirmed by looking at the whois information for the IP of the site.  You can usually get whois information directly from http://www.arin.net should you wish to check for yourself.  Linux users, from the shell, can look this information up via the whois commaind (example: whois 216.250.121.107 <ENTER>)

So, looking at the hops we see the latency run between 2 and 48 ms. Generally speaking that looks acceptable. A return of 48 ms would be considered fast from Portland. However, from Vancouver we would expect the return to be about half that. Could it be saturation? Probably not as the routing goes to Pennsylvania.

Here, then, is an example of what a saturated traceroute to www.mywebsite.com might look like.  The example posted below shows how, on the 5th hop, the latency really goes up.

tracing to www.mywebsite.com (216.250.121.107), over a maximum of 30 hops

1 router.skywaywest.net (216.251.128.254) 2.310 ms 2.520 ms 2.724 ms
2 216.251.132.60 (216.251.132.60) 0.191 ms 0.184 ms 0.170 ms
3 v525.core1.yvr1.he.net (216.218.185.185) 0.265 ms 0.436 ms 0.422 ms
4 10gigabitethernet4-1.core1.sea1.he.net (184.105.222.1) 4.577 ms 5.006 ms 5.447 ms
5 206.51.7.2 (206.51.7.2) 248.347 ms 248.670 ms 248.308 ms
6 ae-1.gw-dista-a.ga.mkc.us.oneandone.net (74.208.6.121) 712.272 ms 843.558 ms 719.712 ms
7 1086 ms 985 ms 1006 ms perfora.net [216.250.121.107]
Trace complete.

The example clearly shows something is going wrong at the 5th hop. We don’t know what but there is definitely something our of order there. The next section will cover possible solutions but before going there is want to give you one more example of what saturation might look like. Using the above trace:

1 router.skywaywest.net (216.251.128.254) 2.310 ms 2.520 ms 2.724 ms
2 216.251.132.60 (216.251.132.60) 0.191 ms 0.184 ms 0.170 ms
3 v525.core1.yvr1.he.net (216.218.185.185) 0.265 ms 0.436 ms 0.422 ms
4 10gigabitethernet4-1.core1.sea1.he.net (184.105.222.1) 4.577 ms 5.006 ms 5.447 ms
5 *   *   * Destination host unreachable.
6 ae-1.gw-dista-a.ga.mkc.us.oneandone.net (74.208.6.121) 712.272 ms 843.558 ms 719.712 ms
7 1086 ms 985 ms 1006 ms perfora.net [216.250.121.107]
Trace complete.

Look at line 5 (Destination host unreachable). What does this mean? Line 4 gave a responce as did line 6. Simple. Looking at the latency beteween line 4 (+/- 5 ms), and line 6 (+/- 800 ms), we can, with relative accuracy, discern that the 4th hop did not respond because it was too busy to respond.

Saturation Solutions

So what can we do if there appears to be saturation? Unfortunately, not a lot, especially if the problem lies far from you on the route. As you can see from our traceroute example above the latency starts to spike around the 5th hop. This hop is not directly connected to either the source or the target end of the connection. It sits somewhere in the middle. Even if you could call and talk to the support department that is responsible for that router it is unlikely you will get any real results from them. The good news is that these events are usually short-lived and you should be able to resume regular communications with your target fairly soon.

If, however, the latency really builds significantly on either the first or the last one to three hops there is something you can do: If the jump is in the first three hops, call or email Skyway Technical Support (support@skywaywest.com / 604-482-1212), we can investigate why our — or the immediate upstream — provider you are passing through shows that much latency. If the latency jumps within the last few hops you could, if you have a means to reach them, communicate your findings to your target site support people who could do the same from their end. In both cases we/they would want to see a copy of your traceroute information for comparison purposes.

What Skyway West’s Engineers can do to Solve the Problem

One of the great things about Skyway West’s networking is we have access to multiple upstream providers so we are not locked into one path to get you to where you need to go. Though our routing strategy is very good at getting you to an end point as fast as possible there may still be a better way through a different upstream. If, after investigating the issue, it turns out that there is a better route we can try and temporarily force your traffic out that way instead.

Over the course of this post we have explored internet traffic congestion covering how to try and identify it and what can be done to get around the issue. While the problem of latency causing saturation can be annoying the effect don’t usually last too long. If you have the time to wait it out this is often the best course of action. If you are on a more critical timeline certainly give Skyway West Support a call. We’ll do what we can to assist you. And, next month get ready when I wind up this trio of entries closing off with how packet loss affects congestion.

Got a question or an idea for a topic you would like to see covered in one of my upcoming blog posts? Write to support@skywaywest.com and sound off. I’ll do what I can to address your questions or concerns either personally in a reply email or on the blog. Until next month, take care.

–Wes