PDA

View Full Version : Network Guy's - WAN Capacity Planning question


jmlivingston
02-17-2010, 9:41 AM
This is a question for you hard-core network guys that have done a lot of enterprise WAN work.

One of the locations on my WAN is starting to really hit their bandwidth limits pretty hard. Our 95th Percentile for transmit usage has grown to 89% of existing capacity and I'm starting to receive complaints about performance from our employees.

1st Question: How much headroom should I keep between the 95th Percentile and the max capacity of any given WAN circuit?

2nd Question: Where did you get that metric, and is it documented anywhere?

With the current economic environment I'll need to fully justify any requests to upgrade this circuit.

Thanks,
John

bigmike82
02-17-2010, 9:50 AM
I've never done actual metrics on it, so I can't help you there.

Have you checked on what kind of traffic they're passing? Is it all legit, or are the employees clogging the tubes with youtube and such? Have you looked into using QoS to prioritize necessary traffic (email, for example)?

Before wanting to upgrade the circuit, I'd do everything possible to cheaply address the issue. Management will thank you for it. ;)

jmlivingston
02-17-2010, 9:55 AM
Yes, all traffic is legit and we are already using QoS on our WAN. As well we have WAN optimization appliances (Riverbed) at all of our locations.

Additional information: This is a fully-meshed MPLS network and each site has it's own internet connection so there's no internet traffic across the WAN.

Mute
02-17-2010, 10:38 AM
Have you run your data through a bandwidth calculator yet?

lazyworm
02-17-2010, 11:13 AM
I'm not sure if there is any hard fixed rules, except as you get closer to 100%,
you'd incur more overhead as there are more timeouts, drops,
resend etc. If these were my sites, this is what I'd do....

look for all available historical data and use it to estimate usage growth
over time. Project this out for X months. Where X should be slightly longer
than your contract for your pipes. OR whatever period that mgmt would
be comfortable with spending money on. Typically 1 - 2 years.

keep track of the trend over time and upgrade when usage is 4 months
out from getting maxed. ( 3 months for provisioning and 1 month for
paper work/padding) Where maxed is the a few % from the normal peak
of the usage cycle.

If your users are complaining. You're already out. It would only get worse
as you wait for new circuits to get dropped in and mgmt to say okay.

lazyworm
02-17-2010, 11:16 AM
forgot to add... each site only has 1 link into the mesh?
If you have duals, then the % should be around 45%.

jmlivingston
02-17-2010, 12:55 PM
look for all available historical data and use it to estimate usage growth
over time. Project this out for X months. Where X should be slightly longer
than your contract for your pipes. OR whatever period that mgmt would
be comfortable with spending money on. Typically 1 - 2 years.


Unfortunately we only have usage data for the last 4 months, prior to that we didn't have any tools that could monitor and capture the usage trends.


If your users are complaining. You're already out. It would only get worse
as you wait for new circuits to get dropped in and mgmt to say okay.


Yup, and this is exactly where I'm at today. :(

forgot to add... each site only has 1 link into the mesh?
If you have duals, then the % should be around 45%.

There are two separate MPLS meshes, each mesh using a different carrier. Our "primary" mesh handles data traffic and the "secondary" handles VoIP traffic. Each has the ability to fail over to the other in case of an outage, but bandwidth on the secondary network is minimal compared to what we do on the primary network.

DiscoBayJoe
02-17-2010, 1:08 PM
There is no easy answer. It depends on your line of business and how time sensitive your applications are.

Quick rule of thumb -- Watch your interface Errors & Discards counters (I use SolarWinds Orion):

> If you are seeing Transmit Discards, upgrade the Circuit
> If you are seeing Receive Discards, upgrade the Device

A few here and there during peak are OK, but if they happen all day long, you'll benefit from the upgrade.

DiscoBayJoe
02-17-2010, 1:13 PM
There are two separate MPLS meshes, each mesh using a different carrier. Our "primary" mesh handles data traffic and the "secondary" handles VoIP traffic. Each has the ability to fail over to the other in case of an outage, but bandwidth on the secondary network is minimal compared to what we do on the primary network.

You'd be much better off binding the two pipes into one larger pipe and implementing a well thought out QoS plan. Based on the rule peaks and averages, 1 + 1 really does equal 3 :)

odysseus
02-17-2010, 1:13 PM
Honestly, outside of QoS, application management (if you can find a glaring inefficiency which often can be seen), and other data management tools like acceleration, most have to just subscribe for more bandwidth. Some MPLS implementations I have seen can burst or can scale bandwidth (though these are expensive contracts) as needed by the provider.

As far as a figure, I have seen it vary depending on what has been standardized and agreed on as an organization, from 80% to "hey what's wrong I can't get this to work?". Industry used to hover back in the day as a gold standard somewhere around 80% utilization as your metric for normalized max util, but in this economic environment...

lazyworm
02-17-2010, 2:11 PM
If I were in your shoes, I'd get mgmt buy off today before you pull any rabbit
out of the hat.

technically, sounds like routing some traffic over the other mesh is your best
bet. Other than that maybe route some traffic out the internet link via a vpn back to corporate?

Does this have to be a network only solution? If you have support from
the whole IT, can you move the services closer to the end user so the
bits doesn't have to go across the wan? Can you cache at the user's end?
Can you reschedule certain traffic off peak? e.g. backup, batch-typed work?

lazyworm
02-17-2010, 2:12 PM
Unfortunately we only have usage data for the last 4 months, prior to that we didn't have any tools that could monitor and capture the usage trends.



Have you checked with your service provider as well? They might have some for billing purpose.

odysseus
02-17-2010, 2:19 PM
Does this have to be a network only solution? If you have support from
the whole IT, can you move the services closer to the end user so the
bits doesn't have to go across the wan? Can you cache at the user's end?
Can you reschedule certain traffic off peak? e.g. backup, batch-typed work?

Absolutely the first hit. An assumption is that effort has been made to look for any inefficiencies that can cut down a lot of util if worked on. However also true is the classic issue of who owns it, fixes it, and pays for it out of their cost center. It can become a cage fight. :D But a traffic analysis should point the way pretty easily.

five.five-six
02-17-2010, 2:24 PM
But a traffic analysis should point the way pretty easily.


and you will find who is watching you tube all day. I was taught to engeneer a network to run at 50%... but that was a long time ago

lazyworm
02-17-2010, 3:08 PM
Absolutely the first hit. An assumption is that effort has been made to look for any inefficiencies that can cut down a lot of util if worked on. However also true is the classic issue of who owns it, fixes it, and pays for it out of their cost center. It can become a cage fight. :D But a traffic analysis should point the way pretty easily.

Politics never help in a technical situation. If everybody believe they're
equally f'd. Buy in usually happens much faster :D

edit: the users who complained... put them to good use, they're already on your side :)

6172crew
02-17-2010, 10:33 PM
How many T1s are you using now for this one? Could you use a VDSL line and same some $$ or do you need a DS3?

I am assuming you are talking about the pipe between 2 points.

I know the telco companies like to sell in blocks of T1s which are ADSL, bt if you could get a VDSL line at 16mb for $100 a month...I dont think they sell a block of IPs with those but attaching a DLCI or multiple IPs to VDSL might take care of your issue on the cheap.

On a side note, is there one group that is using a service more than another? Could you ad another T1 for the bandwith hogs and create a new VLAN so that they use that T1 for X,Y but c and z is for trucks?

If they offer u-verse in that area then VDSL is available, only issue I can see is that they may not want to sell the addresses needed to get it done.

I do know ATT stopped bonding dsl lines by attaching email addresses to them so they wont bond correctly.

Then again I might be talking about something completely different here. :o

nick
02-17-2010, 11:05 PM
This is a question for you hard-core network guys that have done a lot of enterprise WAN work.

One of the locations on my WAN is starting to really hit their bandwidth limits pretty hard. Our 95th Percentile for transmit usage has grown to 89% of existing capacity and I'm starting to receive complaints about performance from our employees.

1st Question: How much headroom should I keep between the 95th Percentile and the max capacity of any given WAN circuit?

2nd Question: Where did you get that metric, and is it documented anywhere?

With the current economic environment I'll need to fully justify any requests to upgrade this circuit.

Thanks,
John


1. I start looking at an upgrade when I see 50% average usage. Assuming a clean line, that's when you start experiencing delays, and start really paying attention to QoS.

95 percentile isn't as good at indicating sustained bandwidth usage, but you hitting 89% of it does show that at the very least your traffic peaks a lot.

Do you have any other ways to measure your traffic than what you get from your ISP (I presume that's where you get your 95 percentile from)? If you control the end devices, it's not hard to do.

You have to know your actual traffic pattern before deciding on upgrading the link, as you need to identify the actual problems first. For example, if some users experience delays and it's a problem, you might just need to play with QoS (I presume you already some QoS, given that you're considering upgrading your link).

2. I believe, I saw Cisco and Juniper docs with updated best practices on bandwidth utilization, which includes their best practices on when to upgrade the link. They'll push WAN optimization, of course, as there's more money in it for them, but they have the info on when to upgrade the link, too. Too lazy to search right now, I've had my fill of IT for the day. That's why I'm on a gun forum right now, you know :)

However, best practices would work if you need to justify this to the IT side. If you need to justify the need for upgrade to the business side, you need to do it in terms of ROI and compliance ROI/risk assessment.

To get there, do you have the traffic history data for that link? If yes, take the data, show the pattern of growth, and use it to create the projected increase in utilization. Then create the cost comparison between using the link as is an upgrading it "gradually", going with a considerably larger link, potentially with a different technology (the per-Mbps cost there would be lower, of course), and the cost of doing traffic shaping. Include the potential benefits and pitfalls.

Then supplement it with the ROI analysis on the cost of delays (effective user downtime (when the users aren't technically down, but they're waiting on something), projects that can't be done because of the slow link, support costs, including the support calls to respond to user complaints about slowness and support calls resulting from something not synchronizing on time, etc.) vs. the cost of the new link/whatever solution you come up with (as upgrading the link isn't the only solution). Of course, you present this paragraph first, before presenting the ROI analysis for which technology to pick :)

Then, if there're compliance/security/disaster recovery issues, do the risk assessment. For example, what if the data doesn't get synchronized. What if your IDS doesn't get adequate bandwidth to monitor the network behind that WAN link? What if your directory doesn't get synched? And so on.

This is just a generic description, of course, as I don't have the slightest idea of your environment. If you have more questions, PM me.

nick
02-17-2010, 11:08 PM
Ah, I see there's more info in the thread already. One day I'll read the entire thread before posting, but that day isn't today. Not yet.

artherd
02-18-2010, 6:42 PM
so much is dependent on your business and usage patterns.

Echo the bonding and QoS for voip, times a thousand!!!

jmlivingston
03-01-2010, 10:19 AM
Just to follow-up on this...

I received permission to upgrade their internet to a 10 Mb MetroE circuit from Cogent. Once this is installed, I can move their Snapmirror replication traffic off their WAN and push it through a VPN back to HQ. This should relieve their WAN congestion plus enhance their internet service (currently their internet is 3Mb).

John