
January 13, 2004
The basic problem we’d like to solve is to be able to tell customers how
many machines they need to set up. We can simplify this to calculating how many
users can go on a single machine and then dividing to get hardware requirements.
There are two limiting factors in calculating this “users per box”
number: response time and throughput. We’ll need to know both before we
can make any predictions about how much hardware our predicted load will require.
Response time refers to how long a particular user will have to wait for a
request to come back. This is obviously somewhat random, as some requests will
take longer than others due to database contention, network congestion, thread
scheduling, and a billion other factors. Since it’s random, it will not
in general be possible to guarantee that 100% of requests will be below a certain
time – one in a million of them will take a really long time. Instead,
response time requirements should be described probabilistically, as in, “Ninety-five
percent of requests should complete in less than four seconds.”
If we assume that the distribution of response times for a given load on the
system is roughly Gaussian (i.e. a bell curve) then it is useful to describe
this in terms of standard deviations from the mean, rather than in terms of
percent. You may have heard the term sigma used in this context; sigma (s) is
the Greek character used to represent one standard deviation.
Two sigmas represent roughly 95% of the distribution. In other words, we can
expect that 95% of all values should be within twice the standard deviation
from the average. For a distance of 3s, the value increases to 99%, making both
these numbers useful metrics of response time characteristics.
The other metric we care about when analyzing a system is throughput. This
is simply the number of requests the system can handle per unit time, across
all users. Throughput is a function of load – as more and more concurrent
demands are made on a system, it will initially be able to keep up, but will
eventually bog down switching back and forth between tasks. A typical throughput
curve (smoothed to emphasize the effect) is shown in Figure 1.

Figure 1 - Typical throughput curve
So far, so good: deriving these numbers is easy enough. In fact,
most load testing frameworks will do exactly this. A simplified excerpt of a
typical report is show below in Figure 2.
Note that I haven’t labeled the horizontal axis. This particular data
was generated using a custom load-testing framework that spins up several threads,
with each thread doing as much work as it can before the test ends. These threads
do not represent users – each thread likely represents work equivalent
to many, many users of the real system. Do not make the mistake of confusing
the number of threads that corresponds with peak throughput to the number of
users that the system can support. The important thing is just to find how the
system behaves as you increase load.

Figure 2 - Throughput and Response Time
So how does one translate these numbers to an actual number of users per machine?
Fortunately, to a first order of approximation, it’s really quite easy.
A few simple definitions will help.
The capacity of the system is how much load it can handle, expressed in requests
per second. This we can simply look up – it’s the throughput listed
in Figure 2, for example 1.57 requests per second.
The service level is the maximum acceptable response time at some particular
level of confidence. For example, we might say, “Requests must be serviced
within 10 seconds in 99% of cases.” We might use sigmas instead of percentage
– as discussed previously, they’re equivalent. This number will
be supplied by the customer.
The demand on the system is how much load we expect the users to incur, expressed
in requests per second. This number we get by multiplying the number of users
times their individual rate of requests. For example, if there are 100 users
each making a request every 10 seconds, the demand on the system across all
users is 10 requests per second.
Determining how many machines are needed is simply a matter of ensuring that
capacity can meet demand at a given service level. Here’s the process:
- Gather service level requirements.
- Use Figure 2 (or equivalent for the system in question) to find the highest
throughput that still meets the requirements from step 1.
- Gather demand requirements.
- The throughput from step 2 is the capacity for an individual machine. Divide
the demand from step 3 by this number and round up to get the number of machines
needed.
- Add some fudge factors – this isn’t an exact science.
An example might help:
- Requests must be processed within 8 seconds 95% of the time.
- Based on Figure 2, the throughput a single box can handle while still maintaining
an 8 second response time at 95% confidence (two sigma) is about 1.64 requests
per second. I’ve highlighted the appropriate line.
- The demand we expect is based on the fact that the customer has 100 users
that they expect to make a request (on average) every two minutes, for a total
rate of 0.83 requests per second.
- Since the demand is less than the capacity for a single machine, we only
need one machine!
- There’s enough of a difference between demand and capacity that we’re
probably okay, but adding a second machine might be necessary if the users
were making requests every one minute, as demand would now be 1.67 requests
per second.
The process for calculating how many users can fit on a single machine is
similar.
- Gather service level requirements.
- Use Figure 2 (or equivalent for the system in question) to find the highest
throughput that still meets the requirements from step 1.
- Gather the average request rate for a single user.
- The throughput from step 2 is the capacity for an individual machine in
requests per second. Divide the throughput from step 2 by the per-user request
rate from step 3. This is the number of users a single box can support.
- Add some fudge factors – this isn’t an exact science.
Again, an example might be illustrative.
- Requests must be processed within 8 seconds 95% of the time.
- Based on Figure 2, the throughput a single box can handle while still maintaining
an 8 second response time at 95% confidence (two sigma) is about 1.64 requests
per second. I’ve highlighted the appropriate line.
- The customer expects users to average one request per five minutes, for
a rate of 0.0033 requests per second.
- Given these numbers, each box can support 492 users.
- Tell the customer some number lower than 492.
A few observations:
- If the required response time is lower than some minimum, no amount of machines
will meet the requirements.
- We’ve assumed that doubling the number of machines doubles throughput.
Because database contention is a major factor in scalability, this will almost
certainly not be true.
- We’ve assumed that adding a second machine has no impact on response
time. Because database contention is a major factor in response time, this
will almost certainly not be true.
There are other factors as well that mean that reality and this model will
diverge. However, having at least some theoretical basis for estimation may
be helpful. If nothing else, it puts the onus on the client to provide numbers
for demand and service level, which makes them an active participant in the
capacity planning process.
Authors
 | Craig Andera has a
Masters degrees in Electrical Engineering from MIT. After a brief stint in the electronic
entertainment industry, Craig built his COM knowledge through a combination
of DevelopMentor classes (then as a student) and his troubleshooting efforts
in the finance industry.
Craig is currently focused on the .NET platform, which he has been exploring
full-time since February 2001. He has a particular interest in the design
and implementation of secure, scalable systems.
Craig is currently a consultant at Wangdera Corporation. |
|