November 4, 2014

Concurrency, Transactions per Second and Capacity Planning for Application Design

Filed under: Uncategorized — mifan @ 1:20 pm

The ability to determine or forecast the capacity of a system or set of components, commonly known as ’sizing’ a system, is an important activity in enterprise system design and Solution Architecture. Oversizing a system leads to excess costs in terms of hardware resources, whilst undersizing might lead to  reduced performance and the inability for a system to serve its intended purpose.

The complexity of course is forecasting the capacity to a fair degree of accuracy. Capacity Planning in enterprise systems design is an art as much as it is a science. Along with certain parameters, it also involves experience, knowledge of the domain itself and inside knowledge of the system. In some instances, it goes as far as analysing the psychology of the system’s expected users, their usage pattern etc.

This post deals with some parameters of capacity planning, with a focus on how a few factors such as concurrency and Transactions per Second (TPS) play a role in it. Of course, many other factors could determine the capacity of a system, including the complexity of transactions, latency and external service calls, memory allocation and utilisation etc.

Throughput and TPS
Throughput is a measure of the number of actions per unit time, where time can be in seconds, minutes, hours etc. Transactions per Second (TPS) is the number of atomic actions, in this case ‘transactions’, per second. For a stateless server, this will be the major characteristic that affects the server capacity.

Theoretically speaking, if a user performs 60 transactions in a minute, then the TPS would be 60/60 TPS = 1 TPS. . Of course, not all concurrent users who are logged into a system might necessarily be using that system. Additionally, think time and pace time comes into consideration.

Concurrent users are the number of users concurrently accessing the system at any given time. In capacity planning, this has several meanings and implications. In an application server with a stateful application which handles sessions, the number of concurrent users will play a bigger role than in an ESB which handles stateless access, for instance. However concurrent users doesn’t necessarily mean a load on the system of the same amount. If 200 concurrent users are logged into the system, and have a 10 second think time, then that amounts to roughly 20 actual concurrent users hitting the system. For systems designed in such a way, each concurrent user consumes some level of memory which needs to be taken into account.

Application design and optimisation
The design of the application or software plays a big role in capacity planning. For each concurrent user if a session is created, then this means some level of memory consumption per session. For each operation, factors such as open database connections, the number of application ‘objects’ stored in memory, the amount of processing that takes place etc determines the amount of memory and processing capacity required. Well designed applications will strive to keep these numbers low or would ‘share’ resources effectively. The number of resources that configured for an application also play a role. For instance for database intensive operations, the database connection pool size would be a limiting factor. Similarly thread pool values, garbage collation times etc also determine the performance of a system. Profiling and load testing an application with tools would help determine the bottlenecks of an application.

Message Size
The size of the message passed across the ‘wire’ is also an important factor in determining the required capacity of a system. Larger messages mean more processing power requirement, more memory requirements, or both.

Work done per transaction
Each incoming ‘transaction’ to a server will have some level of operations that it triggers on the same server. If the transaction is a simple ‘pass through’ that would mean relative lesser processing requirements than a transaction that triggers a set of further operations. If a certain type of transaction triggers for example a series of complex XML based transformations or processing operations, this would mean some level of processing power or memory requirements. A sequence diagram of the transaction would help determine the actual operations that are related to a transaction.

Latency is the additional time spent due to the introduction of a system. NFRs of a system would usually indicate a desired response time of a transaction which a system must then strive to meet. Considering the example above, if a single transaction performs a number of database calls, or a set of synchronous web service calls, the calling transaction must ‘wait’ for a response. This then adds to the overall response time of that said transaction or service call.

Latency is usually calculated via a step by step process – first test response times without the newer systems in place, and then test response times with the addition of the newer systems. The latency vs functionality due to the newer systems is then a tradeoff decision. Techniques like caching can be used to improve latency times.

Capacity calculation
The above are just a few factors that can be used for capacity planning of a system and the importance of these factors vary based on the type of environment. With these factors in place, we also need a set of benchmarked performance numbers to calculate server capacity. For instance, if we know that an Enterprise Service Bus, in certain environmental conditions on certain type of capacity performs at 3000 TPS, then we can assume that a server of similar capacity and operations would provide the same.

The application design and optimisation parameters should also be taken into account as part of the solution. Techniques like caching can help improve performance and latency – this needs to be looked at from a broader perspective. If the service responses change often, then caching wouldn’t make too much of a difference. The cache warm up time needs to be taken into account as well.

It is advisable to have a buffer capacity when allocating server specifications. For instance, allocate 20-30% more of server specifications to that of the peak NFRs to ensure the system doesn’t run out of capacity at peak loads.

Monitoring tools are ideal to calculate a system capacity. Load tests, application and server profiling via monitoring and profiling tools can help determine the current capacity fairly accurately and help pre-identify bottlenecks.

The type of hardware makes a difference as well. Traditional physical boxes are fast being replaced by VMs and cloud instances. The ideal was to calculate capacity if to have benchmarks on these different environments. A 4GB memory allocation on a VM might not be the same as a 4GB memory allocation on a physical server or an Amazon EC2 instance. There would be instances that are geared towards certain types of operations as well. For example, EC2 has memory optimised, compute optimised or I/O optimised instances based on the type of key operation.

As mentioned previously, capacity planning is an art as much as it is a science, and experience plays a huge role in accurate planning of capacity.


Create a free website or blog at