Audio version of the article
Capacity management is the broad term describing a variety of IT monitoring, administration and planning actions that are taken to ensure that a computing infrastructure has adequate resources to handle current data processing requirements as well as the capacity to accommodate future loads.The primary goal of capacity management is to ensure that IT resources are rightsized to meet current and future business requirements in a cost-effective manner. Capacity management, in the context of ICT, isn’t limited to ensuring that organisations have adequate space on servers and associated storage media for data access and Backup and Disaster Recovery (BUDR) purposes. Organisations need to ensure that they have the ability to operate with a set of resources that cater to a broad range of business functions, including HR, information processing, the management of physical office locations and attached facilities. All of these functions have the ability to adversely affect an organisation’s information management controls. The use of resources must be monitored, tuned and projections made of future capacity requirements to ensure the required system performance to meet the business objectives. Capacity management typically looks at three primary types; Data storage capacity – (e.g. in database systems, file storage areas etc.); Processing power capacity – (e.g. adequate computational power to ensure timely processing operations.); and Communications capacity – (often referred to as “bandwidth” to ensure communications are made in a timely manner). Capacity management also needs to be; Pro-active – for example, using capacity considerations as part of change management; Re-active – e.g. triggers and alerts for when capacity usage is reaching a critical point so that timely increases, temporary or permanent can be made.
The use of resources should be monitored and adjusted in line with current and expected capacity requirements.
To ensure the required capacity of information processing facilities, human resources, offices and other facilities.
ISO 27002 Implementation Guidance
Capacity requirements for information processing facilities, human resources, offices and other facilities should be identified, taking into account the business criticality of the concerned systems and processes. System tuning and monitoring should be applied to ensure and, where necessary, improve the availability and efficiency of systems. The organization should perform stress-tests of systems and services to confirm that sufficient system capacity is available to meet peak performance requirements. Detective controls should be put in place to indicate problems in due time. Projections of future capacity requirements should take account of new business and system requirements and current and projected trends in the organization’s information processing capabilities. Particular attention should be paid to any resources with long procurement lead times or high costs. Therefore, managers, service or product owners should monitor the utilization of key system resources. Managers should use capacity information to identify and avoid potential resource limitations and dependency on key personnel which can present a threat to system security or services and plan appropriate action. Providing sufficient capacity can be achieved by increasing capacity or by reducing demand. The following should be considered to increase capacity:
a) hiring new personnel;
b) obtaining new facilities or space;
c) acquiring more powerful processing systems, memory and storage;
d) making use of cloud computing, which has inherent characteristics that directly address issues of capacity. Cloud computing has elasticity and scalability which enable on-demand rapid expansion and reduction in resources available to particular applications and services.
The following should be considered to reduce demand on the organization’s resources:
a) deletion of obsolete data (disk space);
b) disposal of hardcopy records that have met their retention period (free up shelving space);
c) decommissioning of applications, systems, databases or environments;
d) optimizing batch processes and schedules;
e) optimizing application code or database queries;
f) denying or restricting bandwidth for resource-consuming services if these are not critical (e.g. video streaming).
A documented capacity management plan should be considered for mission critical systems.
The methodologies and processes used for IT capacity management may vary, it requires the ability to monitor IT resources closely enough to be able to gather and measure basic performance metrics. With that data in hand, IT managers and administrators can set baselines for operations to meet a company’s processing needs. The baselines — or benchmarks — represent average performance over a specific period of time and can be used to detect deviations from those established levels. Capacity management tools measure the volumes, speeds, latencies and efficiency of the movement of data as it is processed by an organization’s applications. All facets of data’s journey through the IT infrastructure must be monitored, so capacity management must be able to examine the operations of all the hardware and software in an environment and capture critical information about data flow. Capacity planning is typically based on the results and analysis of the data gathered during capacity management activities. By examining performance variances over time, IT management can use those performance statistics to help develop models describing anticipated processing which can be used for short- and long-term planning. By noting which particular resources are being stressed, current configurations can be appropriately revised and IT planners can assemble purchasing plans for hardware and software that will help meet future demands. Measurement and analysis tools must be able to observe the individual performances of IT assets, as well as how these assets interact. A comprehensive capacity management process should be able to monitor and measure the following IT elements:
- End-user devices
- Networks and related communications devices
- Storage systems and storage network devices
- Cloud services
Organisation’s ability to operate as a business on an ongoing basis depends upon the following:
- Organisations should consider business continuity as a top priority when implementing capacity management controls, including the wholesale implementation of detective controls that flag up potential issues before they occur.
- Capacity management should be based upon the proactive functions of tuning and monitoring. Both of these elements should work in harmony to ensure that systems and business functions are not compromised.
- In operational terms, organisations should perform regular stress tests that interrogate a systems ability to cater to overall business needs. Such tests should be formulated on a case-by-case basis and be relevant to the area of operation that they are targeted at.
- Capacity management controls should not be limited to an organisation’s current data or operational needs, and should include any plans for commercial and technical expansion (both from a physical and digital perspective) in order to remain as future-proof as is realistically possible.
- Expanding organisational resources is subject to varying lead times and costs, depending on the system or business function in question. Resources that are more expensive and more difficult to expand should be subject to a higher degree of scrutiny, in order to safeguard business continuity.
- Senior Management should be mindful of single points of failure relating to a dependency on key personnel or individual resources. Should any difficulties arise with either of these factors, it can often lead to complications that are markedly more difficult to rectify.
- Formulate a capacity management plan that deals specifically with business critical systems and business functions.
A dual-fronted approach to capacity management that either increases capacity, or reduces demand upon a resource, or set of resources.When attempting to increase capacity, organisations should:
- Consider hiring new employees to carry out a business function.
- Purchase, lease or rent new facilities or office space.
- Purchase, lease or rent additional processing, data storage and RAM (either on-premise or cloud-hosted).
- Consider using elastic and scalable cloud resources that expand with the computational needs of the organisation, with minimal intervention.
When attempting to reduce demand, organisations should:
- Delete obsolete data to free up storage space on servers and attached media.
- Securely dispose of any hard copies of information that the organisation no longer needs, and is not legally required to obtain, either by law or via a regulatory body.
- Decommission any ICT resources, applications or virtual environments that are no longer required.
- Scrutinise scheduled ICT tasks (including reports, automated maintenance functions and batch processes) to optimise memory resources and reduce the storage space taken up by outputted data.
- Optimise any application code or database queries that are run on a regular enough basis to have an effect on the organisation’s operational capacity.
- Limit the amount of bandwidth that is allocated to non-critical activities within the boundaries of the organisation’s network. This can include restricting Internet access and preventing video/audio streaming from work devices.
Formal capacity management processes involve conducting system tuning, monitoring the use of present resources and, with the support of user planning input, projecting future requirements. Controls in place to detect and respond to capacity problems can help lead to a timely reaction. This is often especially important for communications networks and shared resource environments (virtual infrastructure) where sudden changes in utilization can in poor performance and dissatisfied users. To address this, regular monitoring processes should be employed to collect, measure, analyze and predict capacity metrics including disk capacity, transmission throughput, service/application utilization. Also, periodic testing of capacity management plans and assumptions (whether tabletop exercises or direct simulations) can help proactively identify issues that may need to be addressed to preserve a high level of availability of services for critical services.
Whether capacity management is achieved via software, hardware or manual means — or a combination of any of those — it relies on the interception of data movement metrics and the internal processes of individual components.Capacity management could have a fairly narrow scope, providing high-level information on a variety of infrastructure components or, perhaps, providing detail metrics related to one segment of the computing environment. The trend, however, is to gather as much information as possible and then to attempt to correlate those measurements into an application-centric picture that focuses on the performance and requirements of mission-critical applications across the environment, rather than how individual components are performing. Still, to achieve that application-centric view of capacity management, virtually all elements of the IT infrastructure must be monitored and the definition of capacity must be broad enough to consider the impact an application will have on processing power, memory, storage capacity and speed for all physical and software components comprising an infrastructure.
- Performance — is a key metric in capacity management as it may point to processing bottlenecks that affect overall application processing performance. The central processor unit (CPU) in servers and other connected devices, such as routers, storage and controllers, should be monitored to ensure that their processing capabilities are not frequently “pinning” at or near 100%. An overtaxed processor would be a candidate for upgrading.
- Memory is also a factor in capacity management. Servers and other devices use their installed memory to run applications and process data — if too little memory is installed, processing will slow down. It’s relatively easy to determine if a server has adequate memory resources, but it’s also important to monitor other devices in the environment to ensure that insufficient memory doesn’t turn them into processing bottlenecks.
- Physical space is what is most commonly associated with capacity management, with the focus generally on storage space for applications and data. Storage systems that are near capacity will have longer response times, as it takes longer to locate specific data when drives — hard disk or solid-state — are full or nearly full. As with processor and memory measurements, it’s important to monitor space usage in devices other than servers and end-user PCs that may have installed storage that’s used for caching data.
Capacity management in networking
Managing the capacity of IT networks can be a complex process given the number of different networking elements that can be found in an enterprise environment. The number and type of networks being monitored is likely to vary as well. In addition to the wired and wireless Ethernet-based network infrastructure that connects servers to storage, end-user devices, networking gear, etc., comprehensive network capacity management must also consider dedicated storage networks based on Fibre Channel technologies; the FC networks are likely to be physically isolated from other data networks and will require different tools for monitoring and management. External networking should also be monitored. Again, different tools will be required to track traffic and performance for network connections to remote offices and users, the internet and to cloud services. The networking devices that should be monitored include network interface cards (NICs), network switches, network routers, storage network interfaces (e.g., host bus adapters), storage network switches and optical network devices. Although capacity management for networks doesn’t directly address security, it can be a good method of keeping track of network access, which can help inform security procedures.
Benefits of capacity management
Capacity management provides many benefits to an IT organization and is a factor in overall management of a computing infrastructure. In addition to ensuring that systems are performing at adequate levels to achieve a company’s goals, capacity management can often realize cost savings by avoiding over-provisioning of hardware and software resources. It can also help save money and time by identifying extraneous activities like backing up unused data or maintaining idle servers.Good capacity management can also result in more-effective purchasing to accommodate future growth by being able to more accurately anticipate needs and, thus, make purchases when prices may be lower. By constantly monitoring equipment and processing, problems that might have hindered production may be avoided, such as bottlenecks or imminent equipment failures.
Components of capacity management
The activities that support the capacity management process are crucial to the success and maturity of the process. Some of these are done on an ongoing basis, some daily, some weekly, and some at a longer, regular interval. Some are ad-hoc, based on current (or future) needs or requirements. Let’s look at those:
- Monitoring – Keeping an eye on the performance and throughput or load on a server, cluster, or data center is extremely important. Not having enough headroom can cause performance issues. Having too much headroom can create larger-than-necessary bills for hardware, software, power, etc.
- Analysis – Taking that measurement and monitoring data and drilling down to see the potential impact of changes in demand. As more and more data become available, having the tools needed to find the right data and make sense of it is very important.
- Tuning – Determining the most efficient use of existing infrastructure should not be taken lightly. A lot of organizations have over-configured significant parts of the environment while under-configuring others. Simply reallocating resources could improve performance while keeping spend at current levels.
- Demand Management – Understanding the relationship of current and future demand and how the existing (or new) infrastructure can handle this is incredibly important. Predictive analytics can provide decision support to IT management. Also, moving non-critical workloads to quieter periods can delay purchase of additional hardware (and all the licenses and other costs that go with it).
- Capacity Planning – Determining the requirements for resources required over some future time. This can be done by predictive analysis, modeling, benchmarking, or other techniques – all of which have varying costs and levels of effectiveness.