Audio version of the article
Any organisation needs a system architecture that is sufficient to satisfy the business availability requirements. Redundancy ensures availability by having spare capacity in case of system failure, and often requires duplicate systems such as power supplies. An ‘information processing facility’ (IPF) is a broad term used to describe any piece of ICT infrastructure that’s involved in processing data, such as IT equipment, software and physical facilities and locations. IPFs play a key role in maintaining business continuity and ensuring the smooth operation of an organisation’s ICT network(s). To increase resilience, organisations need to implement measures that increase the redundancy of their IPFs – i.e. fail safe methodologies and processes that mitigate risk to systems and data in the event of failure, misuse or intrusion.Redundancy helps your stored copies to maintain the availability of your information systems. In simple terms if one of your originals fails, you’ll have a backup copy available to replace it. You should conduct regular tests to confirm the viability of your redundancies as It would be a major disappointment if your backup also failed. Since redundant items are of such great value to your system continuity, they must be stored either at the same level or better than your originals. Most companies these days use cloud storage to preserve their redundancies, if you have a supplier relationship, you should discuss the status of your redundancies in the cloud. They should be well informed of the risks you face related to data security. Transparency is key.
Adequate redundancy that can be spun up when necessary forms an important part of business continuity planning and should be tested regularly. The overriding goal is to increase the availability of business services and information systems, which can be interpreted as any aspect of a network that allows the organisation to conduct business. It advocates for duplication as the primary methods of achieving redundancy across its various IPFs, primarily through maintaining an inventory of spare parts, duplicate components (hardware and software) and additional peripheral devices. A redundant system is only as good as its alerting facility. As such, organisations should ensure that failed IPFs are detected quickly, and remedial action is taken to either fail over to standby hardware, or to repair the malfunctioning IPF in as quick a time as possible.
Control
Information processing facilities should be implemented with redundancy sufficient to meet availability requirements.
Purpose
To ensure the continuous operation of information processing facilities.
ISO 27002 Implementation Guidance
The organization should identify requirements for the availability of business services and information systems. The organization should design and implement systems architecture with appropriate redundancy to meet these requirements. Redundancy can be introduced by duplicating information processing facilities in part or in their entirety (i.e. spare components or having two of everything). The organization should plan and implement procedures for the activation of the redundant components and processing facilities. The procedures should establish if the redundant components and processing activities are always activated, or in case of emergency, automatically or manually activated. The redundant components and information processing facilities should ensure the same security level as the primary ones. Mechanisms should be in place to alert the organization to any failure in the information processing facilities, enable executing the planned procedure and allow continued availability while the information processing facilities are repaired or replaced. The organization should consider the following when implementing redundant systems:
a) contracting with two or more suppliers of network and critical information processing facilities such as internet service providers;
b) using redundant networks;
c) using two geographically separate data centers with mirrored systems;
d) using physically redundant power supplies or sources;
e) using multiple parallel instances of software components, with automatic load balancing between them (between instances in the same data center or in different data centers);
f) having duplicated components in systems (e.g. CPU, hard disks, memories) or in networks (e.g. firewalls, routers, switches).
Where applicable, preferably in production mode, redundant information systems should be tested to ensure the failover from one component to another component works as intended.
Other information
There is a strong relationship between redundancy and ICT readiness for business continuity especially if short recovery times are required. Many of the redundancy measures can be part of the ICT continuity strategies and solutions. The implementation of redundancies can introduce risks to the integrity (e.g. processes of copying data to duplicated components can introduce errors) or confidentiality (e.g. weak security control of duplicated components can lead to compromise) of information and information systems, which need to be considered when designing information systems. Redundancy in information processing facilities does not usually address application unavailability due to faults within an application. With the use of public cloud computing, it is possible to have multiple live versions of information processing facilities, existing in multiple separate physical locations with automatic fail over and load balancing between them. Some of the technologies and techniques for providing redundancy and automatic fail-over in the context of cloud services are discussed in ISO/IEC TS 23167.
The objective of this control is to ensure availability of information processing facilities.Information processing facilities should be implemented with redundancy sufficient to meet availability requirements Redundancy helps your stored copies to maintain the availability of your information systems. In simple terms if one of your originals fails, you’ll have a backup copy available to replace it. You should conduct regular tests to confirm the viability of your redundancies as It would be a major disappointment if your backup also failed. Since redundant items are of such great value to your system continuity, they must be stored either at the same level or better than your originals. Most companies these days use cloud storage to preserve their redundancies, if you have a supplier relationship, you should discuss the status of your redundancies in the cloud. They should be well informed of the risks you face related to data security. Transparency is key.
.Organizations should identify business requirements for the availability of information systems. Where the availability cannot be guaranteed using the existing systems architecture, redundant components or architectures should be considered. Where applicable, redundant information systems should be tested to ensure the fail over from one component to another component works as intended.The implementation of redundancies can introduce risks to the integrity or confidentiality of information and information systems, which need to be considered when designing information systems. A good control describes how information processing facilities are implemented with redundancy sufficiency to meet availability requirements. Redundancy refers to implementing, typically, duplicate hardware to ensure the availability of information processing systems. The principle is that if one or more items fail, then there are redundant items that will take over. Critical to this is the testing of redundant components and systems periodically to ensure that fail-over will be achieved in a reasonable time-frame. Redundant components must be protected at the same level or greater than the primary components. Many organizations use cloud-based providers so they will want to ensure redundancy is addressed effectively in their contracts with suppliers and as part of the policy in A.15. The auditor will expect to see that testing is carried out on a periodic basis, where redundant components & systems are in place and in the control of the organization. When designing and installing redundancy measures, organisations should consider the following:
- Entering into a commercial relationship with two separate service providers, to reduce the risk of blanket downtime in the event of a critical event (e.g. an Internet service provider or VoIP provider).
- Adhering to redundancy principles when designing data networks (secondary DCs, redundant BUDR systems etc).
- Making use of geographically separate locations when contracting out data services, especially in the case of file storage and/or data centre facilities.
- Purchasing power systems that have the ability to achieve redundancy either in whole, or in part, as is necessary.
- Using load balancing and automatic fail over between two identical, redundant software components or systems, to improve both real time performance and resilience following a critical event. This is of particular importance when considering an operational model that encompasses both public cloud services and on-premise services. Redundant systems should be regularly tested (where possible) whilst in production mode to ensure that fail over systems are operating correctly.
- Duplicating physical ICT components both within servers and file storage locations (RAID arrays, CPUs), and any that act as a network device (firewalls, redundant switches). Firmware updates should be performed on all linked and redundant devices, to ensure continuity.
As a critical element of maintaining continuity of services, there needs to be adequate redundancy of facilities, people, communications, documentation, training, and services. Redundancy Plans must anticipate a multitude of failures, causes, loss of data or facilities, unavailability of trained personnel, communications losses, powers losses, and more. Plans must assess the risks associated with each critical component and identify redundant/alternative means for providing for continuity of services under all conditions. Due to the nature of business continuity management, it is essential that all elements of a plan have adequate redundancies available, knowing that some elements may be compromised by the nature of the disruption. Redundancies include cross-training of personnel, alternate facilities at locations that do not share vulnerabilities, redundant communications methods and providers, power sources, physical access, and more.
Redundancy Plan – Avoid Potential Single Points Of Failure
LOCATION/GEOGRAPHY | metro area, flood plain/river bank, avalanche, hurricane or tornado-prone, unreliable power source, poor transportation/communication systems, etc. |
FACILITIES and INFRASTRUCTURE | Power, generators/fuel supplies/suppliers, air conditioning, communications lines-data, voice, multiple high-speed networks connect, network topology, environmental conditions/hazards, central organization, schools, or admin units |
SYSTEMS | Lost, damaged, can’t be “touched” or “reached” |
DATA | Lost, not accessible |
PEOPLE | Not 24×7 staffing; off hours not on campus, no/limited transportation, not reachable via communications-voice, Internet, cell, text, etc., facilities disabled, unsafe conditions, evacuated, can’t leave campus – government-enforced, not cross-trained, insufficient backup of skills |
DOCUMENTATION | must document critical/emergency procedures and recovery procedures, must assume that “usual” trained people are unavailable, need for depth of cross-training, docs must be available to all potential users – not just on one, local system or in one place, vendor contacts – office, cell, text, social networking, IM, etc. |
Key Considerations
Data is Essential and Must Be Replicated
Data is more important than hardware. Data should be replicated by a variety of means and should be retrievable as needed. Hardware can always be replaced. Be aware of the dependencies between software and data. Cloud services may provide viable options for replication as long as security and privacy are maintained.
Alternate Sites for Web Hosting
Consider the impact of a hurricane on physical facilities (which become uninhabitable), life-safety issues (evacuations, flooding, disease, lack of potable water and food, etc.), electrical power/network infrastructures, and an extended prognosis for the restoration of “normal operating conditions.” A Redundancy plan should identify alternative means to be used to provide essential services under such unthinkable circumstances. Services should include means for communicating information on the status and safety of the organization and its people to the rest of the world. Consider contracting for an alternate service for communicating key information with on and off-campus people while normal organizational web services cannot be provided.
Availability of Information Processing Facilities
Despite the emergency or disruptive circumstances, information processing facilities must continue to function, be accessible for critical processing, and maintaining the security, integrity, and privacy of information. In creating plans, many variables need to be considered when choosing alternative sites, services, personnel, vendors, power/communication means, and accessibility.
Choosing the Right Locations for Locating Emergency Equipment and Locations to Serve as Continuity Centers
First and foremost, consider all types of factors when evaluating locations to house emergency equipment such as electricity generators. As learned from Super Storm Sandy, never locate generators in basements, locations below 100-year flood lines, or locations likely to be inaccessible for fuel deliveries. NYU Medical Center lost the use of all of its power generators when the East River overflowed its banks. Patients had to be evacuated to other hospitals because all power sources were down and all its generators were underwater. Restoration of power took weeks. Physical damage and lost revenues were beyond any expectations as no one expected the water level to rise to the extent it did. Only now are plans being made that consider rising water levels that are likely to recur more frequently. A plan should include the identification of physical locations that will be used to coordinate during and immediately after an incident. Ideally, several locations should be chosen at increasingly distant locations from the organization. This allows for disruption in a key building, a metropolitan area, and a significant geographic designation while minimizing the impact on a continuity effort. Be cautious and exercise due diligence when considering locations and technologies for potential sites to serve as continuity centers. While some locations and facilities may be seen to have some very positive attributes that would seem to make them cost-effective choices, there are many critical issues that should be explored. Categories of considerations include:
- Physical locations.
- Virtual locations (address security and privacy concerns).
- Cloud resources (address security and privacy concerns).
- Availability of sufficient communications excellent/redundant/multi-vendor cell capacity, landlines, internet bandwidth from more than one vendor and physical supply (not all coming through the same conduit, following the same path), and satellite access.
- Availability of alternative power sources – generators with adequate fuel supply and delivery. (Natural gas is ideal, if available, as long as the location is not prone to flooding, hurricanes, tornadoes, or earthquakes. It minimizes delivery issues.) Multiple suppliers should be contracted for other types of fuel supply. Remember that generators need routine maintenance and testing.
- Sufficient physical access during emergency situations (not located along a major evacuation route, yet highly accessible).
- Proximity to locations that contain hazardous material, or are near river banks/flood plains, avalanche zones, mudslide zones, frequent forest fires, or earthquake-prone locations.
Security and support for personnel staffing the continuity effort.
It is not critical that hot sites (physical, virtual, or cloud) be as extensive or as fast as normal resources. They are for emergency use, not daily operations. Communication facilities and contact information must be as accurate and complete as those used for daily operations. They provide the lifeline for all coordination of communication.
Good Vendor Relationships are Important
Establishing a good, working relationship with key vendors can help in times of crisis. Resources may need to be replaced. Good relationships may help move your needs to the head of the queue of waiting orders. While there may be many others facing similar problems related to the same or other crises, vendors are sensitive to the problems and will try to assist however and whenever possible when there is an existing relationship.
After the Resumption of Normalcy
While everyone may be tired and anxious to get back to business as usual, bringing all the key individuals to a session to discuss how the plan worked or failed is important. This is a unique opportunity to get direct feedback on the usefulness of the plan. Scheduling a “postmortem” is invaluable in getting constructive feedback as well as complaints that need addressing. Drills are helpful, but a postmortem shares real experiences and feedback