Test information must be selected carefully, protected and controlled. It should ideally be created in a generic form with no relation to live system information. However, it is recognised that often live data must be used to perform accurate testing. Where live data is used for testing it should be; Anonymised as far as possible; Carefully selected and secured for the period of testing; Securely deleted when testing is complete. Use of live data must be pre-authorised, logged and monitored. The auditor will expect to see robust procedures in place to protect data being used in test environments, especially if this is wholly or partly live data. While non-production test environments such as staging environments are essential to building high-quality software products and applications free of bugs or errors, the use of real data and lack of security measures in these environments expose information assets to heightened risks. For example, developers may use easy-to-remember credentials(e.g “admin” for both username and password) for testing environments in which sensitive data assets are stored. This may be exploited by cyber attackers to gain easy access to test environments and steal sensitive information assets. Therefore, organisations should put in place appropriate controls and procedures to protect real-world data used for testing.
- To protect and maintain the confidentiality of information used in the testing environment.
- Selection and use of test information that will yield reliable results.
Control
Test information should be appropriately selected, protected and managed.
Purpose
To ensure relevance of testing and protection of operational information used for testing.
ISO 27002 Implementation Guidance
Test information should be selected to ensure the reliability of tests results and the confidentiality of the relevant operational information. Sensitive information (including personally identifiable information) should not be copied into the development and testing environments. The following guidelines should be applied to protect the copies of operational information, when used for testing purposes, whether the test environment is built in-house or on a cloud service:
a) applying the same access control procedures to test environments as those applied to operational environments;
b) having a separate authorization each time operational information is copied to a test environment;
c) logging the copying and use of operational information to provide an audit trail;
d) protecting sensitive information by removal or masking if used for testing;
e) properly deleting operational information from a test environment immediately after the testing is complete to prevent unauthorized use of test information.
Test information should be securely stored (to prevent tampering, which can otherwise lead to invalid results) and only used for testing purposes.
Other information
System and acceptance testing can require substantial volumes of test information that are as close as possible to operational information.
Data used in testing environments such as quality assurance, test, and development must be protected against unauthorized access. For example, test environments may be fire walled to restricted to campus systems. Accounts may be disabled so that only a subset of accounts can be used for testing. Copying data between production and test environment should be approved. Where possible data used for testing should not contain personally identifiable information. Generating non-meaningful test data for performance testing is not a difficult exercise, but generating meaningful data that looks and behaves like real production data for functional testing is the challenge. Meaningful data contains all the characteristics of production data, such as format, context, and referential integrity, but is anonymized for data privacy compliance. While developing an application, developers need to make sure they are testing it under conditions that closely simulate a production environment. Most tests rely on sample data for testing. If the data is manually entered into a test environment, it cannot match the volume and variety of data that the application normally would accumulate in production. Behavior may differ because data inserted into the test database will not match real-world usage, possibly leaving significant bugs. Dev/test managers, application owners, and others know that simulated data fails to effectively support development, and manual scripting cannot keep up with the demand for fast timelines between application development and production.Building a test database with meaningful, protected test data allows the application owner to see and assess how the application will perform once it released. Without meaningful test data in the test environment, it is impossible to predict the way the application will behave after the release. Organizations testing non-production data want to see data that looks real to understand how real data would perform in their application.
- Choosing the right data set type
Choosing your test data is a huge decision; choose the wrong data set type and you could land your business in hot water, particularly if you deal with sensitive data. In any testing environment, production data would be the best choice but this is no longer acceptable. Using production data is too great a risk for your business and you could be held liable for fines and penalties if that data was lost or landed in the wrong hands. With that in mind, it’s important instead to use test data that is as close to production as possible, resulting in a realistic testing environment. Test data management is key here to ensure a process of obfuscating or generating synthetic data. - Creating obfuscated or synthetic data
Managed test data can help to create test environments on demand, with synthetic data able to deliver a highly repeatable solution. In turn, this can enhance the speed of your business’ testing turnaround time. Manual test preparation can take up to 30% of the testing time, which can prevent your organisation from achieving a continuous deployment or DevOps model. - Obfuscated or synthetic test data can allow you to test on demand to achieve faster and cost-efficient delivery, without risking the integrity of any sensitive data. Having an automated method to support your test data management can help you take this to the next level.
Guidelines for Data De-Identification or Anonymization should be followed to remove sensitive information or to modify it beyond recognition when used for testing purposes. If production data is used unchanged for testing, the data should be protected with the same level of controls used for the production system. Test data must be selected carefully, protected, and controlled. Test data should ideally be created in a generic form with no relation to living system data. However, it is recognized that often live data must be used to perform accurate testing. Where live data is used for testing it should be; Anonymized as far as possible; Carefully selected and secured for the period of testing; Securely deleted when testing is complete. Use of live data must be pre-authorized, logged, and monitored. The auditor will expect to see robust procedures in place to protect data being used in test environments, especially if this is wholly or partly live data. Organisations should not use sensitive information, including personal data, in the development and testing environments.It is noted that system and acceptance testing may require an enormous amount of test information, equivalent to operational information. To protect the test information against loss of confidentiality and integrity, organisations should comply with the following:
- Access controls applied in real-world environments should also be implemented in test environments.
- Establishing and implementing a separate authorisation procedure for the copying of real information into test environments.
- To keep an audit trail, all activities related to copying and use of sensitive information in test environments should be recorded.
- If sensitive information will be used in the test environment, it should be protected with appropriate controls such as data masking or data removal.
- Once the testing is completed, information used in the test environment should be safely and permanently removed to eliminate the risk of unauthorised access.
- Furthermore, organisations should apply appropriate measures to ensure the secure storage of information assets.
When it comes to testing, there are many factors that require consideration to ensure the correct use and protection of data. Compliance standards like the Privacy Act, set out requirements for companies to ensure different types of data are carefully managed and protected. Test data should ideally be created in a generic form with no relation to live system data. However, often data needs to reflect actual real data to ensure accurate testing. If you must use “real” data for testing purposes consider implementing a robust data masking technique to protect the data. When using data for testing, an organisation should ensure it is:
- Anonymised – Any personal or confidential information that is used should be protected either by deletion or modification.
- Carefully selected and secured for the period of testing.
- Securely deleted when testing is complete.
- Agreed processes used to protect data during testing are securely managed.
Technique to mask data
- Data Encryption – An encryption algorithm is used to lock the data from anyone being able to see it pother than the person who has the key. For testing purposes it is often not helpful ass it requires t he system to continually lock and unlock the data and processes need to be in place manage and share encryption keys.
- Data Scrambling – Reorganize characters in the data set in a random order, replacing the original content. For example, a number such as 985467 in a production database, could be replaced by 649857 in a test database. Easy to do to but can be less secure if someone figures out the process and can reverse engineer the changes.
- Nulling Out – Data is replaced with “null” or is deleted. Not helpful during testing if you need the data to perform certain functions or test outputs appear on a page correctly.
- Value Variance – Replace original data values by using a function, such as the difference between the lowest and highest value in a series. For example, if a a list of product prices were between 100 and 1000 the product price can be replaced with a range between the highest and lowest price paid. This can help protect anyone getting access to the original dataset.
- Data Substitution – Data values are substituted with fake, but realistic, alternative values. For example, real names or numbers are replaced by random names and numbers
- Data Shuffling – Similar to substitution, except data values are switched within the same dataset. Data is rearranged in each column using a random sequence; for example, switching between real customer names across multiple customer records. The output set looks like real data, but it doesn’t show the real information for each individual or data record.
Best practice to protect data
- Test Data Strategy: An effective, agile, and comprehensive test data management program must start with the strategy. You must gain an understanding of your test data landscape, and the different teams across the organization that will use the test data and contribute to it. Your plan should include test data needs, testing environments, your company’s data governance policies and relevant regulations that impact data handling. Starting with the test data management strategy will save time, overhead costs, and rework.
- Discovering Test Data: The first test data management best practice is to discover, and integrate, test data from multiple source systems and IT environments, across the organization. To achieve this, enterprises should identify all the relevant data channels and sources early on in the process. This includes discovering and categorizing all sensitive data and personally identifying information (PII) according to multiple data protection regulations , and industry legislation .
- Protecting Private Data: Today, sensitive data and Personally Identifying Information (PII) is a touchy topic as people and authorities become more aware of the dangers of collecting and using people’s private information. Test data management must follow specific compliance rules, that demand high data governance standards. When sensitive data is involved, data masking keeps it protected. By using data masking tools to obfuscate production data in a way that mimics real-life data – without exposing the real data – we guarantee both authenticity and compliance. Another security aspect to consider is how the test data is stored and managed. Keeping test data accessible only to authorized personnel, and maintaining security protocols, even for apps under development, are essential.
- Refreshing Test Data in Real Time: Perhaps the most important factor in test data is keeping it fresh. Due to the sheer volume of enterprise data, many enterprises refresh their test data only periodically, such as once a quarter. Since extracting and provisioning test data is time-consuming, testing teams often reuse old data, over and over again. To maintain the relevance and trustworthiness of test data, a real-time synchronization mechanism is needed that does not require bulk database copying. Another important factor is ensuring that production system performance is not adversely impacted by frequent access.
- Ensuring Test Data Relevancy: Time isn’t the only factor impacting the relevancy of test data. Testing quality relies on the ability of the testing teams to source relevant test data to the use case at hand. Due to the complexity of this task, many test data management tools discourage testers from parameter-based subsetting, especially across multiple source systems. Testing teams should examine which data elements are necessary for their particular scenarios, and build the test data subsets accordingly. Not only will the test data sets be more relevant and focused, but they will also improve test data quality and accelerate software delivery.
- Maintaining Test Data: Keeping your data fresh and relevant over time leads us to the next test data management best practice, which is ongoing maintenance. In addition to relevancy and accuracy, your team would need to ensure that the data is adequately stored and remains consistent and error-free. This level of accuracy should be maintained over multiple use cases and even as the volume of data increases. Test data management at scale is challenging, so this is one area where your test data tools will have to prove their value. Monitor the cost efficiency of your test data storage solution, and perform regular audits to examine the integrity, quality, and security of your test data.
- Automating the Test Data Process:
By now, you’re probably concerned about the many tedious tasks related to test data . Worry not, because many test data steps can, and should, be automated. Automation makes test data provisioning faster, and helps minimize human errors. Agile software development and shift-left testing demand test data automation for integration into CI/CD pipelines. Best practices for data testing have become increasingly automated over the past years. It’s about time, test data management did the same.