Do you test using real production data? Beware of using sensitive data for any application development or testing purposes, since lost or stolen information can trigger costly data notifications, regulatory sanctions, and customer fallout.
© Copyright 2009, Dice Holdings, Inc. All Rights Reserved. 
Anecdotally, many developers and QA testers say they prefer to build and test applications using the real thing: actual customer data.
Such practices, however, can violate a number of data privacy regulations. For example, the 1996 Health Insurance Portability and Accountability Act (HIPAA) mandates companies restrict access to people’s personal health data on a “need to know” basis. Likewise, the Sarbanes-Oxley Act (SOX) of 2002 requires companies to control access and track changes to systems handling corporate financial information. In addition, over 30 states have passed data breach notification laws requiring companies to notify consumers if their personal information may have been compromised. This includes such things as a person’s name and address, date of birth, social security number, and credit card and bank account numbers.
These regulations make no distinction between production and testing environments. Simply put, the requirements are the same whether an attacker hacks into your e-commerce application, or accesses a database in the testing environment. Given such risks, many companies have decided developers — as well as quality assurance (QA) personnel and database administrators (DBAs) — simply don’t have a “need to know,” and thus shouldn’t have access to any sensitive information. Beyond helping protect sensitive data, the company and its customers, this also protects developers: they’re not culpable in the event of a data leak or security breach.
To ensure applications perform appropriately once they launch, however, developers still need access to “good enough” data to build and test their applications. Accordingly, many organizations are creating homegrown scripts, or purchasing off-the-shelf software, to transform sensitive production data into safe but usable test data.
Beyond complying with regulations, keeping customers happy, and avoiding class action lawsuits, companies also have a financial incentive for keeping sensitive data out of the test environment. Indeed, the actual cost of lost, stolen, or inappropriately accessed data is quite high: an average of $182 per record. That finding comes from the Ponemon Institute, which studied the actual costs incurred by 31 companies after they experienced a data breach. (The total tab for an affected organization ranged from $226,000 to $22 million.) Costs included legal fees, consumer notifications, credit monitoring services, and decreased customer retention and acquisition.
Data breaches can exact more than money. Witness the breach at CardSystems, a company that processed credit card transactions. Attackers stole over 40 million records containing people’s credit card numbers. The records had reportedly been retained by CardSystems for “research purposes,” despite the company being subject to industry regulations expressly forbidding the storage of such data, at least in unencrypted format. The fallout ultimately drove CardSystems out of business.
Not surprisingly, many companies are now taking a closer look at the data their developers use.
How can you ensure no sensitive data is being used or stored in your development and testing environments? A recent report from Forrester Research, written by Noel Yuhanna and Carey Schwaber, recommends companies pursue these four steps:
To make production data safe for the test environment, your test-data czar will need to formulate transformation strategies on a per-application basis. This transformation can be difficult, however, since an application often needs to think data is real. For example, an e-commerce application may vet a credit card number to see if it “looks” real. Similar checks may occur for social security numbers, birth dates, driver’s license numbers, addresses, bank accounts, and customer identification numbers.
A variety of techniques exist to create data that’s either fake, or “de-identified” enough to be safe, including:
How can you apply these techniques? Many companies use scripting tools or their existing test automation tools to obfuscate or replace sensitive information.
For large companies facing especially stringent data regulations and just beginning to deal with test-data problems, however, Forrester’s Yuhanna and Schwaber recommend using off-the-shelf test data generation or masking software, since building it from scratch may take six to nine months. Such tools, they note, are available from Compuware, Datavantage, Global Software Applications, IBM, Princeton Softech, Quest Software, SoftBase Systems, and Worksoft.
Keep a Record
Regardless of your approach, maintain copious records of your test data transformation processes. First, this will help you apply them in a consistent and repeatable manner. Second, if your company should suffer a breach, states’ data notification laws exempt stolen data that was sufficiently de-identified, or encrypted. Hence your records will help demonstrate to auditors that no sensitive information was stolen, enabling your company to avoid a costly data breach notification process, and to save face with customers.
Originally published on Dice.com, © Copyright 2009, Dice Holdings, Inc. All Rights Reserved.
