Grid-Tools Test Data Management Blog

Jess3589
Jess3589
January 21st, 2010


So, why are you still using data masking?

Most of us have no idea when it comes to figuring out ways to acquire the right kind of data we need for any type of test or development project. We’re lost. We’ve been taking copies of our live environments for years. It’s the only method we know.

There is only one issue changing our IT infrastructures and threatening the comfortable and reliant procedures we know good and well; compliance. Yes, we’re now in the data protection era. Regulating bodies would rather personal information be kept safe and secure inside the production database where it belongs, thank-you-very-much. Also, ignoring compliance measures isn’t really the best idea. We’ve learnt this firsthand from our competitors and their very public data breaches whilst secretly smiling to ourselves, relieved it didn’t happen to us.

However, let us consider for a moment the potential implications of your organization’s sensitive data being leaked into the public domain by some unfortunate soul’s mindless mistake:

• Potential law suit – check
• Damage to corporate brand and reputation – check
• Large fines imposed by regulating bodies – check
• Loss of integrity and potential loss of customers – check

So, what about data masking – you ask? It’s a quick and easy way to solve the problem, right? Let’s get a bit of production data, scramble it up, de-identify some names and – there you go. You’ve got your test data and the perfect solution to the thorn digging into your side.

Oh, but wait. We now hear data masking isn’t actually that secure. Bugger.

Then what alternative do we have? Well, using ’synthetic’ or ‘fake’ test data seems to be the topic of the day – the new method, the new ‘fad’ if you will.

Now, for those of you who don’t know, test data creation is a bi-product of modeling and sampling production data. What is modeled is then turned into data objects or templates which are based on the entire production environment. The templates can be edited or enhanced based on project needs, or moved into different data formats like flat files or CSV files.

This may sound impossible to you. You may be thinking about the referential integrity of your production database; the value of every table, every cell, every format, every name, how each table is ever so consequentially connected to one another. But it’s rather easy, actually – very easy to be frank.

I’ve started looking into this as part of my consultancy. It came to light when I was contacted by a government agency needing an alternative to data masking. I’m impressed. It is true – using test data creation is the way forward. On top of this, less data is actually being used and stored, since the data is generated from the model into a small template. Also, believe it or not, synthetic data gives you better code and functional coverage. How? Well, it’s easier than you think. You simply point your test data creation tool toward its data editing, enhancing and manipulation functions.

Oh, and yes, I nearly forgot about the most important point. More secure than data masking, you say?! Well, it is. This is because data creation tools never actually access or manipulate ‘live’ production data. They can, in fact, create data without data. Confusing? Most data masking tools anonymize your test and development data by moving live production data into a separate staging environment so it can then be masked. How is this secure? Well, it’s not really. Likewise, I bet you didn’t know that masked data can be reengineered back into its original format fairly easily. Synthetic data can never be reengineered because it’s not actually ‘real’ data.

So, why are you still using data masking? My guess is because you don’t know enough about data creation – start reading! It’s the way forward.

Leave a Comment

 

© 2009 Grid-Tools Ltd. - data management and test data generation software

 

Site Design by Grid-Tools Ltd Marketing | InvenTest

Share/Bookmark