Privacy is an issue which is consistently in the news. Large amounts of data are stored by retailers, governments, health care providers, employers and so forth. Much of this data contains personal information. Keeping that data private has proven itself to be a difficult task.
We have seen numerous examples of unintended data loss (unintended by the company whose systems are stolen or attacked).
We hear about thefts of laptops containing personal information for hundreds of thousands of people. Internet-based attacks that allow attackers access to financial transaction data and even rogue credit card swiping equipment hidden in gas pumps have become background noise in a sea of leaked data. This is an area that gets the lion’s share of attention in the media and by security professionals.
Worse than these types of personal data loss, because they are completely preventable, are those that are predicated on a company consciously releasing their customer data. Such companies always assume that they are not introducing risk, but often they are. In all cases, if the owner of the data had simply held it internally no privacy loss would have occurred.
There have been cases of personal data loss due to mistakes in judgment.
AOL released a large collection of search data to researchers. The people releasing the data didn’t consider this a risk to privacy. How could the search terms entered by anonymous people present a risk to privacy?
Of course we now now know that within the data were people’s social security numbers (SSN), phone numbers, credit card numbers and so forth. Why? Well, it turns out that some people will search for those things, quite possibly to prove to themselves that their data is safe. What better way to see if your SSN or credit card number is published on the Internet than by typing it into a search engine? No matches, great!
Personal data has even been lost by companies releasing data after attempting to mask or anonymize it.
The intent of masking is to remove enough information, the personally identifying information (PII), so that the data cannot be associated with real people. Of course this has to be done without losing the important details that allow patterns and relationships in the data to be found.