I was recently reading about an inspiring middle-aged businessman who runs some prominent businesses in Asia. One of them is a hotel chain but that is not the business he is generally known for. I can’t give away his name but I can tell you that he has been involved in a reality TV series. I also know that he started a racing team, in one of the highest forms of motorsports, which was later sold and is also the co-chairman in a football club. But of course, respecting his privacy, I can’t really give away the name.
Obviously, anyone with interest in Football or Formula 1 would have figured out the personality in question by now. Even for someone else, it shouldn’t take more than 10 minutes armed with Google to narrow it down. Of course, the example here is quite obvious and does a bad job of introducing the topic of data anonymisation but bear with me.
Data anonymisation is an empty-headed art in disguise. It is just a fancy term invented to feed people in policies pretending to be in their interest. Big data is all the rage these days. Massive chunks of scrubbed data with personally identifiable information removed allowing for the free flow of data in the modern data-driven world. But is that really the case?
We have had far too many examples and researches until now to prove that data, any sort of chunks of ‘scrubbed’ data can be reverse engineered to identify individuals. You really didn’t think of that while setting up Google ad campaigns for your brand, did you? It is all like a game of sets and the key is to find the right overlaps. With every new layer added, the likelihood of identifying an individual goes up.
This wouldn’t have been a problem worth talking about until 8-9 years back but thanks to our mobile devices and the web of connected things we are weaving around ourselves, it can start to pose a real threat. Have you noticed how creepily accurate online ads are becoming?
An interesting example comes from some researchers from University of Washington and University of California. From the data collected by a car’s onboard computer, they could identify individuals with 90% accuracy just using the data from brake pedal use, albeit from a very small set.
It might not be very alarming for a set of a dozen people or two but don’t forget that we are fast approaching a connected future. Kickstarter is awash with IoT projects while car companies like Tesla are already rolling out over the air updates (and collecting driver data).
Further down the line, that data, after being ‘anonymised’, will have to be shared with more parties involved. The car companies (obviously), internet providers, local infrastructure, local regulators, insurance companies and maybe few more will gain access to the same pool of data and when cross-referenced with the multiple data trails we are generating, it will open doors to things we won’t like.
Data can either be useful or perfectly anonymous but not both, is what many experts claim. Some of these researches very well indicate that all the regulations around privacy are pretty much a joke. However, the biggest problem is that the possible good outcome from such massive data sets is far too valuable to resist. What will come in immediate future will only be the tip of the iceberg. At this point, we don’t need to resist and go back, but debate and find the right model to go forward.
(With inputs from Akshay Sharma).
Driving the research-oriented and analytical theories having the team effort under this avatar.