Data de-identification – is privacy achievable? 

Did you hear about the dataset of Medicare claims that had to be removed from the Australian Government’s open data portal, after being available for anyone to view and use?

A team of researchers from the University of Melbourne found it was possible to decrypt some service provider ID numbers in this publicly available Medicare dataset. Although the Department of Health insisted no patient information was compromised, and no information about the service providers was publicly identified or released, this incident raised significant concerns around data de-identification practices.

Differing viewpoints

In order to de-identify a dataset, information such as IDs, names, addresses, gender, date of birth or other identifying information can be removed from datasets entirely. These attributes can also be masked by changing data values or through aggregation. However, experts disagree on the effectiveness of these processes and their ability to protect an individual’s privacy.

Australian Information and Privacy Commissioner Timothy Pilgrim recently hosted a workshop with privacy and data de-identification experts at the GovInnovate Conference in Canberra. Some argued that no method of de-identification exists that could guarantee the safety of sensitive information, while others were more confident in the anonymisation models produced by decades of statistical and computer science research.

Dr Vanessa Teague, cryptologist and a member of the University of Melbourne’s team that revealed the Medicare dataset’s encryption issue, said it’s a myth there’s any algorithm for de-identification that works. On the other hand, Dr Khaled El Emam, a world-renowned expert in statistical de-identification and re-identification risk, said there are models and risk metrics through which an acceptable level of risk is determined. He cited the statistical risk thresholds set by European and US health agencies, courts and regulators; importantly, these are never zero.

Is legislation the answer?

The Australian Federal Government is already working on tightened privacy laws to address re-identification issues. Attorney-General George Brandis introduced into the Senate an amendment to the Privacy Act 1988 – the Privacy Amendment (Re-identification Offence) Bill 2016. This legislation introduces criminal offences and civil penalty provisions for the re-identification of de-identified personal information published or released by Commonwealth entities.

But we’re still waiting for the data breach notification legislation to be passed after being promised in 2013, 2014 and now in 2016. Perhaps 2017 will be our lucky year?

Perfect methods for de-identifying data 100% of the time do not exist.

Advances in technology and the increased availability of information makes re-identification possible, albeit a relatively difficult task (mostly done by academics or journalists, not hackers). Yet the availability of data is essential for policy development, research and innovation. So what can be done?

Firstly, define an acceptable risk threshold for your organisation and the data in question. The risk of re-identification is never zero. Secondly, monitor technology advances. What has changed that might enable someone to re-identify your data? And finally, the best defence is always offence – don’t collect personal information if you don’t have to. It’s hard to disclose something that you don’t have!

Medicare image via AMA