Sydney University Law Society
Screen Shot 2017-10-23 at 7.53.42 am.png

A Delicate Balance: Medical Data Mining and Patient Privacy

A Delicate Balance: Medical Data Mining and Patient Privacy


Matthew King

4th Year BE (Biomedical) (Hons) (Adv)/LLB Candidate

I    Introduction

In 2015, the NHS Royal Free Trust in the United Kingdom entered into an agreement with Google’s artificial intelligence company DeepMind, granting it access to 1.6 million patient health records.[1] The arrangement was intended to assist DeepMind in developing an application, ‘Streams’, to monitor patients suffering from acute kidney injury. However, when the data-sharing agreement was obtained by New Scientist in 2016, the revelation of the extent of sensitive data that the Google-owned company had access to was alarming. Considerably broad patient information was available to DeepMind, including HIV status and details of patients’ drug use and abortions.[2] The UK Information Commissioner's Office conducted an investigation into the agreement, ruling it a breach of privacy law: there was a significant lack of transparency as to how DeepMind would use patient information, and a clear distinction between what the data subjects could have “reasonably expected” would happen to their data, and the actual use of it.[3]

The “mining” of data which contains specific details about individuals - particularly sensitive medical information - raises serious concerns over use and access.

“Data mining” can be generally understood as the process of collecting large volumes of data, performing analysis upon that data using specific algorithms (statistical models specifically designed to handle large, complex datasets) and arranging the results in a meaningful form.[4] Current uses of data mining include curating news feeds and search results, delivering targeted advertising and driving autonomous vehicles.[5] It offers substantial potential. However, the “mining” of data which contains specific details about individuals - particularly sensitive medical information - raises serious concerns over use and access.

As such, a legal framework that balances robust data protection with suitable data access must exist to enable realisation of the benefits data mining offers without compromising the privacy of individuals whose data is being used. The DeepMind scenario is an example of the scales being tipped too far towards technology and demonstrates the inadequacy of the law in protecting against large-scale breaches of privacy.

This article assesses the impact of privacy laws on the use of data mining in medicine. Three medical professionals, three legal professionals and three data scientists who work within the medical sector have provided insight into the interaction of data laws and machine learning methods within the healthcare industry. Their views and experiences have been incorporated into the discussion.

II    The Rise of Data Mining in the Medical Industry

The availability of electronic patient information is becoming ubiquitous with the collection of health fund data and online medical data storage platforms such as My Health Records. By harnessing the value of this data alongside the responsible management of security and privacy, there is potential to provide medical practitioners with the tools to treat and manage their patients more effectively and efficiently.

A.    Benefits of Data Mining

When considering data mining from a legislative and regulatory standpoint, it is important to recognise that data mining systems can only produce meaningful results through access to a large volume of quality data. Data mining in medicine offers benefits such as a greater level of objectivity in decision making and the ability to personalise a patient’s treatment.  With the right data available, data scientists have produced remarkable systems, including Support Vector Machines capable of diagnosing Alzheimer’s from SPECT scans with 90% accuracy (outperforming specialists)[6] or classifying genes for diagnosing cancer;[7] and neural networks that reliably segment 3D medical scans[8] or diagnose diabetes and cardiovascular disease.[9]

B.    Data Mining Risks

While data mining promises a number of benefits, its use in medicine raises a number of potential issues. These need to be addressed before any clinical applications are implemented, and should be managed under an appropriate legal framework that ensures data is secure and private, and patients have autonomy over the use of their data.

Dr Brett Courtenay, an orthopaedic surgeon, expressed concern about the difficulties which may arise in monitoring the accuracy of conclusions made by machine learning systems. Additionally, the two data scientists Dr Lamiae Azizi and Associate Professor Fabio Ramos who work on medical machine learning methods, discussed the scenario of data mining delivering outputs which don’t necessarily benefit society, or could be used against people. Recently published research reported an ability to predict the sexual preferences of a person based simply upon photos of their face with a high degree of accuracy.[10] Similarly, data mining may discover that a particular race or gender is more likely to be affected by a particular disease. While such outputs can be extremely beneficial if properly managed, there is potential for this information to be abused. If the phenomenon being studied is socially undesirable, results may foster prejudice towards groups and individuals.

Furthermore, research conducted has shown that ‘de-identified data’ can be linked with other datasets to re-identify that information, meaning persons’ data can be linked back to them.[11] As was pointed out by Dr Azizi, data can be ‘re-identified’ in more ways than just tracing data back to the individual, instead being linked to a demographic or geographical region, raising similar concerns to those in the paragraph above.


III    Australia's Current Legal Framework

Australia deals with the collection, use and disclosure of data through a framework of legislative and regulatory principles, under which organisations that deal with personal data must operate. This framework insufficiently addresses privacy protection - particularly in regard to health, an area where data is highly intimate. Laws protecting patient data are essential to maintaining society’s confidence in data mining applications.

The Privacy Act 1988 (Cth) (‘the Act’) is the primary legislative instrument that controls the handling of all forms of data, including medical data. To promote the protection of privacy against the interests of data mining,[12] the Act establishes a Commission to handle complaints[13] and prescribes thirteen Australian Privacy Principles (‘APPs’)[14] to control, amongst other things, the collection, management and disclosure of personal information by organisations or government agencies, collectively referred to under the Act as ‘APP entities’.[15] Australia’s states and territories have enacted legislation supplementing the Act and established commissions of their own to deal with instances that fall beyond the Commonwealth Parliament’s jurisdiction. [16]

Although the Act regulates the handling of all forms of personal data, it provides specific laws for health and genetic data.[17] It controls how organisations such as biomedical companies and public and private hospitals must operate when using medical data. For an entity that is performing analysis of health data, it must adhere to the following stipulations (among other things):

  • All APP entities must maintain a privacy policy and ensure their purposes for collection, use and disclosure are transparent.[18] Additionally, entities must take reasonable measures towards ensuring data security.[19]
  • Health data may only be collected if either the individual consents to that collection and it is necessary for the organisation’s functions or activities;[20] or a permitted health situation exists.[21] (Permitted health situations are set out under s 16B of The Act. They exist where it is impracticable to obtain the individual’s consent and the information is necessary for: providing a health service to the individual;[22] conducting research, compilation or analysis that is relevant to public health or safety; or, the management, funding or monitoring of a health service).[23]
  • Outside of the s 16B exceptions, health data collected may only be used for the primary purpose the individual has consented to, or for a purpose so directly related to that purpose that the individual would reasonably expect it be used in that manner.[24]

In addition to the Privacy Act 1988 (Cth), the law of confidentiality influences the way in which health data is dealt with. The patient/doctor relationship invokes an obligation not to disclose sensitive information[25] and criminal charges exist for doing so.[26] As such, medical officers involved in using patient data for machine learning need to ensure they inform their patients of the purpose and obtain their consent.


A.    Consent

It is important when implementing data mining that patients know what is being done with their data, who has access to it and why.

Under the APPs, consent is a central consideration for an organisation seeking to employ data mining in medicine, and rightfully so. That said, it can present difficulties. Stephen Bolinger, the Chief Privacy Office at Cochlear Ltd, addressed the interaction of consent with purpose specifications which are used to outline to individuals why their data is being collected and what it is going to be used for. For neural networks and complex AI systems—often dubbed ‘black boxes’ due to their opaque internal processes—it is difficult to understand what is being done with individual's data, and therefore hard to specify to someone what they need to consent to.  Further, Bolinger noted that an obstacle arises when discovering new, beneficial ways of using data after it was collected where the original consent does not directly cover the new purpose.[27] As such, these technologies present particular challenges for transparency, consent and purpose specifications.

Similarly, Dr Azizi explained that data collected for research is generally question specific. If as is often the case, an individual’s consent only applies to a single study, then upon completion the data is effectively “dead” and can’t be reused. Sydney Law School’s Dr Belinda Reeve suggests that this issue could potentially be mitigated by allowing individuals to state what they don’t want their data to be used for, in addition to providing their ordinary consent. This may assist biomedical companies and researchers in determining whether they are likely to breach the privacy rights of individuals when planning to use data for novel applications or studies.

It is important when implementing data mining that patients know what is being done with their data, who has access to it and why. A theme common among the interview responses was that those within the legal profession, as well as the technical industry, have a role to play in educating the wider community about privacy protections and the applications of data mining. While some interviewees commented that data laws provide an important control in an environment where people don’t really know what’s being done with their data, this is far from ideal. Individuals cannot provide adequate consent without being fully informed.


B.    Medical Research

The Privacy Act 1988 (Cth) also concerns medical research. The Act grants the National Health and Medical Research Council (‘NHMRC’) the power to issue guidelines for research and granting ethics approval. These guidelines influence how researchers access and analyse data.[28]

As one of the more commonly addressed themes, the interviews conducted showed that the process of ethics approval should be improved. Dr Reeve, who deals extensively with information rights in healthcare, expressed that while the APPs and the NHMRC's guidelines are an important community safeguard for research, the extent of their effectiveness is questionable.  Ethics approval is a complex process which often leads to significant delays in research. Associate Professor Ramos commented that due to the considerable time spent waiting for medical ethics approval, it is simpler for data scientists to work on a project that does not require access to sensitive data. Furthermore, funding and project deadlines for researchers and students are somewhat incompatible with an approval process that takes between 6 months and 2 years. Discouraging researchers is hardly an ideal outcome. Uma Srinivasan, a research data scientist— who has worked in the medical and health area for 25 years developing data mining applications and has reviewed the ethics approval processes—expressed  that there is potential for a far more cohesive system. Srinivasan pointed out that as a consequence of ethics and privacy laws, medical data ‘sits in siloes’ in databases in the various jurisdictions, making it difficult to use.


IV    Beyond Australia: European and United States Law

International organisations which deal with medical data, such as large biomedical and pharmaceutical companies, are required to comply with the laws of all jurisdictions in which they are using personal data. As a result, a number of legal frameworks influence how they can harness medical data. Given the relevance of international jurisdictions to the implementation of data mining in medicine, the European and United States’ laws are reviewed against those of Australia.

In Europe, the processing and moving of personal data is currently governed by the Data Protection Directive.[29] Article 8 places health data in a special category, allowing it to be processed only in limited situations such as where explicit consent has been given or it is necessary to protect the vital interests of an individual. In addition, Convention 108 of the Council of Europe prohibits the automatic processing of health data unless sufficient safeguards exist under domestic law.[30] Data protection in Europe is set to be harmonised on 25 May 2018 under the General Data Protection Regulation (‘GDPR’).[31] The effectiveness of the GDPR is yet to be seen, but it represents a significant focus on the importance of data security and privacy, with a more rigid approach than the Australian system through a marked increase in the maximum applicable fines. For sensitive categories, such as health data, consent will need to be unambiguous and express, except where the processing of data is necessary for ‘public health reasons’, such as protecting against serious cross-border threats to health.[32] In circumstances where it is not possible for the specific purpose for processing data to be fully defined at the time of collection (as alluded to by Bolinger above) the recitals to the GDPR suggest that data subjects should be able to consent only to specific types of research that they are happy for their data to be used in.[33]

In the United States, the Health Insurance Portability and Accountability Act 1996 (‘HIPAA’) governs the use and disclosure of health data through the ‘Privacy Rule’.[34] It takes a similar approach to the EU frameworks and APPs in most respects. Interestingly, the HIPAA’s Privacy Rule does not treat de-identified health data as sensitive data.[35] This could be problematic. As mentioned previously, research has shown that ‘de-identified’ data is not truly anonymous and has the potential to be ‘re-identified’, linking it back to those individuals whose data is being used.[36]

Dr Azizi, who has worked in the UK as a data scientist, commented that Australia’s laws are complicated, and “trickier” than those in the UK. Privacy lawyer, Stephen Bolinger noted that for cross-border transfers of personal data, there is a remarkable distinction in the approach taken amongst jurisdictions: the EU is very stringent; Australia recognises it as a concern but leaves the onus for protection on organisations; and the US fails to address it entirely. Further, where the EU and Australia provide individuals with clear rights to gain access to their own data,[37] the US fails to do so. Despite these shortcomings, the US is considered to have very active regulators, and  the issuing of fines for privacy violations is not uncommon.


V.     Conclusion

Australia’s privacy legislation was first enacted when machine learning and data mining methods were still in their infancy. Since 2014, when the current APPs came into force,[38] significant advancements have been made in data mining. The legal framework deals insufficiently with these developments.

Throughout the course of conducting interviews with professionals within medicine, law and data science, a clear consensus emerged that there is a need to unify the ethics approval process, review the efficacy of de-identification as a privacy measure and manage consent where new and worthwhile applications arise.

The law must be able to deal with the challenges presented by data mining and prevent scenarios like that of Google’s DeepMind in 2015 from re-occurring. Data protection laws should be flexible enough to withstand a landscape of rapidly transforming technology.  The law must allow these technologies to flourish while protecting privacy and maintaining an individual’s ability to decide how their data is dealt with.  While there is significant potential for data mining to contribute to the field of medicine, a delicate balance between privacy and technology is crucial to, and at the core of, a strong and effective legal framework.


[1] Arjun Kharpal, ‘Google DeepMind patient data deal with UK health service illegal, Watchdog says’, CNBC (Online), 3 July 2017 <>.

[2] Hal Hadson, ‘Revealed: Google AI has access to huge haul of NHS patient data’, New Scientist (Online), 29 April 2016


[3] Letter from Elizabeth Denham, Information Commissioner, to Sir David Sloman, Chief Executive, Royal Free NHS Trust, 3 July 2017 <>.

[4] Igor Kononenko, ‘Machine learning for medical diagnosis: history, state of the art and perspective’ (2001) 23(1) Artificial Intelligence in Medicine 89.

[5] See also Concordia, The Power of Big Data and Psychographics (27 September 2016) YouTube <>.

[6] Glenn Fung and Jonathan Stoeckel, ‘SVM feature selection for classification of SPECT images of Alzheimer's disease using spatial information’, (2007) 11(2) Knowledge and Information Systems 243; Javier Ramírez et al, ‘Computer-aided diagnosis of Alzheimer’s type dementia combining support vector machines and discriminant set of features’ (2013) 237 Information Sciences 59.

[7] Isabelle Guyon et al, ‘Gene selection for cancer classification using support vector machines’, (2002) 46(1) Machine learning 389.

[8] Adhish Prasoon et al, ‘Deep Feature Learning for Knee Cartilage Segmentation Using a Triplanar Convolutional Neural Network’ (Paper presented at International conference on medical image computing and computer-assisted intervention, Berlin, Germany, 2013) 246.

[9] Filippo Amato et al, ‘Artificial neural networks in medical diagnosis’, (2013) 11 Journal of Applied Biomedicine 47.

[10] Michal Kosinski and Yilun Wang (in press), ‘Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation From Facial Images’, (2017) Journal of Personality and Social Psychology.

[11] Arvind Narayanan and Vitaly Shmatikov, ‘Myths and fallacies of personally identifiable information’ (2010) 53(6) Communications of the ACM 6.

[12] Privacy Act 1988 (Cth) s 2A.

[13] Ibid pts IV, VII. See also Australian Information Commissioner Act 2010 (Cth).

[14] Privacy Act 1988 (Cth) sch 1.

[15] Ibid ss 6, 6C.

[16] Privacy and Personal Information Protection Act 1998 (NSW); Health Records and Information Privacy Act 2002 (NSW); Information Privacy Act 2014 (ACT); Health Records (Privacy and Access) Act 1997 (ACT);  Information Privacy Act 2000 (Vic); Health Records Act 2001 (Vic); Privacy and Data Protection Act 2014 (Vic); Queensland Health Quality and Complaints Commission Act 1992 (Qld); Health Services Act 1991 (Qld); Information Privacy Act 2009 (Qld); Personal Information and Protection Act 2004 (Tas); Information Act 2002 (NT); Freedom of Information Act 1992 (WA); Health Care Act 2008 (SA).

[17] Privacy Act 1988 (Cth) s 6FA .

[18] Ibid sch 1 cl 1.

[19] Ibid sch 1 cl 12.

[20] Ibid sch 1 cl 3.3(a).

[21] Ibid sch 1 cl 3.4(a).

[22] Ibid s 16B(1).

[23] Ibid s 16B(1).

[24] Ibid sch 1 cl 6.2.

[25]Hunter v Mann [1974] QB 767, 772F; Mid-City Skin Cancer & Laser Centre v Zahedi-Anarak [2006] NSWSC 844 [137].

[26] National Health Act 1953 (Cth) s 135A.

[27] See, eg, Megan Molteni, ‘23andMe Is Digging Through Your Data for a Parkinson’s Cure’, Wired (Online), 13 September 2017 <>.

[28] Privacy Act 1988 (Cth) s 95.

[29] European Parliament and Council Directive 95/46/EC of 24 October 1995 on The Protection of Individuals with Regard to the Processing of Personal Data and on The Free Movement of Such Data [1995] OJ L 281/31.

[30] Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data, opened for signature 28 January 1981, ETS No 108 (entered into force 1 October 1985), art 6.

[31] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC [2016] OJ L 119/1.

[32] Ibid art 9.

[33] Ibid recital 33.

[34] 45 CFR §§ 160, 164 sub-pts A, E.

[35] 45 CFR § 164.502(d)(2).

[36] Arvind Narayanan and Vitaly Shmatikov, above n 12.

[37] Privacy Act 1988 (Cth) sch 1 cl 12.

[38] Privacy Amendment (Enhancing Privacy Protection) Act 2012 (Cth).