Future Smart Service for homes, factories, cities, and governments rely on sharing of large volumes of often personal data between individuals and organisations, or between individuals and governments.
The benefit is the ability to create locally optimised, or even individually personalised services based on personal preference as well as an understanding of the wider network of users and providers.
Data sharing comes with a wide range of challenges broadly categorised as: data format and meaning; legal obligations; privacy; data security; and concerns about unintended consequences of data sharing. This creates the need to develop sharing frameworks which address technical challenges, embed regulatory frameworks, and address personal or cultural concerns.
The ACS Data Sharing Committee was created one year ago to address the overarching challenge of developing privacy-preserving frameworks which support automated data sharing to facilitate smart services creation and deployment. These frameworks seek to address technical challenges, regulatory limitations and limit unintended consequences of data sharing.
Rather than reinvent, the approach is to identify best practice where it is known to exist; consider existing models in an Australian privacy context or identify “whitespace” opportunities to develop frameworks for Australia.
The objectives of the Committee are:
• Developing frameworks which characterise the challenges associated with data sharing and which describes sets of data sets based on the degree of personal information contained within them (nominally referred to as a Personal Information Factor);
• Developing frameworks which characterise “smart service” types based on the data sets used to create them and the associated Personal Information Factor;
• Developing trust (or risk) frameworks which allow data to be shared, joined, and used in operational environments whilst preserving individual privacy;
• Identifying technology driven approaches to address the challenges associated with personal information.
By creating these conceptual frameworks and exploring emerging privacy preserving technologies, the Committee hopes to describe data sharing challenges in a systematic way and so help improve the clarity around State and Commonwealth Privacy Acts to ensure all participants in service creation, deployment and use understand their responsibilities, obligations, and limitations.
What is at stake
Underpinning the transformation to a smarter, truly digital economy is the ability to share data beyond the boundaries of an organisation, company, or government agency. The ability to share data is highly dependent on the question of whether personal information is present in sets of data sets.
A fundamental challenge to answering this question is that there is no way to unambiguously determine if personal information is present in linked data. Even if an unambiguous test was possible for a given data set, the practical reality is also that data sharing does not occur in a vacuum. In almost any imaginable environment, aggregated data can be linked with data from other sources and so decomposed to a more personal level. The ability to increase the level of personally information factor is limited only by the determination and ability to link extraneous data to the set which has been shared.
Aggregation of data has been shown to be a very weak form of protection, however the implications of the blunt instrument of aggregation can be profound when thinking of the use cases which come in and out of scope depending on the level of aggregation used.
The technologies examined by this Committee – determining minimum cohort size, differential privacy, homomorphic encryption and privacy preserving linkage – all address concerns associated with re-identification of individuals from linked data sets. The space is moving rapidly, and has the potential to alleviate privacy and data security concerns in areas as diverse as health care to cities smart without disclosing our personal data.
The power of computational data analytics and the ability of new techniques to address expressed concerns about privacy actually surfaces a newer and bigger ethical concern. The privacy preserving computational techniques enable applications that were not possible when privacy legislation was framed and when the concept of privacy was considered in a joined-up digital economy. The unease that some privacy advocates feel about new personalised services is not readily addressed by the discussions of minimum cohort size or homomorphic encryption. These concerns are best described by the question: Just because we can, should we?
The challenge to address head on is identifying the sources of this unease at their most fundamental level, develop practical frameworks which allow the creation of value and yet preserve our privacy, and then adapt these frameworks for jurisdictions in Australia.
The higher order challenge is to reframe the national conversation on data sharing to be around the service created from data and the rights and obligations of people creating, delivering and using these services. The prize is the opportunity to create benefit for Australian Industry, increased efficiency of government, greater decision making transparency for the citizens of Australia, while still protecting the rights and the sensitive, personal information associated with each of us as individuals.
Recommendations from the Committee so far
Recommendation 1: Regulatory clarification
Regulatory complexity is one of the major challenges associated with greater sharing of data. It is far too easy to read “not allowed” into existing regulations at one or more levels and so effectively prevent opening up of data. Clarifying regulations associated with the release and use of data will help encourage industry and different government agencies to open up and share data.
Clarification of existing legal frameworks needs to include quantified descriptions of acceptable levels of risk in ways which are meaningful for modern data analytics.
Recommendation 2: Research on Data Sharing – A framework should be developed with supports anonymization of data which in turn facilitates sharing.
The areas which have the greatest potential to drive productivity in Australia are also the areas which require access to the most sensitive and personal data sets – Health, superannuation, human services, and education. A focussed effort on mechanisms which allow data to be anonymised and shared with industry and the research community will open up many of the biggest challenges facing Australia to the academic scrutiny and industry led innovation.
The technologies explored by the Committee – determining minimum cohort size, differential privacy, homomorphic encryption and privacy preserving linkage – all address concerns associated with re-identification of individuals from linked data sets, and yet all are at relatively early stages of development. Maturing these technologies by encouraging pilot projects and safe trials would benefit all jurisdictions.
Recommendation 3: A test for Personally Identifiable Data – Develop a nationally accepted test for the existence of Personally Identifiable Data.
Collating data from millions of sensors operating at billions of cycles per second is fundamentally incompatible with relying on human judgements to determine the existence of personally identifiable information. Creating a nationally acceptable test will greatly increase the scope for smart services whilst still leaving room for judgement in risky situations.
Recommendation 4: Agreed standards for minimum cohort size based on data type
By its very nature, the concept that a cohort size of one is always the same as identification of an individual is an unproveable statement. Given the increasing variety of data available and accelerating analytical capability, it is however tempting to say that they are the same. In order to protect individual privacy and to acknowledge concerns about “likely” or “reasonably” reidentification, minimum cohort sizes should be agreed and communicated for different levels of data value. This would help data joining and minimise challenges around use of widely varying levels of aggregation.
Recommendation 5: Agreed standards for Obfuscation / Perturbation
As a complementary Recommendation to 4, standards should be agreed for obfuscation and perturbation. This can not only help provide confidence that data has been robustly deidentified, it can also help with the creation of minimum cohort sizes.
Dr Ian Oppermann is the NSW Government’s Chief Data Scientist, and CEO of the NSW Data Analytics Centre. He is the Chair of the ACS Data Sharing Committee.