Building Trust in (and with) Big Data


For Data Privacy Day, the Ontario Information and Privacy Commissioner’s office held an event focused on Government and Big Data (recording available here).

Commissioner Brian Beamish’s opening remarks neatly encapsulate the struggle that is implicit in big data analysis (and indeed, in the notion of implied consent itself).

“I think the public wants the government – expects the government – to deliver services as effectively as possible,” he says. “That said, I think if the privacy risks aren’t recognized and addressed – if the public gets a sense that their privacy is not being respected – there is a definite possibility, or likelihood that public support for these activities will suffer.”

As an example, he spoke of the provincial provision of child and family services, in which multiple agencies and organizations may be involved in one file.  While sharing information between parties may facilitate quick and effective service as well as allowing outcome measure analysis and overall system improvement, it is simultaneously dependent on secondary uses and implied consent.  This may result in clients of the system feeling surveilled and intruded upon – triggering the “creepy factor”.

Later in the event, during the panel discussion, panelists spoke about the importance of trust between citizen and government, and the risk that big data analysis can corrupt or diminish that trust.

The key question that emerged was how can we capitalize on the possibilities offered by big data, while ensuring both that we respect individual privacy and that individuals know themselves to be protected so that their trust relationship with government remains intact (or is even improved)?

I’m not sure this can reasonably be accomplished within current legislative parameters.  Big data will require law and policy developed with big data in mind. 

There needs to be a way to ensure that the information used is accurate and trustworthy – information that is collected from secondary sources, publicly available banks, automatically generated and/or created through data mining runs the risk of being inaccurate or incomplete.  Analytics may be hampered by a lack of information and a lack of context.  In addition, both the source data and resulting conclusions may reflect a range of problems. For example, it may disproportionately represent specific populations while excluding others. Conclusions may carry the implicit societal biases of their time, or be poorly collected. Results may be misinterpreted based on pseudo-scientific insights such as confusing correlation with causation.

Beamish suggested that an effective approach would necessarily include principle-based legislation governing both data linking and big data analytics.  He posited a combination of factors, including the creation of a central data institute with expertise in privacy, human rights, and data ethics; data minimization requirements; privacy impact and threat risk assessments; mandatory breach notification; and appropriate governance and audit oversight. 

As big data becomes more than just a buzzword, and analytics are increasingly integrated into both service delivery and system design and assessment, the need to address this challenge is becoming more necessary and urgent. 

The Commissioner’s remarks and the subsequent panel discussions were part of an increasingly important conversation, one in which we all must engage.  But engagement alone is not enough – it is time to explore and develop concrete policies and procedures. It’s time to set parameters and controls on big data—with the emphasis on enhancing the trust relationship between citizens and government as a guiding principle every step of the way.