We need to urgently review our data governance frameworks
By Arturo Muente-Kunigami, Inter-American Development Bank
How have governments been dealing with the exponential increase in data worldwide? Pew Research estimates that over 5 billion people have mobile phones, and more than half of those are smart phones. Facebook has more than one and a half billion daily users, while Youtube has two billion active users per month. Online retail, which is already used by a quarter of the population and generated a sales volume of USD 4 trillion just last year, is producing loads of data as we speak. Add to that the humongous amount of unstructured data stemming from the so-called Internet of Things -from smartwatches to TVs to heart implants to street cameras and sensors. Even the public sector has a share in this big data wave, digitalizing paper-based processes and registries as part of growing efforts to introduce more effective and efficient digital public services.
We are still trying to find our way around the Internet’s long memory. Almost everything we do online leaves behind a “data footprint”. Every swipe, click, like, purchase, search, stream… is captured and stored somewhere. The big data wave rose so fast that many governments took a reactive approach, scrambling together different regulations and safeguards that more often than not acted like temporary band-aids serving specific concerns at specific points in time. With just few exceptions, most governments -even the most digitalized ones- are yet to develop policies that approach data in a holistic way.
For that to happen though, they need to take a step back and see the forest for the trees. In other words, countries need to start thinking about a holistic approach to data that brings together all data governance efforts under one single roof.
Wrong Tool for the Job: Vertical Governance for Horizontal Threats
Traditionally, data has been handled in sector-specific, “vertical silos”. Data from the transport sector is seen as different from energy-related data, which is handled separately from the data that stems from the education field. Each vertical silo has its own governance and its own data policies.
Worldwide data creation is increasing its speed exponentially. The United Nations’ report on the Data Revolution (2014) highlights the increase in traditional and new sources of data (surveys and mobile connections were used as proxys). And most of us have probably heard that 90% of existing data has been created in the last couple of years (the UK National Data Strategy team analyzed this statistic and concluded that even though it follows a well-documented trend, it is unverifiable). This growth has brought along potential risks though. The alarming increase of data breaches, as well as some growing concerns on data management in the public and private sectors, have given us a first glimpse of what could come in a fully digital world. Most security breaches did not care for geographic borders nor industry-specific silos, hitting financial services, health care providers, government agencies or retailers in equal ways.
In other words: we may have been regulating data on vertical silos, but the real threats have become increasingly horizontal.
The “Fragmented Geography” of the Data Stakeholders Land
Lately, different groups have raised specific “horizontal” concerns related to the impact that the digital data era has on our daily lives. These groups have clustered into different communities with specific incentives and motivations.
Take, for example, the Transparency and Access to Information camp, demanding that virtually all information should be made available to the public if requested (with few exceptions like national security and personal information). Close to their tent is the Open Data camp, requiring a proactive publication of all data (with similar exceptions) in machine readable formats and with an open license to both improve transparency and accountability while promoting innovation. We also have the Data Protection cluster, primarily concerned about the potential violation of rights and discrimination that could arise from unregulated collection and use of personal data. Next, a growing group of Technology Enthusiasts, mostly from the private sector, promotes the use of all data available to fuel new technologies (notably, artificial intelligence, big data analytics, and automation) under the promise of a more effective and efficient delivery of goods and services. This group includes a subset of more aggressive vendors buying unstructured data to target consumers with personalized proposals. Digital Government Authorities are promoting the concept of “data sharing” (interoperability) so that citizens don’t have to present information to one public agency if another agency already has it. And close by, Cybersecurity folks are prompting everyone (governments and private companies) to put in place security measures, highlighting the potential threats that arise from the mere existence of these huge pools of data and the harm that criminals could inflict to the economy should they get their hands on them. There sure are other groups that have not been listed.
All those groups have valid points of view, and yet they hardly talk to each other (granted, the competition for the attention from both donors and policymakers does not make for a collaborative environment). In fact, they have all intentionally tried not to encroach on each other’s territory. So far so good, right? But things are starting to get a little more complicated.
The Hidden Dangers of Algorithmic Decision-Making
As machines become more and more sophisticated, algorithms are making their way into the decision-making process of increasingly sensitive issues. For instance, insurance companies rely on algorithmic formulas to estimate which health coverage a user is entitled to and at what premium. In the US, there have been some experiments to test if algorithms are suitable to determine whether to release a criminal suspect on bail. The burning question here is: are we considering the biases those algorithmic recommendations may have? The European General Data Protection Regulation, which has quickly become a global reference, tries to mitigate algorithm-related biases by requiring that any natural person “shall have the right not to be subject to a decision based solely on automated processing, including profiling” (GDPR, Article 22). This regulation, however, still leaves plenty of room for interpretation.
Additionally, think about the access to personal information some initiatives require. For example, to find the different relationships required to improve medical diagnosis, algorithms may need very detailed information from many people, which could potentially be traced back to individuals. Anonymizing data, however, can be tricky: a 2000 paper argues that, for the 1990 US census, 87% (216 million of 248 million) of the population in the United States could likely be identified based only on their 5-digit ZIP code, gender, and date of birth. GDPR tries to address this by defining personal data as “any information relating to an identified or identifiable natural person (data subject); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” (GDPR, Article 4). Despite this, it is not hard to imagine many national artificial intelligence strategies supporting such innovative initiatives without further reflecting on whether the information they are sharing and using make individuals identifiable persons.
I recently had a conversation with a local government authority about the installation of security cameras across her city. Eventually, the technology would certainly enable them to identify individuals on the streets and scan license plates from any car. Should they do it? What can they do — if anything — to protect citizens’ privacy while using this technology to identify potential threats and/or investigate crimes?
The exponentially increasing availability, specificity and sheer volume of data around us is shattering the artificial boundaries that we have created around traditional data silos, as well as the sectoral walls that have traditionally shaped our understanding of policymaking. Transport stakeholders need access to the energy grid to evaluate the potential location of electric car charge stations, and local schools need to access health data to check on a student’s immunization records. Data creation and collection is cutting across sectors and borders, thus demanding a more coordinated approach both within and between countries.
Digital Rules Cannot Be Carved into Stone
How can countries get the most out of all the data being produced, promote innovation and effective and efficient delivery of goods and services while protecting basic human rights? We need rules of engagement that can adapt over time to a dynamic environment in constant “beta” mode. These new national strategies on data management should take into consideration at least the following elements: (i) governance, (ii) talent, and (iii) use of data.
Figure 1. Dimensions of a National Data Strategy
Data Governance Framework: Any National Data Strategy should include the rules of engagement for the collection, use and disposal of data, regardless of who produces it or whether it is a public or private institution. It should also establish the institutional arrangements that allow a better assignment of responsibilities along the data life cycle to guarantee quality and responsible use. Fostering the adoption of data standards and coordination among existing entities and agendas in and between countries is required too. Some countries have already appointed a “Chief Data Officer” in charge of drafting, proposing, and adapting data policies to this ever-changing environment.
Talent for the Data Ecosystem: Globally, the need for more data scientists has been stressed for a while. Any solid national data strategy should include programs from formal education to continuous learning (bootcamps, online courses, specific training courses, among others) aimed at addressing the need for data-related skills across the economy. Furthermore, data-related talent should be redefined to span beyond data scientists: Data privacy experts, evidence-based policy makers, technology experts, among others, should be mapped as data-related professions as well.
Widespread use of data: At the end of the day, the objective of any data strategy is to increase the use of data in an impactful and responsible way. Even though some data should not be processed at all (Article 9 of GDPR explicitly prohibits the processing of personal data revealing racial, religious beliefs and political opinions, among other), data use can indeed create economic opportunities and improve the quality of the design, implementation and evaluation of public policies. Programs that improve data usage within and between sectors, including the public sector, should be included.
Data Governance for a Data-driven World: Building the Plane while Flying it
Data has an undeniable value. All of us can benefit with the realization of this value. However, we also need to do it in a responsible, articulated and coordinated way. How to address this “Whole of Data” approach, aimed at keeping the delicate balance between innovation and protection of rights? The United Kingdom has been one of the first countries to start the process for a National Data Strategy. The D9 Group, a group of the most digitalized countries in the world currently chaired by Uruguay, is working on their “Data 360º Initiative”, which seeks to develop a holistic approach of data management in the public administration.
Definitely, the evolution of the response of governments to the growing data economy is a road that is being made as we go. But something is clear: stakeholders around data, both inside and outside the government, need to better coordinate in an open and collaborative way. And for many countries, the development of a National Data Strategy can be the natural first step.