In the world of today, there is a rising increase in the amount of data being collected each day. This data is further being divulged across several environments, sectors, and industries. In most scenarios, the process of collecting data is really not a big deal but the ability and technicality of managing the large amount of data collected become an unending struggle. The struggle to effectively manage the exploding volume of data has led to increased prominence in the ethical use of data. For this reason and more, it has caused the establishment of policies to ensure there is a check and balance as to how data is being shared with the sole purpose of protecting the integrity of people who own this information. In addition, people or organizations who are charged with the role of gathering, distributing, and using this data keep exploring the ethics of their practices and, in most cases, having to confront those ethics in the face of public criticism. When there is a dwindling decline in data management, ethically it will by no means negatively affect human lives and lead to a loss of trust in certain projects or deliverables, products, and relationships between organizations.
Data Ethics refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, especially when personal data is involved. It is becoming more relevant now because of the increase in how data is being generated and shared. For this reason, ethical considerations need to be put in place. There is a difference between it and Information Ethics because the former is more concerned with people who collect and divulge structured or unstructured data such as data brokers, governments, and large corporations while the latter focuses on issues more or less related to intellectual property and concerns relating to librarians, archivists, and information professionals.
Data Ethics can be concerned with the 6 major principles:
- Ownership – This refers to people who own or collect the data.
- Transaction Transparency – If information belonging to individuals of the program is used, they should have transparent access to the product the data is being used for
- Consent – Any individual whose data is going to be collected for onward usage, must be totally briefed and must give his blessing to it.
- Privacy – If data exchange occurs, protection and security measures must be in place to protect individuals who can be negatively affected by the exposure.
- Currency – Individuals should be in the know of any financial transactions resulting from the use of their personal data and the scale of these transactions.
- Openness – Aggregate data sets should be freely available.
Risks Associated With the Data Situation
- Business Considerations Related to Data Sharing
This is one very important risk to consider in a data situation. First and foremost, before data can be shared with outsiders of an organization or business, the data must be checked and validated such that it has no negative rebound effect on the organization. In this case, exposing the data to researchers could give room to concerns such as liability to government agencies, business costs, and thoughts pertaining to the loss of confidential business information. After the information has been gathered by the different government agencies and local authorities, and thereafter collated and transformed, deep analyses carried out by researchers into this new data could reveal certain loopholes in the dataset, if at all there is any, to begin with. One loophole that could possibly be identified, is one where let us say certain agencies have been tasked to provide basic amenities to families in the program and have failed to do so. Such a situation could have an adverse effect on the agencies in question. Furthermore, allowing access to these data could lead to researchers detecting if instances occurred in which the data gathering process was poorly conducted, it could lead to jeopardy in the outcome of the report. This as a whole could paint a bad image and reputation of the government organizations at large.
- Concerns About Adversarial Science
The term ‘adversarial science’ is defined as one that supports conflicting one-sided positions held by individuals, groups, or entire societies, as inputs into the conflict resolution situation, typically with rewards for prevailing in the outcome. There is a possibility of the data being exposed to a high number of researchers. The intentions of these independent researches could be diverse, some might come with the sole intention to either undermine the work and effort which has been put into collating the dataset or discredit the data and its source which could pose integrity issues to the government agencies in return.
Reidentification is a major risk in any given data situation. It is generally referred to as the process of participants being reidentified through deeper analyses on any given data especially in a case where they have been previously promised anonymity. In this case, once the dataset is made available to researchers by the government, there is no limit as to the amount of research they would explore to get a better understanding of the data and this could involve matching the data with other public or previously gained dataset which could lead to buried or covered information about certain people in the program. In most cases, researchers are not to blame because a lot of information is out in the open and by default is available to the public, so people who know where to look easily get access to information such as these. On another note, reidentification is nearly impossible to scrap as researchers need to fully understand the data at full capacity to implement whatever algorithm necessary to produce expected results. The thought of this makes releasing the dataset scary as this could pose an integrity issue on the part of the government because, for one, it exposes the participants of the program to certain risks or attacks, and two, it makes it difficult to attract more participant in the near future.
- Fears Regarding Misuse of Shared Data
Another risk to be conscious of is the way and manner in which the data would be used after it becomes accessible. There is every possibility of data being misused or used for another purpose different from what it was made available for. Many times researchers go on to use data from past projects for secondary purposes and this could have an adverse effect. Also, usage of data intermittently for private and personal study or what have you could make researchers fall into the situation of exposing the security of participants within the program. The fear of secondary usage of the dataset could cause an eruption of series of legal actions to be taken against the government.
- Third-party Breaches
There is a potential risk of third-party breaches where once data has been made accessible to researchers, it in one way or another falls into the hands of individuals outside the government agencies and researchers. As the data becomes available to more researchers, there is a possibility of it being accessed through these researchers by individuals without clearance or who should not have access to the data in the first place. In addition, due to the release of datasets by whichever means of communication (i.e. mail), the data is at risk of being hacked by potential hackers who are constantly in the business of hacking emails. The idea surrounding a possible leak of the data could be a potential problem.
Mitigating These Risks
This is a process of removing any information which might immediately cause researchers to identify a participant. De-identification involves masking or redacting parts of the dataset which might lead to the identification of a person. One way to mitigate the risk of potential recognition of participants in the program is to set up a committee that ensures the integrity of data collected for any mishaps. This committee serves as second-level support whose primary job function is to follow through on ensuring the process of de-identifying participants was successful. Information should be widely checked and scrutinized to remove all demeaning information and also make the dataset such that the information contained therein is sufficient enough to complete the task at hand, at the same time difficult to be merged with other data sources to produce meaningful information.
- Building Trust
Regulations do provide a minimum standard for behavior, and researchers need to do more than just what regulations mandate. Thus, data-sharing policies can provide scaffolding, but the research community needs to set standards of excellence and strive to meet those standards.
- Enhanced IT Security
All communications and exchange of data between government and researchers should be sent over encrypted channels which prevents a possible leak or hack. Adequate funds should be provided by the government to invest in IT security. If there is an existing IT security in place, it should be further empowered constantly during the course of the project to ensure IT firewalls are up to date.