fbpx A desert or a tsunami: fearing Big Data | Science in the net

A desert or a tsunami: fearing Big Data

Read time: 3 mins

What will be the data environment in Horizon 2020? Modern science is facing a huge challenge in managing data. On one hand, every year a big number of the scientific data are lost or become not readable any more. On the other hand, 80 billion Euro investments in research (that is the expected budget for Horizon 2020) could lead to an unbearable amount of data, that cannot be managed easily – with a loss of efficiency and a huge waste of public money. Eventually, one of the highest goal of Horizon 2020 will be to face Big Data.

 

Data loss

80% of data from scientific articles are lost within 20 years. This is the main finding of a study published in December 2013 by the University of British Columbia (Canada). These results immediately opened up a large debate, as they outlined the “dark side” of the technical and technological progress in data storage. As a matter of fact, the study outlined that:

“For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.”

The study suggested that this could be seen as an economic problem, too. Parts of research funds are used in finding data that probably have been already discovered, but locked somewhere. This is particularly true for Western countries, where storage technologies are developed and replaced at a fast rate. Who's still using floppy disk? But how many scientists started collecting data with them?

The geopolitical aspect is not the only one. The Canadian institution that conducted the research went further: among others, ecology and medical related data are those most affected areas, due to their dependence on historical datasets. Ecology and medicine: the cross-cutting issues Horizon 2020 deals most with. But, as seen before, Big Data do not means only loosing old data, but also being overwhelmed by new ones.

 

Data Tsunami

The development of sophisticated measuring instruments creates large sets of findings, that have to be stored, managed and reused. And, nowadays, the scientific community have not found a standard procedure for all the process. Moreover, analysts are fearing what is called “data tsunami”. A data tsunami is generated when a small amount of data, reused, reprocessed or simply linked each other creates a new, huge amount of data. This, reproduced for thousands times, can create something unbearable to manage.

Dealing with 80 billion Euro budget, Horizon 2020 is really fearing a data tsunami, or, at least, an enormous waste of money, by being unprepared to disseminate the collected data (data loss). So, authorities and institutions are attempting to set a new standard in the life circle of scientific research and data managing.

 

A living experiment

The European Union is conscious of these challenges of loss or tsunami of data. They both affect the economy of H2020 in terms of efficiency and money. So, the approach of H2020 is multiple. It involves research on new technologies (mainly digital infrastructure) through the e-Infrastructures research funding. It involves rules to give grant, as some call will explicitly ask a Data Management Plan. It involves policy making, as the H2020 Open Access policy had been discussed for long time, mainly about the balance between open access and patenting the scientific discoveries. Will it work? It is impossible to predict. Anyway, at the moment, Horizon 2020 can be seen as a unique experiment in which, maybe for the first time, the Big Data issue had been faced seriously.


Scienza in rete è un giornale senza pubblicità e aperto a tutti per garantire l’indipendenza dell’informazione e il diritto universale alla cittadinanza scientifica. Contribuisci a dar voce alla ricerca sostenendo Scienza in rete. In questo modo, potrai entrare a far parte della nostra comunità e condividere il nostro percorso. Clicca sul pulsante e scegli liberamente quanto donare! Anche una piccola somma è importante. Se vuoi fare una donazione ricorrente, ci consenti di programmare meglio il nostro lavoro e resti comunque libero di interromperla quando credi.


prossimo articolo

The embarrassing Covid

There's a certain discomfort in having to talk about Covid-19 again, as demanded by the increase in cases that is also recorded in Italy. The reason, writes epidemiologist Stefania Salmaso, might perhaps be identified in the lack of a transparent process that allows understanding on what basis the health authority formulates recommendations, leading to reliance on pundits.

Image credits: visuals/Unsplash

“An embarrassing Covid-19”: It could be the title of a short story by Calvino or a rhyme by Rodari, but it's what we're witnessing these days. The increasing frequency of SARS-CoV-2 infections in various areas of the world, including Italy, has forced mainstream media to address it again. However, it's often discussed with a sort of embarrassment and only for the sake of reporting. When discussing possible countermeasures, the discomfort becomes even more apparent.