fbpx A desert or a tsunami: fearing Big Data | Science in the net

A desert or a tsunami: fearing Big Data

Primary tabs

Read time: 3 mins

What will be the data environment in Horizon 2020? Modern science is facing a huge challenge in managing data. On one hand, every year a big number of the scientific data are lost or become not readable any more. On the other hand, 80 billion Euro investments in research (that is the expected budget for Horizon 2020) could lead to an unbearable amount of data, that cannot be managed easily – with a loss of efficiency and a huge waste of public money. Eventually, one of the highest goal of Horizon 2020 will be to face Big Data.

 

Data loss

80% of data from scientific articles are lost within 20 years. This is the main finding of a study published in December 2013 by the University of British Columbia (Canada). These results immediately opened up a large debate, as they outlined the “dark side” of the technical and technological progress in data storage. As a matter of fact, the study outlined that:

“For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.”

The study suggested that this could be seen as an economic problem, too. Parts of research funds are used in finding data that probably have been already discovered, but locked somewhere. This is particularly true for Western countries, where storage technologies are developed and replaced at a fast rate. Who's still using floppy disk? But how many scientists started collecting data with them?

The geopolitical aspect is not the only one. The Canadian institution that conducted the research went further: among others, ecology and medical related data are those most affected areas, due to their dependence on historical datasets. Ecology and medicine: the cross-cutting issues Horizon 2020 deals most with. But, as seen before, Big Data do not means only loosing old data, but also being overwhelmed by new ones.

 

Data Tsunami

The development of sophisticated measuring instruments creates large sets of findings, that have to be stored, managed and reused. And, nowadays, the scientific community have not found a standard procedure for all the process. Moreover, analysts are fearing what is called “data tsunami”. A data tsunami is generated when a small amount of data, reused, reprocessed or simply linked each other creates a new, huge amount of data. This, reproduced for thousands times, can create something unbearable to manage.

Dealing with 80 billion Euro budget, Horizon 2020 is really fearing a data tsunami, or, at least, an enormous waste of money, by being unprepared to disseminate the collected data (data loss). So, authorities and institutions are attempting to set a new standard in the life circle of scientific research and data managing.

 

A living experiment

The European Union is conscious of these challenges of loss or tsunami of data. They both affect the economy of H2020 in terms of efficiency and money. So, the approach of H2020 is multiple. It involves research on new technologies (mainly digital infrastructure) through the e-Infrastructures research funding. It involves rules to give grant, as some call will explicitly ask a Data Management Plan. It involves policy making, as the H2020 Open Access policy had been discussed for long time, mainly about the balance between open access and patenting the scientific discoveries. Will it work? It is impossible to predict. Anyway, at the moment, Horizon 2020 can be seen as a unique experiment in which, maybe for the first time, the Big Data issue had been faced seriously.

Articoli correlati

Scienza in rete è un giornale senza pubblicità e aperto a tutti per garantire l’indipendenza dell’informazione e il diritto universale alla cittadinanza scientifica. Contribuisci a dar voce alla ricerca sostenendo Scienza in rete. In questo modo, potrai entrare a far parte della nostra comunità e condividere il nostro percorso. Clicca sul pulsante e scegli liberamente quanto donare! Anche una piccola somma è importante. Se vuoi fare una donazione ricorrente, ci consenti di programmare meglio il nostro lavoro e resti comunque libero di interromperla quando credi.


prossimo articolo

Why have neural networks won the Nobel Prizes in Physics and Chemistry?

This year, Artificial Intelligence played a leading role in the Nobel Prizes for Physics and Chemistry. More specifically, it would be better to say machine learning and neural networks, thanks to whose development we now have systems ranging from image recognition to generative AI like Chat-GPT. In this article, Chiara Sabelli tells the story of the research that led physicist and biologist John J. Hopfield and computer scientist and neuroscientist Geoffrey Hinton to lay the foundations of current machine learning.

Image modified from the article "Biohybrid and Bioinspired Magnetic Microswimmers" https://onlinelibrary.wiley.com/doi/epdf/10.1002/smll.201704374

The 2024 Nobel Prize in Physics was awarded to John J. Hopfield, an American physicist and biologist from Princeton University, and to Geoffrey Hinton, a British computer scientist and neuroscientist from the University of Toronto, for utilizing tools from statistical physics in the development of methods underlying today's powerful machine learning technologies.