A novel approach to big data analysis allowed a group of researchers from the Center for Complexity and Biosystems (CC&B) of the University of Milan to identify a sort of genetic signature shared by obesity, breast cancer and diabetes.
Obesity is increasing worldwide, with impressive data showing that about 10% of children are overweight or obese in USA and Europe. From the medical point of view, obesity is overtaking smoking as the leading cause of premature death. In fact, obesity contributes to more than 70% of diabetes cases and it has been seen associated to some types of tumours, such as breast cancer.
The link between obesity, diabetes and breast cancer is based on clinical and epidemiological evidence, but a strong confirmation from gene expression data is still lacking. This is mainly due to the high variability between patients and the limits of in vitro models, but also to the massive amount of noise that is unavoidable in any available data set, which makes difficult to reveal a clear signature from a large set of genes.
“The huge amount of experiments in the biomedical field allowed to establish public databases that gather a large quantity of biological data”, says Caterina La Porta – member of the CC&B and professor of General Pathology at the Department of Environmental Sciences and Policy of the University of Milan – who coordinated the research, just published on NPJ Systems Biology and Applications. “Merging data sets from different studies would be extremely useful to extract relevant information, but it is difficult because of what we call the batch effects”, explains La Porta. “Each experiment introduces a bias in the data that is due to technical processing but is unrelated to biological factors. This means that this noise can mask any biological differences when comparing samples coming from distinct batches, and this is a problem”.
A problem that the researchers coordinated by La Porta tried to alleviate with a new approach, based on the combination of two techniques called singular value decomposition and pathway deregulation analysis. By doing so, they manage to identify 38 genes that are differentially expressed in adipocytes coming from obese and lean subjects. Furthermore, this signature appears to be specific to the biological condition of obesity and is not linked to the gender of the subjects.
These genes are mainly linked to inflammation and immunity and well-known complications of obesity such as type 2 diabetes or fertility. Moreover, by comparing data from breast cancer tissue with healthy breast tissue, they were find to be similarly deregulated in breast cancer and obesity, confirming the strong association between the two. Some of them might thus represent interesting biomarkers for further studies on these topics, or even for prognostic purposes.
“The strength of our work comes from the use of appropriate filtering and noise reduction methods that allow to mitigate batch effects. This general strategy can be naturally extended to other pathological conditions, providing a clear avenue to analyse the massive amount of data accumulating in the biomedical literature”, concludes La Porta. “In this case, our approach allowed us to detect a list of genes characteristic of obesity, which are also associated to type 2 diabetes and breast cancer, with a degree of precision similar to that used to identify the Higgs Boson”.