Implementation and experiments for "Revealing heterogeneity of brain imaging phenotypes in Alzheimer's disease based on unsupervised clustering of blood protein profiles". Preprint at: bioRxiv.
For any issues with the code, contact me at gerard.marti(at)upf.edu.
This project is under the GNU GPLv3 license.
Repository of the code implementing the paper “Martí-Juan G, Sanroma G, Piella G et al. Revealing heterogeneity of brain imaging phenotypes in Alzheimer’s disease based on unsupervised clustering of blood marker profiles. PLoS ONE [In revision]. 2018 Jun 5. Available from: https://doi.org/10.1101/339614”
Alzheimer’s disease (AD) is a neurodegenerative pathology, which degenerates the brain and causes cognitive deterioration and loss of memory. It is one of the largest problems in public health in the world, and while many efforts have been inverted into studying the disease, causes and progression paths, many things about the disease are still not known.
One interesting area to explore is disease subpying. Does the disease behave differently between patients? If so, why? Answering those two questions mean:
This project tackles those two questions. We apply a non-supervised clustering technique [1] [2] over a space of blood markers, which are not typically used to detect the disease but are inexpensive and easy to obtain.
Detecting relevant subtypes of the disease could lead to a more personalized early treatment. Moreover, if the characteristics defining those subtypes are non-invasive markers, we could be closer to non-invasive testing and could gain more understanding of the hidden processes in the disease.
All code under the GNU GPL license. SIMLR/CIMLR code forked from (https://github.com/BatzoglouLabSU/SIMLR), their license applies.
Python 2.7+ is required. Jupyter is required to run the notebooks.
Packages:
Matlab engine for Python is also required, as well as having MATLAB installed, for the clustering simulation.
Freesurfer is also required for the cortical experiments, and its fsPalm extension.
Data used is gathered from ADNI database. Data is available upon request. Due to the use agreement of ADNI, data cannot be redistributed, researchers have to ask for access to the data directly to ADNI. File named subjects.csv
contains a list of patients used in the paper for reproducibility.
Files needed for the experiment are as follows, all available in the ADNI website:
Place the corresponding files in the data/ directory.
Run data/Data_preparation.ipynb
to generate a file with the covariate data needed for the clustering. Script can be modified to include diferent covariates/patients.
Define a config file with the experiment parameters.
python simlr-ad.py --config_file configs/config_cimlr.ini --clusters 4 --output_directory_name test --cimlr
A new folder will be created in the folder defined in the configuration, with the name you have chosen, containing the results.
[1]: Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., & Batzoglou, S. (2017). Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning. Nature Methods, 14(4), 414–416. http://doi.org/10.1038/nMeth.4207 [2]: 1. Ramazzotti D, Lal A, Wang B, Batzoglou S, Sidow A. Multi-omic tumor data reveal diversity of molecular mechanisms underlying survival. bioRxiv. 2018:267245. doi:10.1101/267245