Machine Learning in NeuroImaging (MLNI) is a python package that performs various tasks using neuroimaging data.
Machine Learning in NeuroImaging (MLNI) is a python package that performs various tasks using neuroimaging data: i) binary classification for disease diagnosis, following good practice proposed in AD-ML; ii) regression prediction, such as age prediction; and iii) semi-supervised clustering with HYDRA.
Ananconda allows you to install, run and update python package and their dependencies. We highly recommend the users to install Anancond3 on your machine. After installing Anaconda3, there are three choices to use MLNI.
We recommend the users to use Conda virtual environment:
1) conda create --name mlni python=3.6
Activate the virtual environment:
2) source activate mlni
Install other python package dependencies (go to the root folder of MLNI):
3) ./install_requirements.sh
Finally, we need install mlni from PyPi:
4) pip install mlni==0.1.4
After installing all dependencies in the requirements.txt file, go to the root folder of MLNI where the setup.py locates:
pip install -e .
This will allow you to run the software as command-line in the terminal. See an example here:
Advanced users can git clone the package locally and work from the source code:
python -m pip install git+https://github.com/anbai106/mlni.git
MLNI requires a specific input structure inspired by BIDS. Conventions for the group label/diagnosis: -1 represents healthy control (CN) and 1 represents patient (PT); categorical variables, such as sex, should be encoded to numbers: Female for 0 and Male for 1, for instance.For regression, simply replace the diagnosis column with the predicted variable, such as age in age prediction task.
Clustering MLNI clusters with ROI features in feature_tsv (covariate_tsv is optionally provided). Example for feature_tsv:
participant_id session_id diagnosis ROI1 ROI2 ...
sub-CLNC0001 ses-M00 -1 432.1 596.9
sub-CLNC0002 ses-M00 1 398.2 601.3
sub-CLNC0003 ses-M00 -1 412.0 567.3
sub-CLNC0004 ses-M00 -1 487.4 600.1
sub-CLNC0005 ses-M00 1 346.5 529.5
sub-CLNC0006 ses-M00 1 443.2 663.2
sub-CLNC0007 ses-M00 -1 450.2 599.3
sub-CLNC0008 ses-M00 1 443.2 509.4
Example for covariate_tsv:
participant_id session_id diagnosis age sex ...
sub-CLNC0001 ses-M00 -1 56.1 0
sub-CLNC0002 ses-M00 1 57.2 0
sub-CLNC0003 ses-M00 -1 43.0 1
sub-CLNC0004 ses-M00 -1 25.4 1
sub-CLNC0005 ses-M00 1 74.5 1
sub-CLNC0006 ses-M00 1 44.2 0
sub-CLNC0007 ses-M00 -1 40.2 0
sub-CLNC0008 ses-M00 1 43.2 1
Classification with ROI Note: For classification, nested feature selection has also been implemented for ROI-wise and voxel-wise features! Only feature_tsv is required. Example for feature_tsv:
participant_id session_id diagnosis ROI1 ROI2 ...
sub-CLNC0001 ses-M00 -1 432.1 596.9
sub-CLNC0002 ses-M00 1 398.2 601.3
sub-CLNC0003 ses-M00 -1 412.0 567.3
sub-CLNC0004 ses-M00 -1 487.4 600.1
sub-CLNC0005 ses-M00 1 346.5 529.5
sub-CLNC0006 ses-M00 1 443.2 663.2
sub-CLNC0007 ses-M00 -1 450.2 599.3
sub-CLNC0008 ses-M00 1 443.2 509.4
Classification with voxel-wise images Only participant_tsv is required. Example for participant_tsv for voxel-wise classification:
participant_id session_id diagnosis path ...
sub-CLNC0001 ses-M00 -1 path1
sub-CLNC0002 ses-M00 1 path2
sub-CLNC0003 ses-M00 -1 path3
sub-CLNC0004 ses-M00 -1 path4
sub-CLNC0005 ses-M00 1 path5
sub-CLNC0006 ses-M00 1 path6
sub-CLNC0007 ses-M00 -1 path7
sub-CLNC0008 ses-M00 1 path8
Classification with multi-scale ROI from SOPNMF After running images with SOPNMF, only participant_tsv is required as input. Example for participant_tsv for voxel-wise classification:
participant_id session_id diagnosis
sub-CLNC0001 ses-M00 -1
sub-CLNC0002 ses-M00 1
sub-CLNC0003 ses-M00 -1
sub-CLNC0004 ses-M00 -1
sub-CLNC0005 ses-M00 1
sub-CLNC0006 ses-M00 1
sub-CLNC0007 ses-M00 -1
sub-CLNC0008 ses-M00 1
Regression Note: For regression with ROI-wise features, please replace the diagnosis column with the predicted variable (e.g., age). Only feature_tsv is required. Example for feature_tsv:
participant_id session_id diagnosis ROI1 ROI2 ...
sub-CLNC0001 ses-M00 23 432.1 596.9
sub-CLNC0002 ses-M00 44 398.2 601.3
sub-CLNC0003 ses-M00 65 412.0 567.3
sub-CLNC0004 ses-M00 15 487.4 600.1
sub-CLNC0005 ses-M00 22 346.5 529.5
sub-CLNC0006 ses-M00 78 443.2 663.2
sub-CLNC0007 ses-M00 90 450.2 599.3
sub-CLNC0008 ses-M00 33 443.2 509.4
We offer a toy dataset in the folder of mlni/data.
from mlni.hydra_clustering import clustering
feature_tsv="mlni/data/test_feature.tsv"
output_dir = "PATH_OUTPUT_DIR"
k_min=2
k_max=8
cv_repetition=100
clustering(feature_tsv, output_dir, k_min, k_max, cv_repetition)
Note that the above example assume that the input features have been corrected by covariate effects, such as age and sex, if not, one can run:
from mlni.hydra_clustering import clustering
feature_tsv="mlni/data/test_feature.tsv"
output_dir = "PATH_OUTPUT_DIR"
k_min=2
k_max=8
cv_repetition=100
covariate_tsv="mlni/data/test_covariate.tsv"
clustering(feature_tsv, output_dir, k_min, k_max, cv_repetition, covariate_tsv=covariate_tsv)
from mlni.adml_classification import classification_roi
feature_tsv="mlni/data/test_feature.tsv"
output_dir = "PATH_OUTPUT_DIR"
cv_repetition=250
classification_roi(feature_tsv, output_dir, cv_repetition)
or
from mlni.adml_classification import classification_voxel
feature_tsv="mlni/data/test_feature_voxel.tsv"
output_dir = "PATH_OUTPUT_DIR"
cv_repetition=250
classification_voxel(feature_tsv, output_dir, cv_repetition)
from mlni.adml_regression import regression_roi
feature_tsv="mlni/data/test_feature_regression_age.tsv"
output_dir = "PATH_OUTPUT_DIR"
cv_repetition=250
regression_roi(feature_tsv, output_dir, cv_repetition)
:warning: Please let me know if you use this package for your publication; I will update your papers in the section of Publication using MLNI…
:warning: Please cite the software using the Cite this repository button on the right sidebar menu, as well as the original papers below for different tasks…
Wen, J., 2020. Reproducible evaluation of diffusion MRI features for automatic classification of patients with Alzheimer’s disease. Neuroinformatics, pp.1-22. doi:10.1007/s12021-020-09469-5 - Paper in PDF
Wen, J., 2024. The Genetic Architecture of Multimodal Human Brain Age. Nature Communications, pp.1-22. 10.1038/s41467-024-46796-6 - Paper in PDF
Varol, E., 2017. HYDRA: Revealing heterogeneity of imaging and genetic patterns through a multiple max-margin discriminative analysis framework. Neuroimage, 145, pp.346-364. doi:10.1016/j.neuroimage.2016.02.041 - Paper in PDF
Wen, J., 2021. Multi-scale semi-supervised clustering of brain images: deriving disease subtypes. MedIA. - Link
Wen, J., 2022. Characterizing Heterogeneity in Neuroimaging, Cognition, Clinical Symptoms, and Genetics Among Patients With Late-Life Depression. JAMA Psychiatry - Link
Lalousis, P.A., 2022. Neurobiologically Based Stratification of Recent Onset Depression and Psychosis: Identification of Two Distinct Transdiagnostic Phenotypes. Biological Psychiatry. - Link