scparadise.sceve.train#
- scparadise.sceve.train(mdata, first_modality_name, second_modality_name, first_layer=None, second_layer=None, detailed_annotation=None, path='', model_name='scEve_model', test_size=0.2, epochs=200, eval_metric=['rmse'], batch_size=128, patience=10, use_augmentation=True, aug_probability=0.5, prob=0.15, noise_std=0.1, dropout_aug=0.1, alpha=0.2, nc=4, nb=4, nh=8, ed_nh_ratio=32, ff_hd=512, regressor_hd=512, dropout=0.1, lr=0.0001, weight_decay=0.0001, device='auto', random_state=0, verbose=True, return_model=False)[source]#
Train custom scEve model using MuData object with different modalities.
- Parameters:
mdata (MuData) – MuData object.
path (str, path object) – Path to create a model folder containing the training history, cell annotation dictionary, and genes used for training.
model_name (str (default: 'model_annotation')) – Name of a folder to save model.
first_modality_name (str (default: 'rna')) – Name of first modality in MuData object.
second_modality_name (str (default: 'prot')) – Name of second modality in MuData object.
first_layer (str (default: None)) – If specified, use mdata.mod[first_modality_name].layers[first_layer] for expression values instead of mdata.mod[first_modality_name].X.
second_layer (str (default: None)) – If specified, use mdata.mod[second_modality_name].layers[second_layer] for expression values instead of mdata.mod[second_modality_name].X.
detailed_annotation (str, (default: None)) – The most detailed level of cell annotation. Key in mdata.obs dataframe. If given may increase model evaluation score.
test_size (float or int (default: 0.2)) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test cells.
epochs (int (default: 200)) – Maximum number of epochs for scEve model training
eval_metric (list (default: ['rmse'])) – The metric is used as the target and for early stopping. The last metric is used as the target and for early stopping. Available metrics: ‘mse’, ‘mae’, ‘rmse’, ‘rmsle’.
batch_size (int, (default: 128)) – Number of examples per batch.
patience (int (default: 10)) – Number of consecutive epochs without improvement before performing early stopping. If patience is set to 0, then no early stopping will be performed. Note that if patience is enabled, then best weights from best epoch will automatically be loaded at the end of the training.
use_augmentation (bool (default: True)) – Use data augmentation or not.
aug_probability (float (default: 0.5)) – The probability of applying augmentation to a batch
prob (float (default: 0.15)) – Gene masking probability.
noise_std (float (default: 0.1)) – Gaussian noise standard deviation.
dropout_aug (float (default: 0.1)) – Dropout probability for simulating technical noise.
alpha (float (default: 0.2)) – Alpha parameter for mixup augmentation.
nc (int (default: 4)f) – Number of chunks for genes from adata.
nb (int (default 4)) – Number of blocks in scEve model.
nh (int (default: 8)) – Number of heads in scEve model attention mechanism.
ed_nh_ratio (int (default: 32)) – Used for calculating embedding dimensionality (‘ed’) from ‘nh’. Default ed = nh * ed_nh_ratio = 8 * 32 = 256.
ff_hd (int (default: 512)) – Number of nodes in each scEve model layer in feed forward network.
regressor_hd (int (default: 256)) – Number of nodes in scEve regressor.
dropout (float (default: 0.3)) – Portion of neurons that temporarily ignored during training (prevents overfitting).
lr (float (default: 1e-4)) – Determines the step size at each iteration while moving toward a minimum of a loss function.
weight_decay (float (default: 1e-4)) – Weight decay coefficient.
device (str (default: 'auto')) – Type of device to use in training model (‘cpu’, ‘cuda’). Set ‘auto’ for automatic selection.
random_state (int (default: 0)) – Controls the data shuffling, splitting to folds and model training. Pass an int for reproducible output across multiple function calls.
verbose (bool (default: True)) – Show progress bar for each epoch during training.
return_model (bool (default: False)) – Return model after training or not.