scparadise.scadam.train_tuned#
- scparadise.scadam.train_tuned(adata, celltype_keys, layer=None, path='', path_tuned='', model_name='scAdam_model_tuned', test_size=0.2, eval_metric=['accuracy', 'balanced_accuracy'], strategy='linear_offset', batch_size=None, epochs=None, patience=None, nc=None, nb=None, nh=None, ed_nh_ratio=None, ff_hd=None, classifier_hd=None, dropout=None, lr=None, weight_decay=None, adaptive_loss=True, use_augmentation=None, aug_probability=None, prob=None, noise_std=None, dropout_aug=None, alpha=None, from_unsupervised=None, pretrain_epochs=None, pretrain_data=None, device='auto', random_state=0, return_model=False, unknown_detection=True, verbose=True)[source]#
Train custom scAdam model with tuned hyperparameters. The function automatically uses the configured hyperparameters. However, you can change any hyperparameter by passing it via the corresponding parameter.
- Parameters:
adata (AnnData) – Dataset with cell type annotations in adata.obs
path (str, path object) – Path to create a model folder containing the training history, cell annotation dictionary, and genes used for training.
path_tuned (str, path object) – Path to folder with tuned parameters by ‘scparadise.scadam.hyperparameter_tuning’ function.
model_name (str (default: 'scAdam_model_tuned')) – Name of a folder to save model.
celltype_keys (list) – List of cell type annotations in adata.obs. Example: [‘lineage’, ‘cell type’, ‘cell state’]
layer (str (default: None)) – If specified, use adata.layers[layer] for expression values instead of adata.X.
test_size (float or int (default: 0.2)) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test cells.
epochs (int (default: None)) – Maximum number of epochs for scAdam model training. If specified, the specified value is used.
eval_metric (str or list (default: ['accuracy', 'balanced_accuracy'])) – Available evaluation metrics:’accuracy’, ‘balanced_accuracy’, ‘f1_score’. The last metric is used as the target and for early stopping.
batch_size (int, (default: None)) – Number of examples per batch. If specified, the specified value is used.
patience (int (default: None)) – Number of consecutive epochs without improvement before performing early stopping. If patience is set to 0, then no early stopping will be performed. Note that if patience is enabled, then best weights from best epoch will automatically be loaded at the end of the training. If specified, the specified value is used.
use_augmentation (bool (default: None)) – Use data augmentation or not. If specified, the specified value is used.
aug_probability (float (default: None)) – The probability of applying augmentation to a batch. If specified, the specified value is used.
prob (float (default: None)) – Gene masking probability. If specified, the specified value is used.
noise_std (float (default: None)) – Gaussian noise standard deviation. If specified, the specified value is used.
dropout_aug (float (default: None)) – Dropout probability for simulating technical noise. If specified, the specified value is used.
alpha (float (default: None)) – Alpha parameter for mixup augmentation. If specified, the specified value is used.
nc (int (default: None)) – Number of chunks for genes from adata. If specified, the specified value is used.
nb (int (default None)) – Number of blocks in scAdam model. If specified, the specified value is used.
nh (int (default: None)) – Number of heads in scAdam model attention mechanism. If specified, the specified value is used.
ed_nh_ratio (int (default: None)) – Used for calculating embedding dimensionality (‘ed’) from ‘nh’. If specified, the specified value is used.
ff_hd (int (default: None)) – Number of nodes in each scAdam model layer in feed forward network. If specified, the specified value is used.
classifier_hd (int (default: None)) – Number of nodes in scAdam classifier. If specified, the specified value is used.
dropout (float (default: None)) – Portion of neurons that temporarily ignored during training (prevents overfitting). If specified, the specified value is used.
lr (float (default: None)) – Determines the step size at each iteration while moving toward a minimum of a loss function. If specified, the specified value is used.
weight_decay (float (default: None)) – Weight decay coefficient. If specified, the specified value is used.
from_unsupervised (bool (default: None)) – Use a previously self supervised model as starting weights. Supervised model training included in function.
pretrain_epochs (int (default: None)) – Number of pretraining epochs.
pretrain_data (AnnData (default: None)) – Anndata with the same expression matrix for unsupervised pretraining. Additional Anndata may not contain cell type annotations.
device (str (default: 'auto')) – Type of device to use in training model (‘cpu’, ‘cuda’). Set ‘auto’ for automatic selection.
random_state (int (default: 0)) – Controls the data shuffling, splitting to folds and model training. Pass an int for reproducible output across multiple function calls.
verbose (bool (default: True)) – Show progress bar for each epoch during training.
return_model (bool (default: False)) – Return model after training or not.