scparadise.scadam.hyperparameter_tuning#
- scparadise.scadam.hyperparameter_tuning(adata, celltype_keys, path='', layer=None, model_name='scAdam_model_tuning', storage='scadam_model_tuning.db', study_name='study', load_if_exists=True, eval_metric=['balanced_accuracy'], strategy='linear_offset', device='auto', tune_params='auto', adaptive_loss=True, num_trials=100, n_splits=5, epochs=None, patience=None, batch_size=None, use_augmentation=None, aug_probability=None, prob=None, noise_std=None, dropout_aug=None, alpha=None, nc=None, nb=None, nh=None, ed_nh_ratio=None, ff_hd=None, classifier_hd=None, dropout=None, lr=None, weight_decay=None, from_unsupervised=None, pretrain_epochs=None, pretrain_data=None, random_state=0, verbose=True)[source]#
Hyperparameter tuning for scAdam model with k-fold cross validation using Optuna.
- Parameters:
adata (AnnData) – Dataset with cell type annotations in adata.obs
path (str, path object) – Path to create a folder with best hyperparameters, dictionary of cell annotations and genes used for hyperparameters optimization.
celltype_keys (list) – List of cell type annotations in adata.obs. Example: [‘lineage’, ‘cell type’, ‘cell state’]
layer (str (default: None)) – If specified, use adata.layers[layer] for expression values instead of adata.X.
model_name (str (default: 'scAdam_model_tuning')) – Name of a folder to save tuned hyperparameters.
storage (str (default: 'scadam_model_tuning.db')) – Database URL. If this argument is set to None, in-memory (RAM) storage is used, and the study will not be persistent. We don’t recommend to use in-memory (RAM) storage to save optimization progress.
study_name (str (default: 'study')) – Study’s name. If this argument is set to None, a unique name is generated automatically.
load_if_exists (bool (default: True)) – Flag to control the behavior to handle a conflict of study names. In the case where a study named study_name already exists in the storage, a DuplicatedStudyError is raised if load_if_exists is set to False. Otherwise, the creation of the study is skipped, and the existing one is returned. If the value is True, allows hyperparameter tuning to continue if interrupted (keyboard interrupt, or OS update).
eval_metric (str or list (default: ['balanced_accuracy'])) – Available evaluation metrics:’accuracy’, ‘balanced_accuracy’, ‘f1_score’. The last metric is used as the target and for early stopping.
num_trials (int (default: 100)) – The number of trials to get optimized hyperparameters for model training.
n_splits (int (default: 5)) – The number of data splits (folds) per trial. The data is divided into n_splits parts, where each part in turn is validation data, and the rest is training data. The number of folds determines the test_size. If n_splits = 5, then test_size = 0.2. If n_splits = 4, then test_size = 0.25.
adaptive_loss (bool (default: True)) – If True, enables adaptive weighting of hierarchical loss levels: each level’s loss is tracked over training and its contribution to the total loss is increased if this level remains hard (high loss) and decreased if it becomes easy (low loss). This helps the model focus more on poorly performing levels of the hierarchy instead of weighting all levels equally. If False, all levels are summed with a fixed weight of 1.0.
strategy (str (default: 'linear_offset')) – Weighting strategy for different cell type annotation levels. The following weighting strategies are available: linear, exponential, linear_offset, equal, last. linear: linear increase in weight from level to level. exponential: exponential increase in weight from level to level linear_offset: linear increase in weight from level to level with offset. equal: equal weight for all cell type annotation levels. last: uses only last cell type annotation for model evaluation.
device (str (default: 'auto')) – Type of device to use in training model (‘cpu’, ‘cuda’). Set ‘auto’ for automatic selection.
tune_params (dict or 'auto' (default: 'auto')) – Dict specifying search spaces or “auto” to use built‑in defaults. Ranges and step for scAdam model and training parameters. Default tuning parameters are available using ‘scparadise.scadam.get_default_tune_params’. The differences between setting parameters are available in ‘?scparadise.scadam.get_default_tune_params’. For a description of the parameters, see the ‘scparadise.scadam.train’ function.
batch_size (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
epochs (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
patience (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
nc (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
nb (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
nh (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
ed_nh_ratio (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
ff_hd (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
classifier_hd (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
dropout (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
lr (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
weight_decay (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
use_augmentation (bool or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
aug_probability (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
prob (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
noise_std (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
dropout_aug (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
alpha (float or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
from_unsupervised (bool or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed. Supervised model training included in function.
pretrain_epochs (int or None (default: None)) – If a value is specified, then the tuning of this parameter will not be performed.
pretrain_data (AnnData) – Anndata with the same expression matrix for unsupervised pretraining. Additional Anndata may not contain cell type annotations.
random_state (int (default: 0)) – Controls the data shuffling, splitting to folds and model training. Pass an int for reproducible output across multiple function calls.
verbose (bool (default: True)) – Show progress bar for each trail during hyperparameter tuning.