scparadise.scnoah.oversample

Contents

scparadise.scnoah.oversample#

scparadise.scnoah.oversample(adata, celltype_keys, target_per_class=None, max_oversample_factor=7.0, min_oversample_cells=5, random_state=0)[source]#

Oversample some cell types in AnnData object. Returns adata_oversampled with updated matrix and adata_oversampled.obs with given celltypes levels and sample. If you give counts function returns counts. If you give normalized data function returns normalized data.

Parameters:
  • adata (AnnData) – Input dataset to be oversampled.

  • celltype_keys (list) – List of cell type annotations in adata.obs. Example: [‘lineage’, ‘cell type’, ‘cell state’]

  • target_per_class (int (default: None)) – Global target per cell type. If None, computed as average cells per cell type in celltype level with most cell types.

  • max_oversample_factor (float (default: 7.0)) – Upper bound on how much a small class may be expanded relative to its original size.

  • min_oversample_cells (int (default: 5)) – Minimal cell type size to allow substantial generation of new cells.

  • random_state (int (default: 0)) – Seed for random number generators to ensure reproducibility.

Returns:

New AnnData with oversampled minor cell types. Original and synthetic cells are concatenated.

Return type:

AnnData