anndict.utils.sample_and_drop

anndict.utils.sample_and_drop#

anndict.utils.sample_and_drop(adata, strata_keys, min_num_cells=0, n_largest_groups=None, **kwargs)[source]#

Sample adata based on specified strata keys and drop strata with fewer than the min_num_cells. Can optionally retain only the n_largest_groups.

Parameters:

adata AnnData: An AnnData.
strata_keys list[str] | str: List of column names in adata.obs to use for stratification.
min_num_cells int (default: 0): Minimum number of cells required to retain a stratum.
n_largest_groups int | None (default: None): If specified, keep only the n_largest_groups.
kwargs: Additional keyword arguments passed to sample_adata() and sc.pp.subsample().

Return type:

AnnData

Returns:

Concatenated AnnData object after resampling and filtering.

Raises:

ValueError – If any of the specified strata_keys do not exist in adata.obs.

Notes

In the case of ties when selecting the largest groups, all tied groups are kept. So you may end up with more than n_largest_groups.