anndict.utils.sample_and_drop

anndict.utils.sample_and_drop#

anndict.utils.sample_and_drop(adata, strata_keys, min_num_cells=0, n_largest_groups=None, **kwargs)[source]#

Sample adata based on specified strata keys and drop strata with fewer than the min_num_cells. Can optionally retain only the n_largest_groups.

Parameters:
adata AnnData

An AnnData.

strata_keys list[str] | str

List of column names in adata.obs to use for stratification.

min_num_cells int (default: 0)

Minimum number of cells required to retain a stratum.

n_largest_groups int | None (default: None)

If specified, keep only the n_largest_groups.

kwargs

Additional keyword arguments passed to sample_adata() and sc.pp.subsample().

Return type:

AnnData

Returns:

Concatenated AnnData object after resampling and filtering.

Raises:

ValueError – If any of the specified strata_keys do not exist in adata.obs.

Notes

In the case of ties when selecting the largest groups, all tied groups are kept. So you may end up with more than n_largest_groups.