anndict.AdataDict.add_stratification

anndict.AdataDict.add_stratification#

AdataDict.add_stratification(strata_keys, *, desired_strata=None)[source]#

Split each value of an AnnData dictionary into further subsets based on additional desired strata.

Parameters:
adata_dict AdataDict

An AdataDict

strata_keys list[str]

List of column names in adata.obs to use for further stratification.

desired_strata list | dict | None (default: None)

List of desired strata values or a dictionary where keys are strata keys and values are lists of desired strata values.

Return type:

AdataDict

Returns:

Nested AdataDict, where the top-level is now stratified by strata_keys.

Raises:

ValueError – If any of the strata_keys are already in the hierarchy.

Examples

Case 1: Build by Donor first, then add Tissue stratification after

import pandas as pd
import anndict as adt
from anndata import AnnData
# Create an example AnnData object
adata = AnnData(obs=pd.DataFrame({
>     "Donor": ["Donor1", "Donor1", "Donor2"],
>     "Tissue": ["Tissue1", "Tissue2", "Tissue1"]
> }))
# First, build an AdataDict grouped/stratified by Donor
strata_keys = ["Donor"]
adata_dict = adt.build_adata_dict(adata, strata_keys)
print(adata_dict)
> {
>     ("Donor1",): adata_d1,
>     ("Donor2",): adata_d2,
> }
# Then, add a stratification by ``Tissue``
strata_keys = ["Tissue"]
adata_dict.add_stratification(strata_keys)
print(adata_dict)
> {
>     ("Tissue1",) : {
>         ("Donor1",) : adata_d1_t1,
>         ("Donor2",) : adata_d2_t1
>         },
>     ("Tissue2",) : {
>         ("Donor1",) : adata_d1_t2
>         }
> }
# Note 1 If you wanted a new object instead of modifying the original ``adata_dict``, you can instead do:
new_adata_dict = adt.add_stratification(adata_dict, strata_keys)

# Note 2: we can always flatten or rearrange the nesting structure
adata_dict.set_hierarchy(["Donor","Tissue"])
print(adata_dict)
> {
>     ("Donor1", "Tissue1"): adata_d1_t1,
>     ("Donor1", "Tissue2"): adata_d1_t2,
>     ("Donor2", "Tissue1"): adata_d2_t1,
> }
# For example, if we want Donor as the top-level index
adata_dict.set_hierarchy(["Donor",["Tissue"]])
>             {
>     ("Donor1",) : {
>         ("Tissue1",) : adata_d1_t1,
>         ("Tissue2",) : adata_d1_t2
>         },
>     ("Donor2",) : {
>         ("Tissue1",) : adata_d2_t1
>         }
> }