anndict.ai_unify_labels

anndict.ai_unify_labels#

anndict.ai_unify_labels(adata_dict, label_columns, new_label_column, simplification_level='unified, typo-fixed')[source]#

Unifies cell type labels across multiple AnnData objects by mapping them to a simplified, unified set of labels.

Parameters:
adata_dict AdataDict

An AdataDict.

label_columns dict[tuple[str, ...], str]

dict where keys should be the same as the keys of adata_dict and values are the column names in the corresponding adata.obs containing the original labels.

new_label_column str

Name of the new column to be created in each adata.obs for storing the unified labels.

simplification_level str (default: 'unified, typo-fixed')

Instructions on how to unify the labels.

Return type:

dict

Returns:

A mapping dict where the keys are the original labels and the values are the unified labels.

Notes

Modifies each adata in adata_dict in-place by adding adata.obs[new_label_column] with the unified label mapping.

Example

#import package
import anndict as adt

#configure LLM backend
configure_llm_backend('your-provider-name','your-provider-model-name',api_key='your-provider-api-key')

#load the data as an AdataDict
adata_paths = ['path/to/adata1.h5ad', 'path/to/adata2.h5ad']
adata_dict = adt.read_adata_dict_from_h5ad(adata_paths)

#define the label columns for each adata
label_columns = {
    ('adata1',): 'cell_type',
    ('adata2',): 'cell_type_label', # this could be different in each adata
}
new_label_column = 'unified_cell_type'

#unify the labels across the adata
mapping_dict = adt.ai_unify_labels(
    adata_dict,
    label_columns=label_columns,
    new_label_column=new_label_column,
    simplification_level="unified, typo-fixed"
)

# Now each adata in adata_dict has a new column 'unified_cell_type'

# Write the adata_dict to disk (an adata_dict on disk is just a directory containing .h5ad files)
adt.write_adata_dict(adata_dict, 'path/to/unified_adata_dict/')