anndict.annotate.ai_annotate_biological_process

anndict.annotate.ai_annotate_biological_process#

anndict.annotate.ai_annotate_biological_process(adata, groupby, n_top_genes, new_label_column='ai_biological_process')[source]#

Annotate biological processes based on the top n marker genes for each cluster.

This function performs differential expression analysis to identify marker genes for each group in adata.obs[groupby] and labels the list of genes with the biological process it represents. The results are added to adata and returned as a DataFrame.

Parameters:

adata AnnData: An AnnData object.
groupby str: Column in adata.obs to group by for differential expression analysis.
n_top_genes int: The number of top marker genes to consider.
new_label_column str (default: 'ai_biological_process'): The name of the new column in adata.obs where the biological process annotations will be stored.

Return type:

DataFrame

Returns:

A pd.DataFrame with a column for the top marker genes for each cluster.

Notes

This function also modifies the input adata in place, adding annotations to adata.obs[new_label_col]

Examples

import anndict as adt

# This will annotate the treatment group with biological processes based on the top 10 differentially expressed genes in each group
adt.ai_annotate_biological_process(adata, groupby='treatment_vs_control',
    n_top_genes=10, new_label_column='ai_biological_process')

adata.obs['ai_biological_process']
>>> ['immune response', 'cell cycle regulation', ...]

anndict.annotate.ai_annotate_biological_process

Contents

anndict.annotate.ai_annotate_biological_process#