anndict.annotate.cells.de_novo.ai_annotate

Contents

anndict.annotate.cells.de_novo.ai_annotate#

anndict.annotate.cells.de_novo.ai_annotate(func, adata, groupby, n_top_genes, new_label_column, tissue_of_origin_col=None, **kwargs)[source]#

Annotate clusters based on the top marker genes for each cluster.

This uses marker genes for each cluster and applies func to determine the label for each cluster based on the top n marker genes. The results are added to the AnnData object and returned as a DataFrame.

If rank_genes_groups hasn’t been run on the adata, this function will automatically run sc.tl.rank_genes_groups

Parameters:
func callable

A function that takes gene_list : list [ str ] and returns annotation : str.

adata AnnData

An AnnData object.

groupby str

Column in adata.obs to group by for differential expression analysis.

n_top_genes int

The number of top marker genes to consider for each cluster.

new_label_column str

The name of the new column in adata.obs where the annotations will be stored.

tissue_of_origin_col str (default: None)

Name of a column in adata.obs that contains the tissue of orgin. Used to provide context to the LLM.

**kwargs

additional kwargs passed to func

Return type:

DataFrame

Returns:

A pd.DataFrame with a column for the top marker genes for each cluster.

Notes

This function also modifies the input adata in place, adding annotations to adata.obs[new_label_col]