anndict.annotate.cells.de_novo.ai_annotate_by_comparison

anndict.annotate.cells.de_novo.ai_annotate_by_comparison#

anndict.annotate.cells.de_novo.ai_annotate_by_comparison(func, adata, groupby, n_top_genes, new_label_column, cell_type_of_origin_col=None, tissue_of_origin_col=None, **kwargs)[source]#

Annotate clusters based on the top marker genes for each cluster, in the context of the other clusters’ marker genes.

This uses marker genes for each cluster and applies func to determine the label for each cluster based on the top n marker genes. The results are added to the AnnData object and returned as a DataFrame.

If rank_genes_groups hasn’t been run on the adata, this function will automatically run sc.tl.rank_genes_groups

Parameters:
func callable

A function that takes gene_lists : list [ list [ str ] ] and

returns annotations : list [ str ], one for each list of genes in gene_lists.

adata AnnData

An AnnData object.

groupby str

Column in adata.obs to group by for differential expression analysis.

n_top_genes int

The number of top marker genes to consider for each cluster.

new_label_column str

The name of the new column in adata.obs where the annotations will be stored.

cell_type_of_origin_col str (default: None)

Name of a column in adata.obs that contains the cell type of orgin. Used for context to the LLM.

tissue_of_origin_col str (default: None)

Name of a column in adata.obs that contains the tissue of orgin. Used to provide context to the LLM.

**kwargs

additional kwargs passed to func

Return type:

DataFrame

Returns:

A pd.DataFrame with a column for the top marker genes for each cluster.

Notes

This function also modifies the input adata in place, adding annotations to adata.obs[new_label_col]