anndict.annotate.cells.de_novo.ai_annotate#
- anndict.annotate.cells.de_novo.ai_annotate(func, adata, groupby, n_top_genes, new_label_column, tissue_of_origin_col=None, **kwargs)[source]#
Annotate clusters based on the top marker genes for each cluster.
This uses marker genes for each cluster and applies func to determine the label for each cluster based on the top n marker genes. The results are added to the AnnData object and returned as a DataFrame.
If rank_genes_groups hasn’t been run on the adata, this function will automatically run
sc.tl.rank_genes_groups
- Parameters:
- func
callable
A function that takes
gene_list
:list
[str
] and returnsannotation
:str
.- adata
AnnData
An
AnnData
object.- groupby
str
Column in
adata.obs
to group by for differential expression analysis.- n_top_genes
int
The number of top marker genes to consider for each cluster.
- new_label_column
str
The name of the new column in
adata.obs
where the annotations will be stored.- tissue_of_origin_col
str
(default:None
) Name of a column in
adata.obs
that contains the tissue of orgin. Used to provide context to the LLM.- **kwargs
additional kwargs passed to
func
- func
- Return type:
DataFrame
- Returns:
A
pd.DataFrame
with a column for the top marker genes for each cluster.
Notes
This function also modifies the input
adata
in place, adding annotations toadata.obs[new_label_col]