anndict.annotate.cells.de_novo.ai_annotate_by_comparison#
- anndict.annotate.cells.de_novo.ai_annotate_by_comparison(func, adata, groupby, n_top_genes, new_label_column, cell_type_of_origin_col=None, tissue_of_origin_col=None, **kwargs)[source]#
Annotate clusters based on the top marker genes for each cluster, in the context of the other clusters’ marker genes.
This uses marker genes for each cluster and applies func to determine the label for each cluster based on the top n marker genes. The results are added to the AnnData object and returned as a DataFrame.
If rank_genes_groups hasn’t been run on the adata, this function will automatically run
sc.tl.rank_genes_groups
- Parameters:
- func
callable
A function that takes
gene_lists
:list
[list
[str
] ] andreturns
annotations
:list
[str
], one for eachlist
of genes ingene_lists
.- adata
AnnData
An
AnnData
object.- groupby
str
Column in
adata.obs
to group by for differential expression analysis.- n_top_genes
int
The number of top marker genes to consider for each cluster.
- new_label_column
str
The name of the new column in
adata.obs
where the annotations will be stored.- cell_type_of_origin_col
str
(default:None
) Name of a column in
adata.obs
that contains the cell type of orgin. Used for context to the LLM.- tissue_of_origin_col
str
(default:None
) Name of a column in
adata.obs
that contains the tissue of orgin. Used to provide context to the LLM.- **kwargs
additional kwargs passed to
func
- func
- Return type:
DataFrame
- Returns:
A
pd.DataFrame
with a column for the top marker genes for each cluster.
Notes
This function also modifies the input
adata
in place, adding annotations toadata.obs[new_label_col]