anndict.annotate.transfer_labels_using_classifier

anndict.annotate.transfer_labels_using_classifier#

anndict.annotate.transfer_labels_using_classifier(origin_adata, destination_adata, origin_label_key, feature_key, classifier_class, new_column_name='predicted_label', random_state=None, **kwargs)[source]#

Transfers labels from origin_adata to destination_adata using a classifier of type classifier_class.

Supported classifiers include any sklearn classifier inheriting from sklearn.base.ClassifierMixin.

Parameters:
origin_adata AnnData

An AnnData containing the original labels. A classifier will be trained on this adata.

destination_adata AnnData

An AnnData containing the new cells to be labeled. Must contain the same .obsm[feature_key] as origin_adata if feature_key is not 'use_X'.

origin_label_key str

Key in origin_adata.obs containing the original labels.

feature_key Union[str, Literal['use_X']]

Key of data to use in origin_adata.obsm, or 'use_X' to use origin_adata.X.

classifier_class Type[ClassifierMixin]

Any classifier inheriting from sklearn.base.ClassifierMixin. Pass as a class, e.g. LogisticRegression, and not an already-instantiated object.

new_column_name str (default: 'predicted_label')

The name of the new column in destination_adata.obs where the predicted labels will be stored.

random_state int | None (default: None)

random state seed passed to stable_label_adata().

**kwargs

Additional keyword arguments passed to the classifier constructor.

Return type:

AdataPredicoder

Returns:

A AdataPredicoder that contains the trained classifier, and automatically decodes predicted labels into text labels. Can be used to calculate class membership probabilities or predict on other AnnData.

Notes

Modifies destination_adata in-place.

See also

AdataPredicoder

The container class for classifier+label encoder/decoder.

train_label_classifier()

The function that trains the classifier on origin_adata.

Examples

Case 1: Using a logistic regression classifier

import anndict as adt
from sklearn.linear_model import LogisticRegression

transfer_labels(
    origin_adata=origin_adata,
    destination_adata=destination_adata,
    origin_label_key='cell_type',
    feature_key='X_pca',
    classifier_class=LogisticRegression,
    new_column_name='predicted_label',
    penalty='l2', #one kwarg for LogisticRegression
    fit_intercept=True, #another kwarg for LogisticRegression
)

Case 2: Using a random forest classifier

import anndict as adt
from sklearn.ensemble import RandomForestClassifier

transfer_labels(
    origin_adata=origin_adata,
    destination_adata=destination_adata,
    origin_label_key='cell_type',
    feature_key='X_pca',
    classifier_class=RandomForestClassifier,
    new_column_name='predicted_label',
    n_estimators=1000, #one kwarg for RandomForestClassifier
    max_features='sqrt', #another kwarg for RandomForestClassifier
)