anndict.annotate.train_label_classifier

anndict.annotate.train_label_classifier#

anndict.annotate.train_label_classifier(adata, label_key, feature_key, classifier_class, *, random_state=None, **kwargs)[source]#

Trains a classifier on the given adata; used internally by transfer_labels_using_classifier().

Parameters:
adata AnnData

An AnnData containing the original labels. A classifier will be trained on this adata.

label_key str

Key in adata.obs containing the original labels.

feature_key Union[str, Literal['use_X']]

Key of data to use in adata.obsm, or 'use_X' to use adata.X.

classifier_class Type[ClassifierMixin]

Any classifier inheriting from sklearn.base.ClassifierMixin. Pass as a class, e.g. LogisticRegression, and not an already-instantiated object.

random_state int | None (default: None)

random state seed passed to stable_label_adata().

**kwargs

Additional keyword arguments passed to the classifier constructor.

Return type:

AdataPredicoder

Returns:

A AdataPredicoder, containing the trained classifier and label encoder/decoder.

See also

AdataPredicoder

The container class for classifier+label encoder/decoder.

stable_label_adata()

The function that trains the classifier on adata.

Examples

Case 1: Using a logistic regression classifier

import anndict as adt
from sklearn.linear_model import LogisticRegression

train_label_classifier(
    adata=adata,
    label_key='cell_type',
    feature_key='X_pca',
    classifier_class=LogisticRegression,
    penalty='l2', #one kwarg for LogisticRegression
    fit_intercept=True, #another kwarg for LogisticRegression
)

Case 2: Using a random forest classifier

import anndict as adt
from sklearn.ensemble import RandomForestClassifier

train_label_classifier(
    adata=adata,
    label_key='cell_type',
    feature_key='X_pca',
    classifier_class=RandomForestClassifier,
    n_estimators=1000, #one kwarg for RandomForestClassifier
    max_features='sqrt', #another kwarg for RandomForestClassifier
)