Automated Label Management

Automated Label Management#

This module contains functions that use LLMs to automate label management within and between dataframes. All of these functions are for processing category labels in pd dfs. We separate these functions into 3 categories based on their input, see the left navigation sidebar for the docs, and a short description below.

The functions are all broadly based around easing friction when working with categorical labels. In the context of single cell RNA sequencing data, some common examples include:

  • “My cell type labels have typos.”
    • i.e. T cells and T cels

  • “I downloaded data from 3 different groups, and they each used a different label for the same cell type”
    • i.e. macrophage, Macrophage.,

  • “I want to coarsen the cell type labels.”
    • i.e. Map CD8+ T cells and CD4+ T cells to a single category called T cells

Single Column#

Input: a single column.
  • These functions may generate more than one output column/category mapping, but they only look at a single set of input categories/labels.

Within Adata#

Input: multiple columns within a single DataFrame (i.e. .obs from a single adata)
  • These functions process multiple columns within a single DataFrame, usually outputting a single column that resolves differences among the input columns.

Between Adata#

Input multiple DataFrames (i.e. .obs from several adata)
  • These functions process columns across DataFrames, usually inserting a new column into each that shares labels/categories across the DataFrames.

Table of Contents