Automated Label Management#
This module contains functions that use LLMs to automate label management within and between dataframes. All of these functions are for processing category labels in pd dfs. We separate these functions into 3 categories based on their input, see the left navigation sidebar for the docs, and a short description below.
The functions are all broadly based around easing friction when working with categorical labels. In the context of single cell RNA sequencing data, some common examples include:
- “My cell type labels have typos.”
i.e.
T cells
andT cels
- “I downloaded data from 3 different groups, and they each used a different label for the same cell type”
i.e.
macrophage
,Macrophage.
,MΦ
- “I want to coarsen the cell type labels.”
i.e. Map
CD8+ T cells
andCD4+ T cells
to a single category calledT cells
Single Column#
- Input: a single column.
These functions may generate more than one output column/category mapping, but they only look at a single set of input categories/labels.
Within Adata#
- Input: multiple columns within a single
DataFrame
(i.e..obs
from a singleadata
) These functions process multiple columns within a single
DataFrame
, usually outputting a single column that resolves differences among the input columns.
Between Adata#
- Input multiple DataFrames (i.e.
.obs
from severaladata
) These functions process columns across
DataFrames
, usually inserting a new column into each that shares labels/categories across theDataFrames
.