anndict.simplify_var_index

anndict.simplify_var_index#

anndict.simplify_var_index(adata, column, new_column_name, simplification_level='')[source]#

Simplifies gene names in the index of the AnnData object’s var attribute based on a boolean column, and stores the result in a new column using the map_gene_labels_to_simplified_set(). This function assumes that adata.var contains gene symbols (i.e. PER1, IL1A) and not numeric indices or accession numbers.

Parameters:
adata AnnData

The AnnData object containing the data.

column str

The boolean column in adata.var used to select genes for simplification.

new_column_name str

The name of the new column to store the simplified labels.

simplification_level str (default: '')

A qualitative description of how much you want the labels to be simplified. Could be anything, like 'extremely', 'barely', or 'pathway-level'.

Return type:

dict

Returns:

A dict containing the map from the current labels to the simplified labels

Raises:

ValueError – If more than 1000 genes are selected for simplification or if the masking column (used to select genes) is not boolean.

Notes

Modifies adata by adding adata.var[new_column_name] (i.e. the new labels) in-place.

Example

import anndict as adt

print(adata.var)
>  index        simplify
> 'HSP90AA1'    1
> 'HSPA1A'      1
> 'HSPA1B'      1
> 'CLOCK'       1
> 'ARNTL'       1
> 'PER1'        1
> 'IL1A'        1
> 'IL6'         1
> 'APOD'        0
> 'CFD'         0

label_mapping = adt.simplify_var_index(adata,
                        '',
                        new_column_name = 'functional_category',
                        simplification_level='functional category level'
                        )

print(adata.var) # New column added
>  index        simplify        functional_category
> 'HSP90AA1'    1               'Heat Shock Proteins'
> 'HSPA1A'      1               'Heat Shock Proteins'
> 'HSPA1B'      1               'Heat Shock Proteins'
> 'CLOCK'       1               'Circadian Rythm'
> 'ARNTL'       1               'Circadian Rythm'
> 'PER1'        1               'Circadian Rythm'
> 'IL1A'        1               'Interleukin'
> 'IL6'         1               'Interleukin'
> 'APOD'        0                Nan
> 'CFD'         0                Nan