Iterate Over an AdataDict#
This tutorial demonstrates how to iterate over an AdataDict.
Iteration over an AdataDict is handled by the .fapply()
method, which wraps the function adata_dict_fapply()
(which can be used separately as a function call if you prefer).
In general, the way to think about working with an AdataDict
is to:
Think about what you want to do in terms of operating on a single AnnData
Write a function my_func that does the operations.
Apply it to each anndata in AdataDict with `.fapply(my_func)
Below is an example of this design.
First, build the AdataDict.
import anndict as adt
import scanpy as sc
#load an anndata
adata = sc.datasets.pbmc3k_processed()
# Rename obs column (for legibility)
adata.obs['cell_type'] = adata.obs['louvain']
# Build an AdataDict from this anndata
adata_dict = adt.build_adata_dict(adata, strata_keys=['cell_type'])
Then, let’s say we have some analysis pipeline called my_analysis_pipeline
that we want to try running on each celltype separatley. Here’s how we’d do that with an AdataDict
.
First, define a function that takes adata and runs the analysis pipeline on it. That would look like this:
def my_analysis_pipeline(adata, some_param=0.5, other_param=None, **kwargs):
"""
This is my analysis pipeline. It takes an anndata and two parameters for processing.
"""
#First, normalize the adata to 10k reads per cell
sc.pp.normalize_total(adata, target_sum=1e4)
#Then, run some function on it
result_of_my_pipeline = my_custom_func(adata, some_param=some_param)
return result_of_my_pipeline
This pipeline does the following:
normalize the adata
run’s some other function that takes the keyword argument
some_param
returns the result of this function (whatever this result is, could be a number, a plot, a dataframe, an anndata, etc)
Then, to run this pipeline on each adata in adata_dict
, we’d use .fapply
as follows.
This code will run my_analysis_pipeline
on each adata in adata_dict, passing some_param=0.5
each time.
#Run the function on each adata in adata_dict
all_results = adata_dict.fapply(my_analysis_pipeline, some_param=0.5)
The return behavior of adata_dict.fapply(func)
is governed by the return behaviour of func
. .fapply
will return:
a dictionary of the same structure as
adata_dict
and matching keys, containg the return offunc
as values.None
iffunc
returnsNone
on each adata inadata_dict
In this case, all_results will be a dictionary of return values because we’ve defined my_analysis_pipeline
to return some value.
Now, let’s consider a slightly more complicated case. Let’s say we want the value of some_param
to be different for each adata in adata_dict
. In this case, you can pass a dictionary (with the same keys and structure as adata_dict
) to the some_param
argument, and .fapply
will pass the right value of some_param
for each adata.
# We'll need an entry for each cell type:
adata_dict.keys()
dict_keys([('CD4 T cells',), ('CD14+ Monocytes',), ('B cells',), ('CD8 T cells',), ('NK cells',), ('FCGR3A+ Monocytes',), ('Dendritic cells',), ('Megakaryocytes',)])
# Manually define the dictionary to pass to some_param
some_param_dict = {
('CD4 T cells',): 0.5,
('CD14+ Monocytes',): 0.6,
('B cells',): 0.4,
('CD8 T cells',): 0.7,
('NK cells',): 0.3,
('FCGR3A+ Monocytes',): 0.6,
('Dendritic cells',): 0.8,
('Megakaryocytes',): 0.9
}
We can also take advantage of the fact that fapply returns a dictionary of the same structure as the adata_dict on which it was called to create the argument dictionary like this:
def determine_param(adata):
"""
This function will calculate the argument to pass to some_param.
"""
param_value = some_func(adata)
return param_value
some_param_dict = adata_dict.fapply(determine_param) # This will return a dictionary with the correct structure and keys.
Then pass the dictionary argument to .fapply()
all_results = adata_dict.fapply(my_analysis_pipeline, some_param=some_param_dict) #This gives a different value of some_param for each cell type.
.fapply()
can handle a mix of global and adata-specific arguments. For example, you can do this:
all_results = adata_dict.fapply(my_analysis_pipeline, some_param=some_param_dict, other_param=0.5)
And finally, you can define your func
to take a parameter called adt_key
to make the adata_dict
key available to func
when func
is passed to fapply.
def my_analysis_pipeline_with_adt_key(adata, some_param=0.5, other_param=None, adt_key=None, **kwargs):
"""
This is my analysis pipeline. It takes an anndata and two parameters for processing and prints the current key being processed.
"""
print(f"Processing: {adt_key}")
#First, normalize the adata to 10k reads per cell
sc.pp.normalize_total(adata, target_sum=1e4)
#Then, run some function on it
result_of_my_pipeline = my_custom_func(adata, some_param=some_param)
return result_of_my_pipeline
all_results = adata_dict.fapply(my_analysis_pipeline_with_adt_key, some_param=some_param_dict, other_param=0.5) # This will now print out each key of adata_dict as it is processed.