API
Preprocessing
Preprocessing functions are relevant both for preparing the data for integration as well as postprocessing the integration output.
The most relevant preprocessing steps are:
Normalization
Scaling, batch-aware
Highly variable gene selection, batch-aware
Cell cycle scoring
Principle component analysis (PCA)
k-nearest neighbor graph (kNN graph)
UMAP
Clustering
Note that some preprocessing steps depend on each other. Please refer to the best_practices for more details.
|
Normalise counts using the |
|
Batch-aware scaling of count matrix |
|
Highly variable gene selection |
|
Batch-aware highly variable gene selection |
|
Score cell cycle score given an organism |
|
Apply feature selection and dimensionality reduction steps. |
Integration
Integration method functions require the preprocessed anndata
object (here adata
) and the name of the batch column
in adata.obs
(here 'batch'
).
The methods can be called using the following, where integration_method
is the name of the integration method.
scib.ig.integration_method(adata, batch="batch")
For example, in order to run Scanorama, on a dataset, call:
scib.ig.scanorama(adata, batch="batch")
Warning
The following notation is deprecated.
scib.integration.runIntegrationMethod(adata, batch="batch")
Please use the snake_case naming without the run
prefix.
Some integration methods (e.g. scgen()
, scanvi()
) also use cell type
labels as input.
For these, you need to additionally provide the corresponding label column of adata.obs
(here cell_type
).
scib.ig.scgen(adata, batch="batch", cell_type="cell_type")
scib.ig.scanvi(adata, batch="batch", labels="cell_type")
Functions
|
BBKNN wrapper function |
|
ComBat wrapper function ( |
|
DESC wrapper function |
|
Harmony wrapper function |
|
MNN wrapper function ( |
|
SAUCIE wrapper function |
|
Scanorama wrapper function |
|
scANVI wrapper function |
|
scGen wrapper function |
|
scVI wrapper function |
|
trVAE wrapper function |
|
trVAE wrapper function ( |
Metrics
This package contains all the metrics used for benchmarking scRNA-seq data integration performance. They can be applied on the integrated as well as the unintegrated data and can be classified into biological conservation and batch removal metrics. For a detailed description of the metrics implemented in this package, please see our publication.
Most metrics require specific inputs that need to be preprocessed, which is described in detail under User Guide.
Biological conservation metrics quantify either the integrity of cluster-based metrics based on clustering results of the integration output, or the difference in the feature spaces of integrated and unintegrated data. Each metric is scaled to a value ranging from 0 to 1 by default, where larger scores represent better conservation of the biological aspect that the metric addresses.
|
Adjusted Rand Index |
|
Cell cycle conservation score |
|
Cell-type LISI (cLISI) score |
|
Highly variable gene overlap |
|
Isolated label score ASW |
|
Isolated label score F1 |
|
Normalized mutual information |
|
Average silhouette width (ASW) |
|
Trajectory conservation score |
Batch correction metrics values are scaled by default between 0 and 1, in which larger scores represent better batch removal.
|
Graph Connectivity |
|
Integration LISI (iLISI) score |
|
kBET score |
|
Principal component regression score |
|
Batch ASW |
For convenience, scib
provides wrapper functions that, given integrated and unintegrated adata objects, apply
multiple metrics and return all the results in a pandas.Dataframe
.
The main function is metrics()
, that provides all the parameters for the different metrics.
scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)
The remaining functions call the metrics()
for
Furthermore, metrics()
is wrapped by convenience functions with preconfigured subsets of metrics
based on expected computation time:
metrics_fast()
only computes metrics that require little preprocessingmetrics_slim()
includes all functions ofmetrics_fast()
and adds clustering-based metricsmetrics_all()
includes all metrics
|
Master metrics function |
|
Only metrics with minimal preprocessing and runtime |
|
All metrics apart from kBET and LISI scores |
|
All metrics |
Some parts of metrics can be used individually, these are listed below.
|
cLISI and iLISI scores |
|
Principal component regression for anndata object |
|
Principal component regression |