Release Notes#
Version 0.4.1: June 25, 2025#
- General:
New utility method to help with data preprocessing:
utils.convert_id2symbol
to map ensemble ID to gene symbol given a mapping table.Updated cell annotation tutorial.
Minor fixes and formatting.
- Training:
A small refactor to the tiledb and anndata dataloaders to allow for using
spawn
as the multiprocessing context.New scripts for training and cell annotation knn index from anndata.
Various fixes to the anndata dataloader.
Version 0.4.0: May 05, 2025#
- General:
A new training tutorial has been added which describes the new training workflow. This includes data preparation, training, and post-training data structures using the new scripts.
- Training:
A new training workflow has been added to use CellArr, a TileDB based framework, as the data store to streamline the end-to-end process. This replaces the old Zarr based workflows.
New data loaders and samplers for CellArr data have been added in the
tiledb_data_models
module.A example training script has been added to show how to train models as
scripts/train.py
.New scripts for creating all post-training data structures have been added in the folder
scripts
.New utility methods that make use of the CellArr store:
utils.query_tiledb_df
to query a tiledb dataframe,utils.adata_from_tiledb
to extract cells from the tiledb stores based on index, including raw counts.
Version 0.3.0: November 19, 2024#
- General:
Various changes to utility functions to improve efficiency and flexibility.
Simplification of many class constructor parameters.
Tutorials have been updated with new download links and analyses.
- Exhaustive queries:
Functionality to perform exhaustive queries has been added as new methods
cell_query.search_exhaustive
,cell_query.search_centroid_exhaustive
, andcell_query.search_cluster_centroids_exhaustive
.The kNN query method
cell_query.search
has been renamed tocell_query.search_nearest
.
- Query result filtering and interpretation:
The
cell_query.compile_sample_metadata
method has been expanded to allow grouping by tissue and disease (in addition to study and sample).The methods
utils.subset_by_unique_values
,utils.subset_by_frequency
, andutils.categorize_and_sort_by_score
have been added to provide tools for filtering, sorting and summarizing query results.The “query_stability” quality control metric has been renamed to “query_coherence” and is now deterministic (by setting a random seed).
Results from exhaustive queries can be constrained to specific metadata criteria (e.g., tissue, disease, in vitro vs in vivo, etc). using the
metadata_filter
argument tocell_query.search_exhaustive
.Results from exhaustive queries can be constrained by distance-to-query using the
max_dist
argument tocell_query.search_exhaustive
.The mappings in
utils.clean_tissues
andutils.clean_diseases
have been expanded.
- Optimizations to training:
The ASW and NMSE training evaluation metrics were added.
The
triplet_selector.get_asw
method was added to calculate ASW.The
ontologies.find_most_viable_parent
method was added to help coarse grain cell type ontology labels.Optimized sampling weights of study and cell type.
Version 0.2.0: March 22, 2024#
Updated version requirements for multiple dependencies and removed the
pegasuspy
dependency.Expanded API documentation and tutorials.
Simplified model file structure and model loader methods.
Added
search_centroid
method to classCellQuery
for cell queries using custom centroids which provides quality control statistics to assess the consistency of query results.
Version 0.1.0: August 13, 2023#
Initial public release.