Frequently Asked Questions (FAQ)¶
This FAQ addresses common questions about SPEX library usage, troubleshooting, and best practices.
General Questions¶
What is SPEX?¶
SPEX is a comprehensive Python library for spatial transcriptomics analysis. It provides tools for: - Image segmentation and cell detection - Feature extraction from spatial data - Clustering analysis with spatial awareness - Spatial analysis including co-localization and niche analysis - Data preprocessing and quality control
What data formats does SPEX support?¶
SPEX supports multiple data formats: - Expression data: H5AD files, CSV, TSV - Spatial coordinates: CSV files with x,y coordinates - Images: TIFF, PNG, JPEG formats - Segmentation masks: TIFF, PNG formats
How do I install SPEX?¶
pip install spex
For development installation:
git clone https://github.com/your-repo/spex.git
cd spex
pip install -e .
Data Loading and Preprocessing¶
How do I load my spatial transcriptomics data?¶
import spex
# Load data with spatial coordinates
adata = spex.load_anndata(
"expression_matrix.h5ad",
spatial_data="spatial_coordinates.csv",
image_path="tissue_image.tif"
)
What should my spatial coordinates file look like?¶
Your spatial coordinates CSV should have this format:
cell_id,x,y
cell_1,100,200
cell_2,150,250
cell_3,200,300
How do I handle missing spatial coordinates?¶
# Load without spatial data
adata = spex.load_anndata("expression_matrix.h5ad")
# Add spatial coordinates later
spatial_coords = pd.read_csv("spatial_coordinates.csv", index_col=0)
adata.obsm['spatial'] = spatial_coords.values
What preprocessing steps are recommended?¶
# Basic preprocessing
adata = spex.preprocess(
adata,
min_genes=10,
min_cells=3,
max_counts_per_cell=5000,
normalize=True,
log_transform=True
)
# Dimensionality reduction
adata = spex.reduce_dimensionality(
adata,
method='pca',
n_components=50
)
Image Segmentation¶
Which segmentation method should I use?¶
Cellpose (recommended for most cases): - Works well with various cell types - Automatic parameter detection - Good for fluorescence images
StarDist: - Excellent for nuclear segmentation - Good for brightfield images - Requires more parameter tuning
Watershed: - Fast and simple - Good for well-separated cells - Less accurate for complex images
How do I download Cellpose models?¶
# Download default models
spex.download_cellpose_models()
# Download specific model
spex.download_cellpose_models(model_type='cyto')
My segmentation is poor. What should I do?¶
-
Preprocess the image:
image = spex.load_image("tissue.tif") image_processed = spex.background_subtract(image) image_processed = spex.median_denoise(image_processed)
-
Adjust Cellpose parameters:
segmentation_mask = spex.cellpose_cellseg( image_processed, model_type='cyto', diameter=20, # Manual diameter flow_threshold=0.3, # Lower threshold cellprob_threshold=-2 # More permissive )
-
Post-process the segmentation:
segmentation_mask = spex.remove_small_objects(segmentation_mask, min_size=50) segmentation_mask = spex.remove_large_objects(segmentation_mask, max_size=1000) segmentation_mask = spex.rescue_cells(segmentation_mask, image_processed)
How do I visualize segmentation results?¶
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(image, cmap='gray')
axes[0].set_title('Original Image')
axes[0].axis('off')
axes[1].imshow(image_processed, cmap='gray')
axes[1].set_title('Preprocessed Image')
axes[1].axis('off')
axes[2].imshow(segmentation_mask, cmap='tab20')
axes[2].set_title('Cell Segmentation')
axes[2].axis('off')
plt.tight_layout()
plt.show()
Clustering Analysis¶
Which clustering method should I use?¶
Leiden (recommended): - Fast and scalable - Good for large datasets - Consistent results
Louvain: - Classic method - Good for smaller datasets - More sensitive to resolution parameter
Phenograph: - Advanced method - Good for complex datasets - Slower but more robust
How do I choose the right resolution parameter?¶
# Test different resolutions
resolutions = [0.1, 0.3, 0.5, 0.7, 1.0]
results = {}
for res in resolutions:
adata = spex.cluster(adata, method='leiden', resolution=res)
results[res] = len(adata.obs['leiden'].unique())
print("Number of clusters per resolution:")
for res, n_clusters in results.items():
print(f"Resolution {res}: {n_clusters} clusters")
How do I validate clustering results?¶
from sklearn.metrics import silhouette_score
# Calculate silhouette score
silhouette_avg = silhouette_score(adata.obsm['X_pca'], adata.obs['leiden'])
print(f"Silhouette Score: {silhouette_avg:.3f}")
# Visualize clustering
sc.pl.umap(adata, color='leiden', show=False)
plt.title(f'Clustering (Silhouette: {silhouette_avg:.3f})')
plt.show()
How do I find marker genes for clusters?¶
# Find marker genes
marker_genes = spex.differential_expression(
adata,
groupby='leiden',
method='wilcoxon'
)
# Get top markers per cluster
for cluster in adata.obs['leiden'].unique():
cluster_markers = marker_genes[marker_genes['cluster'] == cluster]
top_markers = cluster_markers.head(5)['gene'].tolist()
print(f"Cluster {cluster}: {', '.join(top_markers)}")
Spatial Analysis¶
What is Co-Localization Quotient (CLQ)?¶
CLQ measures the spatial association between different cell types: - CLQ > 1: Cell types are co-localized - CLQ = 1: Random spatial distribution - CLQ < 1: Cell types avoid each other
How do I perform CLQ analysis?¶
# Calculate CLQ
clq_results = spex.CLQ_vec_numba(
adata,
cluster_key='leiden',
spatial_key='spatial',
n_permutations=1000
)
# Visualize results
import seaborn as sns
clq_matrix = clq_results.pivot(index='cluster1', columns='cluster2', values='clq')
sns.heatmap(clq_matrix, annot=True, cmap='RdBu_r', center=1)
plt.title('Co-Localization Quotient')
plt.show()
What is niche analysis?¶
Niche analysis identifies spatial microenvironments where specific cell types are enriched:
# Perform niche analysis
niche_results = spex.niche(
adata,
cluster_key='leiden',
spatial_key='spatial',
radius=100,
min_cells=5
)
print("Niche analysis results:")
print(niche_results.head(10))
How do I analyze spatial autocorrelation?¶
# Calculate Moran's I for a gene
gene_expression = adata[:, 'gene_name'].X.flatten()
coords = adata.obsm['spatial']
# Calculate spatial weights
from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors(n_neighbors=10).fit(coords)
distances, indices = nbrs.kneighbors(coords)
# Moran's I calculation (simplified)
def calculate_morans_i(expression, indices, distances):
# Implementation here
pass
moran_i = calculate_morans_i(gene_expression, indices, distances)
print(f"Moran's I: {moran_i:.3f}")
Performance and Optimization¶
My analysis is slow. How can I speed it up?¶
-
Use appropriate data types:
adata.X = adata.X.astype(np.float32)
-
Reduce dataset size:
sc.pp.subsample(adata, fraction=0.5, random_state=42)
-
Use parallel processing:
import multiprocessing as mp def process_chunk(chunk_data): # Your processing function pass with mp.Pool(processes=4) as pool: results = pool.map(process_chunk, data_chunks)
-
Reduce permutations for spatial analysis:
clq_results = spex.CLQ_vec_numba( adata, cluster_key='leiden', n_permutations=100 # Reduced from 1000 )
How do I handle large datasets?¶
-
Process in chunks:
chunk_size = 1000 for i in range(0, len(adata), chunk_size): chunk = adata[i:i+chunk_size] # Process chunk
-
Use memory-efficient operations:
# Clear unnecessary data del large_variable import gc gc.collect()
-
Use caching:
import joblib @joblib.cache def expensive_computation(data): # Your computation here return result
Troubleshooting¶
I get a "No module named 'spex'" error¶
This means SPEX is not installed. Install it with:
pip install spex
My segmentation fails with "CUDA out of memory"¶
-
Reduce image size:
from skimage.transform import resize image = resize(image, (image.shape[0]//2, image.shape[1]//2))
-
Use CPU instead of GPU:
import os os.environ['CUDA_VISIBLE_DEVICES'] = '' # Disable GPU
-
Process smaller regions:
# Split image into tiles tile_size = 512 for i in range(0, image.shape[0], tile_size): for j in range(0, image.shape[1], tile_size): tile = image[i:i+tile_size, j:j+tile_size] # Process tile
My clustering produces too many/few clusters¶
Adjust the resolution parameter:
# For fewer clusters
adata = spex.cluster(adata, method='leiden', resolution=0.1)
# For more clusters
adata = spex.cluster(adata, method='leiden', resolution=1.0)
My spatial analysis fails¶
-
Check spatial coordinates:
print("Spatial coordinates shape:", adata.obsm['spatial'].shape) print("Coordinate range:", adata.obsm['spatial'].min(), adata.obsm['spatial'].max())
-
Remove duplicate coordinates:
coords = adata.obsm['spatial'] unique_coords = np.unique(coords, axis=0) if len(unique_coords) < len(coords): print("Warning: Duplicate coordinates found")
-
Check for NaN values:
if np.isnan(adata.obsm['spatial']).any(): print("Warning: NaN values in spatial coordinates")
My quality control removes too many cells¶
Adjust QC parameters:
# More permissive parameters
qc_params = {
'min_counts': 50, # Lower minimum
'max_counts': 15000, # Higher maximum
'min_genes': 5, # Lower minimum
'max_genes': 8000, # Higher maximum
'max_mito_ratio': 0.3, # Higher threshold
'min_cells_per_gene': 2 # Lower minimum
}
Best Practices¶
Data Organization¶
-
Use consistent file naming:
sample_001/ ├── expression.h5ad ├── spatial_coordinates.csv ├── tissue_image.tif └── metadata.json
-
Document your analysis:
# Save analysis parameters analysis_params = { 'segmentation': {'method': 'cellpose', 'diameter': 20}, 'clustering': {'method': 'leiden', 'resolution': 0.5}, 'spatial_analysis': {'n_permutations': 1000} } import json with open('analysis_parameters.json', 'w') as f: json.dump(analysis_params, f, indent=2)
Reproducibility¶
-
Set random seeds:
import numpy as np np.random.seed(42)
-
Save intermediate results:
# Save at key steps adata.write('preprocessed_data.h5ad') adata.write('clustered_data.h5ad') adata.write('final_results.h5ad')
-
Use version control:
git add . git commit -m "Analysis step: clustering completed"
Visualization¶
-
Use consistent color schemes:
# Define color palette cluster_colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
-
Save high-quality figures:
plt.savefig('figure.png', dpi=300, bbox_inches='tight')
-
Add proper labels and titles:
plt.xlabel('UMAP 1') plt.ylabel('UMAP 2') plt.title('Cell Clustering Results')
Getting Help¶
Where can I find more examples?¶
- Check the
docs/examples/
directory - Look at the Jupyter notebooks in
notebooks/
- Visit the documentation website
How do I report a bug?¶
- Check if the issue is already reported
- Create a minimal reproducible example
- Include error messages and system information
- Submit an issue on GitHub
How do I contribute to SPEX?¶
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Where can I ask questions?¶
- GitHub Issues for bug reports
- GitHub Discussions for questions
- Email the maintainers for specific issues
This FAQ covers the most common questions about SPEX. If you don't find your answer here, please check the documentation or ask in the GitHub discussions.