SIGnature.meta#
- class SIGnature.meta.Meta(df)[source]#
Bases:
object
A class for working with cell metadata files and output scores
- Parameters:
df (pandas.Dataframe) – input dataframe to consider
Examples
>>> meta_df = pd.read_csv(meta_path) >>> meta = Meta(meta_df)
- add_hits(column_name, mode='percentile', cut_val=95.0, hit_type='above', string_append='__hit')[source]#
Add hits above percentile or quantile in a numerical column.
- Parameters:
column_name (str) – Name of column used for cut-offs.
mode (str, default: "percentile") – Mode in one of “percentile”, “quantile”, or “value” to calculate cut-offs.
cut_val (Union[float, int], default: 95.0) – The cut-off value used by percentile or quantile function on the column of interest.
hit_type (str, default: "above") – Hit type in one of “above” for hits above cut-off or “below” for hits below.
string_append (str, default: "__hit") – String to append to original category name to denote hits.
Examples
>>> meta.add_hits(column_name="GLI2")
- add_scores(score_dict, mode='value')[source]#
Add attribution scores to metadata. Accounts for .npz sparse files (common for attributions) or numpy arrays.
- Parameters:
score_dict (dict) – Dictionary where keys are category names and values are either scores or file locations.
mode (str, default: "value") – Mode in one of “value” or “file” for how to load data.
Examples
>>> score_dict = {"GLI1": "/data/GLI1_score.npz", "GLI2": "/data/GLI2_score.npy"} >>> meta.add_scores_by_file(score_dict, mode='file')
- cat_by_min(column_name='prediction', mode='percent', cut_val=1.0)[source]#
Get a list of categories with at least X% of a column
- Parameters:
column_name (str) – Name of column used for cut-offs.
mode (str) – Mode in “percent” to at percent of all cells or “count” to look at minimum count
cut_val (float) – Cut-off percent or count value to be used.
- Returns:
A list of valid categories in column.
- Return type:
list
Examples
>>> cats = meta.cat_by_min(column_name="prediction", mode="count", cut_val=50)
- columns()[source]#
Return columns of current dataframe
Examples
>>> columns = meta.columns()
- Return type:
list
- ncell()[source]#
Returns number of cells in current meta
Examples
>>> ncell = meta.ncell()
- Return type:
int
- samphit_boxplot(title='Hits Across Diseases per Sample', hit_label='Hits', swarm=False, title_fs=16, dotsize=3, fe=1, figsize=(6, 4), filename=None)[source]#
Plots boxplot and swarmplot for disease
- Parameters:
df (pandas.DataFrame) – A pandas dataframe that contains sample where each row has a sample labeled by its “disease” and “Hit Percentage” that indicates what proportion of cells are hits.
title (str, default: "Hits Across Diseases per Sample") – Plot title
hit_label (str, default: "Hits") – label for what to consider hits
swarm (bool, default: False) – whether to include a swarmplot on top as well or no
title_fs (Union[int, float], default: 16) – font size for the title
dotsize (Union[int, float], default: 3,) – dot size for swarmplot
fe (Union[int, float], default: 1) – scaling factor for various sizes
figsize (Tuple, default: (6,4)) – figure size
filename (Optional[str], default: None) – file name if want to save file
Examples
>>> Meta.samphit_boxplot(df=samphit_df)
- samphit_df(cell_min=50, samp_min=3, samp_groupby=['sample'], acats=['tissue', 'disease', 'study', 'sample'], dropna=True, hit_col='Hit Percentage', num_dis=15)[source]#
Manipulate dataframe to calculate percentage of hits per sample
- Parameters:
cell_min (int) – minimum number of cells per sample to be considered
samp_min (int) – minimum number of qualifying samples for disease to be considered.
samp_groupby (list) – groupby to consider for minimum number of cells
acats (list) – annotation categories that user cares about for plotting
dropna (bool) – drop all diseases named NA
hit_col (str) – hit column name to consider
num_dis (Optional[int]) – number of diseases to include in chart
- Returns:
A sample level dataframe.
- Return type:
pandas.DataFrame
Examples
>>> df = meta.samphit_df(num_dis=10)
- subset_hq(cutoff=0.02, quality_column='prediction_nn_dist', mode='below')[source]#
Subset to cells below prediction dist cut-off
- Parameters:
cutoff (float, default: 0.02) – Cut-off to use for quality.
quality_column (str, default: "prediction_nn_dist") – Column used to score quality metric.
mode (str, default: "below") – Mode in “below” for when you want lower and “above” for higher.
Examples
>>> meta.subset_hq()
- subset_invivo(column_name='in_vivo', in_vivo_val=True)[source]#
Subset to in vivo cells using standard SCimilarity columns.
Examples
>>> meta.subset_invivo()
- Parameters:
column_name (str)
in_vivo_val (bool | str)
- top_cells(ncell, column_name, return_df=True)[source]#
Get dataframe including only top cells by category.
Paramaters#
- column_name: str
Name of column to consider.
- ncell: int
Number of cells to keep.
- return_df: bool, default: True
Return a dataframe instead of modifying the class attribute.
- returns:
A dataframe of top cells.
- rtype:
pandas.DataFrame
Examples
meta_top100 = meta.top_cells(100, ‘GLI2’, return_df=True)
- Parameters:
ncell (int)
column_name (str)
return_df (bool)
- Return type:
pandas.DataFrame