SIGnature.meta#

class SIGnature.meta.Meta(df)[source]#

Bases: object

A class for working with cell metadata files and output scores

Parameters:: df (pandas.Dataframe) – input dataframe to consider

Examples

>>> meta_df = pd.read_csv(meta_path)
>>> meta = Meta(meta_df)

add_hits(column_name, mode='percentile', cut_val=95.0, hit_type='above', string_append='__hit')[source]#

Add hits above percentile or quantile in a numerical column.

Parameters:

column_name (str) – Name of column used for cut-offs.
mode (str, default: "percentile") – Mode in one of “percentile”, “quantile”, or “value” to calculate cut-offs.
cut_val (Union[float, int], default: 95.0) – The cut-off value used by percentile or quantile function on the column of interest.
hit_type (str, default: "above") – Hit type in one of “above” for hits above cut-off or “below” for hits below.
string_append (str, default: "__hit") – String to append to original category name to denote hits.

Examples

>>> meta.add_hits(column_name="GLI2")

add_scores(score_dict, mode='value')[source]#

Add attribution scores to metadata. Accounts for .npz sparse files (common for attributions) or numpy arrays.

Parameters:

score_dict (dict) – Dictionary where keys are category names and values are either scores or file locations.
mode (str, default: "value") – Mode in one of “value” or “file” for how to load data.

Examples

>>> score_dict = {"GLI1": "/data/GLI1_score.npz", "GLI2": "/data/GLI2_score.npy"}
>>> meta.add_scores_by_file(score_dict, mode='file')

append(df)[source]#

Append dataframe to current meta object

Examples

>>> meta.append(df)

cat_by_min(column_name='prediction', mode='percent', cut_val=1.0)[source]#

Get a list of categories with at least X% of a column

Parameters:

column_name (str) – Name of column used for cut-offs.
mode (str) – Mode in “percent” to at percent of all cells or “count” to look at minimum count
cut_val (float) – Cut-off percent or count value to be used.

Returns:

A list of valid categories in column.

Return type:

list

Examples

>>> cats = meta.cat_by_min(column_name="prediction", mode="count", cut_val=50)

columns()[source]#

Return columns of current dataframe

Examples

>>> columns = meta.columns()

Return type:: list

copy()[source]#

Return copy of current meta object

Examples

>>> new_meta = meta.copy()

ncell()[source]#

Returns number of cells in current meta

Examples

>>> ncell = meta.ncell()

Return type:: int

samphit_boxplot(title='Hits Across Diseases per Sample', hit_label='Hits', swarm=False, title_fs=16, dotsize=3, fe=1, figsize=(6, 4), filename=None)[source]#

Plots boxplot and swarmplot for disease

Parameters:

df (pandas.DataFrame) – A pandas dataframe that contains sample where each row has a sample labeled by its “disease” and “Hit Percentage” that indicates what proportion of cells are hits.
title (str, default: "Hits Across Diseases per Sample") – Plot title
hit_label (str, default: "Hits") – label for what to consider hits
swarm (bool, default: False) – whether to include a swarmplot on top as well or no
title_fs (Union[int, float], default: 16) – font size for the title
dotsize (Union[int, float], default: 3,) – dot size for swarmplot
fe (Union[int, float], default: 1) – scaling factor for various sizes
figsize (Tuple, default: (6,4)) – figure size
filename (Optional[str], default: None) – file name if want to save file

Examples

>>> Meta.samphit_boxplot(df=samphit_df)

samphit_df(cell_min=50, samp_min=3, samp_groupby=['sample'], acats=['tissue', 'disease', 'study', 'sample'], dropna=True, hit_col='Hit Percentage', num_dis=15)[source]#

Manipulate dataframe to calculate percentage of hits per sample

Parameters:

cell_min (int) – minimum number of cells per sample to be considered
samp_min (int) – minimum number of qualifying samples for disease to be considered.
samp_groupby (list) – groupby to consider for minimum number of cells
acats (list) – annotation categories that user cares about for plotting
dropna (bool) – drop all diseases named NA
hit_col (str) – hit column name to consider
num_dis (Optional[int]) – number of diseases to include in chart

Returns:

A sample level dataframe.

Return type:

pandas.DataFrame

Examples

>>> df = meta.samphit_df(num_dis=10)

subset_hq(cutoff=0.02, quality_column='prediction_nn_dist', mode='below')[source]#

Subset to cells below prediction dist cut-off

Parameters:

cutoff (float, default: 0.02) – Cut-off to use for quality.
quality_column (str, default: "prediction_nn_dist") – Column used to score quality metric.
mode (str, default: "below") – Mode in “below” for when you want lower and “above” for higher.

Examples

>>> meta.subset_hq()

subset_invivo(column_name='in_vivo', in_vivo_val=True)[source]#

Subset to in vivo cells using standard SCimilarity columns.

Examples

>>> meta.subset_invivo()

Parameters:

column_name (str)
in_vivo_val (bool | str)

top_cells(ncell, column_name, return_df=True)[source]#

Get dataframe including only top cells by category.

Paramaters#

column_name: str: Name of column to consider.
ncell: int: Number of cells to keep.
return_df: bool, default: True: Return a dataframe instead of modifying the class attribute.

returns:: A dataframe of top cells.
rtype:: pandas.DataFrame

Examples

meta_top100 = meta.top_cells(100, ‘GLI2’, return_df=True)

Parameters:

ncell (int)
column_name (str)
return_df (bool)

Return type:

pandas.DataFrame