SIGnature.meta#

class SIGnature.meta.Meta(df)[source]#

Bases: object

A class for working with cell metadata files and output scores

Parameters:

df (pandas.Dataframe) – input dataframe to consider

Examples

>>> meta_df = pd.read_csv(meta_path)
>>> meta = Meta(meta_df)
add_hits(column_name, mode='percentile', cut_val=95.0, hit_type='above', string_append='__hit')[source]#

Add hits above percentile or quantile in a numerical column.

Parameters:
  • column_name (str) – Name of column used for cut-offs.

  • mode (str, default: "percentile") – Mode in one of “percentile”, “quantile”, or “value” to calculate cut-offs.

  • cut_val (Union[float, int], default: 95.0) – The cut-off value used by percentile or quantile function on the column of interest.

  • hit_type (str, default: "above") – Hit type in one of “above” for hits above cut-off or “below” for hits below.

  • string_append (str, default: "__hit") – String to append to original category name to denote hits.

Examples

>>> meta.add_hits(column_name="GLI2")
add_scores(score_dict, mode='value')[source]#

Add attribution scores to metadata. Accounts for .npz sparse files (common for attributions) or numpy arrays.

Parameters:
  • score_dict (dict) – Dictionary where keys are category names and values are either scores or file locations.

  • mode (str, default: "value") – Mode in one of “value” or “file” for how to load data.

Examples

>>> score_dict = {"GLI1": "/data/GLI1_score.npz", "GLI2": "/data/GLI2_score.npy"}
>>> meta.add_scores_by_file(score_dict, mode='file')
append(df)[source]#

Append dataframe to current meta object

Examples

>>> meta.append(df)
cat_by_min(column_name='prediction', mode='percent', cut_val=1.0)[source]#

Get a list of categories with at least X% of a column

Parameters:
  • column_name (str) – Name of column used for cut-offs.

  • mode (str) – Mode in “percent” to at percent of all cells or “count” to look at minimum count

  • cut_val (float) – Cut-off percent or count value to be used.

Returns:

A list of valid categories in column.

Return type:

list

Examples

>>> cats = meta.cat_by_min(column_name="prediction", mode="count", cut_val=50)
columns()[source]#

Return columns of current dataframe

Examples

>>> columns = meta.columns()
Return type:

list

copy()[source]#

Return copy of current meta object

Examples

>>> new_meta = meta.copy()
ncell()[source]#

Returns number of cells in current meta

Examples

>>> ncell = meta.ncell()
Return type:

int

samphit_boxplot(title='Hits Across Diseases per Sample', hit_label='Hits', swarm=False, title_fs=16, dotsize=3, fe=1, figsize=(6, 4), filename=None)[source]#

Plots boxplot and swarmplot for disease

Parameters:
  • df (pandas.DataFrame) – A pandas dataframe that contains sample where each row has a sample labeled by its “disease” and “Hit Percentage” that indicates what proportion of cells are hits.

  • title (str, default: "Hits Across Diseases per Sample") – Plot title

  • hit_label (str, default: "Hits") – label for what to consider hits

  • swarm (bool, default: False) – whether to include a swarmplot on top as well or no

  • title_fs (Union[int, float], default: 16) – font size for the title

  • dotsize (Union[int, float], default: 3,) – dot size for swarmplot

  • fe (Union[int, float], default: 1) – scaling factor for various sizes

  • figsize (Tuple, default: (6,4)) – figure size

  • filename (Optional[str], default: None) – file name if want to save file

Examples

>>> Meta.samphit_boxplot(df=samphit_df)
samphit_df(cell_min=50, samp_min=3, samp_groupby=['sample'], acats=['tissue', 'disease', 'study', 'sample'], dropna=True, hit_col='Hit Percentage', num_dis=15)[source]#

Manipulate dataframe to calculate percentage of hits per sample

Parameters:
  • cell_min (int) – minimum number of cells per sample to be considered

  • samp_min (int) – minimum number of qualifying samples for disease to be considered.

  • samp_groupby (list) – groupby to consider for minimum number of cells

  • acats (list) – annotation categories that user cares about for plotting

  • dropna (bool) – drop all diseases named NA

  • hit_col (str) – hit column name to consider

  • num_dis (Optional[int]) – number of diseases to include in chart

Returns:

A sample level dataframe.

Return type:

pandas.DataFrame

Examples

>>> df = meta.samphit_df(num_dis=10)
subset_hq(cutoff=0.02, quality_column='prediction_nn_dist', mode='below')[source]#

Subset to cells below prediction dist cut-off

Parameters:
  • cutoff (float, default: 0.02) – Cut-off to use for quality.

  • quality_column (str, default: "prediction_nn_dist") – Column used to score quality metric.

  • mode (str, default: "below") – Mode in “below” for when you want lower and “above” for higher.

Examples

>>> meta.subset_hq()
subset_invivo(column_name='in_vivo', in_vivo_val=True)[source]#

Subset to in vivo cells using standard SCimilarity columns.

Examples

>>> meta.subset_invivo()
Parameters:
  • column_name (str)

  • in_vivo_val (bool | str)

top_cells(ncell, column_name, return_df=True)[source]#

Get dataframe including only top cells by category.

Paramaters#

column_name: str

Name of column to consider.

ncell: int

Number of cells to keep.

return_df: bool, default: True

Return a dataframe instead of modifying the class attribute.

returns:

A dataframe of top cells.

rtype:

pandas.DataFrame

Examples

meta_top100 = meta.top_cells(100, ‘GLI2’, return_df=True)

Parameters:
  • ncell (int)

  • column_name (str)

  • return_df (bool)

Return type:

pandas.DataFrame