Native Backend basic usage#
A simple dataframe#
Let’s start with a simple pandas dataframe you have obtained by summarizing some data. Here for simplicity we are going to enter the data manually rather than summarizing it.
Using the function pandas_to_report_html we can convert it to a nice html table.
import pandas as pd
from IPython.core.display import display
from pysummaries import pandas_to_report_html
# create the dataframe
nums = [["Male", "91 (67.9%)", "28 (49.1%)", "7 (50.0%)", "126 (61.5%)"],
["Female", "43 (32.1%)", "29 (50.9%)", "7 (50.0%)", "79 (38.5%)"]]
cols = ['Gender', 'Alive', 'Melanoma death', 'Non-melanoma death', 'Overall' ]
df = pd.DataFrame(nums, columns=cols)
# transform the dataframe to a nice html table
table = pandas_to_report_html(df)
# show the report table
table
Gender | Alive | Melanoma death | Non-melanoma death | Overall | |
---|---|---|---|---|---|
0 | Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
1 | Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
Hiding the row index#
As you can see however, the row indexes of the data frame are displayed. If this is not what you would like to have, there are two options. The first one is to hide the indexes with the parameter show_index=False. It looks much nicer now!
table = pandas_to_report_html(df, show_index=False)
table
Gender | Alive | Melanoma death | Non-melanoma death | Overall |
---|---|---|---|---|
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
Taking advantage of row indexes#
A second alternative is to set the categorical column as row index. For this particular example it may not be the best option because the word Gender is hidden, but this approach will let us build on more powerful visualizations later.
df2 = df.set_index('Gender')
table = pandas_to_report_html(df2)
table
Alive | Melanoma death | Non-melanoma death | Overall | |
---|---|---|---|---|
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
Setting Numbers of cases for columns#
Now that we set the row index, we can set the number of cases per columns to be displayed below the column names. We achieve this using the argument strat_numbers, which is a dictionary where they keys are the column names and the values are the number of cases
strat_numbers = {"Alive" :"134",
"Melanoma death": "57",
"Non-melanoma death": "14",
"Overall": "205"}
table = pandas_to_report_html(df2, strat_numbers=strat_numbers)
table
Alive
(N=134) |
Melanoma death
(N=57) |
Non-melanoma death
(N=14) |
Overall
(N=205) |
|
---|---|---|---|---|
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
Multiple Row indices#
What if we would like to show statistics breaken not only by gender, but also by age? In such case we can use a pandas Multindex to set the different levels.
There are many ways to create multi-indices in pandas. Here we create them from exististing columns, but there are other ways (for example pd.Multiindex.from_tuples or pd.Multiindex.from_product). You can learn more about pandas multi-indices here
# build the dataframe
nums = [["Gender","Male","91 (67.9%)", "28 (49.1%)", "7 (50.0%)", "126 (61.5%)"],
["Gender", "Female", "43 (32.1%)", "29 (50.9%)", "7 (50.0%)", "79 (38.5%)"],
["Age", "Mean (SD)", "50.0 (15.9)", "55.1 (17.9)", "65.3 (10.9)", "52.5 (16.7)"],
["Age", "Median [mix max]", "52.0 [4.00, 84.0]", "56.0 [14.0, 95.0]", "65.0 [49.0, 86.0]", "54.0 [4.00, 95.0]"]]
cols = ["Category1", 'Category2', 'Alive', 'Melanoma death', 'Non-melanoma death', 'Overall' ]
df = pd.DataFrame(nums, columns=cols)
# set the multiindex
df = df.set_index(['Category1', "Category2"])
# strat numbers
strat_numbers = {"Alive" :"134",
"Melanoma death": "57",
"Non-melanoma death": "14",
"Overall": "205"}
# beautify and show
table = pandas_to_report_html(df, strat_numbers=strat_numbers)
table
Alive
(N=134) |
Melanoma death
(N=57) |
Non-melanoma death
(N=14) |
Overall
(N=205) |
||
---|---|---|---|---|---|
Gender | |||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) | |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) | |
Age | |||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) | |
Median [mix max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] |
What if you need even more levels on rows? No problem! You can set an arbitrary number of index leves. Here for the sake of the example, let’s duplicate the data into section 1 and 2
nums = [["Section 1", "Gender","Male","91 (67.9%)", "28 (49.1%)", "7 (50.0%)", "126 (61.5%)"],
["Section 1", "Gender", "Female", "43 (32.1%)", "29 (50.9%)", "7 (50.0%)", "79 (38.5%)"],
["Section 1", "Age", "Mean (SD)", "50.0 (15.9)", "55.1 (17.9)", "65.3 (10.9)", "52.5 (16.7)"],
["Section 1", "Age", "Median [mix max]", "52.0 [4.00, 84.0]", "56.0 [14.0, 95.0]", "65.0 [49.0, 86.0]", "54.0 [4.00, 95.0]"],
["Section 2", "Gender","Male","91 (67.9%)", "28 (49.1%)", "7 (50.0%)", "126 (61.5%)"],
["Section 2", "Gender", "Female", "43 (32.1%)", "29 (50.9%)", "7 (50.0%)", "79 (38.5%)"],
["Section 2", "Age", "Mean (SD)", "50.0 (15.9)", "55.1 (17.9)", "65.3 (10.9)", "52.5 (16.7)"],
["Section 2", "Age", "Median [mix max]", "52.0 [4.00, 84.0]", "56.0 [14.0, 95.0]", "65.0 [49.0, 86.0]", "54.0 [4.00, 95.0]"]]
cols = ["Section", "Category1", 'Category2', 'Alive', 'Melanoma death', 'Non-melanoma death', 'Overall' ]
df = pd.DataFrame(nums, columns=cols)
# set the multiindex
df = df.set_index(["Section", 'Category1', "Category2"])
# strat numbers
strat_numbers = {"Alive" :"134",
"Melanoma death": "57",
"Non-melanoma death": "14",
"Overall": "205"}
# beautify and show
table = pandas_to_report_html(df, strat_numbers=strat_numbers)
table
Alive
(N=134) |
Melanoma death
(N=57) |
Non-melanoma death
(N=14) |
Overall
(N=205) |
||
---|---|---|---|---|---|
Section 1 | |||||
Gender | |||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) | |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) | |
Age | |||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) | |
Median [mix max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] | |
Section 2 | |||||
Gender | |||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) | |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) | |
Age | |||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) | |
Median [mix max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] |
Multiple Column indices#
In the same way as we set multiple levels for rows, we can use multiple leves for columns by setting the columns as a multi-index.
Here we set the columns and row multi-index manually from tuples, but there are other ways (for example pd.Multiindex.from_arrays or pd.Multiindex.from_product). You can learn more about pandas multi-indices here
Notice that strat_numbers keys has to change to reflect the new column multi-indices
nums = [["91 (67.9%)", "28 (49.1%)", "7 (50.0%)", "126 (61.5%)"],
["43 (32.1%)", "29 (50.9%)", "7 (50.0%)", "79 (38.5%)"],
["50.0 (15.9)", "55.1 (17.9)", "65.3 (10.9)", "52.5 (16.7)"],
["52.0 [4.00, 84.0]", "56.0 [14.0, 95.0]", "65.0 [49.0, 86.0]", "54.0 [4.00, 95.0]"],
["91 (67.9%)", "28 (49.1%)", "7 (50.0%)", "126 (61.5%)"],
["43 (32.1%)", "29 (50.9%)", "7 (50.0%)", "79 (38.5%)"],
["50.0 (15.9)", "55.1 (17.9)", "65.3 (10.9)", "52.5 (16.7)"],
["52.0 [4.00, 84.0]", "56.0 [14.0, 95.0]", "65.0 [49.0, 86.0]", "54.0 [4.00, 95.0]"]]
cols = [("Specific","Alive", ""),
("Specific","Death", "Melanoma death"),
("Specific","Death", "Non-melanoma death"),
("","Overall", "")]
rows = [("Section 1", "Gender", "Male"), ("Section 1", "Gender", "Female"),
("Section 1", "Age", "Mean (SD)"), ("Section 1", "Age", "Median [min max]"),
("Section 2", "Gender", "Male"), ("Section 2", "Gender", "Female"),
("Section 2", "Age", "Mean (SD)"), ("Section 2", "Age", "Median [min max]")]
df = pd.DataFrame(nums, columns=pd.MultiIndex.from_tuples(cols), index=pd.MultiIndex.from_tuples(rows))
# notice that now strat numbers must make reference to all the hierarchy of multi-indices
strat_numbers = {("Specific","Alive", ""):"134",
("Specific", "Death", "Melanoma death"):"57",
("Specific", "Death", "Non-melanoma death"):"14",
("", "Overall", ""):"205"}
# beautify and show
table = pandas_to_report_html(df, strat_numbers=strat_numbers)
table
Specific | |||||
---|---|---|---|---|---|
Alive | Death | Overall | |||
(N=134) |
Melanoma death
(N=57) |
Non-melanoma death
(N=14) |
(N=205) |
||
Section 1 | |||||
Gender | |||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) | |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) | |
Age | |||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) | |
Median [min max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] | |
Section 2 | |||||
Gender | |||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) | |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) | |
Age | |||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) | |
Median [min max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] |