Sample product marginals dataset
sample_marginals.Rd
Sample product marginals dataset
Arguments
- dat
Data.frame to sample from, must include only covariates.
- n
Number of observations to sample.
- seed
NULL
or seed for exact reproducibility.
Details
The product marginals dataset is a grid of values that is sampled independently per each column (feature) from the original dataset. The aim here is to disentangle the correlations between features and assess how each feature affects the model predictions individually. It will not contain new values per column, but it may contain new combinations of values not seen in the original data. One can also check how the model behaves if there are unseen observations (new combination of features). Note that the use of the product marginal dataset for model sculpting only works if the features are approximately additive for model predictions. In the quite rare case when they are not, the sculpted models using the product marginal dataset is expected to have significantly lower performance and the conclusions may be misleading.
One can also try using the original data instead of the product marginals for model sculpting and see how the results differ.
Examples
sample_marginals(mtcars, n = 5, seed = 543)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 14.3 6 145.0 113 3.07 5.250 17.98 0 0 4 1
#> 2 22.8 8 360.0 150 3.92 1.513 16.87 1 0 3 2
#> 3 15.5 8 301.0 97 3.15 1.835 16.70 1 0 3 6
#> 4 14.7 4 75.7 110 4.43 5.250 19.44 0 1 4 3
#> 5 19.7 8 472.0 66 2.76 2.780 17.60 1 0 4 8