Create a dataframe of performance metrics for a prediction model
create_metric_df.Rd
This function takes as input two data frames: train_df and test_df that contain the training and test data for a prediction model, respectively. It also takes a character string specifying the performance metric to calculate "metric", a character string specifying the prediction model to use "method", an integer specifying the minimum number of neighbors to consider when using the "kknn" method "kmin", a character string specifying the name of the target variable in the data frames target_variable, and a character vector specifying the names of the predictor variables in the data frames "predictors_vector". The function returns a data frame containing the specified performance metric, the predictor variables, the prediction model method, and the value of kmin "if applicable".
Usage
create_metric_df(
train_df,
test_df,
metric,
method,
kmin = "NA",
target_variable,
predictors_vector
)
Arguments
- train_df
A data frame containing the training data for the prediction model
- test_df
A data frame containing the test data for the prediction model
- metric
A character string specifying the performance metric to calculate "rmse", "rsq", or "mae"
- method
A character string specifying the prediction model to use lm" or "kknn"
- kmin
An integer specifying the minimum number of neighbors to consider when using the "kknn" method - ignored if "lm" method is used
- target_variable
A character string specifying the name of the target variable in the data frames
- predictors_vector
A character vector specifying the names of the predictor variables in the data frames
Value
A data frame containing the specified performance metric, the predictor variables, the prediction model method, and the value of kmin i"f applicable"
Details
The function calculates the specified performance metric :"rmse", "rsq", or "mae" for the prediction model specified by the method argument: "lm" or "kknn". If the method argument is "kknn", the function uses the kmin argument to determine the minimum number of neighbors to consider if "kmin" variable is not specified. If the kmin argument is not specified it is set to "NA". The function assumes that the target variable and predictor variables have already been identified in the data
Examples
# Load data
data(mtcars)
# Example 1: Using single variable regression with lm method
train_df <- target_df(mtcars[1:16, ], 'gear', "am", "vs")
test_df <- target_df(mtcars[17:32, ], 'gear', "am", "vs")
create_metric_df(train_df, test_df, metric = "rmse", method = "lm", target_variable = "gear", predictors_vector = "am")
#> outcome predictor metric metric_value method kmin
#> 1 gear am rmse 0.5899178 lm NA
# Example 2: Using k-nearest neighbor method with optimal k
create_metric_df(train_df, test_df, metric = "mae", method = "kknn", kmin = 3, target_variable = "gear", predictors_vector = c("am", "vs"))
#> outcome predictor metric metric_value method kmin
#> 1 gear am + vs mae 0.3541667 kknn 3