Skip to contents

This function takes as input two data frames: train_df and test_df that contain the training and test data for a prediction model, respectively. It also takes a character string specifying the performance metric to calculate "metric", a character string specifying the prediction model to use "method", an integer specifying the minimum number of neighbors to consider when using the "kknn" method "kmin", a character string specifying the name of the target variable in the data frames target_variable, and a character vector specifying the names of the predictor variables in the data frames "predictors_vector". The function returns a data frame containing the specified performance metric, the predictor variables, the prediction model method, and the value of kmin "if applicable".

Usage

create_metric_df(
  train_df,
  test_df,
  metric,
  method,
  kmin = "NA",
  target_variable,
  predictors_vector
)

Arguments

train_df

A data frame containing the training data for the prediction model

test_df

A data frame containing the test data for the prediction model

metric

A character string specifying the performance metric to calculate "rmse", "rsq", or "mae"

method

A character string specifying the prediction model to use lm" or "kknn"

kmin

An integer specifying the minimum number of neighbors to consider when using the "kknn" method - ignored if "lm" method is used

target_variable

A character string specifying the name of the target variable in the data frames

predictors_vector

A character vector specifying the names of the predictor variables in the data frames

Value

A data frame containing the specified performance metric, the predictor variables, the prediction model method, and the value of kmin i"f applicable"

Details

The function calculates the specified performance metric :"rmse", "rsq", or "mae" for the prediction model specified by the method argument: "lm" or "kknn". If the method argument is "kknn", the function uses the kmin argument to determine the minimum number of neighbors to consider if "kmin" variable is not specified. If the kmin argument is not specified it is set to "NA". The function assumes that the target variable and predictor variables have already been identified in the data

Examples

# Load data
data(mtcars)

# Example 1: Using single variable regression with lm method
train_df <- target_df(mtcars[1:16, ], 'gear', "am", "vs")
test_df <- target_df(mtcars[17:32, ], 'gear', "am", "vs")
create_metric_df(train_df, test_df, metric = "rmse", method = "lm", target_variable = "gear", predictors_vector = "am")
#>   outcome predictor metric metric_value method kmin
#> 1    gear        am   rmse    0.5899178     lm   NA

# Example 2: Using k-nearest neighbor method with optimal k
create_metric_df(train_df, test_df, metric = "mae", method = "kknn", kmin = 3, target_variable = "gear", predictors_vector = c("am", "vs"))
#>   outcome predictor metric metric_value method kmin
#> 1    gear   am + vs    mae    0.3541667   kknn    3