Chapter 6, Question 2

For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

2(a) The lasso, relative to least squares, is:

2(a)i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

2(a)ii. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

2(a)iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

2(a)iv. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.

Regarding i: increasing flexibility increases variance.
Regarding iv: decreasing flexibility decreases variance.
Statement iii applies to the lasso.

2(b) Repeat (a) for ridge regression relative to least squares.

Statement iii applies to ridge regression.

2(c) Repeat (a) for non-linear methods relative to least squares.

Statement ii applies to non-linear methods.

Chapter 6, Question 9

In this exercise, we will predict the number of applications received using the other variables in the College data set.

9(a) Split the data set into a training set and a test set.

set.seed(2^17-1)
library(ISLR)
n.tot = nrow(College)
shuffle = sample(1:n.tot)
n.trn = round(0.8 * n.tot)
trn = College[shuffle[1:n.trn],]
tst = College[shuffle[(n.trn+1):n.tot],]

9(b) Fit a linear model using least squares on the training set, and report the test error obtained.

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
trControl = trainControl("repeatedcv", number = 5, repeats = 50)
model.glm = train(Apps ~ ., data = trn, method = "glm", trControl = trControl)
model.glm
## Generalized Linear Model 
## 
## 622 samples
##  17 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 50 times) 
## Summary of sample sizes: 497, 499, 497, 497, 498, 496, ... 
## Resampling results:
## 
##   RMSE      Rsquared 
##   1130.731  0.9188572
## 
## 
mean((tst$Apps - predict(model.glm, newdata = tst))^2)
## [1] 1247649

9(c) Fit a ridge regression model on the training set, with lambda chosen by cross-validation. Report the test error obtained.
9(d) Fit a ridge regression model on the training set, with lambda chosen by cross-validation. Report the test error obtained.

model.glmnet = train(Apps ~ ., data = trn, method = "glmnet", trControl = trControl, tuneGrid = expand.grid(alpha = seq(0, 1, 0.1), lambda = c(1, 10, 100)))
## Loading required package: glmnet
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-5
model.glmnet    # alpha = 0 for ridge regression, alpha = 1 for lasso
## glmnet 
## 
## 622 samples
##  17 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 50 times) 
## Summary of sample sizes: 498, 496, 499, 498, 497, 498, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda  RMSE      Rsquared 
##   0.0      1     1229.021  0.9099608
##   0.0     10     1229.021  0.9099608
##   0.0    100     1229.021  0.9099608
##   0.1      1     1134.487  0.9172683
##   0.1     10     1134.804  0.9174515
##   0.1    100     1156.368  0.9179035
##   0.2      1     1133.915  0.9173020
##   0.2     10     1134.008  0.9175924
##   0.2    100     1153.589  0.9181616
##   0.3      1     1133.352  0.9173404
##   0.3     10     1133.226  0.9177263
##   0.3    100     1151.810  0.9181914
##   0.4      1     1132.827  0.9173748
##   0.4     10     1132.475  0.9178497
##   0.4    100     1152.478  0.9177699
##   0.5      1     1132.509  0.9173871
##   0.5     10     1131.735  0.9179661
##   0.5    100     1154.180  0.9171267
##   0.6      1     1132.429  0.9173738
##   0.6     10     1131.018  0.9180686
##   0.6    100     1154.635  0.9166191
##   0.7      1     1132.350  0.9173631
##   0.7     10     1130.272  0.9181694
##   0.7    100     1153.106  0.9163848
##   0.8      1     1132.185  0.9173623
##   0.8     10     1129.556  0.9182535
##   0.8    100     1150.472  0.9163189
##   0.9      1     1132.014  0.9173683
##   0.9     10     1128.805  0.9183462
##   0.9    100     1147.584  0.9162722
##   1.0      1     1131.715  0.9173789
##   1.0     10     1128.126  0.9184219
##   1.0    100     1145.034  0.9161823
## 
## RMSE was used to select the optimal model using  the smallest value.
## The final values used for the model were alpha = 1 and lambda = 10.
plot(model.glmnet)

mean((tst$Apps - predict(model.glmnet, newdata = tst))^2)
## [1] 1242704

9(e) Fit a PCR model on the training set, with M chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

model.pcr = train(Apps ~ ., data = trn, method = "pcr", trControl = trControl, tuneGrid = expand.grid(ncomp = c(1:(ncol(trn)-1))))
## Loading required package: pls
## 
## Attaching package: 'pls'
## The following object is masked from 'package:stats':
## 
##     loadings
model.pcr
## Principal Component Analysis 
## 
## 622 samples
##  17 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 50 times) 
## Summary of sample sizes: 498, 496, 498, 497, 499, 499, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared  
##    1     3801.294  0.02097751
##    2     1815.887  0.79782108
##    3     1791.105  0.80350868
##    4     1757.499  0.81013587
##    5     1277.678  0.89843786
##    6     1194.210  0.90676880
##    7     1195.847  0.90645382
##    8     1197.849  0.90479897
##    9     1199.573  0.90449384
##   10     1168.731  0.90949458
##   11     1146.874  0.91298671
##   12     1148.171  0.91278272
##   13     1154.656  0.91235840
##   14     1153.297  0.91267723
##   15     1130.702  0.91620221
##   16     1134.364  0.91562785
##   17     1128.717  0.91690167
## 
## RMSE was used to select the optimal model using  the smallest value.
## The final value used for the model was ncomp = 17.
mean((tst$Apps - predict(model.pcr, newdata = tst))^2)
## [1] 1247649

9(f) Fit a PLS model on the training set, with M chosen by cross validation. Report the test error obtained, along with the value of M selected by cross-validation.

model.pls = train(Apps ~ ., data = trn, method = "pls", trControl = trControl, tuneGrid = expand.grid(ncomp = c(1:(ncol(trn)-1))))
model.pls
## Partial Least Squares 
## 
## 622 samples
##  17 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 50 times) 
## Summary of sample sizes: 497, 498, 498, 498, 497, 497, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared 
##    1     1778.728  0.8092939
##    2     1636.100  0.8386257
##    3     1416.261  0.8790881
##    4     1224.444  0.9069937
##    5     1192.859  0.9101475
##    6     1193.877  0.9095076
##    7     1194.862  0.9083396
##    8     1189.284  0.9089712
##    9     1180.327  0.9105525
##   10     1141.227  0.9165210
##   11     1135.013  0.9176930
##   12     1131.452  0.9187883
##   13     1122.976  0.9196680
##   14     1123.472  0.9195098
##   15     1124.836  0.9193441
##   16     1125.491  0.9192675
##   17     1120.640  0.9203231
## 
## RMSE was used to select the optimal model using  the smallest value.
## The final value used for the model was ncomp = 17.
mean((tst$Apps - predict(model.pls, newdata = tst))^2)
## [1] 1247649

9(g) Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

results = data.frame(
  cv.rmse = c(
    min(model.glm$results[,"RMSE"]),
    min(model.glmnet$results[,"RMSE"]),
    min(model.pcr$results[,"RMSE"]),
    min(model.pls$results[,"RMSE"])
  ),
  cv.rmse.sd = c(
    min(model.glm$results[,"RMSESD"]),
    min(model.glmnet$results[,"RMSESD"]),
    min(model.pcr$results[,"RMSESD"]),
    min(model.pls$results[,"RMSESD"])
  ),
  tst.mse = c(
    mean((tst$Apps - predict(model.glm, newdata = tst))^2),
    mean((tst$Apps - predict(model.glmnet, newdata = tst))^2),
    mean((tst$Apps - predict(model.pcr, newdata = tst))^2),
    mean((tst$Apps - predict(model.pls, newdata = tst))^2)
  )
)
results
##    cv.rmse cv.rmse.sd tst.mse
## 1 1130.731   239.3209 1247649
## 2 1128.126   229.0373 1242704
## 3 1128.717   234.4598 1247649
## 4 1120.640   235.2747 1247649

The cross validation error is similar for these models. The PLS model has the lowest cross-validation error; however, the cross validation error for the regularized glmnet model is within one standard error of the lowest cross validation error (and it has lower complexity than the PLS model).

We expect the root mean squared error for predictions of the lasso model to be 1115 applications.

Kaggle: https://inclass.kaggle.com/c/ml210-twitter

set.seed(2^17-1)
setwd("C:/Data/Twitter")
start.time = Sys.time()

trn_X = read.csv("trn_X.csv", header = F)
trn_y = scan("trn_y.txt")
tst_X = read.csv("tst_X.csv", header = F)

# vif() will go nuts!
relationships = array(0, c(choose(ncol(trn_X), 2), 4))
colnames(relationships) = c("var i", "var j", "correlation", "% equal")
index = 1
for (i in 1:(ncol(trn_X) - 1)) {
    for (j in (i+1):ncol(trn_X)) {
        r = abs(cor(trn_X[,i], trn_X[,j]))
        relationships[index,1] = i
        relationships[index,2] = j
        relationships[index,3] = r
        relationships[index,4] = mean(trn_X[,i] == trn_X[,j])
        index = index + 1
    }
}
relationships[relationships[,"correlation"] > 0.98,]
##       var i var j correlation   % equal
##  [1,]     1    29   0.9966277 0.6583386
##  [2,]     1    71   0.9999940 0.8379700
##  [3,]     2    30   0.9967411 0.6641543
##  [4,]     2    72   0.9999946 0.8461174
##  [5,]     3    31   0.9970468 0.6487544
##  [6,]     3    73   0.9999953 0.8425306
##  [7,]     4    32   0.9972989 0.6354736
##  [8,]     4    74   0.9999959 0.8340129
##  [9,]     5    33   0.9975812 0.6228890
## [10,]     5    75   0.9999962 0.8254608
## [11,]     6    34   0.9977051 0.6055551
## [12,]     6    76   0.9999966 0.8177660
## [13,]     7    35   0.9977106 0.6058088
## [14,]     7    77   0.9999964 0.8133665
## [15,]    22    43   0.9871472 0.8379700
## [16,]    23    44   0.9897143 0.8461174
## [17,]    24    45   0.9926254 0.8425306
## [18,]    25    46   0.9913779 0.8340129
## [19,]    26    47   0.9902731 0.8254608
## [20,]    27    48   0.9834683 0.8177660
## [21,]    28    49   0.9822105 0.8133665
## [22,]    29    71   0.9968107 0.6678028
## [23,]    30    72   0.9969087 0.6732688
## [24,]    31    73   0.9971936 0.6570184
## [25,]    32    74   0.9974353 0.6434736
## [26,]    33    75   0.9977039 0.6312319
## [27,]    34    76   0.9978209 0.6140763
## [28,]    35    77   0.9978283 0.6148238
# redundant feature sets!
# 1, 29, 71
# 2, 30, 72
# 3, 31, 73
# 4, 32, 74
# 5, 33, 75
# 6, 34, 76
# 7, 35, 77
# 22, 43
# 23, 44
# 24, 45
# 25, 46
# 26, 47
# 27, 48
# 28, 49

trn_X = trn_X[, c(1:28, 36:42, 50:70)]
tst_X = tst_X[, c(1:28, 36:42, 50:70)]

trn = data.frame(trn_X, y = trn_y)
model = lm(y ~ ., data = trn)
library(car)
outliers = as.integer(names(outlierTest(model)$rstudent))
outliers
##  [1]  34642  63575 122239 185980 206771 218091 245314 255108 271907 273154
leverage = which(cooks.distance(model) > 1)
leverage
##  29342  34642  63575  66084  89884 122239 123068 123989 134191 138117 
##  29342  34642  63575  66084  89884 122239 123068 123989 134191 138117 
## 177483 185980 204689 266127 271907 273154 
## 177483 185980 204689 266127 271907 273154
remove = unique(sort(c(outliers, leverage)))
trn_X = trn_X[-remove,]
trn_y = trn_y[-remove]

# unnormalized Haar wavelet matrix
H = matrix(c( 1,  1,  1,  1,  1,  1,  1,  1,    # sum
              1,  1,  1,  1, -1, -1, -1, -1,    # difference of two halves
              1,  1, -1, -1,  0,  0,  0,  0,    # ...
              0,  0,  0,  0,  1,  1, -1, -1,
              1, -1,  0,  0,  0,  0,  0,  0,
              0,  0,  1, -1,  0,  0,  0,  0,
              0,  0,  0,  0,  1, -1,  0,  0,
              0,  0,  0,  0,  0,  0,  1, -1),
           nrow = 8, byrow = T)

# padded with leading zero for 7 values (effectively)
Hp = H[2:8,2:8]

trn_X_new = cbind(as.matrix(trn_X[, 0+1:7]) %*% t(Hp),
                  as.matrix(trn_X[, 7+1:7]) %*% t(Hp),
                  as.matrix(trn_X[,14+1:7]) %*% t(Hp),
                  as.matrix(trn_X[,21+1:7]) %*% t(Hp),
                  as.matrix(trn_X[,28+1:7]) %*% t(Hp),
                  as.matrix(trn_X[,35+1:7]) %*% t(Hp),
                  as.matrix(trn_X[,42+1:7]) %*% t(Hp),
                  as.matrix(trn_X[,49+1:7]) %*% t(Hp),
                  apply(trn_X[, 0+1:7], 1, median),
                  apply(trn_X[, 7+1:7], 1, median),
                  apply(trn_X[,14+1:7], 1, median),
                  apply(trn_X[,21+1:7], 1, median),
                  apply(trn_X[,28+1:7], 1, median),
                  apply(trn_X[,35+1:7], 1, median),
                  apply(trn_X[,42+1:7], 1, median),
                  apply(trn_X[,49+1:7], 1, median))

tst_X_new = cbind(as.matrix(tst_X[, 0+1:7]) %*% t(Hp),
                  as.matrix(tst_X[, 7+1:7]) %*% t(Hp),
                  as.matrix(tst_X[,14+1:7]) %*% t(Hp),
                  as.matrix(tst_X[,21+1:7]) %*% t(Hp),
                  as.matrix(tst_X[,28+1:7]) %*% t(Hp),
                  as.matrix(tst_X[,35+1:7]) %*% t(Hp),
                  as.matrix(tst_X[,42+1:7]) %*% t(Hp),
                  as.matrix(tst_X[,49+1:7]) %*% t(Hp),
                  apply(tst_X[, 0+1:7], 1, median),
                  apply(tst_X[, 7+1:7], 1, median),
                  apply(tst_X[,14+1:7], 1, median),
                  apply(tst_X[,21+1:7], 1, median),
                  apply(tst_X[,28+1:7], 1, median),
                  apply(tst_X[,35+1:7], 1, median),
                  apply(tst_X[,42+1:7], 1, median),
                  apply(tst_X[,49+1:7], 1, median))

vars = c("NCD", "AI", "ASNA", "BL", "ASNAC", "AT", "NA", "ADL")
type = c("H1", "H2", "H3", "H4", "H5", "H6", "H7", "M")
namelist = c(paste(vars[1], type, sep = "_"),
             paste(vars[2], type, sep = "_"),
             paste(vars[3], type, sep = "_"),
             paste(vars[4], type, sep = "_"),
             paste(vars[5], type, sep = "_"),
             paste(vars[6], type, sep = "_"),
             paste(vars[7], type, sep = "_"),
             paste(vars[8], type, sep = "_"))
colnames(trn_X_new) = namelist
colnames(tst_X_new) = namelist

max.p = ncol(trn_X_new) / 2
library(leaps)
subsets = regsubsets(trn_X_new, trn_y, nvmax = max.p, method = "backward")

p = max.p
results = array(0, c(p, 28))
colnames(results) = list("Variable Count", "RMSE", "RMSE SD",
                         "RMSE R01", "RMSE R02", "RMSE R03", "RMSE R04", "RMSE R05",
                         "RMSE R06", "RMSE R07", "RMSE R08", "RMSE R09", "RMSE R10",
                         "RMSE R11", "RMSE R12", "RMSE R13", "RMSE R14", "RMSE R15",
                         "RMSE R16", "RMSE R17", "RMSE R18", "RMSE R19", "RMSE R20",
                         "RMSE R21", "RMSE R22", "RMSE R23", "RMSE R24", "RMSE R25")
best.loss = Inf
best.subset = NA
best.model = NA
start.time = Sys.time()
library(caret)
trControl = trainControl("repeatedcv", number = 5, repeats = 5)
for (i in 1:p) {
    selected = which(summary(subsets)$outmat[i,] == "*")
    model = train(as.matrix(trn_X_new[,selected]), trn_y, method = "glm", trControl = trControl)
    results[i,1] = i
    results[i,2] = model$results[,"RMSE"]
    results[i,3] = model$results[,"RMSESD"]
    results[i,4:28] = model$resample[,"RMSE"]
    if (results[i,2] < best.loss) {
        best.loss = results[i,2]
        best.subset = selected
        best.model = model
    }
    if (i == 1) {
        trControl = trainControl("repeatedcv", number = 5, repeats = 5, index = model$control$index)
    }
}
results = results[order(results[,2]),]
results[,1:3]
##       Variable Count     RMSE   RMSE SD
##  [1,]             23 124.0509  2.850015
##  [2,]             25 124.0515  2.825771
##  [3,]             24 124.0549  2.823891
##  [4,]             26 124.0633  2.829938
##  [5,]             27 124.0652  2.835105
##  [6,]             22 124.0711  2.902515
##  [7,]             28 124.0755  2.839648
##  [8,]             32 124.0887  2.824201
##  [9,]             29 124.0895  2.831793
## [10,]             31 124.0895  2.825100
## [11,]             30 124.0933  2.830076
## [12,]             21 124.1285  2.912591
## [13,]             20 124.1992  2.941170
## [14,]             19 124.2875  2.974100
## [15,]             18 124.3825  2.972516
## [16,]             17 124.4857  3.018326
## [17,]             16 124.4980  3.025227
## [18,]             15 124.6198  3.108199
## [19,]             14 124.7121  3.160405
## [20,]             13 124.9464  3.084340
## [21,]             12 125.1789  3.128515
## [22,]             11 125.7637  3.150650
## [23,]             10 127.4467  3.349085
## [24,]              9 129.8103  3.535201
## [25,]              8 130.5367  3.672816
## [26,]              7 133.5558  3.579327
## [27,]              6 137.1651  3.735532
## [28,]              5 140.8570  4.252061
## [29,]              4 147.7805  5.507917
## [30,]              3 164.6736  8.504385
## [31,]              2 178.6657  8.131986
## [32,]              1 241.3267 10.763414
resample.count = nrow(best.model$resample)
best.model$results
##   parameter     RMSE  Rsquared   RMSESD  RsquaredSD
## 1      none 124.0509 0.9534889 2.850015 0.003240858
best.model$resample
##        RMSE  Rsquared   Resample
## 1  124.5772 0.9524852 Fold1.Rep1
## 2  127.1163 0.9563523 Fold2.Rep1
## 3  120.2275 0.9519246 Fold3.Rep1
## 4  121.2666 0.9559187 Fold4.Rep1
## 5  127.1506 0.9513295 Fold5.Rep1
## 6  125.6395 0.9516227 Fold1.Rep2
## 7  123.1399 0.9532161 Fold2.Rep2
## 8  122.6864 0.9561940 Fold3.Rep2
## 9  122.1094 0.9501842 Fold4.Rep2
## 10 127.5716 0.9566635 Fold5.Rep2
## 11 126.5753 0.9539469 Fold1.Rep3
## 12 121.2510 0.9480023 Fold2.Rep3
## 13 123.1746 0.9555970 Fold3.Rep3
## 14 126.1254 0.9483468 Fold4.Rep3
## 15 123.0001 0.9602151 Fold5.Rep3
## 16 129.7803 0.9500548 Fold1.Rep4
## 17 122.2231 0.9540476 Fold2.Rep4
## 18 122.9460 0.9525887 Fold3.Rep4
## 19 121.0668 0.9608974 Fold4.Rep4
## 20 123.5350 0.9500228 Fold5.Rep4
## 21 127.8841 0.9557397 Fold1.Rep5
## 22 121.5952 0.9517692 Fold2.Rep5
## 23 118.2577 0.9521097 Fold3.Rep5
## 24 125.3027 0.9547981 Fold4.Rep5
## 25 127.0693 0.9531948 Fold5.Rep5
mean(best.model$resample[,"RMSE"])
## [1] 124.0509
sd(best.model$resample[,"RMSE"])
## [1] 2.850015
# standard error of the mean: check out "?oneSE" 
sd(best.model$resample[,"RMSE"]) / sqrt(resample.count)
## [1] 0.5700031
# two (independent) sample t test
t.test(results[1,4:28], results[2,4:28])
## 
##  Welch Two Sample t-test
## 
## data:  results[1, 4:28] and results[2, 4:28]
## t = -0.00079593, df = 47.996, p-value = 0.9994
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.614548  1.613270
## sample estimates:
## mean of x mean of y 
##  124.0509  124.0515
difference = results[1,"RMSE"] - results[2,"RMSE"]
difference.stderr = sqrt((results[1,"RMSE SD"]^2) / resample.count +
                         (results[2,"RMSE SD"]^2) / resample.count)
test.statistic = difference / difference.stderr
test.statistic
##          RMSE 
## -0.0007959327
df = (difference.stderr^4) / 
     ((results[1,"RMSE SD"]^4) / ((resample.count^2)*(resample.count-1)) +
      (results[2,"RMSE SD"]^4) / ((resample.count^2)*(resample.count-1)))
df
## RMSE SD 
## 47.9965
2*pt(-abs(test.statistic), df)
##      RMSE 
## 0.9993682
# one (dependent) sample t test: note the use of index for trainControl()
difference = results[1,4:28] - results[2,4:28]
t.test(difference)
## 
##  One Sample t-test
## 
## data:  difference
## t = -0.035759, df = 24, p-value = 0.9718
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.03751298  0.03623521
## sample estimates:
##    mean of x 
## -0.000638883
mean.difference = mean(difference)
difference.stderr = sqrt(var(difference) / resample.count)
test.statistic = mean.difference / difference.stderr
test.statistic
## [1] -0.03575924
df = resample.count - 1
df
## [1] 24
2*pt(-abs(test.statistic), df)
## [1] 0.9717701
# comparing the t and normal distributions
x = seq(-4, 4, 0.001)
plot(x, dt(x, 30), typ = "l", lty = "dotted", xlab = "", ylab = "")
lines(x, dnorm(x))
lines(x, dt(x, 5), lty="dotted", col = "red")
legend("topleft", legend = c("t(5)", "t(30)", "norm(0,1)"),
       lty = c("dotted", "dotted", "solid"),
       col = c("red", "black", "black"))

tuneGrid = expand.grid(alpha = c(0, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 1),
                       lambda = c(0, 0.1, 1, 10, 100))
model = train(as.matrix(trn_X_new[,best.subset]), trn_y, method = "glmnet", trControl = trControl, tuneGrid = tuneGrid)
model
## glmnet 
## 
## 291605 samples
##     23 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 233284, 233285, 233284, 233283, 233284, 233286, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda  RMSE      Rsquared 
##   0.00     0.0   133.1263  0.9469664
##   0.00     0.1   133.1263  0.9469664
##   0.00     1.0   133.1263  0.9469664
##   0.00    10.0   133.1263  0.9469664
##   0.00   100.0   139.4779  0.9422774
##   0.10     0.0   124.5517  0.9531257
##   0.10     0.1   124.5517  0.9531257
##   0.10     1.0   124.5642  0.9531170
##   0.10    10.0   127.8952  0.9506725
##   0.10   100.0   146.4523  0.9375175
##   0.25     0.0   124.2576  0.9533400
##   0.25     0.1   124.2576  0.9533400
##   0.25     1.0   124.6625  0.9530466
##   0.25    10.0   128.7765  0.9500575
##   0.25   100.0   151.5284  0.9360753
##   0.50     0.0   124.3212  0.9532924
##   0.50     0.1   124.3212  0.9532924
##   0.50     1.0   125.0899  0.9527347
##   0.50    10.0   130.8518  0.9485646
##   0.50   100.0   163.3900  0.9324531
##   0.75     0.0   124.5133  0.9531590
##   0.75     0.1   124.5133  0.9531590
##   0.75     1.0   125.4925  0.9524360
##   0.75    10.0   133.8025  0.9463458
##   0.75   100.0   177.2371  0.9283531
##   0.90     0.0   124.4005  0.9532382
##   0.90     0.1   124.4054  0.9532352
##   0.90     1.0   125.7694  0.9522303
##   0.90    10.0   135.0048  0.9454623
##   0.90   100.0   187.9930  0.9229503
##   0.95     0.0   124.3848  0.9532486
##   0.95     0.1   124.3924  0.9532433
##   0.95     1.0   125.8666  0.9521577
##   0.95    10.0   135.3254  0.9452351
##   0.95   100.0   192.1861  0.9201155
##   1.00     0.0   124.4168  0.9532236
##   1.00     0.1   124.4243  0.9532192
##   1.00     1.0   125.9606  0.9520874
##   1.00    10.0   135.6534  0.9450020
##   1.00   100.0   195.7072  0.9179372
## 
## RMSE was used to select the optimal model using  the smallest value.
## The final values used for the model were alpha = 0.25 and lambda = 0.1.
predictions = predict(model, tst_X_new[, best.subset])
predictions[predictions < 0] = 0.0
output = data.frame(Index = 1:length(predictions), Prediction = predictions)
write.csv(output, "predictions.csv", quote=F, row.names = F)

Sys.time() - start.time
## Time difference of 18.7484 mins
lambda = model$results[which.min(model$results[,"RMSE"]),"lambda"]
coef(model$finalModel, s = lambda)
## 24 x 1 sparse Matrix of class "dgCMatrix"
##                         1
## (Intercept)  2.166374e+00
## NCD_H1      -2.395074e-01
## NCD_H2      -1.960525e-01
## NCD_H3      -2.254694e-01
## NCD_H4      -4.295290e-01
## NCD_H7      -3.657048e-01
## NCD_M       -1.395812e-01
## AI_H2       -7.892024e-02
## AI_H3        1.248439e-01
## AI_H4       -1.681806e-01
## AI_H7        5.151869e+04
## AI_M         3.740595e+04
## ASNA_H1      9.012873e+04
## ASNA_H2     -2.326409e+04
## ASNA_H3      8.965083e+04
## ASNA_H5      1.512967e+05
## BL_M         4.176294e+04
## ASNAC_H1    -1.397524e+05
## AT_H3        1.601273e-01
## AT_H5        5.447866e-02
## AT_H6        6.638809e-02
## AT_H7        1.189471e-01
## ADL_H1       1.739765e-01
## ADL_H3       1.795228e+05