Neural networks applied to the internet demographics dataset. Key point. Non-linear methods often have an inherent "instability". They are sensitive to small changes in initial conditions. Here, a component of the initial conditions are the starting weights for the network. If the starting weights change by a small amount then the fitted weights can change by a large amount. Approach: fit the network many times -- each time from a new starting position. Let the networks "vote" on the classification. Calibrate the voting rule within sample and see how it works out of sample. We run the net 10 times. Our rule is that if 5 or more nets vote for a mailing then we mail that person. This is an example of a "majority rule" -- these are popular ways of combining forecasts. The main S-Plus loop for refitting the network. for(i in 1: 10){ nnet.out <- nnet(as.factor(Newbie) ~ Age + Household.Income + Gender + Major.Occupation + Marital.Status + Education.Attainment, data =uva, maxit=100, subset=samp, size=9, skip=T, decay=.0001) votes <- votes + ifelse(predict(nnet.out,uva,type="raw") > .3333,1,0) } First some naive approaches. 1. Since the marginal probability of being a Newbie is 1/4, and we need to have this greater than 1/3 to make a profit mail to no-one. Expected profit = $0. 2. Since the marginal probability of being a Newbie is 1/4 randomly mail to 1/4 of people. Expected profit on 11414 people = -$7129. 3. Run a logistic regression with interaction. My logistic regression with interaction makes 1481 * 30 - 3112 * 10 = $13310. 3 Nodes in hidden layer. Training sample calibration table V O T E S 0 1 2 3 4 5 6 7 8 9 10 OLD 3695 423 231 188 194 125 121 165 182 203 272 NEW 420 128 67 80 97 80 86 111 130 165 445 Interpretation: 717 points had unanimous "Newbie" verdicts. All the nets voted on theses points as Newbies. Of these 445 actually were Newbies, whereas 272 were not. With our specific cost function, it makes sense to mail (expected profit > 0) if the probability of being a Newbie is greater than one third. This cutoff happens at around 5 votes. See how the rule works out of sample: Validation sample table V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 5207 569 311 298 277 202 210 256 263 314 522 1 783 200 132 148 149 137 154 153 201 289 639 The final 2 x 2 table is Predict 0 1 0 6662 1767 1 1412 1573 giving a profit of Profit: $13790 A 3.6 percent improvement over the logistic regression. Network with 5 nodes in the hidden layer. Size = 5 Training sample V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 3629 416 282 212 200 187 170 185 157 149 212 1 363 116 97 95 72 87 101 136 143 191 408 Validation sample V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 5069 584 383 300 245 305 247 298 232 339 427 1 709 221 170 158 158 181 191 230 217 252 498 Profit: $12900 10 nodes in hidden layer. Training sample V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 3314 588 397 322 247 227 186 129 132 113 144 1 225 94 103 106 103 106 129 125 152 225 441 Validation sample V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 4445 775 516 422 396 375 348 272 265 277 338 1 555 253 251 205 194 221 227 220 215 227 417 0 1 0 6554 1875 1 1458 1527 Profit: $11,790. Network with 20 nodes. Training sample V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 3494 577 336 234 207 162 160 117 107 176 229 1 258 97 83 54 87 93 111 96 128 255 547 Validation sample V O T E S 0 1 2 3 4 5 6 7 8 9 10 0 4506 758 519 381 318 265 268 228 304 387 495 1 659 258 212 198 179 150 188 176 183 290 492 0 1 0 6482 1947 1 1506 1479 Profit = $10,110 Summary: Logistic. NN-1 NN-2 NN-3 NN-5 NN-10 NN-20 Profit. 13310 13870 14000 13790 12900 11790 10110. Over complexity has 2 problems: 1. Seriously exaggerated within sample performance (too many degrees of freedom). 2. Bottom line worse out of sample performance. PARSIMONY.