Neural networks applied to the internet demographics dataset.
 

Key point. Non-linear methods often have an inherent "instability".
They are sensitive to small changes in initial conditions.

Here, a component of the initial conditions are the starting weights
for the network.

If the starting weights change by a small amount then the fitted weights can
change by a large amount.

Approach: fit the network many times -- each time from a new starting position.

Let the networks "vote" on the classification.

Calibrate the voting rule within sample and see how it works out of sample.


We run the net 10 times. Our rule is that if 5 or more nets vote for
a mailing then we mail that person.

This is an example of a "majority rule" -- these are popular ways of 
combining forecasts.


The main S-Plus loop for refitting the network.

for(i in 1: 10){
nnet.out <- nnet(as.factor(Newbie) ~ Age + 
                                     Household.Income + 
                                     Gender + 
                                     Major.Occupation + 
                                     Marital.Status + 
                                     Education.Attainment, 
                 data =uva,
                 maxit=100,
                 subset=samp,
		 size=9,
                 skip=T,
                 decay=.0001)

votes <- votes + ifelse(predict(nnet.out,uva,type="raw") > .3333,1,0)
} 	


First some naive approaches.

1. Since the marginal probability of being a Newbie is 1/4, and we need
to have this greater than 1/3 to make a profit mail to no-one.
Expected profit = $0.


2. Since the marginal probability of being a Newbie is 1/4 
randomly mail to 1/4 of people. Expected profit on 11414 people = -$7129.

3. Run a logistic regression with interaction.


My logistic regression with interaction makes
 
1481 * 30 - 3112 * 10 = $13310.


3 Nodes in hidden layer.


Training sample calibration table

                     V O T E S
       0   1   2   3   4   5   6   7   8   9  10 
OLD 3695 423 231 188 194 125 121 165 182 203 272
NEW  420 128  67  80  97  80  86 111 130 165 445

Interpretation: 717 points had unanimous "Newbie" verdicts. 
All the nets voted on theses points as Newbies. Of these 445 
actually were Newbies, whereas 272 were not.

With our specific cost function, it makes sense to mail 
(expected profit > 0) if the probability of being a Newbie is greater 
than one third. This cutoff happens at around 5 votes.


See how the rule works out of sample:


Validation sample table

                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 5207 569 311 298 277 202 210 256 263 314 522
1  783 200 132 148 149 137 154 153 201 289 639


The final 2 x 2 table is

     Predict
     0    1 
0 6662 1767
1 1412 1573

giving a profit of

Profit:  $13790
 
A 3.6 percent improvement over the logistic regression.


Network with 5 nodes in the hidden layer.

Size = 5

Training sample

                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 3629 416 282 212 200 187 170 185 157 149 212
1  363 116  97  95  72  87 101 136 143 191 408

Validation sample

                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 5069 584 383 300 245 305 247 298 232 339 427
1  709 221 170 158 158 181 191 230 217 252 498

Profit:  $12900


10 nodes in hidden layer.

Training sample


                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 3314 588 397 322 247 227 186 129 132 113 144
1  225  94 103 106 103 106 129 125 152 225 441


Validation sample

                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 4445 775 516 422 396 375 348 272 265 277 338
1  555 253 251 205 194 221 227 220 215 227 417


     0    1 
0 6554 1875
1 1458 1527

Profit: $11,790. 


Network with 20 nodes.


Training sample
                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 3494 577 336 234 207 162 160 117 107 176 229
1  258  97  83  54  87  93 111  96 128 255 547

Validation sample
                     V O T E S
     0   1   2   3   4   5   6   7   8   9  10 
0 4506 758 519 381 318 265 268 228 304 387 495
1  659 258 212 198 179 150 188 176 183 290 492


     0    1 
0 6482 1947
1 1506 1479

Profit = $10,110


Summary:

            Logistic.   NN-1   NN-2    NN-3     NN-5     NN-10    NN-20

Profit.     13310       13870  14000   13790    12900    11790    10110.


Over complexity has 2 problems:

1. Seriously exaggerated within sample performance (too many degrees of
   freedom).
2. Bottom line worse out of sample performance.


PARSIMONY.