Class 19 Stat701 Fall 1997
Introduction to Neural Networks
Todays class.
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
-
Classification in the internet demographics dataset.
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
-
Introduction to Neural Nets.
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
- Discussion points from the Lo paper on Neural Nets.
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
- Installing the nnet library.
key SPlus command library(nnet,first=T,lib.loc=''C:/public'')
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
-
Neural nets in action for classification - an example for which the
true classification probabilities are known
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
-
NNet classification example on the internet demographics data set.
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
-
Adding gold to the data - does the net find it?
![*](http://www-stat.wharton.upenn.edu/~waterman/icons/bluepin.gif)
-
A pre-prepared page of graphics - just in case we bomb in class
Classification in the internet demographics dataset.
Building classification models.
Judging models on out of sample prediction.
Introduction to Neural Nets.
We will consider the Feedforward Perceptron with a Single Hidden Layer.
It's objective is to output probabilities of group membership based
on a set of inputs.
Figure 1 is a diagram of a single hidden layer network.
Essentially it
links inputs to outputs through a set of weights and activation functions.
In class we will only use the ``logistic'' activation function though others
are available. In the displayed network there are two inputs and three
units in the hidden layer.
Therefore between the input layer and the hidden layer we are looking for the
following weights
Link X1 and X2 to H1. Weights g11 and g12.
Link X1 and X2 to H2. Weights g21 and g22.
Link X1 and X2 to H3. Weights g31 and g32.
Now apply the activation function T (logistic) to each hidden unit to get
T(g11 X1 + g12 X2)
T(g21 X1 + g22 X2)
T(g31 X1 + g32 X2)
The procedure also looks for weights d1, d2 and d3 to form
d1 T(g11 X1 + g12 X2) + d2 T(g21 X1 + g22 X2) + T(g31 X1 + g32 X2),
and finally the output is given by
T(d1 T(g11 X1 + g12 X2) + d2 T(g21 X1 + g22 X2) + T(g31 X1 + g32 X2))
Figure 1.
Fitting a given net involves finding the set of weights that best
matches observed and predicted outputs.
It is the inclusion of the hidden layer that makes the network a very flexible
fitting tool.
Compare the neural network diagram to the previous
logistic regression diagram.
Discussion points from the Lo paper on Neural Nets.
- It is a non-parametric technique. Does not assume an explicit functional
form for the relationship between inputs and outputs.
- This is useful for capturing possible non-linearities in the relationship.
- The nets relate inputs to outputs through weights and activation functions
- A hidden layer enables them to be extremely flexible - universal
approximators
- Downside - if the network can fit anything - it can fit the
noise as well as the signal
- Do they learn? What does ``learn'' mean anyway. Weights change if data is
updated.
- Standard statistical inference on the model parameters (weights)
is not readily available.
- But inference is not such a big deal if sole objective is prediction
and out of sample validation is followed.
- Models are not generally well defined, small changes in the data
can lead to large changes in weights
- Weights are not interpretable in the sense that regression weights
(partial slopes) are.
- The objective function (the object that is maximized to choose the
weights) may have local maxima. Practical solution - set algorithm off at
different starting values.
- It is one of many non-parametric techniques - similar in this sense to
splines and kernel methods.
- Practically, may need much human supervision if they are to
produce coherent results - as with many technologies,
they do new things but require more work.
- A useful additional member of the quantitative tool box -
but no panacea, probably more dangerous than regression in the wrong hands!
Richard Waterman
Mon Nov 10 22:31:10 EST 1997