Deliverables (by 05/07/99) :
A. Recall that the objective of this homework is to take a corpus of bulletin board posts and classify them into one of either of 2 categories, Bullish or Other.
A sample of raw text documents can be found in the
directory
/~waterman/public_html/Teaching/540s99/Posts
Further, in this directory is a file named
idtc.txt which contains an additional set of
potential covariates describing the daily history of the stock price.
View each bulletin board post as an observation. You must extract from it a set of covariates of your own choosing and construction, that may be useful for the classification.
The necessary step to take is to construct a "spreadsheet" of classes and covariates, on which to train your classifier. Your classifier could be a logistic regression, neural network, or tree classifier. You may wish to try boosting your learner.
Last update 4/15/99.