Final Project, Stat 540 Spring 1999.

Deliverables (by 05/07/99) :

  • Here is a link to the training data set: bull.txt.

    A. Recall that the objective of this homework is to take a corpus of bulletin board posts and classify them into one of either of 2 categories, Bullish or Other.

    A sample of raw text documents can be found in the directory

     /~waterman/public_html/Teaching/540s99/Posts
    Further, in this directory is a file named idtc.txt which contains an additional set of potential covariates describing the daily history of the stock price.

    View each bulletin board post as an observation. You must extract from it a set of covariates of your own choosing and construction, that may be useful for the classification.

    The necessary step to take is to construct a "spreadsheet" of classes and covariates, on which to train your classifier. Your classifier could be a logistic regression, neural network, or tree classifier. You may wish to try boosting your learner.

    Last update 4/15/99.