Homework 10. Stat 540 Spring 1999.

Deliverables (by 4/9/99) :

A. Use the dataset uvatoday.txt to perform the boosting. For this homework you do not have to do out of sample prediction, the didactic objective is to code the boosting algorithm. Use a multiple logistic regression (without interactions) as the weak learner. Notice how the mis-classification probability increases to 0.5 from one iteration to the next, as well as the corresponding log odds approaches 0. R commands that may be useful are:

 glm predict ifelse sum abs log
Your algorithm should perform the boosting and come up with a final classification.

B. Use the LWP module to obtain a stock price WEB page. You will have to find two identifiers that bracket the stock price, and then go to work on extracting and manipulating what lies in between them. A good way to find 2 identifiers is to obtain a page using Netscape, then to use the VIEW SOURCE function to find the identifier.

C. You can probably find examples of this on the web, but if you do, make sure that you know what they are doing.

D. The above link takes you into the Patent database, but you can get an individual page by using a URL of the form http://patents.uspto.gov/cgi-bin/ilink4?INDEX+0+5615225+F. It's the last number that is the citation number. Once you have obtained one of these records, parse it, pulling off all the patents it cites. You code should produce a list ofpatent numbers, in a format suitable for re-submission as a URL.