Assignment 3: Regression Project

Introduction

You were recently hired by a small firm headquartered near Philadelphia. The firm is the country's eleventh largest manufacturer of polished metal blocks. The company is a job shop. That is, all its sales come from one-time jobs rather than continuing product lines. The basic procedure begins with a call from a potential customer, who describes the basic job characteristics to a sales representative. The sales representative quotes a price. If the price is accepted, production is scheduled at either an old plant in suburban New Jersey or a new plant in Arkansas.

At your job interview, the director of management information systems for the company described two regression models that were under development for the company's use:

  1. A cost estimation model used to develop prices for potential customers
  2. A cost analysis model used to explain the actual cost of a completed job.
The director gave you a brief summary of the plans for both models.

THE COST ESTIMATION MODEL This model will be used to estimate the average cost per finished block in an order. It will be based on the job characteristics that the customer specifies. The relevant information is the following:

tabular14

THE COST ANALYSIS MODEL This model will be used to assess the actual average cost per block of a completed job. The model will be based on production data and should be useful for analyzing how the various aspects of production affect costs. The variables that will be considered are:

tabular27

The information systems director had told you that gathering the data had been a considerable problem. Because the required information was not directly available from the firm's accounting system, several of the variables had to be gathered specially through the use of ``special studies''. Therefore, data were only available on 200 of the recent jobs completed by the firm.

When you arrived for your first day on the job, you received a major shock. The firm had recently been taken over by Megabite Industries. The staff had been downsized. The entire management information systems staff had been fired, leaving you to complete development of the models on your own.

Analysis

Before starting to analyze the data, you search for any existing information on the two models. The only thing of possible relevance that you find is a handwritten note.

I agree that we shouldn't try any transformations of average cost as our dependent variable.

In the cost estimation part, we've got to keep most of the predictor variables untransformed. The sales representatives think that UNITS isn't a straight-line predictor. One guy is taking an economics course and keeps talking about diminishing marginal returns and fixed costs. That might be worth a try.

The sales representatives have to come up with a quote fast, so we'd better keep the number of predictors to four, maybe less if we can.

In the cost analysis part, we can use more transformation ideas and we've got more freedom to use variables. Still, anything more than ten variables will be too complicated for the operations people.

I've put the data in time order, so job number 1 is the oldest and the last job in the data set is the most recent. It shouldn't matter, because our costs have been pretty stable over time.

You are on your own. You have both a problem and an opportunity. You have only a few weeks and only a minimal number of observations and no help. If you can come up with useful regression models based on the these data, you will be regarded as a most promising manager.

Downloading the data

The data is available here. It is in tab delimited plain text format.

Writeup

You are to prepare a memo that can be passed on to senior management. The memo should not be longer than five double-spaced pages. It should start with an executive summary. The senior managers are intelligent people but don't have any statistical training - do not overuse jargon. If you have any caveats about your findings, state them in straightforward terms.

It is highly recommended to spend some time thinking about the content of the problem before jumping into the data analysis. What variables do you think should be important, can you classify them, what sort of relationships do you think are plausible?

Your boss also wants technical backup of about 10 pages. A suggested format for the report is as follows:

Any computer output that you feel is appropriate to include should be interesting enough to warrant at least a paragraph of discussion. If not, leave it out. Also remember that you have been looking over these graphs for hours if not days and weeks. What is now obvious to you will not be immediately obvious to your boss. Any method of help would be appreciated:

are a few ways of making a figure communicate more quickly.



Richard Waterman
Mon Nov 17 23:14:21 EST 1997