FINAL EXAM

This is the evolving page of the "Final Exam" .

New Rules: 

1) You must work individually for the rest of the semester.
2) You can consult with me, but not with each other.
3) You must declare whatever external resources you use: 
   web pages, books, papers, advice from external sources 
   (in case you happen on someone who gives you unwanted advice).
4) You are under honor code to follow 1)-3).

No longer post your answers on your web page.
Send them to me by email instead.

----------------------------------------------------------------

Problems/Questions:

1) Python semantics: Interpret the following three lines of Python code:
a = [1,2,3]
b = a
del a
   What are the two possible semantics of these three statements?
   Which one was chosen by the Python designer?
   What Python statements are needed to achieve the results of the second possibility?

2) The low-level nature of C:
   What would one have to do in C to reconstruct something like the following Python statement?
a = [1,2,3]
   Consider the case where you know at compile time how long  "a"  is, 
   and the more difficult case where you do not know.  
   Explain in English; don't write a program.
   If you don't know enough C, explain what facility C would have to offer.

3) Write a Python one-liner that creates a list containing 100 samples from the
   standard Gaussian, conditional on being in the interval [0,2].
   It is sufficient if you achieve the 100 size with very high probability.
   Once you try to solve the problem you will understand what this cryptic remark means.

4) Download the file "Numeric-22.0.tar.gz" by going to, for example,
     http://prdownloads.sourceforge.net/numpy/Numeric-22.0.tar.gz?use_mirror=telia
   Alternatively, go to "www.python.org/" -> Vaults of Parnassus -> math -> 
     -> Numerical Python -> numpy ... "Download" -> Numeric-22.0.tar.gz -> Reston, VA (e.g.)
   Save to this directory:
     c:/cygwin/lib/python2.2/site-packages/
   In a bash say:
     cd c:/cygwin/lib/python2.2/site-packages/
     gunzip Numeric-22.0.tar.gz
     tar xfv Numeric-22.0.tar
     cd Numeric-22.0/
     python setup.py install  
   If this does not work, let me know.
   If it works, you should be able to run python and import the modules
     Numeric, LinearAlgebra, RandomArray
   Abbreviate them as "N", "L", and "R" in what follows.

5) Generate a Numeric matrix "mat1" of size 3x3 containing the integers 1...9.  
   Define "mat2" to be the slice "mat1[0:2,:]".  
   Assign 100 to "mat2[0,0]".  Check "mat1".
   Then generate the same submatrix of "mat1" with "N.take" and call it "mat3".
   Assign 200 to "mat3[0,1]".  Check "mat1".
   Comment on the semantics of slicing and "take"ing.

6) What is the largest size of Numeric vector (= 1-dim array) that you can reasonably 
   create and sort in Python using Numeric?  Once you run into memory errors, quit Python,
   re-enter and try again.  How much physical memory does your machine have?
   How does the size of the largest array compare with the size of the memory?

7) Recall: A database index is a pairing of keys (= values of an attribute) and 
           addresses (of cases) such that finding the addresses of cases 
           with a given key is quick.
           We use indices that are sorted on the keys, for bisection search.
   Implement a class that is a database consisting of two attributes
   such as "attr1" and "attr2", implemented as vectors:
attr1 = R.permutation(100000) + 10000000
attr2 = R.permutation(100000) + 20000000
   The i'th case has values "attr1[i-1]" and "attr2[i-1]" on the two attributes.
   On instantiation with two attribute vectors, the class instance should
   build two indices for looking up the second attribute given a key on the first, 
   and vice versa.
   Use the function "N.argsort" to build the indices, 
   and the function "N.searchsorted" for methods that do the lookup.
   Implement each index not as a list of pairs, but as two conformal integer vectors, 
   one for the sorted keys, the other for the corresponding addresses.

8) When implementing an orthogonal projection, it is not recommended to actually form
   P = U U^t,  where  U  is an orthonormal frame.  Write a function 
   a) that takes an argument  U  and a vector  x;
   b) that checks whether  U  is an orthonormal frame and  x  is conforming with  U,
      and bails out if not;
   c) that projects  x  onto the space spanned by the columns of  U  without 
      computing  U U^t.

9) Someone who does not know multiple linear regression very well thinks that 
   x_1 + x_2  is a very meaningful quantity and should be added to the predictors  
   x_1, x_2,..., x_p  as a p+1'st predictor.
   He doesn't tell you, but out of caution you run an svd on the Nx(p+1) matrix.
   What will you discover?