##================================================================
##
##
## ---- SOME BASICS: HISTORY, NUMBERS, OPERATIONS, FUNCTIONS ----
##
##
## WHY R?
##
## . We need standards -- R is one of them.
## . Huge developer community
## . New stats algorithms appear first as R packages.
## . Growing user community, also in industry
## . Powerful language -- quite different from Python
## . Probably the best language for 'one-shot programming'
## (which comprises much of data analysis)
##
## HISTORY OF R: C --> S --> R
##
## . Where: Bell Labs (C, S)
## . Related: Unix --> Linux, MacOS
##
## R AND OTHER LANGUAGES:
## . R, Python, Matlab are ... interpreted, high-level <<< human efficiency
## . C, C++, Java, Fortran are ... compiled, low-level <<< machine efficiency
##
## GETTING RSTUDIO:
##
## . Search and install,
## . or see the syllabus for the URL or search 'RStudio'.
## ... right now!
## https://www.rstudio.com/products/rstudio/download/
##
## . NOTE! The instructor does not use RStudio.
## He uses the so-called 'emacs' environment.
## He does not recommend that you use it because
## its learning curve is too steep.
##
## . Problems specific to some MAC users:
##
## ~ RStudio gives an error message at the end of installing:
## Install plain R from www.cran.r-project.org for MacOS as well.
## Thereafter RStudio seems to work ok.
##
## ~ If you don't see syntax coloring in the code editor window
## of RStudio, you will probably see that the file has been
## renamed with another extension (e.g., '.txt') added to the file name.
## You should rename the file back to it's original name, which is
## Chapter-01-First-Basics.R
## RStudio only provides syntax coloring for files with extension '.R'.
##
## GETTING THESE NOTES:
##
## . For now, find the instructor's webpage (search 'buja wharton').
## In the Section on Stat 470/503/770 click on "Chapter 1".
## (MAC users: Make sure the download does not attach the extension '.txt'
## to the filename! It is important that the extension is '.R'.)
## . Next week chapter files will be on Canvas.
##
##
##================================================================
##
##
## SOME ANSWERS TO NATURAL QUESTIONS:
##
## - The basic work cylce in RStudio:
##
## . Edit an R code file with extension ".R" in the upper left
## pane of RStudio (Files > Open File > .... or ctrl-O)
## The present chapter file is indeed an R code file,
## but it also contains a lot of non-executable text called 'comments'.
##
## . Copy/paste code lines from the editor pane (upper left pane)
## into the R console (lower left pane) and execute them.
## Instructions for doing this efficiently are given below.
##
## - Q: Why is this file a crude text file?
## What do the hash marks do at the beginning of the lines?
##
## A: All R code files are essentially txt files without formatting.
## The .R extension tells RStudio and some editors to use
## ``syntax highlighting''.
##
## Any content in a line that follows a hash sign is
## NOT interpreted as code but as mere comment.
## If you type or copy lines with hashes at the beginning
## into an R interpreter/console, nothing gets computed, just copied back.
## Anything before a hash is interpreted as code,
## and when typed or copied into an R interpreter/console,
## R will try to compute something.
## AYT? What is code and what is comment in the following line?
10+20 # Some calculation...
## Syntax highlighting shows the difference between code and comment.
## The color scheme in RStudio will be different from the instructor's.
## You can choose your own scheme as follows:
## Tools > Global Options... > Appearance
## Then play with fonts, font sizes and editor themes.
## Finally, click "Apply" or "Cancel".
##
## - Q: Isn't there a more convenient way to copy/paste a line of code
## into the R interpreter/console?
##
## A: There is!
## Place the pointer on the code line and hit
## ---------------------
## | Ctrl-Enter | Windows !!!!!!!!!!!!!!!
## | Command-Enter | MacOS !!!!!!!!!!!!!!!
## ---------------------
## This copies the line into the R console and
## moves the pointer to the next line.
## Example: Copy the following lines into the R console.
10 + 20
3*123
## So if you wish to execute multiple consecutive lines,
## simply hit Ctrl/Command-Enter multiple times.
##
## - Q: And where are the solutions to '...'?
##
## A: The text below has no answers to the questions/problems.
## We give the solutions here in class.
## If you miss a class, get your answers from a fellow student or TA.
## Knowing how to get human help is a fundamental skill in life.
## [No, solutions to '...' will NOT be posted ever.]
##
##
## - Finding your way around in code files:
##
## . Primary way to find specific code snippets:
## Text search using ctrl-F (Windows) or ... (MAC?)
## Example: Search for "3*123".
##
## . Secondary: by line numbers
## Compare instructor's current line number in the white bar below the code.
## It has the form (133,32), which means 'line 133, position 32 in line'.
## Find the line number in RStudio,
## but it may only be approximate due to editing differences.
## Example: Find line 110.
##
## - WARNING!
## If you try to execute an incomplete expression,
## The R interpreter will ask you complete it on the following line(s).
## It will change the prompt from ">" to "+". Try:
10 /
## You may abort with the ESC key, or by completing the expression.
## Typical case: Forget a closing parenthesis, as in
log(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
## You can also use this mechanism by spreading long expressions over multiple lines:
log(1 + 2 + 3 + # incomplete
4 + 5 + 6 + # incomplete
7 + 8 + 9 + 10) # complete, finally
##
##
##================================================================
##
##
## WHAT YOU CAN DO WITH R RIGHT AWAY:
##
## - Use R as a pocket calculator using math notation.
## (Every R class starts this way.)
##
##
## EXAMPLES:
##
## - What is 1.25% interest on a $1,213.85 bank account balance?
...
1213.85 * 0.0125
1213.85 * (1.25 / 100)
##
## . Syntax Rule: Make sure to omit the decimal commas in R!
## Commas in numbers are illegal.
##
## . Strange: What is the meaning of '[1]' when R prints an answer?
## Explanation: R considers a single number as a vector (sequence) of length 1.
##
## Compare:
1:100
## This expression generated a ``vector'' of length 100.
## R prints it by showing the position in brackets for
## the first value on each line.
##
## - Calculate an 18% tip on a $28.50 bill:
...
## What is the total amount of bill plus tip?
...
##
## - What is the 10th power of 2?
...
## What do computer geeks call this number?
## ... kilo
##
## ----------------------------------------------------
## | First bit of new syntax, actually, an operation: |
## | m:n |
## | generates a 'vector' or 'sequence' of numbers |
## | spaced by 1, starting at m, ending at/before n. |
## | We call them 'ladders'. |
## ----------------------------------------------------
##
## - Show all integers from 1,001 to 1,100:
...
##
## - Show all multiples of 3 for the previous sequence:
...
##
## - Show all numbers divisible by 3 below 100:
...
## (Later we will learn about a function that is easier
## for this purpose. For now this exercise illustrates
## how to obtain a solution to problem with the tools
## that you know.)
##
## - Calculate the powers of 2 for 0 up to 20:
...
##
## - Explorations for ladders:
##
## . What do you expect to see when you generate
## a ladder of numbers starting at 1.3 up to 10?
## ...
##
## . What do you expect if you start the sequence with
## a negative integer such as -3?
-3:10
## There arise an issue: Does R interpret the above as
(-3):10
## or as
-(3:10)
## ???
## That is, is "-" executed first, and then ":",
## or the other way round?
## Do you know a technical term for this general issue?
## You might know it from high school math:
## ...
## So which operation binds stronger (is first executed), '-' or ':'?
## ...
##
## . INCREMENTS:
## What do you expect if you try a sequence that ends lower
## than its starting value, such as from 10 to 5?
...
## Do you expect the following to work?
5:-3
##
## . BLANKS/SPACES: Liberal use of blanks permitted!
5 : - 3
5 : -3
5:-3
## However, the following is NOT a good way to write 1,285.23:
1 285 . 23
## No blanks/spaces within atomic expressions (e.g., numbers)!
##
## ---------------------------------------------------------------
## | |
## | R SYNTAX: |
## | |
## | - Just like in math, computer languages require so-called |
## | "order of operations" or "operator precedence". |
## | |
## | - Again like in math, use ROUND parentheses "()" to force | <<<<<<<
## | the order of operations according to your intentions. |
## | Do NOT use brackets "[]" or curly braces "{}" !!! |
## | |
## | - Even if the default precedences agree with your intentions, | <<<<<
## | avoid ambiguity for the human reader by using parens |
## | even where they may not be needed. |
## | |
## | - You can insert blanks liberally for clarity, but not inside |
## | numbers. |
## | |
## | - You can learn a lot by experimentating creatively in R. |
## | |
## ---------------------------------------------------------------
##
## - What is the number 'pi'?
## There is a symbol in the language for this number:
## Just type 'pi'!
...
##
## - What is the 'sin' of pi? of pi/4?
## There is a function sin();
## its argument must be in radians (arcs, multiples of pi), not degrees:
...
## Discuss the result and ponder the wonders of machine precision!!!
## Ask yourself:
## . Can the machine know the number pi exactly? ...
## . Can the function sin() compute exact results? ...
## ==> Never assume that operations involving decimal numbers are exact!
##
## - What is half of the square root of 2?
## There is a function sqrt(...) that you can use,
## or you can use the power with exponent 0.5.
## Write several versions, using 'sqrt()' and
## ways of writing the exponent:
...
...
...
##
## - What is the reciprocal of the square root of 2?
...
##
## - Compare the previous results and explain!
## ...
## - What is the number 'e'?
## You will need to use the function 'exp()'.
## There is no fixed symbol for this one.
...
##
## - What is the natural exponential (base e) of 10?
...
##
## - Write the natural exponential of 10 as 'e' to the power 10.
...
##
## - What is the justification for identical results in the
## previous two questions?
## ...
#---------------------------------------------------------------- end of lecture 2
## - What is the natural logarithm of the previous two results?
## You know the answer, but 'confirm' with actual R code.
...
##
## - What are the 10-based logarithms of 1,000 and 5,000 and 10,000?
...
## - Why would the 10-based log of 5,000 be higher than 3.5?
## ...
## - What is the interest on a $1,000 initial investment
## after 4 years with the following annual financial returns?
##
## +7%, +9%, -10%, -6%
##
## Recall: These are yearly percentage gains and losses.
##
## Biologists: Translate this to cultures of uni-cellular organisms
## starting with 1,000 cells and the above percentages
## interpreted as minute-to-minute changes.
##
...
##
## - Comprehension question: Do you expect to be back to 1,000?
## ...
##
## - Same question for the following returns:
##
## -6%, -10%, +9%, +7%
##
...
##
## - Are you surprised?
## ...
##
## - Quantitative literacy, side remark:
##
## A percentage change is a multiplicative change! <<<<<<<<<<<<<<<<
##
##
## - Summary of arithmetic operations, as well as the ladder operation:
## -----------------------------------------
## | ARITHMETIC OPERATIONS: |
## | |
## | Power: ^ | 2^10
## | Unary sign operation: - | -(2); -(1/2)
## | Sequence/ladder: : | 2:5
## | Multiplication/division: *, / | 5/2
## | Integer division, remainder: %/%, %% | [to be explained]
## | Addition/subtraction: +, - | 10-12
## -----------------------------------------
##
## - Can you guess why the operations are listed in this order?
## ...
##
## - Try to guess what the following does before you execute:
-2^0.5
#
## - Not yet seen: integer division %/% and remainder %% operation:
10 %/% 3 # integer division
10 %% 3 # remainder operation
## For the intellectually curious student, here are some math strangenesses:
(-10) %/% 3 # strange, isn't it?
(-10) %% 3 # consistent with the strangeness of (-10)%/%3
## Mathematicians usually use %/% and %% only for integers.
## However, R allows us to use them sensibly for decimal numbers as well:
2.6 %/% 0.5 # make sense
2.6 %% 0.5 # this, too
## %/% produces always an integer and %% the associated remainder.
##
## ==> For general interest, sometimes useful, but not to
## appear in quizzes.
## - Patterned sequences using '%/%' and '%%':
## Apply ...%/%3 and ...%%3 to the ladder 0:20
...
...
## What kinds of sequence patterns can you generate this way?
## ...
## ...
##
## The operations %/% and %% will NOT BE ON QUIZZES!
##
## ---------------------------------------
## | SUMMARY OF MATH FUNCTIONS: |
## | |
## | Square root: sqrt() |
## | Exp, log: exp(), log(), log10() |
## | Trig: sin(), cos(), tan() |
## | Trig inverses: asin(), acos(), atan() | [Not on quiz 1]
## ---------------------------------------
##
## Notes:
## - Yes, we will have occasion to use logs and trig functions!
## - Trig functions take 'arc' as an argument, not degrees.
## Reminder: arc(degree) = degree / 180 * pi
## Obtain the sin of 30 degrees by translating to arc first:
sin(30 / 180 * pi)
## Obtain the sin of 60 degrees:
...
## [There was something about asin() and acos() in previous versions
## of this file; remove it.]
##
##
## MISSING NUMERIC VALUES: They are the results of undefined operations.
##
## - R has 'values' for three kinds of 'missing numbers':
Inf; -Inf; NaN
## Can you intuit how they will be used?
##
## - Examples: Guess in each case the result!
1/0 # ...
0/0 # ...
-1/0 # ...
1+1/0 # ...
1+Inf # ...
1/Inf # ...
1/(-Inf) # ...
1/NaN # ...
log(0) # ...
log(-1) # ...
sqrt(-1) # ...
##
## Note:
##
## - R may issue a 'Warning' when missing numeric values arise.
## This does not mean the computation didn't go through!
## In fact, it did, but you are warned about missing values.
## Actual errors and aborted computations arise, for example,
## from bad syntax:
/(2+3)
##
## - The values Inf, -Inf and NaN can be used like numbers.
## If you type them into an R console, or make them part
## of computations, no error will occur. Instead,
## results will be produced as in the above examples.
##
## - Fun experiment:
## Is it possible to write a number large enough to be turned into Inf by R?
## Background fact: Numbers are stored in 64 bit 'words'.
## So something has to happen if we type a number with too many digits digits.
##
##
## ----------------------------------------------------------
## | MISSING NUMERIC VALUES IN R: |
## | |
## | Inf : 1/0 |
## | -Inf : -1/0; log(0) |
## | NaN : 0/0; log(-1); sqrt(-1); Inf-Inf; NaN+3 |
## | |
## | Do not expect illegal math operations to cause an error. |
## | They usually generate a form of missing numeric value. |
## ----------------------------------------------------------
##
##
## MISSING DATA:
##
## - When DATA are missing, R will use the following 'value':
NA
## This value can appear in data files or it may be generated
## if a field in a data file is empty.
##
## - You can type
NA
## into the R console, and R will happily copy it back
# as a vector containint one element, NA,
## as if it had done a computation for you.
##
## - Distinguish:
##
## ---------------------------------------------------------------
## | . Missing values resulting from computations: Inf; -Inf; NaN |
## | . Missing values resulting from absent data: NA |
## ---------------------------------------------------------------
##
## We will encounter NA values later when we deal with data files.
## Another situation arises when we need to allocate a data table
## but the values of the table will be filled in later.
## In this case, NA makes a good default value.
##
##
## REPRESENTATION OF NUMBERS -- DEALING WITH FINITE PRECISION
##
## - Examples of EXTREME NUMBERS:
0.000000001
1000000000
-999999999999
## In all cases R rendered the values in exponential form,
## also called 'scientific notation'.
## What does the symbol 'e' stand for?
## ...
## Instead of 'e' we can also use 'E' when writing numbers:
1e6
1E6
1e+6
1E+6
1000000
10^6
## Why are you not surprised about the results?
## ...
## Which of the six does an actual arithmetic calculation?
## ...
## Hint: R sees all but one as simple numbers.
##
## - Example of a VERY SMALL NUMBER: one millionth can be written as
0.000001
1e-6
1E-6
1.0000E-6
10E-7
## Why are you not surprised about the results?
## ...
## Does R see any arithmetic operations, or just simple numbers?
## ...
##
## - CAUTION: Syntax!
## The following is incorrect:
1e(-6)
## No parens in number syntax!
## The following is correct, though:
1*10^(-6)
## What is the difference between this and the following?
1e-6
## ...
##
## - R by default shows decimal numbers to 7 digits precision:
pi # 7 digits
0.999999999 # 8 digits, hence rounded to 7 digits
0.000000999 # full precision due to exponential notation
0.0000009999999 # 7 digits, still full precision
0.00000099999999 # 8 digits, hence rounded to 7 digits
##
## Numbers are represented internally to about 16 significant digits.
##
## --------------------------------------------------------------------------
## | There is a difference between machine precision and printed precision! |
## --------------------------------------------------------------------------
##
## If you want to see more precision, do the following:
print(pi, 20)
## This is a call to a function print(), asking to print 'pi' to 20 digits.
## It can't do better than 16 digits, so this is what it prints.
## Another example:
print(sqrt(2), 12)
## - It is possible to write numbers so extreme that R can't represent them.
## Examples:
99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
## Try:
2e308
## However, the following still works:
1e308
## Hence somewhere between 1e308 and 2e308 the computer runs
## out of bits to represent the number, in which case it is
## rendered as 'Inf'.
## - [Hint for the future:
## This default behavior can be changed
## using the function 'options()'.
## Example:
## options(digits=10)
## We haven't talked about R's functions yet, so don't worry.
## ]
##
##
## ----------------------------------------------------------
## | DECIMAL NUMBERS IN R: |
## | |
## | - Most general form using 10-based exponential notation: |
## | 123.4567E30; -123.4567e-10 |
## | |
## | - The exponential part can be missing; |
## | the decimal part cannot be missing: |
## | 10; -5; 0.01234; -12.345 # no exponential part |
## | 1e10; # decimal part 1 |
## | |
## | - Default printing precision in R: 7 decimals |
## | Internal precision: > 15 decimals |
## ----------------------------------------------------------
##
##
##================================================================