##================================================================ ## ## ## ---- SOME BASICS: HISTORY, NUMBERS, OPERATIONS, FUNCTIONS ---- ## ## ## WHY R? ## ## . We need standards -- R is one of them. ## . Huge developer community ## . New stats algorithms appear first as R packages. ## . Growing user community, also in industry ## . Powerful language -- quite different from Python ## . Probably the best language for 'one-shot programming' ## (which comprises much of data analysis) ## ## HISTORY OF R: C --> S --> R ## ## . Where: Bell Labs (C, S) ## . Related: Unix --> Linux, MacOS ## ## R AND OTHER LANGUAGES: ## . R, Python, Matlab are ... interpreted, high-level <<< human efficiency ## . C, C++, Java, Fortran are ... compiled, low-level <<< machine efficiency ## ## GETTING RSTUDIO: ## ## . Search and install, ## . or see the syllabus for the URL or search 'RStudio'. ## ... right now! ## https://www.rstudio.com/products/rstudio/download/ ## ## . NOTE! The instructor does not use RStudio. ## He uses the so-called 'emacs' environment. ## He does not recommend that you use it because ## its learning curve is too steep. ## ## . Problems specific to some MAC users: ## ## ~ RStudio gives an error message at the end of installing: ## Install plain R from www.cran.r-project.org for MacOS as well. ## Thereafter RStudio seems to work ok. ## ## ~ If you don't see syntax coloring in the code editor window ## of RStudio, you will probably see that the file has been ## renamed with another extension (e.g., '.txt') added to the file name. ## You should rename the file back to it's original name, which is ## Chapter-01-First-Basics.R ## RStudio only provides syntax coloring for files with extension '.R'. ## ## GETTING THESE NOTES: ## ## . For now, find the instructor's webpage (search 'buja wharton'). ## In the Section on Stat 470/503/770 click on "Chapter 1". ## (MAC users: Make sure the download does not attach the extension '.txt' ## to the filename! It is important that the extension is '.R'.) ## . Next week chapter files will be on Canvas. ## ## ##================================================================ ## ## ## SOME ANSWERS TO NATURAL QUESTIONS: ## ## - The basic work cylce in RStudio: ## ## . Edit an R code file with extension ".R" in the upper left ## pane of RStudio (Files > Open File > .... or ctrl-O) ## The present chapter file is indeed an R code file, ## but it also contains a lot of non-executable text called 'comments'. ## ## . Copy/paste code lines from the editor pane (upper left pane) ## into the R console (lower left pane) and execute them. ## Instructions for doing this efficiently are given below. ## ## - Q: Why is this file a crude text file? ## What do the hash marks do at the beginning of the lines? ## ## A: All R code files are essentially txt files without formatting. ## The .R extension tells RStudio and some editors to use ## ``syntax highlighting''. ## ## Any content in a line that follows a hash sign is ## NOT interpreted as code but as mere comment. ## If you type or copy lines with hashes at the beginning ## into an R interpreter/console, nothing gets computed, just copied back. ## Anything before a hash is interpreted as code, ## and when typed or copied into an R interpreter/console, ## R will try to compute something. ## AYT? What is code and what is comment in the following line? 10+20 # Some calculation... ## Syntax highlighting shows the difference between code and comment. ## The color scheme in RStudio will be different from the instructor's. ## You can choose your own scheme as follows: ## Tools > Global Options... > Appearance ## Then play with fonts, font sizes and editor themes. ## Finally, click "Apply" or "Cancel". ## ## - Q: Isn't there a more convenient way to copy/paste a line of code ## into the R interpreter/console? ## ## A: There is! ## Place the pointer on the code line and hit ## --------------------- ## | Ctrl-Enter | Windows !!!!!!!!!!!!!!! ## | Command-Enter | MacOS !!!!!!!!!!!!!!! ## --------------------- ## This copies the line into the R console and ## moves the pointer to the next line. ## Example: Copy the following lines into the R console. 10 + 20 3*123 ## So if you wish to execute multiple consecutive lines, ## simply hit Ctrl/Command-Enter multiple times. ## ## - Q: And where are the solutions to '...'? ## ## A: The text below has no answers to the questions/problems. ## We give the solutions here in class. ## If you miss a class, get your answers from a fellow student or TA. ## Knowing how to get human help is a fundamental skill in life. ## [No, solutions to '...' will NOT be posted ever.] ## ## ## - Finding your way around in code files: ## ## . Primary way to find specific code snippets: ## Text search using ctrl-F (Windows) or ... (MAC?) ## Example: Search for "3*123". ## ## . Secondary: by line numbers ## Compare instructor's current line number in the white bar below the code. ## It has the form (133,32), which means 'line 133, position 32 in line'. ## Find the line number in RStudio, ## but it may only be approximate due to editing differences. ## Example: Find line 110. ## ## - WARNING! ## If you try to execute an incomplete expression, ## The R interpreter will ask you complete it on the following line(s). ## It will change the prompt from ">" to "+". Try: 10 / ## You may abort with the ESC key, or by completing the expression. ## Typical case: Forget a closing parenthesis, as in log(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 ## You can also use this mechanism by spreading long expressions over multiple lines: log(1 + 2 + 3 + # incomplete 4 + 5 + 6 + # incomplete 7 + 8 + 9 + 10) # complete, finally ## ## ##================================================================ ## ## ## WHAT YOU CAN DO WITH R RIGHT AWAY: ## ## - Use R as a pocket calculator using math notation. ## (Every R class starts this way.) ## ## ## EXAMPLES: ## ## - What is 1.25% interest on a $1,213.85 bank account balance? ... 1213.85 * 0.0125 1213.85 * (1.25 / 100) ## ## . Syntax Rule: Make sure to omit the decimal commas in R! ## Commas in numbers are illegal. ## ## . Strange: What is the meaning of '[1]' when R prints an answer? ## Explanation: R considers a single number as a vector (sequence) of length 1. ## ## Compare: 1:100 ## This expression generated a ``vector'' of length 100. ## R prints it by showing the position in brackets for ## the first value on each line. ## ## - Calculate an 18% tip on a $28.50 bill: ... ## What is the total amount of bill plus tip? ... ## ## - What is the 10th power of 2? ... ## What do computer geeks call this number? ## ... kilo ## ## ---------------------------------------------------- ## | First bit of new syntax, actually, an operation: | ## | m:n | ## | generates a 'vector' or 'sequence' of numbers | ## | spaced by 1, starting at m, ending at/before n. | ## | We call them 'ladders'. | ## ---------------------------------------------------- ## ## - Show all integers from 1,001 to 1,100: ... ## ## - Show all multiples of 3 for the previous sequence: ... ## ## - Show all numbers divisible by 3 below 100: ... ## (Later we will learn about a function that is easier ## for this purpose. For now this exercise illustrates ## how to obtain a solution to problem with the tools ## that you know.) ## ## - Calculate the powers of 2 for 0 up to 20: ... ## ## - Explorations for ladders: ## ## . What do you expect to see when you generate ## a ladder of numbers starting at 1.3 up to 10? ## ... ## ## . What do you expect if you start the sequence with ## a negative integer such as -3? -3:10 ## There arise an issue: Does R interpret the above as (-3):10 ## or as -(3:10) ## ??? ## That is, is "-" executed first, and then ":", ## or the other way round? ## Do you know a technical term for this general issue? ## You might know it from high school math: ## ... ## So which operation binds stronger (is first executed), '-' or ':'? ## ... ## ## . INCREMENTS: ## What do you expect if you try a sequence that ends lower ## than its starting value, such as from 10 to 5? ... ## Do you expect the following to work? 5:-3 ## ## . BLANKS/SPACES: Liberal use of blanks permitted! 5 : - 3 5 : -3 5:-3 ## However, the following is NOT a good way to write 1,285.23: 1 285 . 23 ## No blanks/spaces within atomic expressions (e.g., numbers)! ## ## --------------------------------------------------------------- ## | | ## | R SYNTAX: | ## | | ## | - Just like in math, computer languages require so-called | ## | "order of operations" or "operator precedence". | ## | | ## | - Again like in math, use ROUND parentheses "()" to force | <<<<<<< ## | the order of operations according to your intentions. | ## | Do NOT use brackets "[]" or curly braces "{}" !!! | ## | | ## | - Even if the default precedences agree with your intentions, | <<<<< ## | avoid ambiguity for the human reader by using parens | ## | even where they may not be needed. | ## | | ## | - You can insert blanks liberally for clarity, but not inside | ## | numbers. | ## | | ## | - You can learn a lot by experimentating creatively in R. | ## | | ## --------------------------------------------------------------- ## ## - What is the number 'pi'? ## There is a symbol in the language for this number: ## Just type 'pi'! ... ## ## - What is the 'sin' of pi? of pi/4? ## There is a function sin(); ## its argument must be in radians (arcs, multiples of pi), not degrees: ... ## Discuss the result and ponder the wonders of machine precision!!! ## Ask yourself: ## . Can the machine know the number pi exactly? ... ## . Can the function sin() compute exact results? ... ## ==> Never assume that operations involving decimal numbers are exact! ## ## - What is half of the square root of 2? ## There is a function sqrt(...) that you can use, ## or you can use the power with exponent 0.5. ## Write several versions, using 'sqrt()' and ## ways of writing the exponent: ... ... ... ## ## - What is the reciprocal of the square root of 2? ... ## ## - Compare the previous results and explain! ## ... ## - What is the number 'e'? ## You will need to use the function 'exp()'. ## There is no fixed symbol for this one. ... ## ## - What is the natural exponential (base e) of 10? ... ## ## - Write the natural exponential of 10 as 'e' to the power 10. ... ## ## - What is the justification for identical results in the ## previous two questions? ## ... #---------------------------------------------------------------- end of lecture 2 ## - What is the natural logarithm of the previous two results? ## You know the answer, but 'confirm' with actual R code. ... ## ## - What are the 10-based logarithms of 1,000 and 5,000 and 10,000? ... ## - Why would the 10-based log of 5,000 be higher than 3.5? ## ... ## - What is the interest on a $1,000 initial investment ## after 4 years with the following annual financial returns? ## ## +7%, +9%, -10%, -6% ## ## Recall: These are yearly percentage gains and losses. ## ## Biologists: Translate this to cultures of uni-cellular organisms ## starting with 1,000 cells and the above percentages ## interpreted as minute-to-minute changes. ## ... ## ## - Comprehension question: Do you expect to be back to 1,000? ## ... ## ## - Same question for the following returns: ## ## -6%, -10%, +9%, +7% ## ... ## ## - Are you surprised? ## ... ## ## - Quantitative literacy, side remark: ## ## A percentage change is a multiplicative change! <<<<<<<<<<<<<<<< ## ## ## - Summary of arithmetic operations, as well as the ladder operation: ## ----------------------------------------- ## | ARITHMETIC OPERATIONS: | ## | | ## | Power: ^ | 2^10 ## | Unary sign operation: - | -(2); -(1/2) ## | Sequence/ladder: : | 2:5 ## | Multiplication/division: *, / | 5/2 ## | Integer division, remainder: %/%, %% | [to be explained] ## | Addition/subtraction: +, - | 10-12 ## ----------------------------------------- ## ## - Can you guess why the operations are listed in this order? ## ... ## ## - Try to guess what the following does before you execute: -2^0.5 # ## - Not yet seen: integer division %/% and remainder %% operation: 10 %/% 3 # integer division 10 %% 3 # remainder operation ## For the intellectually curious student, here are some math strangenesses: (-10) %/% 3 # strange, isn't it? (-10) %% 3 # consistent with the strangeness of (-10)%/%3 ## Mathematicians usually use %/% and %% only for integers. ## However, R allows us to use them sensibly for decimal numbers as well: 2.6 %/% 0.5 # make sense 2.6 %% 0.5 # this, too ## %/% produces always an integer and %% the associated remainder. ## ## ==> For general interest, sometimes useful, but not to ## appear in quizzes. ## - Patterned sequences using '%/%' and '%%': ## Apply ...%/%3 and ...%%3 to the ladder 0:20 ... ... ## What kinds of sequence patterns can you generate this way? ## ... ## ... ## ## The operations %/% and %% will NOT BE ON QUIZZES! ## ## --------------------------------------- ## | SUMMARY OF MATH FUNCTIONS: | ## | | ## | Square root: sqrt() | ## | Exp, log: exp(), log(), log10() | ## | Trig: sin(), cos(), tan() | ## | Trig inverses: asin(), acos(), atan() | [Not on quiz 1] ## --------------------------------------- ## ## Notes: ## - Yes, we will have occasion to use logs and trig functions! ## - Trig functions take 'arc' as an argument, not degrees. ## Reminder: arc(degree) = degree / 180 * pi ## Obtain the sin of 30 degrees by translating to arc first: sin(30 / 180 * pi) ## Obtain the sin of 60 degrees: ... ## [There was something about asin() and acos() in previous versions ## of this file; remove it.] ## ## ## MISSING NUMERIC VALUES: They are the results of undefined operations. ## ## - R has 'values' for three kinds of 'missing numbers': Inf; -Inf; NaN ## Can you intuit how they will be used? ## ## - Examples: Guess in each case the result! 1/0 # ... 0/0 # ... -1/0 # ... 1+1/0 # ... 1+Inf # ... 1/Inf # ... 1/(-Inf) # ... 1/NaN # ... log(0) # ... log(-1) # ... sqrt(-1) # ... ## ## Note: ## ## - R may issue a 'Warning' when missing numeric values arise. ## This does not mean the computation didn't go through! ## In fact, it did, but you are warned about missing values. ## Actual errors and aborted computations arise, for example, ## from bad syntax: /(2+3) ## ## - The values Inf, -Inf and NaN can be used like numbers. ## If you type them into an R console, or make them part ## of computations, no error will occur. Instead, ## results will be produced as in the above examples. ## ## - Fun experiment: ## Is it possible to write a number large enough to be turned into Inf by R? ## Background fact: Numbers are stored in 64 bit 'words'. ## So something has to happen if we type a number with too many digits digits. ## ## ## ---------------------------------------------------------- ## | MISSING NUMERIC VALUES IN R: | ## | | ## | Inf : 1/0 | ## | -Inf : -1/0; log(0) | ## | NaN : 0/0; log(-1); sqrt(-1); Inf-Inf; NaN+3 | ## | | ## | Do not expect illegal math operations to cause an error. | ## | They usually generate a form of missing numeric value. | ## ---------------------------------------------------------- ## ## ## MISSING DATA: ## ## - When DATA are missing, R will use the following 'value': NA ## This value can appear in data files or it may be generated ## if a field in a data file is empty. ## ## - You can type NA ## into the R console, and R will happily copy it back # as a vector containint one element, NA, ## as if it had done a computation for you. ## ## - Distinguish: ## ## --------------------------------------------------------------- ## | . Missing values resulting from computations: Inf; -Inf; NaN | ## | . Missing values resulting from absent data: NA | ## --------------------------------------------------------------- ## ## We will encounter NA values later when we deal with data files. ## Another situation arises when we need to allocate a data table ## but the values of the table will be filled in later. ## In this case, NA makes a good default value. ## ## ## REPRESENTATION OF NUMBERS -- DEALING WITH FINITE PRECISION ## ## - Examples of EXTREME NUMBERS: 0.000000001 1000000000 -999999999999 ## In all cases R rendered the values in exponential form, ## also called 'scientific notation'. ## What does the symbol 'e' stand for? ## ... ## Instead of 'e' we can also use 'E' when writing numbers: 1e6 1E6 1e+6 1E+6 1000000 10^6 ## Why are you not surprised about the results? ## ... ## Which of the six does an actual arithmetic calculation? ## ... ## Hint: R sees all but one as simple numbers. ## ## - Example of a VERY SMALL NUMBER: one millionth can be written as 0.000001 1e-6 1E-6 1.0000E-6 10E-7 ## Why are you not surprised about the results? ## ... ## Does R see any arithmetic operations, or just simple numbers? ## ... ## ## - CAUTION: Syntax! ## The following is incorrect: 1e(-6) ## No parens in number syntax! ## The following is correct, though: 1*10^(-6) ## What is the difference between this and the following? 1e-6 ## ... ## ## - R by default shows decimal numbers to 7 digits precision: pi # 7 digits 0.999999999 # 8 digits, hence rounded to 7 digits 0.000000999 # full precision due to exponential notation 0.0000009999999 # 7 digits, still full precision 0.00000099999999 # 8 digits, hence rounded to 7 digits ## ## Numbers are represented internally to about 16 significant digits. ## ## -------------------------------------------------------------------------- ## | There is a difference between machine precision and printed precision! | ## -------------------------------------------------------------------------- ## ## If you want to see more precision, do the following: print(pi, 20) ## This is a call to a function print(), asking to print 'pi' to 20 digits. ## It can't do better than 16 digits, so this is what it prints. ## Another example: print(sqrt(2), 12) ## - It is possible to write numbers so extreme that R can't represent them. ## Examples: 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 ## Try: 2e308 ## However, the following still works: 1e308 ## Hence somewhere between 1e308 and 2e308 the computer runs ## out of bits to represent the number, in which case it is ## rendered as 'Inf'. ## - [Hint for the future: ## This default behavior can be changed ## using the function 'options()'. ## Example: ## options(digits=10) ## We haven't talked about R's functions yet, so don't worry. ## ] ## ## ## ---------------------------------------------------------- ## | DECIMAL NUMBERS IN R: | ## | | ## | - Most general form using 10-based exponential notation: | ## | 123.4567E30; -123.4567e-10 | ## | | ## | - The exponential part can be missing; | ## | the decimal part cannot be missing: | ## | 10; -5; 0.01234; -12.345 # no exponential part | ## | 1e10; # decimal part 1 | ## | | ## | - Default printing precision in R: 7 decimals | ## | Internal precision: > 15 decimals | ## ---------------------------------------------------------- ## ## ##================================================================