MATH 4910/5010 - R Lab 1
This lab is designed to help you learn the basics of coding in R. Because R is not a prerequisite for the course, this guide assumes no background in the language.
Objectives:- Get familiar with the RStudio interface
- Learn Basic R Expressions
- Understand R basic data types
- Understand R data structures: vectors and data frames
- Understand R functions and control structures
You can find the R markdown template for this lab on Canvas in the course files under the "R Lab 1" folder. Follow the instructions below. Instructions in green indicate tasks that should be completed in the R markdown file for this lab.
Basic R Expressions and Data Types
Expressions can be typed directly into the R console. When you press enter, the result is immediately outputted underneath. These expressions can numerical, character, or logical.
-
Type some expressions into the R console to see what happens.
Below are some sample expressions along with their output.
Code after the
'>'
symbol indicates commands input in the R console. Expressions following the '##' characters indicate what the R console should output. For a list of R operators, see this webpage.
Numerical Expressions> 5+6
## [1] 11
> (7/3)^2
## [1] 5.444444
> pi
## [1] 3.141593
Character Expressions> "Go Pokes!!"
## [1] "Go Pokes!!"
> 'You can also use single quotes'
## [1] "You can also use single quotes"
> 'Just say "No"'
## [1] "Just say \"No\""
Logical Expressions> 11>23
## [1] FALSE
> (3^2+4^2)==5^2
## [1] TRUE
> !(TRUE & FALSE)
## [1] TRUE
Like most programming languages, expressions can be assigned to variables.
The assignment operator in R is '<-
'.
The following code assigns the value of the expression \(4-11\) to the variable 'x
'.
> x <- 4-11
->
' as follows.
> "Topology is cool!" -> y
When you assign a variable, no output is produced. To see the value of a variable, just input the variable.
> x
## [1] -7
> y
## [1] "Topology is cool!"
> x^2
## [1] 49
-
Assign the value of \( {2 \over 3} + {3 \over 4}\) to a variable '
w
'. Then, output the value \(6w\).
The class()
function returns that data type of a variable or expression.
> x <- 5!=5
> class(x)
## [1] "logical"
> class(3/4)
## [1] "numeric"
> class("3/4")
## [1] "character"
There's also a fourth data type called a factor. Factors are sort of a category or class of objects. We won't be using factors much in this course.
Vectors
Most of the time, we will be working with variables that store several values at once. These are called arrays. Arrays consist of a set of values paired with a set of indices. Arrays are indexed by the positive integers starting at 1. (This is different from most other programming languages with begin indexing at 0.) For example, here's an array with the first 8 Fibonacci numbers.
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
Value | 1 | 1 | 2 | 3 | 5 | 8 | 13 | 21 |
There are a few different types of arrays in R.
In this lab, we will focus on a couple of them.
Vectors are arrays of values with the same data type.
Vectors can be defined using the c()
function.
-
Create a vector with the first eight Fibonacci numbers using the code below.
> c(1,1,2,3,5,8,13,21)
## [1] 1 1 2 3 5 8 13 21
-
Create a vector with the strings "one", "two", and "three" with the following code.
> c("one","two","three")
## [1] "one" "two" "three"
-
Vectors always have a single data type.
See what happens when you run the following code.
> vect <- c(2,"Susan",TRUE)
> vect
vect
? -
You can see the length of a vector using the
length
function.> vect2 <- c(5,6,7,8)
> length(vect2)
## [1] 4
vect
?
A common type of vector you may want to define is a vector of numbers incremented by a constant value.
The ':
' operator can be used to create vectors which increment by 1.
> 1:100
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100"
When using the ':
' operator we can use different values to start and end at different places.
We can look at the documentation for an operator to understand its functionality using '?
'.
-
Check the documentation for '
:
' by inputting the following command into the R console.> ?':'
-
Use '
:
' to create a sequence which starts at 2.5 and counts down by one to -6.5.
If you want to see a specific entry of a vector we can indicate the indices we want in square brackets.
> x <- c(4,8,15,16,23,42)
> x[3]
## [1] 15
> x[2:4]
## [1] 8 15 16
> x[c(3,5,6)]
## [1] 16 23 42
-
Use the following code to store a sequence to the variable
my_seq
.> my_seq<-c(6, -37, 11, 43, 44, 34, 37, 36, 15, 46, -1, 26, -19, -43, -33, 1, -42, 25, 16, -18, 21, 27, -48, 7, 20, 12, -14, 39, 13, -24, 28)
Another way to create iterative sequences is the seq()
function.
Just like with operators we can use '?
' to look at the documentation for a function.
-
Check the documentation for '
seq()
' by inputting the following command into the R console.> ?seq
-
Use '
seq()
' to create a sequence which counts by 0.25 from 2 to 4.
Before when we were using the function c()
, we were actually concatenating vectors of length one together.
However, we can concatenate vectors of longer length together using the same function.
-
Try using
c()
to concatenate some vectors together.> c(2:7,14:10)
## [1] 2 3 4 5 6 7 14 13 12 11 10
> c(0,100:90)
## [1] 0 100 99 98 97 96 95 94 93 92 91 90
> v1 <- c(9,3/4,-pi)
> c(v1,22,seq(6.3,7.5,0.2))
## [1] 9.000000 0.750000 -3.141593 22.000000 6.300000 6.500000 6.700000
## [8] 6.900000 7.100000 7.300000 7.500000
Vectors can used in expressions to do computations. Operations will be applied to each value of the vector.
> x <- 1:6
> x*2
## [1] 2 4 6 8 10 12
-
What happens if we try to type in the following code?
> x <- 1:6
> y <- 0:5
> x * y
-
As you may have noticed, when more than one vector is used in an expersion, R starts by using the first term of every vector used.
Then, for each subsequent term of the output, the index of each vector used in created by one.
But what happens if the vectors have different length?
Well, let's see.
> x <- 1:6
> y <- c(1,0)
> x * y
- Let \((a_n)\) be the following sequence: \[ a_n=2,1,2,1,2,1,\ldots=\begin{cases}2 & n\text{ is odd}\\ 1 & n\text{ is even}\end{cases} \] Define the terms of the sequence \((b_n)\) as follows: \[ b_n={a_n n^2 \over 2} \] Using vectors, write code that displays the first 50 terms of \((b_n)\) in code block 1.
Data Frames
When analyzing data, we usually don't just have an arbitrary sequence of values. What we usually have is several related variables collected from observations. Each observation has a value for each of these variables (some variables could be missing if they were not observed). Essentially, what we want is a table with a column for each variable and a row for each observation. To handle this type of information we can use an R data type called a data frame.
A data frame is a collection of equal length vectors with a header row of column labels.
We usually think of data frames as having each row correspond to some observation and each column correspond to some property of the observations.
Data frames can be defined directly using the data.frame()
function.
-
Use the following code to create a data frame called
nfc_east
.> city <- c("Dallas","New York","Washington","Philadelphia")
> nickname <- c("Cowboys","Giants","Commanders","Eagles")
> super_bowls <- c(5:3,1)
> nfc_east <- data.frame(city,nickname,super_bowls)
> nfc_east
## city nickname super_bowls
## 1 Dallas Cowboys 5
## 2 New York Giants 4
## 3 Washington Commanders 3
## 4 Philadelphia Eagles 1
-
Confirm the data type of
nfc_east
using theclass()
function.
-
You can use the
View()
function to see the data insidenfc_east
in a worksheet.
-
The
colnames()
function shows use the column labels of our data.> colnames(nfc_east)
## [1] "city" "nickname" "super_bowls"
-
To see the dimensions of our data we can use the
dim()
function.> dim(nfc_east)
## [1] 4 3
-
Since our data is indexed by two numbers, a row and a column, we can use ordered pairs to call specific entries in the data frame.
> nfc_east[3,2]
## [1] "Commanders"
> nfc_east[2,]
## city nickname super_bowls
## 2 New York Giants 4
> nfc_east[,3]
## [1] 5 4 3 1
$
' operator.> nfc_east$city
## [1] "Dallas" "New York" "Washington" "Philadelphia"
-
A very neat function you can use to basic statistical analysis on data is the
summary()
function. Give it a try!> summary(nfc_east)
## city nickname super_bowls
## Length:4 Length:4 Min. :1.00
## Class :character Class :character 1st Qu.:2.50
## Mode :character Mode :character Median :3.50
## Mean :3.25
## 3rd Qu.:4.25
## Max. :5.00
-
In code block 2, write code to store the data from the following table into data a frame.
Then, summarize the data using the
summary()
function.Student Exam 1 Exam 2 Exam 3 Olive Schaefer 77 76 61 Ishaan Hoover 92 90 97 Virginia Moss 75 94 60 Porter Navarro 95 79 81 Winter Ochoa 91 89 80 Winston Jennings 71 78 85 Palmer Dunn 74 96 84 Dawson McCormick 63 72 73 Macie Stein 82 69 64 Creed Newton 83 98 68 - What was the mean score for Exam 1? Write your answer in the space marked [Answer Here 1].
We will learn more about data frames in the next lab.
Functions
We've already learned about several functions. Here are a few more convenient ones.
The functions abs()
, log()
, and sqrt()
returns respectively the absolute value, the natural log, and the square root of a number.
> abs(-3)
## [1] 3
> log(10)
## [1] 2.302585
> sqrt(2)
## [1] 1.414214
The functions sum()
, mean()
, and sd()
returns respectively the sum, the mean, and the standard deviation of a entires in a vector.
> sum(1:10)
## [1] 55
> mean(1:10)
## [1] 5.5
> sd(1:10)
## [1] 3.02765
The objects between the parentheses of a function are called arguments.
To see what arguments can be passed to a function, its documentation using '?
'.
-
View the documentation for the
rnorm()
function. What arguments can be passed to thernorm()
function, and what does the function do with them? Write your response in the space marked [Answer Here 2].
-
Check out the documentation for the
rep()
function to see what it does.
-
When passing arguments, R assumes that you are listing them in order.
However you can also pass a specific argument by using '
=
'. Input the following commands to see what happens.> rep(3,4)
> rep(4,3)
> rep(times=4,x=3)
You can also create your own functions.
-
Define a function
plus1()
using the following code.> plus1 <- function(x) x+1
> plus1(7)
## [1] 8
-
Here's a function that computes dot products.
> dot_prod <- function(x,y) sum(x*y)
> dot_prod(c(1,2,3),c(3,2,1))
## [1] 10
-
In code block 3, create a function
d_s()
which can be passed two numerical vector arguments of the same length and returns the distance between them in the standard metric. Recall that given two points \(x,y\in\mathbb{R}^n\) where \(x=(x_1,\ldots,x_n)\) and \(x=(y_1,\ldots,y_n)\), \[ d_s(x,y)=\sqrt{(x_1-y_1)^2+\cdots+(x_n-y_n)^2} . \] Use this function to find the distance between the points \((-1,2,0,5)\) and \((1,-5,1,1)\).
Control Structures
In R, you get very far just using expressions and functions.
However, sometimes, you may want to use more complex programming structures.
Two of the most basic control structures are if
statements, which are used to select a choice of commands based on a condition,
and loops, which allows us to execute the same set of commands repeatedly while iterating through data.
Here's the for
loop syntax:
if
statement:if (<boolean expression>) {<expression if true>}
if-else
statement:if (<boolean expression>) {<expression if true>} else {<expression if false>}
Here's the loop syntax:
for
loop:for (<iterated element> in <vector>) {<looped expression>}
while
statement:while (<boolean expression>) {<looped expression>}
-
Assign a number to the variable
x
. Test the following code.> if (x>0) {print("positive")}
> if (x<0) {print("negative")}
> if (x>0) {print("positive");} else {if (x<0) {print("negative");}else{print("zero");}}
-
Here's an example of some loops.
See what they do!
> for(i in 1:10) {print(rep("Hi",i))}
> s <- 0
> i <- 0
> diff <- 1
> while(diff>0.0005) {
diff <- 1/factorial(i);
s <- s+diff;
i=i+1;
}
> s
-
This loop creates a vector of prime numbers between 1 and 50.
> primes <- c()
> for(i in 1:50) {if(i > 1 & (i%%2 != 0 | i==2) & (i%%3 != 0 | i==3) & (i%%5 != 0 | i==5) & (i%%7 != 0 | i==7)){primes<-c(primes,i);}}
> primes
## [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47
-
In code block 4, use
if
statements andfor
loops to create a vector of all the numbers between 1 and 100 which are both multiples of 3 and factors of 288.
Comments
Sometimes R code can be hard to read.
To assist readers, you can add comments to your code using '#
'.
-
Copy the definition below into code block 5.
ps <- function(s,n){sum(s[1:n])}
# The function ps(s,n) returns the sum of the first n terms of the vector s.
-
Use the
ps()
function, to display the first 10 partial sums of the following sequence. \[ (a_n)=(1,-{1 \over 2},{1 \over 3},-{1 \over 4},{1 \over 5},-{1 \over 6},\ldots) \]
Congratulations! You've mastered the basics of R programming.