Chapter 2 R Basics

2.1 What are R and RStudio?

version$major #
  • If not “4” or above, please upgrade your R.
rstudioapi::versionInfo()$version
  • If not “1.4.1717” or above, please upgrade your RStudio

Analogy of difference between R and RStudio.

  • R: engine, or the hardware of the entire car parts other than its dashboard.

  • RStudio: dashboard. A lot of user-friendly features for you to drive a car easily.

2.2 Environment setup

2.2.1 Project

Always start your RStudio as a project, by

  • clicking a .Rproj file; or
  • clicking the top-right corner of RStudio to start.
  • When creating a Project folder, every document in the folder can be referred relative to where xxx.Rproj file sits

For example, you have a project folder at:

C:\martin\learningR

which means there is a xxx.Rproj file as:

C:\martin\learningR\xxx.Rproj

If there is a file:

C:\martin\learningR\others\demo.R

we can refer to demo.R file two different ways:

  • Absolute path:
C:\martin\learningR\others\demo.R
  • Relative path (relative to the folder where .Rproj is) using .:
.\others\demo.R

Relative path makes your program portable – whoever copies your project folder does not have to modify file path in its programs.

2.2.2 Start a project in RStudio

On the top right corner: New Project… > New Directory > New Project

If you are in a project,

  • your top right corner will show your project name.

  • Your File panel will have a return-to-project-root icon.

  • you should save all your files within the project folder.

2.2.3 RStudio

A Integrated Developement Environment (IDE)

Tools > Global Options

  • Show white space: Code > Display > “Show whitespace characters”

  • Protect your eyes: Appearance > Editor theme: Tomorrow Night

  • Layout (optional): Pane Layout > Put Source and Console as the top two panes.

Save with encoding: Tools > Global Options > Code > Saving:

  • Default text encoding: UTF-8

This is very important if you are using non-English characters in your program.


For R markdown: File > New File > R Markdown…,

Tools >

  • Modify Keyboard Shortcuts: name=Insert Chunk.

2.2.4 R

The R environment setup we will discuss here can all be setup automatically via package econR2’s:

Addins > Setup your environment.

2.2.4.1 Package installation

Check if you can install packages successfully:

RStudio Packages tab > Install

2.2.4.2 Error message in English

Every programmers learn a lot from error messages.

Execute the following and see its error message:

2+x

If not fully in English,

run either

Sys.getenv("LANG")

or

Sys.getenv("LANGUAGE")

will show its current language setting.

You can change that by:

  • open the “.Rprofile” file in your project folder via
file.edit(".Rprofile")
  • then add the following line to the file and save it.
Sys.setenv(LANG = "en")
  • restart your project.

2.3 R Script/R Markdown

  • Console window is an interactive environment. Once you hit enter, the command is executed.

If you want to save a bunch of commands (called script) in one file, you can choose File > New File > :

  • R script (.R file): purely R commands. Good for developing an app.

  • R Markdown or R Notebook (.Rmd file): mixture of R commands and text (in a style called markdown). Good for learning, and experimenting codes (which code line to keep? which not?)

2.3.1 Code chunk

  • Run current code chunk to check if your program works.

  • You can run just one command line by hitting Ctrl+Enter or Cmd+Enter (Mac).

  • All computer language executes your script from the top all the way to the bottom. To check if the entire program works, you should:

    • Clean Environment; and

    • Run all chunks

2.4 Basic data type

For ONE value, there are three major basic types of storage:

  • character (sometimes called string):
"Mary"
'Friday'
  • numeric:
2L # integer
3.1412 # double (non-integer real number)
2 # double
  • logical
TRUE
FALSE
T
F

2.5 Collection of values

When facing a collection of values, R has two types of data collection:

  • Atomic vector: vector with all values of the same type
c("John", "Mary")
c(2, 3.1412)
c(TRUE, TRUE, F)
  • list (or general vector): vector of values of different types (though the same type is allowed).
list("John", 178, TRUE)

Atomic vector’s type is determined by its element values:

typeof(c("John", "Mary"))

list’s type is list:

typeof(list("John", 178, TRUE))

2.6 Values and element values

A single basic type value (aka singleton) such as a character ("John"), a number (5), and a logical (TRUE) is a value, but a vector itself (no matter atomoic vector or list) can mean one value too.

The following can all be considered one value:

"John"
5
TRUE
list("John", 5, TRUE)
c(5, 7, 9)

Within a vector, the values inside the vector are called its element value.

list("John", 5, TRUE) # has 3 element values.

For list it can contain almost anything inside, even another list such as:

list("John",42, list("Mary", 35)) # big list (a nested list)

Within a vector, each element value is:

  • separated by , ; and

  • not sitting inside another vector.

Therefore, the above vector has the following three element values:

"John"
42
list("Mary", 35)
  • You CANNOT say "Mary" is an element value of the big list, since it sits inside another vector.

2.7 Retrieve ONE element value by Position

Position: For a collection of element values separated by ,, each value has a position (or location) reference, with 1 be the 1st value, 2 be the 2nd value, 3 be the 3rd value, etc. (i.e. the counting of position starts from 1.)

We can retrieve a value from the vector (no matter atomic or list) using [[]] double bracket operator:

c("John", "Mary")[[1]]
c("John", "Mary")[[2]]
list("John", 178, TRUE)[[1]]
list("John", 178, TRUE)[[2]]
list("John", 178, TRUE)[[3]]

Exercise 2.1 Try [[]] on:

list("John",42, list("Mary", 35))

Chained extraction operations

[[x]] can be followed immediately by another [[y]],

  • R will resolve the extraction sequentially from the leftest extraction ([[x]]) to the rightest extraction ([[y]])
bigList <- list("John",42, list("Mary", 35))
bigList[[3]][[2]]
  • The leftest extraction [[3]], so R resolves bigList[[3]] first. It will return list("Mary", 35).

  • The rightest extraction [[2]], so R now resolves list("Mary", 35)[[2]].

  • At the end we get 35.

2.8 Binding

To operate on a value, lots of time it is more convenient if we can recreate a name as a reference to the values. When we call that name, R will know we are asking for the values it represent. This requires binding a name to some value.

When we assign a name with a value, it is called binding.

personName = c("John", "Mary")
interestingNumber = c(2, 3.1412)
covidPositive = c(TRUE, TRUE, F)

R’s special arrow binding:

personName <- c("John", "Mary")
interestingNumber <- c(2, 3.1412)
covidPositive <- c(TRUE, TRUE, F)
c("John", "Mary") -> personName
c(2, 3.1412) -> interestingNumber
c(TRUE, TRUE, F) -> covidPositive

When we bind a collection of values to a name, the name can be used to recall its underlying values:

personName <- c("John", "Mary")
personName

In addition, to retrieve:

c("John", "Mary")[[2]]

you can:

personName[[2]]

Whenever there is name calling in a code line, R will retrieve the value binding with the name first, then continue the rest of the code execution

That is why when R sees

personName[[2]]

it will resolve the value of the name (which is c("John", "Mary")[[2]]), and execute the following implicitly:

c("John", "Mary")[[2]]

Consider

element1 <- "a"
element2 <- "b"

The following code line involved with two names element1 and element2

c(element1, element2) # there are two name calls

When running the line R will replace element1 with “a” and element2 with “b” first, then execute the resulting line:

c("a", "b")

Common naming styles:

  • camel: personName <- c("John", "Mary")

  • snake: person_name <- c("John", "Mary")

Some common practice:

  • regular name starts with small case.

  • constructor (will be explained in the future) name starts with Capital letter.


A regular valid name starts with:

  • a letter;

  • or the dot not followed by a number.

A valid name (also called symbol) consists of: letters, numbers and the dot (.) or underline (_) characters.

Which one is valid? Which one is valid but not regular?

my_108_total_credits <- 15
_108_total_credits <- 15
108_total_credits <- 15
_my_108_total_credits <- 15
my.108.total_credits <- 15
.108.total_credits <- 15 
.my.108.total_credits <- 15 # start with . will hide name
`.108.total_credits` <- 15 # irregular name, ` is not part of the name
`.108.total_credits` <- 15
`108 total credits` <- 15

2.9 Concatenate

c() where c comes from concatenate, meaning chaining together all element values of input value as element values in one vector:

  • its input values must be of the same type.

  • if input value is a vector, its element values will be taken out. All taken-out element values, will be packed together as one vector.

  • Then newly packed one vector has the type as the source input values.

typeof(c("a", "b"))
typeof(c("c", "d"))
c(c("a", "b"), c("c", "d")) 
  • source values c("a", "b") and c("c", "d")

  • c(two source values) chains all values together as

c("a", "b", "c", "d")

Concatenate atomic vectors (i.e. vector of c(...)) will always result in an atomic vectors (i.e. still a c(...) vector)


c(list("a", 1), list("c", 2))
  • source values: list("a", 1) and list("c", 2).

  • c(two source values) chains all values together as

list("a", 1, "c", 2) 

Concatenate list vectors (i.e. vector of list(...)) will always result in an list vectors (i.e. still a list(...) vector)

2.10 List

list

  • can take in all types of values – even a list.
list(c("a", "b"), c("c", "d")) # (1)
  • but it does not chain element values inside each element values.
# so (1) is not the same as
list("a", "b", "c", "d")
c(c("a", "b"), c("c", "d"))[[1]]
c(c("a", "b"), c("c", "d"))[[2]]
c(c("a", "b"), c("c", "d"))[[3]]
c(c("a", "b"), c("c", "d"))[[4]]
list(c("a", "b"), c("c", "d"))[[1]]
list(c("a", "b"), c("c", "d"))[[2]]
list(c("a", "b"), c("c", "d"))[[3]] # Error

Data type coercion.

coercion <- c(c("a", "b"), c(1, 22))

typeof(coercion[[3]])

When you concatenate values of different types, since c() will return only an vector with all element values having the same type. It will choose one type and coerce other type of value into that chosen type.

Each type is designed to deal with different operations, like + is for numeric value. Operations on wrong types of values will produce error:

1 + 2
coercion[[3]] + 2

2.11 Sampled data

In social science, it is common to survey a sampled group of people from the entire population. For each interviewee, we may collect his/her age, gender, residential district, etc. These are called features of interviewee.

  • Each feature has its own type of information.

How do we store these four person’s data? (Each person’s data is called one observation.) There are two common ways to storing their data:

  • Observation by observation

  • Feature by feature

2.11.1 Observation by observation

  • store each observation completely, then put them together
list(54, "male", "north", 100) # (1)
list(32, "female", "south", 25) # (2)
list(28, "female", "east") # (3)
list(20, "male", "east", 77) # (4)

Observation (3) does not have a fourth element because it is missing. However, missing itself is still considered valuable information. We normally put down NA, i.e. 

list(28, "female", "east", NA) # (3)

Put them together

list(
  list(54, "male", "north", 100), # (1)
  list(32, "female", "south", 25), # (2)
  list(28, "female", "east", NA), # (3)
  list(20, "male", "east", 77) # (4)
)

Exercise 2.2 Why don’t we use:

c(
  list(54, "male", "north", 100), # (1)
  list(32, "female", "south", 25), # (2)
  list(28, "female", "east", NA), # (3)
  list(20, "male", "east", 77) # (4)
)

# means comment. R will ignore # and anything follows it in that line. So the above code in R’s eyes is like:

list(
  list(54, "male", "north", 100), 
  list(32, "female", "south", 25), 
  list(28, "female", "east", NA), 
  list(20, "male", "east", 77), 
)

In addition, R gives users a great flexibility of line breaking. You can break a line of code pretty much any where before the line end so long as

  • R fines the command is incomplete.
list(list(54, "male", "north", 100), 
  list(32, "female", "south", 25)) # is the same as

list(
  list(54, "male", "north", 100), 
  list(32, "female", "south", 25)) # also

list(
  list(54, "male", "north", 100), 
  list(32, "female", "south", 25)
  ) # also

list(
  list(54, 
    "male", 
    "north", 
    100), 
  list(32, "female", "south", 25)
  ) # also
3 + 2 # should not be broken into

3  # because a command line 3 can be complete command
 +2
  • R does not care indentation, so the followings are the same as above:
list(
list(54, "male", "north", 100), 
list(32, "female", "south", 25)
) # also

list(
list(54, 
  "male", 
  "north", 
  100), 
list(32, "female", "south", 25)
) # also

2.11.2 Feature by feature

  • store each feature completely, then put them together
c(54, 32, 28, 20) # age
c("male", "female", "female", "male") # gender
c("north", "south", "east", "east") # residence
c(100, 25, NA, 77) # income

Putting together

list(
  c(54, 32, 28, 20), # age
  c("male", "female", "female", "male"), # gender
  c("north", "south", "east", "east"), # residence
  c(100, 25, NA, 77) # income  
)

You can not use c() to collect all features together.

  • It will chained all inputs’ element values together.

  • Since it only accommodates element values of the same type, it will coerce them into a single type (mostly character).

c(
  c(54, 32, 28, 20), # age
  c("male", "female", "female", "male"), # gender
  c("north", "south", "east", "east"), # residence
  c(100, 25, NA, 77) # income  
)

It will become:

c(
  "54", "32", "28", "20", 
  "male", "female", "female", "male", 
  "north", "south", "east", "east", 
  "100", "25", NA, "77"   
)

Exercise 2.3 Name its values:

list(
  c(54, 32, 28, 20), # age
  c("male", "female", "female", "male"), # gender
  c("north", "south", "east", "east"), # residence
  c(100, 25, NA, 77) # income  
)

2.12 Named element values

In a vector, we can give each element value a name, using "name"=value assignment.

# atomic vector
c("John"=177, "Mary"=160, "Bill"=170)
# list
list("John"=177, "Mary"=160, "Bill"=170)
  • The name is given to element value, no matter the element value is a singleton or not.
list(
  "1st Observation"=list(54, "male", "north", 100), 
  "2nd Observation"=list(32, "female", "south", 25), 
  "3rd Observation"=list(28, "female", "east", NA), 
  "4th Observation"=list(20, "male", "east", 77), 
)
  • You can also use ``, such as list(`first Observation`=..., `second Observation`=...)

  • If name is regular, you can even ignore the quotation, such as

list(
  firstObservation=list(54, "male", "north", 100), 
  secondObservation=list(32, "female", "south", 25))

When you give element values names, you can retrieve the value not only by its position, but also by its name

personName[[2]]
personName_nameValuePair[[2]]
personName_nameValuePair[["the_second_person"]]

To give element value a name,

  • You MUST use =,

  • you CANNOT use <- or ->.

personName_nameValuePair <- c("the_1st_person" <- "John", "the_second_person" <- "Mary")
list(
  firstObservation <- list(54, "male", "north", 100), 
  secondObservation <- list(32, "female", "south", 25))
  • <- or -> is merely for binding purpose.

  • vectors with binding inside will be interpreted as name-value binding followed immediately by name call. In the list case, it is equivalent to:

# name-value binding
firstObservation <- list(54, "male", "north", 100)
secondObservation <- list(32, "female", "south", 25)
# name call
list(
  firstObservation, secondObservation 
  )
  • The element values in the list are NOT named as a result.

2.13 Retrieve element value by element name

If an element value has name (to be more precisely element name), you can extract it by its name

sample_data <- 
  list(
    "1st Observation"=list(54, "male", "north", 100), 
    "2nd Observation"=list(32, "female", "south", 25), 
    "3rd Observation"=list(28, "female", "east", NA), 
    "4th Observation"=list(20, "male", "east", 77)
  )
sample_data[["4th Observation"]]

So far we know we can use [[x]] to extract an element value of an object, where x is either

  • a location index: 1, 2, 3, etc. or;

  • an element name if the element value has a name: "value_name" where " " is required no matter value_name is regular or not.

a <- c(1, 3, elementName=7)
a[[1]]
a[["elementName"]]

But not

a[[elementName]] 
a[[`elementName`]]
  • elementName and `elementName` are both considered name call. R would try to find its binding value, and replace the object name with its binding value.

Exercise 2.4 Do you understand why the following works:

targetName <- "elementName"
a[[targetName]]

2.14 list only $ extractor

List has a privilege on extraction when you want to extract by name. Other than [["value_name"]] you can also use $"value_name" or $`value_name`, or when value_name is regular, $value_name.

sample_data <- list(
  "observation1"=list(54, "male", "north", 100), 
  "observation2"=list(32, "female", "south", 25), 
  "observation3"=list(28, "female", "east", NA), 
  "observation4"=list(20, "male", "east", 77)
)
sample_data[["observation4"]]

sample_data$"observation4"

sample_data$`observation4`

sample_data$observation4