Chapter 3 Element Values
3.1 More on list
When we use list()
, it is like we are making a list. There are two types of list.
- non-labelled items
- Labelled items
Shopping list (non-labelled items)
milk
apple
pork
Here 1, 2 and 3 are not their label. It simply represents the position of the item on the shopping list, which is like the position of element value in a list.
It is like
list(
"milk",
"apple",
"pork"
)
My enrolled course list (labelled list)
Compulsory
Principle of Economics,
Calculus,
Accounting
Selective
Philosophy,
Chinese Literature
It is like
list(
Compulsory =
c("Principle of Economics",
"Calculus",
"Accounting"),
Selective =
c("Philosophy",
"Chinese Literature")
)
When we make a list it can be nested, like a concert event:
name
- Gianandrea Noseda conducts Schumann and Mendelssohn — With Mikhail Pletnev
time
- 2021, Oct, 01
Program
Robert Schumann, Piano Concerto in A Minor, Op. 54
Felix Mendelssohn-Bartholdy, Symphony No. 4 in A Major, Op. 90, “Italian”
list(
name="Gianandrea Noseda conducts Schumann and Mendelssohn — With Mikhail Pletnev",
time="2021, Oct, 01",
program=list(
"Robert Schumann, Piano Concerto in A Minor, Op. 54",
"Felix Mendelssohn-Bartholdy, Symphony No. 4 in A Major, Op. 90, Italian"
) )
When there are several concerts, the list can look like
name
time
program
name
time
program
Here the numbers indicate the position of element value. As a list, it would be like:
list(
list(
name="name1",
time="time1",
program=list(
"music1",
"music2"
)
),list(
name="name2",
time="time2",
program=list(
"musicA",
"musicB"
)
) )
Exercise 3.1 How would you store the following lists:
- Shopping list: egg x 2 dozens, milk x 1L, vegetable x 2 kinds.
- Make a list of two courses your take this semester with course name, time, and place
3.1.1 JSON data
JSON (Javascript Object Notation) is the most common data format that transmit across internet.
browseURL("https://data.gov.tw/dataset/6013")
install.packages("jsonlite")
To use a function from a package, we can use ::
to use the function:
::function_name package_name
# Observation by observation
<-
concerts_obo ::fromJSON("https://cloud.culture.tw/frontsite/trans/SearchShowAction.do?method=doFindTypeJ&category=17", simplifyDataFrame = F)
jsonlite
# Feature by feature
<-
concerts_fbf ::fromJSON("https://cloud.culture.tw/frontsite/trans/SearchShowAction.do?method=doFindTypeJ&category=17", simplifyDataFrame = T) jsonlite
fromJSON
can take both an URL or a file path of a .json filesimplifyDataFrame
controls importing a json data source observation-by-observation (F) or feature-by-feature (T).- Feature by feature data set is called a data frame.
concerts_obo:
Take the first observation as an example. Its list is like:
version: 1.4
UID: 5f96275fd083a34ec4da005b
title: 2021爛泥發芽高雄場 【我就爛 ! 泥發財 !】
category: 17
showInfo:
time
location
locationName
:
endTime
concerts_fbf
View(concerts_fbf)
A lot of things in the universe can be recorded in human language, especially computer language. Taking Programming for Data Scient is like taking the red pill in the move Matrix. This is what you will see the world eventually – only if the red pill does not kill you.
3.2 Retrieve multiple element values
[x]
where x
is either
a numeric atomic vector, such as
c(1,2,5)
; ora character atomic vector with element names.
<-
JohnsFamily list(
name = "John",
age = 35,
spouse = list(
name = "Mary",
age = 32),
children = list(
list(
name = "Bill",
age = 5),
list(
name = "Jane",
age = 3)
) )
name: “John”
age: 35
spouse:
- name: “Marty
- age: 32
- name: “Marty
children:
name: “Bill”
age: 5
name: “Jane”
age: 3
- There are 4 element values inside
JohnsFamily
list.
# checking how many element values
length(JohnsFamily)
# checking element names
names(JohnsFamily)
In computer language, objects like length
and names
are function type objects. A function object has two different calls:
- name call:
shows what inside the function, which is like the value binding with the name
length names
Function value is a body consists of codes that can be executed with proper call, which is
- function call
<- list("John", 35, covid19Positive=FALSE)
fncallExample length(fncallExample)
names(fncallExample)
Function call has ()
which sometimes needs values for the codes inside the function body to work properly. If the body codes does not require any value input to work, you still need to attach ()
to the function name to form a function call.
Sys.Date() # show the date today
John’s name and age
c(1,2)]
JohnsFamily[c("name", "age")] JohnsFamily[
When you want to retrieve multiple element values from an object, the resulting vector will be the same type as its source object.
- Source object
JohnsFamily
is of list type.
the returned vector will also be of list type.
When retrieving one element value it is better to use [[]]
(or $
only for list) instead of []
.
John’s age
"age"]] # or JohnsFamily[[2]]
JohnsFamily[["age"] # or JohnsFamily[2] JohnsFamily[
JohnsFamily[["age"]]
indeed retrieves the element value.JohnsFamily["age"]
create a vector of the same type asJohnsFamily
to store the retrieved element value.
[]
is designed to retrieve multiple element values. Its resulting vector will always follow the same type as the source object.
Exercise 3.2 From concerts_obo,
Retrieve the 3rd and 5th concert observations and bind the result value with an object name concerts_sampled.
For the second concert, what are its first show’s time and location? Retrieve the information and bind the value with firstShowInfo.
3.3 Replacement
[[]]<-
, []<-
, $ <-
<- JohnsFamily
JohnsFamilyCopy
1]] <- "Watson"
JohnsFamilyCopy[[1]] JohnsFamilyCopy[[
When replace multiple element values at once, you need to respect their source structure, that is
- the value has to follow the type of vector (atomic vector or list) as you retrieve multiple element values.
element 1 and 2 are from a source of list
with names name
and age
.
c(1,2)] <-
JohnsFamilyCopy[list(name="Watson", age=37)
c(1,2)] JohnsFamilyCopy[
Replacement is about replacing element values not element name. Therefore, you can ignore element names and simply:
c(1,2)] <-
JohnsFamilyCopy[list("Dickens", 32)
c(1,2)] JohnsFamilyCopy[
You can also use
-> [[]]
,-> []
,-> $
"Watson" -> JohnsFamilyCopy[[1]]
[[]]=
,[]=
,$ =
1]] = "Watson" JohnsFamilyCopy[[
You can chain retrieval operators:
$spouse[["age"]]
JohnsFamilyCopy$children[[1]]$name JohnsFamilyCopy
Exercise 3.3 From concerts_obo, due to Covid-19 the 3rd concert’s 1st show time is changed to “2022/01/31 19:30:00” and the location is changed to “Taichung Opera House”, please change the information
3.4 Add element values
3.4.1 How to
Adding an element value is like
Retrieve a non-existent element value and bind a value to it
<- c("John", "Mary", "Bill")
example1 # retrieve an non-existing element name "person4"
"person4"]] <- "Ken"
example1[[# retrieve two non-existing element positions
c(7,8)] <- c("person7"="Jack", "Janem")
example1[ example1
- Be aware that
[] <-
has to bind a value that is consistent with source object type.example1[c(7, 8)]
is a character atomic vector, soexample1[c(7, 8)] <-
can only bind with a character atomic vector.
<- JohnsFamily
JohnsFamilyCopy2 # Add a new born
$children[[3]] <-
JohnsFamilyCopy2list(name="Lisa", age=0) # $ is for list only
Exercise 3.4 How do you add the new born if you use JohnsFamilyCopy3$children[3] <-
:
<- JohnsFamily
JohnsFamilyCopy3 $children[3] <- JohnsFamilyCopy3
3.4.2 Data construction
Adding element values is commonly used to construct whole vector.
For example, we can directly construct the following whole vector
<- c("001"=177, "002"=183, "003"=173) height
Or we can
<- numeric(0) # Declaration
height
"001"]] <- 177
height[[c("002", "003")] <- c(183, 173) height[
- The first step creates an empty vector with type numeric and bind that value to a name called
height
.
When you bind a name with no value but its value type, it is called declaration.
- Declaration is necessary since without it there is no value to operate retrieval on – even our goal is to add values.
Each type of vector has a different declaration function to call:
# declare a numeric object
<- numeric(0)
object_numeric # declare a character object
<- character(0)
object_character # declare a logical object
<- logical(0) object_logical
- You can omit
0
if you want.
Be aware! To declare an empty list object:
<- list() object_list
- NOT
list(0)
.
3.4.3 Data construction example
Declare-then-add method is extremely convenient for list construction since a lot of time we know how we want to retrieve the information later.
Consider a course a student takes in school year 108, semester 1. The course is
name: “programming for data science”
credit: 2
We know this course information can be formed as:
<-
course1 list(
name="programming for data science",
credit=2
)
Since the student can takes many courses in the semester. To construct a data set classSchedule to contain all possible courses’ information, we can think about
- How do we want to retrieve course1 later from classSchedule.
Suppose the following is the retrieval method you plan to use to retrieve course1 later:
# Retrieve the 1st course in semester 1, school year 108
$yr108$semester1[[1]] classSchedule
Then you can:
# step 1 (do it only once): declare
<- list() classSchedule
- Be careful it is not
list(0)
.
# step 2: add
$yr108$semester1[[1]] <- course1 classSchedule
This will result in a list data set that is the same as the direct approach delivers (the below one):
<- list(
classSchedule yr108 = list(
semester1 = list(
list(
name="programming for data science",
credit=2
)
)
) )
This requires strong familiarity with nested list usage In this case, we can declare an empty list and use add method to complete the list:
<- list() classSchedule
- Be careful it is not
list(0)
.
# Retrieve the course name of the the 1st course in semester 1, school year 108
$yr108$semester[[1]]$name <- "programming for data science"
classSchedule# Retrieve the course credit of the the 1st course in semester 1, school year 108
$yr108$semester[[1]]$credit <- 2 classSchedule
Exercise 3.5 The 5th concert decides to add one more show which shares the same show information as its first show except the date is 2 days later.
3.5 Remove element values
It is a save-what-you-want method.
Retrieve element values you want and bind it with the source object again
<- source_object_retrieve_what_you_want source_object
Remove “Jack”
<- c("John", "Mary", "Bill", person4="Jack") example2
- Retrieve element values you want
c(1, 2, 3)] example2[
- Bind 1 with the source object again
<- example2[c(1, 2, 3)] example2
Sequence generator.
From number n to number m, each increases (or decreases if m<n) by 1 :
1:3 # from 1 to 3, each increases by 1
7:11
2:-1 # from 2 to -1, each decreases by 1
c(1,2,3)] # the same as
example1[1:3] example1[
From number n to number m, each increases by k :
1:3 # the same as
seq(from=1, to=3, by=1) # from 1 to 3 increase by 1
seq(from=3, to=11, by=4) # from 3 to 11 increase by 4
Divide the interval from number n to number m by q equal length cuts (including cuts m and n)
seq(from=3, to=11, length.out=10) # divide interval [3, 11] into 10 equal length cuts
R’s []
retrieval can take position-exclusive indices, which is -c(position indices)
which will retrieve all element values but those specified in the position indices.
<- c("John", "Mary", "Bill", person4="Jack")
example3
## inclusive approach
c(1, 2, 3)]
example3[
## exclusive approach
-c(4)]
example3[
## multiple exclusion is possible
-c(1, 4)] example3[
Removal with exclusive indexation:
<- c("John", "Mary", "Bill", person4="Jack")
example4
# Object source <- retrieval of elements to keep
<-
example4 -c(4)]
example4[
example4
Be careful. Exclusive indexation does not work on element names since you can not take a negative sign on characters:
<- c("John", "Mary", "Bill", person4="Jack")
example5
# ERROR
-c("person4")] example5[
If you use :
to generate exclusion sequence:
## exclude 2 to 4
-c(2, 3, 4)] example3[
The negative indices:
-c(2, 3, 4)
is not the same as:
-2:4 # which is from -2 to 4
If you want R to do part of the command first, you can enclose that part by ()
:
# generate 2:4 first then take negative sign
-(2:4)
- R will do
2:4
first then take-
.
Exercise 3.6 Due to Covid-19 concerts 2, 5, 7 are cancelled. Remove those three concerts from concerts_obo.
Exercise 3.7 If John divorced Mary, how do you change the record by removing the spouse element?
<- JohnsFamily JohnsFamilyCopy4
list has an external removal technique, which is to bind NULL with the retrieved element value.
<- JohnsFamilyCopy6 <- JohnsFamilyCopy7 <- JohnsFamilyCopy8 <- JohnsFamily JohnsFamilyCopy5
Techniques that apply to both atomic vector and list:
<-
JohnsFamilyCopy5 -c(3)] JohnsFamilyCopy5[
Techniques that apply only to list
$spouse <- NULL JohnsFamilyCopy6
Remove multiple elements at once:
c(3, 4)] <- NULL JohnsFamilyCopy7[
3.6 Example on data.taipei
There are two ways to access the data:
Download .csv (comma-separated values file) and import to R:
- Environment > Import Dataset > From Text (readr)
API approach
3.6.1 Download .csv approach
Sometimes you need to adjust encoding for data import:
- Locale > Encoding: BIG5
3.6.2 API approach
install.packages("httr")
<-
mrtStationAds ::content(httr::GET("https://data.taipei/api/v1/dataset/91290609-2b8b-4130-8ce9-e6085529bc46?scope=resourceAquire&limit=1000")) httr
Resource endpoint: https://data.taipei/api/v1/dataset/91290609-2b8b-4130-8ce9-e6085529bc46
Query parameter and parameter value:
Follow resource endpoint after
?
(query sign)In
query_parameter=parameter_value
pair and separate each pair by&
.
http verb
GET
:- The web action to take regarding that resource
The Maokong Gondola (貓空纜車) Station Data: https://ptx.transportdata.tw/MOTC?t=Rail&v=2#!/Metro/MetroApi_Station
<- httr::content(httr::GET("https://ptx.transportdata.tw/MOTC/v2/Rail/Metro/Station/TRTC?$top=30&$format=JSON")) maokongGondola
API approach obtains data on the run. When the endpoint is a real-time (即時) data endpoint, you can build a real-time service app based on API request, like the app Taipei Bus (台北等公車).
API retrieved data mostly follows UTF-8 encoding (a standard encoding system applied on non-English characters) which avoids encoding/decoding issues as we face in .csv download situation.
3.6.3 Saving your data
Through out the semester we will build an app. Whatever information the app needs is better organized as a list. We can use declare-then-add approach to build up the app information data, say myApp
.
<- list() myApp
$data[["臺北捷運車站廣告契約出租資料"]] <- mrtStationAds
myApp$data[["貓空纜車車站資料"]] <- maokongGondola myApp
Save myApp
saveRDS(myApp, file="110-1-r4ds-app.rds")
Load your data
= readRDS("110-1-r4ds-app.rds") myApp