基礎資料框處理

A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier.

Hadley Wickham

成功將資料輸入 R 語言之後,初學者會花很多時間在處理資料上面,所以我們特別將基礎的技巧整理在一個章節中,我們探討的技巧將專注在資料框(data.frame)這個資料結構,這是多數 R 使用者在做資料科學應用時最常面對的結構。

資料框的維度與外觀

read.csv() 函數讀入一個副檔名為 .csv 的文字檔案,它記錄了 1995 至 1996 年球季的美國職籃(NBA)芝加哥公牛隊球員名單與一些球員基本資訊。接著可以使用 nrow()ncol()dim() 這三個內建函數可以幫助我們暸解所輸入資料框的列數、欄數與維度資訊。其中 nrow() 函數命名是 number of rows 的縮寫, ncol() 函數命名是 number of columns 的縮寫, dim() 函數命名是 dimensions 的縮寫,知道這些函數的全稱,可以有效幫助我們記憶它們。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url)
nrow(chicago_bulls)
ncol(chicago_bulls)
dim(chicago_bulls)
1
2
3
4
5
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url)
## > nrow(chicago_bulls)
## [1] 15
## > ncol(chicago_bulls)
## [1] 7
## > dim(chicago_bulls)
## [1] 15  7
1
2
3
4
5
6
7
8

使用 View() 函數可以將整個資料框以美觀的樣式顯示在程式碼(Script)區塊。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url)
View(chicago_bulls)
1
2
3
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url)
## > View(chicago_bulls)
1
2
3

使用 View() 函數可以將整個資料框以美觀的樣式顯示

在面對觀測值眾多的資料框我們不會呼叫 View() 函數,因為那將耗時且耗費較多電腦資源,這時會改使用 head()tail() 這兩個函數可以得知部分資料框的外觀,顯示在終端機(Console)區塊,預設印出前六列或後六列觀測值(含變數名稱。)

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url)
head(chicago_bulls)
tail(chicago_bulls)
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url)
## > head(chicago_bulls)
##   No.        Player Pos   Ht  Wt        Birth.Date                                            College
## 1   0   Randy Brown  PG  6-2 190      May 22, 1968 University of Houston, New Mexico State University
## 2  30  Jud Buechler  SF  6-6 220     June 19, 1968                              University of Arizona
## 3  35  Jason Caffey  PF  6-8 255     June 12, 1973                              University of Alabama
## 4  53 James Edwards   C  7-0 225 November 22, 1955                           University of Washington
## 5  54    Jack Haley   C 6-10 240  January 27, 1964              University of California, Los Angeles
## 6   9    Ron Harper  PG  6-6 185  January 20, 1964                                   Miami University
## > tail(chicago_bulls)
##    No.          Player Pos   Ht  Wt         Birth.Date                                College
## 10  13     Luc Longley   C  7-2 265   January 19, 1969               University of New Mexico
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965         University of Central Arkansas
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961 Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964        Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                     Providence College
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                  St. John's University
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

使用 colnames() 函數可以得知資料框的變數名稱、使用 row.names() 可以得知資料框的列索引值。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url)
colnames(chicago_bulls)
row.names(chicago_bulls)
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url)
## > colnames(chicago_bulls)
## [1] "No."        "Player"     "Pos"        "Ht"         "Wt"         "Birth.Date" "College"   
## > row.names(chicago_bulls)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
1
2
3
4
5
6

資料框的詳細資訊

使用 summary() 函數可以得知每一個變數的描述性統計量。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
summary(chicago_bulls)
1
2
3
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > summary(chicago_bulls)
##       No.           Player              Pos                 Ht                  Wt         Birth.Date       
##  Min.   : 0.00   Length:15          Length:15          Length:15          Min.   :175.0   Length:15         
##  1st Qu.:11.00   Class :character   Class :character   Class :character   1st Qu.:193.5   Class :character  
##  Median :25.00   Mode  :character   Mode  :character   Mode  :character   Median :220.0   Mode  :character  
##  Mean   :29.13                                                            Mean   :219.0                     
##  3rd Qu.:34.50                                                            3rd Qu.:242.5                     
##  Max.   :91.00                                                            Max.   :265.0                     
##    College         
##  Length:15         
##  Class :character  
##  Mode  :character
1
2
3
4
5
6
7
8
9
10
11
12
13
14

使用 str() 函數可以得知資料框複合式的資訊,包含像是資料結構的種類、觀測值數、變數個數、變數名稱與前幾筆觀測值等,str() 函數的命名是 structure 的縮寫。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
str(chicago_bulls)
1
2
3
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > str(chicago_bulls)
## 'data.frame': 15 obs. of  7 variables:
##  $ No.       : int  0 30 35 53 54 9 23 25 7 13 ...
##  $ Player    : chr  "Randy Brown" "Jud Buechler" "Jason Caffey" "James Edwards" ...
##  $ Pos       : chr  "PG" "SF" "PF" "C" ...
##  $ Ht        : chr  "6-2" "6-6" "6-8" "7-0" ...
##  $ Wt        : int  190 220 255 225 240 185 195 175 192 265 ...
##  $ Birth.Date: chr  "May 22, 1968" "June 19, 1968" "June 12, 1973" "November 22, 1955" ...
##  $ College   : chr  "University of Houston, New Mexico State University" "University of Arizona" "University of Alabama" "University of Washington" ...
1
2
3
4
5
6
7
8
9
10
11

解構資料框

一個資料框可以被解構為長度為 1 的向量、長度大於 1 的向量與較小的資料框(subset),在資料結構中已經有初步的認識,但為了內容的完整性,讓我們再複習一下相關的技巧。首先是透過 [m, n] 可以從資料框中將位於第 m 列、第 n 欄的資料取出,成為長度為 1 的向量。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
mj <- chicago_bulls[7, "Player"]
mj
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > mj <- chicago_bulls[7, "Player"]
## > mj
## [1] "Michael Jordan"
1
2
3
4
5

透過 [, n] 可以從資料框中將位於第 n 欄的資料取出,成為長度為 m 的向量。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
players <- chicago_bulls[, "Player"]
players
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > players <- chicago_bulls[, "Player"]
## > players
##  [1] "Randy Brown"     "Jud Buechler"    "Jason Caffey"    "James Edwards"   "Jack Haley"      "Ron Harper"     
##  [7] "Michael Jordan"  "Steve Kerr"      "Toni Kukoc"      "Luc Longley"     "Scottie Pippen"  "Dennis Rodman"  
## [13] "John Salley"     "Dickey Simpkins" "Bill Wennington"
1
2
3
4
5
6
7

透過 [m, ]可以從資料框中將位於第 m 列的資料取出,成為外觀為 1xn 的較小資料框。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
trio <- chicago_bulls[c(7, 11, 12), ]
trio
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > trio <- chicago_bulls[c(7, 11, 12), ]
## > trio
##    No.         Player Pos  Ht  Wt         Birth.Date                                College
## 7   23 Michael Jordan  SG 6-6 195  February 17, 1963           University of North Carolina
## 11  33 Scottie Pippen  SF 6-8 210 September 25, 1965         University of Central Arkansas
## 12  91  Dennis Rodman  PF 6-7 210       May 13, 1961 Southeastern Oklahoma State University
1
2
3
4
5
6
7
8

除了透過 [m, ] 能夠將原始資料框解構為較小資料框,也能夠使用判斷條件產生邏輯值向量選擇部分的資料框,也是實務上較常使用的方式。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
the_trio <- chicago_bulls$Player %in% c("Michael Jordan", "Scottie Pippen", "Dennis Rodman")
the_trio
trio <- chicago_bulls[the_trio, ]
trio
1
2
3
4
5
6
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > the_trio <- chicago_bulls$Player %in% c("Michael Jordan", "Scottie Pippen", "Dennis Rodman")
## > the_trio
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## > trio <- chicago_bulls[the_trio, ]
## > trio
##    No.         Player Pos  Ht  Wt         Birth.Date                                College
## 7   23 Michael Jordan  SG 6-6 195  February 17, 1963           University of North Carolina
## 11  33 Scottie Pippen  SF 6-8 210 September 25, 1965         University of Central Arkansas
## 12  91  Dennis Rodman  PF 6-7 210       May 13, 1961 Southeastern Oklahoma State University
1
2
3
4
5
6
7
8
9
10
11

假如希望融入更多的判斷條件,可以使用 &(and)或者 |(or)運算子將邏輯值向量交集或者聯集。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
# 聯集
logical_union <- chicago_bulls$Player == "Michael Jordan" | chicago_bulls$Player == "Scottie Pippen" | chicago_bulls$Player == "Dennis Rodman"
logical_union
trio <- chicago_bulls[logical_union, ]
trio
# 交集
logical_intersection <- chicago_bulls$Pos != "PG" & chicago_bulls$Player != "Toni Kukoc" & chicago_bulls$Wt <= 210
logical_intersection
trio <- chicago_bulls[logical_intersection, ]
trio
1
2
3
4
5
6
7
8
9
10
11
12
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > # 聯集
## > logical_union <- chicago_bulls$Player == "Michael Jordan" | chicago_bulls$Player == "Scottie Pippen" | chicago_bulls$Player == "Dennis Rodman"
## > logical_union
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## > trio <- chicago_bulls[logical_union, ]
## > trio
##    No.         Player Pos  Ht  Wt         Birth.Date                                College
## 7   23 Michael Jordan  SG 6-6 195  February 17, 1963           University of North Carolina
## 11  33 Scottie Pippen  SF 6-8 210 September 25, 1965         University of Central Arkansas
## 12  91  Dennis Rodman  PF 6-7 210       May 13, 1961 Southeastern Oklahoma State University
## > # 交集
## > logical_intersection <- chicago_bulls$Pos != "PG" & chicago_bulls$Player != "Toni Kukoc" & chicago_bulls$Wt <= 210
## > logical_intersection
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## > trio <- chicago_bulls[logical_intersection, ]
## > trio
##    No.         Player Pos  Ht  Wt         Birth.Date                                College
## 7   23 Michael Jordan  SG 6-6 195  February 17, 1963           University of North Carolina
## 11  33 Scottie Pippen  SF 6-8 210 September 25, 1965         University of Central Arkansas
## 12  91  Dennis Rodman  PF 6-7 210       May 13, 1961 Southeastern Oklahoma State University
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

很多有其他程式語言編寫經驗的 R 語言使用者,會在這裡問一個問題:使用 &&& 或者使用 ||| 的差別為何呢?在 R 語言中連結長度 1 的邏輯值向量,我們使用 &&||

player_name <- "Michael Jordan"
player_name == "Michael Jordan" || player_name == "Scottie Pippen"
player_name == "Michael Jordan" && player_name == "Scottie Pippen"
1
2
3
## > player_name <- "Michael Jordan"
## > player_name == "Michael Jordan" || player_name == "Scottie Pippen"
## [1] TRUE
## > player_name == "Michael Jordan" && player_name == "Scottie Pippen"
## [1] FALSE
1
2
3
4
5

假如連結長度大於 1 的邏輯值向量,我們使用 &|

rio <- c("Michael Jordan", "Scottie Pippen", "Dennis Rodman")
trio == "Michael Jordan" | trio == "Scottie Pippen"
trio == "Michael Jordan" & trio == "Scottie Pippen"
1
2
3
## > trio <- c("Michael Jordan", "Scottie Pippen", "Dennis Rodman")
## > trio == "Michael Jordan" | trio == "Scottie Pippen"
## [1]  TRUE  TRUE FALSE
## > trio == "Michael Jordan" & trio == "Scottie Pippen"
## [1] FALSE FALSE FALSE
1
2
3
4
5

連結長度 1 的邏輯值向量若使用 &| 並不會影響結果。

player_name <- "Michael Jordan"
player_name == "Michael Jordan" | player_name == "Scottie Pippen"
player_name == "Michael Jordan" & player_name == "Scottie Pippen"
1
2
3
## > player_name <- "Michael Jordan"
## > player_name == "Michael Jordan" | player_name == "Scottie Pippen"
## [1] TRUE
## > player_name == "Michael Jordan" & player_name == "Scottie Pippen"
## [1] FALSE
1
2
3
4
5

然而在連結長度大於 1 的邏輯值向量時若使用 &&||,就只會納入位於第一個索引值的邏輯值進行判斷,這不是我們要的結果。

trio <- c("Michael Jordan", "Scottie Pippen", "Dennis Rodman")
trio == "Michael Jordan" || trio == "Scottie Pippen"
trio == "Michael Jordan" && trio == "Scottie Pippen"
1
2
3
## > trio <- c("Michael Jordan", "Scottie Pippen", "Dennis Rodman")
## > trio == "Michael Jordan" || trio == "Scottie Pippen"
## [1] TRUE
## > trio == "Michael Jordan" && trio == "Scottie Pippen"
## [1] FALSE
1
2
3
4
5

假如您是對語法有潔癖的使用者,這是特別為您準備的說明;如果您是一個大而化之的使用者,我們建議您一律使用 &| 即可。

新增與刪除變數、觀測值

直接將欲新增的變數以向量宣告後指派給既有的資料框就能夠完成。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
chicago_bulls$is_starting <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)
chicago_bulls
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > chicago_bulls$is_starting <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College is_starting
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University       FALSE
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona       FALSE
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama       FALSE
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington       FALSE
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles       FALSE
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University        TRUE
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina        TRUE
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona       FALSE
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                          FALSE
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico        TRUE
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas        TRUE
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University        TRUE
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology       FALSE
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College       FALSE
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University       FALSE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

或者將衍生的計算邏輯寫作為函數後以 sapply() 映射至既有的向量,再新增至資料框。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
chicago_bulls$is_starting <- sapply(chicago_bulls$Player, FUN = function(x) x %in% c("Ron Harper", "Michael Jordan", "Scottie Pippen", "Dennis Rodman", "Luc Longley"))
chicago_bulls
1
2
3
4
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > chicago_bulls$is_starting <- sapply(chicago_bulls$Player, FUN = function(x) x %in% c("Ron Harper", "Michael Jordan", "Scottie Pippen", "Dennis Rodman", "Luc Longley"))
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College is_starting
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University       FALSE
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona       FALSE
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama       FALSE
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington       FALSE
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles       FALSE
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University        TRUE
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina        TRUE
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona       FALSE
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                          FALSE
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico        TRUE
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas        TRUE
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University        TRUE
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology       FALSE
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College       FALSE
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University       FALSE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

將欲刪除的變數指派為 NULL 就能完成刪除。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
chicago_bulls$is_starting <- sapply(chicago_bulls$Player, FUN = function(x) x %in% c("Ron Harper", "Michael Jordan", "Scottie Pippen", "Dennis Rodman", "Luc Longley"))
chicago_bulls
chicago_bulls$is_starting <- NULL
chicago_bulls
1
2
3
4
5
6
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > chicago_bulls$is_starting <- sapply(chicago_bulls$Player, FUN = function(x) x %in% c("Ron Harper", "Michael Jordan", "Scottie Pippen", "Dennis Rodman", "Luc Longley"))
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College is_starting
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University       FALSE
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona       FALSE
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama       FALSE
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington       FALSE
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles       FALSE
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University        TRUE
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina        TRUE
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona       FALSE
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                          FALSE
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico        TRUE
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas        TRUE
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University        TRUE
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology       FALSE
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College       FALSE
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University       FALSE
## > chicago_bulls$is_starting <- NULL
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                   
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

將欲新增的觀測值以資料框的資料結構宣告,再以 rbind() 函數完成新增。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
front_courts <- chicago_bulls[chicago_bulls$Pos %in% c("SF", "PF", "C"), ]
back_courts <- chicago_bulls[chicago_bulls$Pos %in% c("PG", "SG"), ]
back_courts
front_courts
rbind(back_courts, front_courts)
1
2
3
4
5
6
7
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > front_courts <- chicago_bulls[chicago_bulls$Pos %in% c("SF", "PF", "C"), ]
## > back_courts <- chicago_bulls[chicago_bulls$Pos %in% c("PG", "SG"), ]
## > back_courts
##   No.         Player Pos  Ht  Wt         Birth.Date                                            College
## 1   0    Randy Brown  PG 6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 6   9     Ron Harper  PG 6-6 185   January 20, 1964                                   Miami University
## 7  23 Michael Jordan  SG 6-6 195  February 17, 1963                       University of North Carolina
## 8  25     Steve Kerr  PG 6-3 175 September 27, 1965                              University of Arizona
## > front_courts
##    No.          Player Pos   Ht  Wt         Birth.Date                                College
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                  University of Arizona
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                  University of Alabama
## 4   53   James Edwards   C  7-0 225  November 22, 1955               University of Washington
## 5   54      Jack Haley   C 6-10 240   January 27, 1964  University of California, Los Angeles
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                       
## 10  13     Luc Longley   C  7-2 265   January 19, 1969               University of New Mexico
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965         University of Central Arkansas
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961 Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964        Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                     Providence College
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                  St. John's University
## > rbind(back_courts, front_courts)
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                   
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

刪除觀測值則應用解構資料框的處理方式。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
front_courts <- chicago_bulls[chicago_bulls$Pos %in% c("SF", "PF", "C"), ]
front_court_rows <- as.numeric(row.names(front_courts))
front_court_rows
back_courts <- chicago_bulls[-front_court_rows, ]
back_courts
1
2
3
4
5
6
7
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > front_courts <- chicago_bulls[chicago_bulls$Pos %in% c("SF", "PF", "C"), ]
## > front_court_rows <- as.numeric(row.names(front_courts))
## > front_court_rows
##  [1]  2  3  4  5  9 10 11 12 13 14 15
## > back_courts <- chicago_bulls[-front_court_rows, ]
## > back_courts
##   No.         Player Pos  Ht  Wt         Birth.Date                                            College
## 1   0    Randy Brown  PG 6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 6   9     Ron Harper  PG 6-6 185   January 20, 1964                                   Miami University
## 7  23 Michael Jordan  SG 6-6 195  February 17, 1963                       University of North Carolina
## 8  25     Steve Kerr  PG 6-3 175 September 27, 1965                              University of Arizona
1
2
3
4
5
6
7
8
9
10
11
12
13

調整變數

利用 colnames() 函數能夠為資料框的變數重新命名。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
colnames(chicago_bulls)
col_names <- gsub(tolower(colnames(chicago_bulls)), pattern = "\\.", replacement = "_")
colnames(chicago_bulls) <- col_names
colnames(chicago_bulls)
1
2
3
4
5
6
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > colnames(chicago_bulls)
## [1] "No."        "Player"     "Pos"        "Ht"         "Wt"         "Birth.Date" "College"   
## > col_names <- gsub(tolower(colnames(chicago_bulls)), pattern = "\\.", replacement = "_")
## > colnames(chicago_bulls) <- col_names
## > colnames(chicago_bulls)
## [1] "no_"        "player"     "pos"        "ht"         "wt"         "birth_date" "college"
1
2
3
4
5
6
7
8

調整變數位置則應用解構資料框的處理方式。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
chicago_bulls
ordered_cols <- sort(colnames(chicago_bulls))
chicago_bulls <- chicago_bulls[, ordered_cols]
chicago_bulls
1
2
3
4
5
6
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                   
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University
## > ordered_cols <- sort(colnames(chicago_bulls))
## > chicago_bulls <- chicago_bulls[, ordered_cols]
## > chicago_bulls
##            Birth.Date                                            College   Ht No.          Player Pos  Wt
## 1        May 22, 1968 University of Houston, New Mexico State University  6-2   0     Randy Brown  PG 190
## 2       June 19, 1968                              University of Arizona  6-6  30    Jud Buechler  SF 220
## 3       June 12, 1973                              University of Alabama  6-8  35    Jason Caffey  PF 255
## 4   November 22, 1955                           University of Washington  7-0  53   James Edwards   C 225
## 5    January 27, 1964              University of California, Los Angeles 6-10  54      Jack Haley   C 240
## 6    January 20, 1964                                   Miami University  6-6   9      Ron Harper  PG 185
## 7   February 17, 1963                       University of North Carolina  6-6  23  Michael Jordan  SG 195
## 8  September 27, 1965                              University of Arizona  6-3  25      Steve Kerr  PG 175
## 9  September 18, 1968                                                    6-10   7      Toni Kukoc  SF 192
## 10   January 19, 1969                           University of New Mexico  7-2  13     Luc Longley   C 265
## 11 September 25, 1965                     University of Central Arkansas  6-8  33  Scottie Pippen  SF 210
## 12       May 13, 1961             Southeastern Oklahoma State University  6-7  91   Dennis Rodman  PF 210
## 13       May 16, 1964                    Georgia Institute of Technology 6-11  22     John Salley  PF 230
## 14      April 6, 1972                                 Providence College  6-9   8 Dickey Simpkins  PF 248
## 15     April 26, 1963                              St. John's University  7-0  34 Bill Wennington   C 245
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

排序資料框

利用 order() 函數取得依據資料大小(英文文字順序)排列所得的列索引值來應用解構資料框的處理方式進行排序。

csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
# unsorted
chicago_bulls
ordered_rows <- order(chicago_bulls$Pos)
ordered_rows
chicago_bulls <- chicago_bulls[ordered_rows, ]
# sorted by Position
chicago_bulls
1
2
3
4
5
6
7
8
9
## > csv_url <- "https://s3-ap-northeast-1.amazonaws.com/r-essentials/chicago_bulls_1995_1996.csv"
## > chicago_bulls <- read.csv(csv_url, stringsAsFactors = FALSE)
## > # unsorted
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                   
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University
## > ordered_rows <- order(chicago_bulls$Pos)
## > ordered_rows
##  [1]  4  5 10 15  3 12 13 14  1  6  8  2  9 11  7
## > chicago_bulls <- chicago_bulls[ordered_rows, ]
## > # sorted by Position
## > chicago_bulls
##    No.          Player Pos   Ht  Wt         Birth.Date                                            College
## 4   53   James Edwards   C  7-0 225  November 22, 1955                           University of Washington
## 5   54      Jack Haley   C 6-10 240   January 27, 1964              University of California, Los Angeles
## 10  13     Luc Longley   C  7-2 265   January 19, 1969                           University of New Mexico
## 15  34 Bill Wennington   C  7-0 245     April 26, 1963                              St. John's University
## 3   35    Jason Caffey  PF  6-8 255      June 12, 1973                              University of Alabama
## 12  91   Dennis Rodman  PF  6-7 210       May 13, 1961             Southeastern Oklahoma State University
## 13  22     John Salley  PF 6-11 230       May 16, 1964                    Georgia Institute of Technology
## 14   8 Dickey Simpkins  PF  6-9 248      April 6, 1972                                 Providence College
## 1    0     Randy Brown  PG  6-2 190       May 22, 1968 University of Houston, New Mexico State University
## 6    9      Ron Harper  PG  6-6 185   January 20, 1964                                   Miami University
## 8   25      Steve Kerr  PG  6-3 175 September 27, 1965                              University of Arizona
## 2   30    Jud Buechler  SF  6-6 220      June 19, 1968                              University of Arizona
## 9    7      Toni Kukoc  SF 6-10 192 September 18, 1968                                                   
## 11  33  Scottie Pippen  SF  6-8 210 September 25, 1965                     University of Central Arkansas
## 7   23  Michael Jordan  SG  6-6 195  February 17, 1963                       University of North Carolina
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

小結

在這個小節中我們簡介 R 語言的基礎資料框處理技法,包含如何觀察資料框的維度與外觀、查詢資料框的詳細資訊、解構資料框、調整變數以及排序資料框。

延伸閱讀