4.2 dplyr 패키지에 있는 함수들

예제로 사용할 데이터는 MASS 패키지에 들어있는 Cars93 데이터프레임입니다.

원래는 93개의 자동차 관측치에 27개의 변수를 가지고 있는데, 예시들기 편하도록 앞에서부터 변수 8개만 선택해서 사용하겠다. (Cars93_1 dataframe)

4.2.1 패키지 설치

library(dplyr)
library(MASS)

4.2.2 예제 데이터 확인

head(MASS::Cars93)
##   Manufacturer   Model    Type Min.Price Price Max.Price MPG.city MPG.highway
## 1        Acura Integra   Small      12.9  15.9      18.8       25          31
## 2        Acura  Legend Midsize      29.2  33.9      38.7       18          25
## 3         Audi      90 Compact      25.9  29.1      32.3       20          26
## 4         Audi     100 Midsize      30.8  37.7      44.6       19          26
## 5          BMW    535i Midsize      23.7  30.0      36.2       22          30
## 6        Buick Century Midsize      14.2  15.7      17.3       22          31
##              AirBags DriveTrain Cylinders EngineSize Horsepower  RPM
## 1               None      Front         4        1.8        140 6300
## 2 Driver & Passenger      Front         6        3.2        200 5500
## 3        Driver only      Front         6        2.8        172 5500
## 4 Driver & Passenger      Front         6        2.8        172 5500
## 5        Driver only       Rear         4        3.5        208 5700
## 6        Driver only      Front         4        2.2        110 5200
##   Rev.per.mile Man.trans.avail Fuel.tank.capacity Passengers Length Wheelbase
## 1         2890             Yes               13.2          5    177       102
## 2         2335             Yes               18.0          5    195       115
## 3         2280             Yes               16.9          5    180       102
## 4         2535             Yes               21.1          6    193       106
## 5         2545             Yes               21.1          4    186       109
## 6         2565              No               16.4          6    189       105
##   Width Turn.circle Rear.seat.room Luggage.room Weight  Origin          Make
## 1    68          37           26.5           11   2705 non-USA Acura Integra
## 2    71          38           30.0           15   3560 non-USA  Acura Legend
## 3    67          37           28.0           14   3375 non-USA       Audi 90
## 4    70          37           31.0           17   3405 non-USA      Audi 100
## 5    69          39           27.0           13   3640 non-USA      BMW 535i
## 6    69          41           28.0           16   2880     USA Buick Century
  • 관측치 53,940개, 변수 10개로 이루어진 데이터임을 알 수 있다.

이 외에도 데이터를 확인하는 다양한 함수들은 다음과 같은 것들이 있다.

# Cars93 요약정보 확인
summary(Cars93)
##     Manufacturer     Model         Type      Min.Price         Price      
##  Chevrolet: 8    100    : 1   Compact:16   Min.   : 6.70   Min.   : 7.40  
##  Ford     : 8    190E   : 1   Large  :11   1st Qu.:10.80   1st Qu.:12.20  
##  Dodge    : 6    240    : 1   Midsize:22   Median :14.70   Median :17.70  
##  Mazda    : 5    300E   : 1   Small  :21   Mean   :17.13   Mean   :19.51  
##  Pontiac  : 5    323    : 1   Sporty :14   3rd Qu.:20.30   3rd Qu.:23.30  
##  Buick    : 4    535i   : 1   Van    : 9   Max.   :45.40   Max.   :61.90  
##  (Other)  :57    (Other):87                                               
##    Max.Price       MPG.city      MPG.highway                  AirBags  
##  Min.   : 7.9   Min.   :15.00   Min.   :20.00   Driver & Passenger:16  
##  1st Qu.:14.7   1st Qu.:18.00   1st Qu.:26.00   Driver only       :43  
##  Median :19.6   Median :21.00   Median :28.00   None              :34  
##  Mean   :21.9   Mean   :22.37   Mean   :29.09                          
##  3rd Qu.:25.3   3rd Qu.:25.00   3rd Qu.:31.00                          
##  Max.   :80.0   Max.   :46.00   Max.   :50.00                          
##                                                                        
##  DriveTrain  Cylinders    EngineSize      Horsepower         RPM      
##  4WD  :10   3     : 3   Min.   :1.000   Min.   : 55.0   Min.   :3800  
##  Front:67   4     :49   1st Qu.:1.800   1st Qu.:103.0   1st Qu.:4800  
##  Rear :16   5     : 2   Median :2.400   Median :140.0   Median :5200  
##             6     :31   Mean   :2.668   Mean   :143.8   Mean   :5281  
##             8     : 7   3rd Qu.:3.300   3rd Qu.:170.0   3rd Qu.:5750  
##             rotary: 1   Max.   :5.700   Max.   :300.0   Max.   :6500  
##                                                                       
##   Rev.per.mile  Man.trans.avail Fuel.tank.capacity   Passengers   
##  Min.   :1320   No :32          Min.   : 9.20      Min.   :2.000  
##  1st Qu.:1985   Yes:61          1st Qu.:14.50      1st Qu.:4.000  
##  Median :2340                   Median :16.40      Median :5.000  
##  Mean   :2332                   Mean   :16.66      Mean   :5.086  
##  3rd Qu.:2565                   3rd Qu.:18.80      3rd Qu.:6.000  
##  Max.   :3755                   Max.   :27.00      Max.   :8.000  
##                                                                   
##      Length        Wheelbase         Width        Turn.circle   
##  Min.   :141.0   Min.   : 90.0   Min.   :60.00   Min.   :32.00  
##  1st Qu.:174.0   1st Qu.: 98.0   1st Qu.:67.00   1st Qu.:37.00  
##  Median :183.0   Median :103.0   Median :69.00   Median :39.00  
##  Mean   :183.2   Mean   :103.9   Mean   :69.38   Mean   :38.96  
##  3rd Qu.:192.0   3rd Qu.:110.0   3rd Qu.:72.00   3rd Qu.:41.00  
##  Max.   :219.0   Max.   :119.0   Max.   :78.00   Max.   :45.00  
##                                                                 
##  Rear.seat.room   Luggage.room       Weight         Origin              Make   
##  Min.   :19.00   Min.   : 6.00   Min.   :1695   USA    :48   Acura Integra: 1  
##  1st Qu.:26.00   1st Qu.:12.00   1st Qu.:2620   non-USA:45   Acura Legend : 1  
##  Median :27.50   Median :14.00   Median :3040                Audi 100     : 1  
##  Mean   :27.83   Mean   :13.89   Mean   :3073                Audi 90      : 1  
##  3rd Qu.:30.00   3rd Qu.:15.00   3rd Qu.:3525                BMW 535i     : 1  
##  Max.   :36.00   Max.   :22.00   Max.   :4105                Buick Century: 1  
##  NA's   :2       NA's   :11                                  (Other)      :87
DT::datatable(Cars93)
str(Cars93)
## 'data.frame':    93 obs. of  27 variables:
##  $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
##  $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
##  $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
##  $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
##  $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
##  $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
##  $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
##  $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
##  $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
##  $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
##  $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
##  $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
##  $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
##  $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
##  $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
##  $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
##  $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
##  $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
##  $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
##  $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
##  $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
##  $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
##  $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
##  $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
##  $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
##  $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
##  $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
dplyr::glimpse(Cars93)
## Rows: 93
## Columns: 27
## $ Manufacturer       <fct> Acura, Acura, Audi, Audi, BMW, Buick, Buick, Bui...
## $ Model              <fct> Integra, Legend, 90, 100, 535i, Century, LeSabre...
## $ Type               <fct> Small, Midsize, Compact, Midsize, Midsize, Midsi...
## $ Min.Price          <dbl> 12.9, 29.2, 25.9, 30.8, 23.7, 14.2, 19.9, 22.6, ...
## $ Price              <dbl> 15.9, 33.9, 29.1, 37.7, 30.0, 15.7, 20.8, 23.7, ...
## $ Max.Price          <dbl> 18.8, 38.7, 32.3, 44.6, 36.2, 17.3, 21.7, 24.9, ...
## $ MPG.city           <int> 25, 18, 20, 19, 22, 22, 19, 16, 19, 16, 16, 25, ...
## $ MPG.highway        <int> 31, 25, 26, 26, 30, 31, 28, 25, 27, 25, 25, 36, ...
## $ AirBags            <fct> None, Driver & Passenger, Driver only, Driver & ...
## $ DriveTrain         <fct> Front, Front, Front, Front, Rear, Front, Front, ...
## $ Cylinders          <fct> 4, 6, 6, 6, 4, 4, 6, 6, 6, 8, 8, 4, 4, 6, 4, 6, ...
## $ EngineSize         <dbl> 1.8, 3.2, 2.8, 2.8, 3.5, 2.2, 3.8, 5.7, 3.8, 4.9...
## $ Horsepower         <int> 140, 200, 172, 172, 208, 110, 170, 180, 170, 200...
## $ RPM                <int> 6300, 5500, 5500, 5500, 5700, 5200, 4800, 4000, ...
## $ Rev.per.mile       <int> 2890, 2335, 2280, 2535, 2545, 2565, 1570, 1320, ...
## $ Man.trans.avail    <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, No, No, No,...
## $ Fuel.tank.capacity <dbl> 13.2, 18.0, 16.9, 21.1, 21.1, 16.4, 18.0, 23.0, ...
## $ Passengers         <int> 5, 5, 5, 6, 4, 6, 6, 6, 5, 6, 5, 5, 5, 4, 6, 7, ...
## $ Length             <int> 177, 195, 180, 193, 186, 189, 200, 216, 198, 206...
## $ Wheelbase          <int> 102, 115, 102, 106, 109, 105, 111, 116, 108, 114...
## $ Width              <int> 68, 71, 67, 70, 69, 69, 74, 78, 73, 73, 74, 66, ...
## $ Turn.circle        <int> 37, 38, 37, 37, 39, 41, 42, 45, 41, 43, 44, 38, ...
## $ Rear.seat.room     <dbl> 26.5, 30.0, 28.0, 31.0, 27.0, 28.0, 30.5, 30.5, ...
## $ Luggage.room       <int> 11, 15, 14, 17, 13, 16, 17, 21, 14, 18, 14, 13, ...
## $ Weight             <int> 2705, 3560, 3375, 3405, 3640, 2880, 3470, 4105, ...
## $ Origin             <fct> non-USA, non-USA, non-USA, non-USA, non-USA, USA...
## $ Make               <fct> Acura Integra, Acura Legend, Audi 90, Audi 100, ...
# subset Cars93 
Cars93_1 <- Cars93[, 1:8] 
str(Cars93_1)
## 'data.frame':    93 obs. of  8 variables:
##  $ Manufacturer: Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
##  $ Model       : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
##  $ Type        : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
##  $ Min.Price   : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
##  $ Price       : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
##  $ Max.Price   : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
##  $ MPG.city    : int  25 18 20 19 22 22 19 16 19 16 ...
##  $ MPG.highway : int  31 25 26 26 30 31 28 25 27 25 ...

Cars93_1 데이터 프레임에 대하여 str(Cars93_1)으로 데이터 구조를 확인해 본다.

컬럼(변수) 갯수, 컬럼(변수) 명, 관찰치 개수, 관찰치 미리보기 등을 확인해 보면 다음과 같다.

  • 데이터 구조 : 'data.frame' :
  • 컬럼(변수) 갯수 : 8 variables
  • 컬럼(변수) 명 : $ Manufacturer, $ Model, $ Type, $ Min.Price, Price, Max.Price, MPG.City,MPG.highway 등 8개 컬럼(변수)의 이름
  • 관찰치 개수 : 93 obs.
  • 관찰치 미리보기 : 각 컬럼별 관찰치의 데이터 타입과 실제 데이터를 보여준다.
    • $ Manufacturer: Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
    • $ Model : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
    • $ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
    • $ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
    • $ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
    • $ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
    • $ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ...
    • $ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ...

실제 데이터(관찰치)의 내용은 View(Cars93_1)로 확인할 수 있다.

View(Cars93_1)
  • 8개컬럼(변수)
  • 93개행(관찰치)

4.2.3 dplyr 패키지의 주요 함수 목록

단일 테이블을 대상으로 하는 dplyr 패키지의 함수들(Single table verbs)을 표로 정리해보면 아래와 같습니다.

dplyr verbs description similar {package} function
filter() Filter rows with condition {base} subset
slice() Filter rows with position {base} subset
arrange() Re-order or arrange rows {base} order
select() Select columns {base} subset
select(df, starts_with()) Select columns that start with a prefix
select(df, ends_with()) Select columns that end with a prefix
select(df, contains()) Select columns that contain a character string
select(df, matchs()) Select columns that match a regular expression
select(df, one_of()) Select columns that are from a group of names
select(df, num_range()) Select columns from num_range a to n with a prefix
rename() Rename column name {reshape} rename
distinct() Extract distinct(unique) rows {base} unique
sample_n() Random sample rows for a fixed number {base} sample
sample_frac() Random sample rows for a fixed fraction {base} sample
mutate() Create(add) new columns. mutate() allows you to refer to columns that you’ve just created. {base} transform
transmute() Create(add) new columns. transmute() only keeps the new columns. {base} transform
summarise() Summarise values {base} summary