Tidy data
Tidy data (as understood and used in the
tidyverse):
- Each variable in a column
- Each observation in a row
- Each value is a cell
The concept of tidy data is related to that of a relational
database (and by extension to Codd's
relational algebra). The properties of tidy data are perfectly accomodated by R because of its vectorized programming paradigm.
Tidy data was popularized by
Hadley Wickham (see
his paper).
tidyr
aims at helping to create tidy data.
Normalize data with spread()
spread()
can be used to
normalize data (that is: to create a
pivot, aka tidy data):
library(tidyr)
df <- read.table(
header = TRUE,
text =
'fruit period K V
Apple Q1 items-sold 501
Apple Q1 price 1.09
Apple Q2 items-sold 873
Apple Q2 price 0.97
Apple Q3 items-sold 724
Apple Q3 price 0.81
Apple Q4 items-sold 619
Apple Q4 price 0.55
Banana Q1 items-sold 109
Banana Q1 price 2.11
Banana Q2 items-sold 187
Banana Q2 price 2.08
Banana Q3 items-sold 179
Banana Q3 price 1.94
Banana Q4 items-sold 155
Banana Q4 price 2.01
Cherry Q1 items-sold 58
Cherry Q1 price 3.55
Cherry Q2 items-sold 218
Cherry Q2 price 3.07
Cherry Q3 items-sold 209
Cherry Q3 price 2.88
Cherry Q4 items-sold 74
Cherry Q4 price 3.00'
)
normalized <- spread(df, key = K, value = V)
normalized
#
# fruit period items-sold price
# 1 Apple Q1 501 1.09
# 2 Apple Q2 873 0.97
# 3 Apple Q3 724 0.81
# 4 Apple Q4 619 0.55
# 5 Banana Q1 109 2.11
# 6 Banana Q2 187 2.08
# 7 Banana Q3 179 1.94
# 8 Banana Q4 155 2.01
# 9 Cherry Q1 58 3.55
# 10 Cherry Q2 218 3.07
# 11 Cherry Q3 209 2.88
# 12 Cherry Q4 74 3.00
#
Un-pivot data with gather()
gather()
un-pivots data:
library(tidyr)
df <- read.table(
header = TRUE,
text =
'fruit items_sold_Q1 items_sold_Q2 items_sold_Q3 items_sold_Q4
Apple 501 873 724 619
Banana 109 187 179 155
Cherry 58 218 209 74'
)
df %>% gather(
key = 'period',
value = 'items_sold',
c('items_sold_Q1':'items_sold_Q4')
)
#
# fruit period items_sold
# 1 Apple items_sold_Q1 501
# 2 Banana items_sold_Q1 109
# 3 Cherry items_sold_Q1 58
# 4 Apple items_sold_Q2 873
# 5 Banana items_sold_Q2 187
# 6 Cherry items_sold_Q2 218
# 7 Apple items_sold_Q3 724
# 8 Banana items_sold_Q3 179
# 9 Cherry items_sold_Q3 209
# 10 Apple items_sold_Q4 619
# 11 Banana items_sold_Q4 155
# 12 Cherry items_sold_Q4 74
#