Stacking Data Vertically from Multiple Columns in R Studio (feat. reshape package)
Previously, I posted about how to change the data structure in the following scenario.
□ Combining Factors from Separate Datasets into a Single Column Using R Studio (feat. dplyr package)
This time, I will introduce a method for changing the structure of data as shown below. Specifically, this is about cases where there are multiple columns within a single dataset, not two different datasets.
I will create a simple dataset for illustration purposes.
Genotype= c ("CV1","CV2","CV3","CV4","CV5")
dataA= data.frame (Genotype, Experiment=c(rep("Defoliation",5), rep("De_graining",5)),
Field1=c('30','31','32','34','35','33','34','31','33','32'),
Field2=c('30','32','31','34','31','35','36','31','34','31'),
Field3=c('32','31','32','33','34','33','35','31','32','33'))
dataA
Genotype Experiment Field1 Field2 Field3
1 CV1 Defoliation 30 30 32
2 CV2 Defoliation 31 32 31
3 CV3 Defoliation 32 31 32
4 CV4 Defoliation 34 34 33
5 CV5 Defoliation 35 31 34
6 CV1 De_graining 33 35 33
7 CV2 De_graining 34 36 35
8 CV3 De_graining 31 31 31
9 CV4 De_graining 33 34 32
10 CV5 De_graining 32 31 33
This is a method for taking input values that are divided into several columns in a structure like the one shown above and placing them into a single column. To achieve this, you can install the ‘reshape2’ package and use the reshape2::melt()
function.
install.packages("reshape")
library(reshape)
dataB= reshape::melt(dataA, id.vars=c("Genotype", "Experiment"))
dataB
Genotype Experiment variable value
1 CV1 Defoliation Field1 30
2 CV2 Defoliation Field1 31
3 CV3 Defoliation Field1 32
4 CV4 Defoliation Field1 34
5 CV5 Defoliation Field1 35
6 CV1 De_graining Field1 33
7 CV2 De_graining Field1 34
8 CV3 De_graining Field1 31
9 CV4 De_graining Field1 33
10 CV5 De_graining Field1 32
11 CV1 Defoliation Field2 30
12 CV2 Defoliation Field2 32
13 CV3 Defoliation Field2 31
14 CV4 Defoliation Field2 34
15 CV5 Defoliation Field2 31
16 CV1 De_graining Field2 35
17 CV2 De_graining Field2 36
18 CV3 De_graining Field2 31
19 CV4 De_graining Field2 34
20 CV5 De_graining Field2 31
21 CV1 Defoliation Field3 32
22 CV2 Defoliation Field3 31
23 CV3 Defoliation Field3 32
24 CV4 Defoliation Field3 33
25 CV5 Defoliation Field3 34
26 CV1 De_graining Field3 33
27 CV2 De_graining Field3 35
28 CV3 De_graining Field3 31
29 CV4 De_graining Field3 32
30 CV5 De_graining Field3 33
I will change the column names and column texts of the above data to create a clearer data structure.
colnames(dataB)[3]= c("Field")
colnames(dataB)[4]= c("Yield (kg)")
dataB
Genotype Experiment Field Yield (kg)
1 CV1 Defoliation Field1 30
2 CV2 Defoliation Field1 31
3 CV3 Defoliation Field1 32
4 CV4 Defoliation Field1 34
5 CV5 Defoliation Field1 35
6 CV1 De_graining Field1 33
7 CV2 De_graining Field1 34
8 CV3 De_graining Field1 31
9 CV4 De_graining Field1 33
10 CV5 De_graining Field1 32
11 CV1 Defoliation Field2 30
12 CV2 Defoliation Field2 32
13 CV3 Defoliation Field2 31
14 CV4 Defoliation Field2 34
15 CV5 Defoliation Field2 31
16 CV1 De_graining Field2 35
17 CV2 De_graining Field2 36
18 CV3 De_graining Field2 31
19 CV4 De_graining Field2 34
20 CV5 De_graining Field2 31
21 CV1 Defoliation Field3 32
22 CV2 Defoliation Field3 31
23 CV3 Defoliation Field3 32
24 CV4 Defoliation Field3 33
25 CV5 Defoliation Field3 34
26 CV1 De_graining Field3 33
27 CV2 De_graining Field3 35
28 CV3 De_graining Field3 31
29 CV4 De_graining Field3 32
30 CV5 De_graining Field3 33
The column names have been changed like this. Next, I will also modify the text within the columns.
dataB$Field[dataB$Field=="Field1"]= "Spain"
dataB$Field[dataB$Field=="Field2"]= "Korea"
dataB$Field[dataB$Field=="Field3"]= "Germany"
When I typically use the above code to change the text names within the columns, I encounter an error along with the following message.
Warning message:
In [<-.factor
(*tmp*
, Data_B$Field == "Field1", value = c(NA, :
invalid factor level, NA generated
So, I will attempt to change the text names within the columns using the stringr package.
install.packages ("stringr")
library (stringr)
dataB$Field= str_replace_all (dataB$Field, 'Field1', 'Spain')
dataB$Field= str_replace_all (dataB$Field, 'Field2', 'Korea')
dataB$Field= str_replace_all (dataB$Field, 'Field3', 'Germany')
dataB
Genotype Experiment Field Yield (kg)
1 CV1 Defoliation Spain 30
2 CV2 Defoliation Spain 31
3 CV3 Defoliation Spain 32
4 CV4 Defoliation Spain 34
5 CV5 Defoliation Spain 35
6 CV1 De_graining Spain 33
7 CV2 De_graining Spain 34
8 CV3 De_graining Spain 31
9 CV4 De_graining Spain 33
10 CV5 De_graining Spain 32
11 CV1 Defoliation Korea 30
12 CV2 Defoliation Korea 32
13 CV3 Defoliation Korea 31
14 CV4 Defoliation Korea 34
15 CV5 Defoliation Korea 31
16 CV1 De_graining Korea 35
17 CV2 De_graining Korea 36
18 CV3 De_graining Korea 31
19 CV4 De_graining Korea 34
20 CV5 De_graining Korea 31
21 CV1 Defoliation Germany 32
22 CV2 Defoliation Germany 31
23 CV3 Defoliation Germany 32
24 CV4 Defoliation Germany 33
25 CV5 Defoliation Germany 34
26 CV1 De_graining Germany 33
27 CV2 De_graining Germany 35
28 CV3 De_graining Germany 31
29 CV4 De_graining Germany 32
30 CV5 De_graining Germany 33
The following post is a similar method to this one and also introduces how to transpose data from columns to rows using Power Query in Excel.