Stacking Data Vertically from Multiple Columns in R Studio (feat. reshape package)

Stacking Data Vertically from Multiple Columns in R Studio (feat. reshape package)


Previously, I posted about how to change the data structure in the following scenario.


Combining Factors from Separate Datasets into a Single Column Using R Studio (feat. dplyr package)



This time, I will introduce a method for changing the structure of data as shown below. Specifically, this is about cases where there are multiple columns within a single dataset, not two different datasets.

I will create a simple dataset for illustration purposes.

Genotype= c ("CV1","CV2","CV3","CV4","CV5")
dataA= data.frame (Genotype, Experiment=c(rep("Defoliation",5), rep("De_graining",5)),
                   Field1=c('30','31','32','34','35','33','34','31','33','32'),
                   Field2=c('30','32','31','34','31','35','36','31','34','31'),
                   Field3=c('32','31','32','33','34','33','35','31','32','33'))

dataA
   Genotype  Experiment Field1 Field2 Field3
1       CV1 Defoliation     30     30     32
2       CV2 Defoliation     31     32     31
3       CV3 Defoliation     32     31     32
4       CV4 Defoliation     34     34     33
5       CV5 Defoliation     35     31     34
6       CV1 De_graining     33     35     33
7       CV2 De_graining     34     36     35
8       CV3 De_graining     31     31     31
9       CV4 De_graining     33     34     32
10      CV5 De_graining     32     31     33

This is a method for taking input values that are divided into several columns in a structure like the one shown above and placing them into a single column. To achieve this, you can install the ‘reshape2’ package and use the reshape2::melt() function.

install.packages("reshape")
library(reshape)
dataB= reshape::melt(dataA, id.vars=c("Genotype", "Experiment"))

dataB
   Genotype  Experiment variable value
1       CV1 Defoliation   Field1    30
2       CV2 Defoliation   Field1    31
3       CV3 Defoliation   Field1    32
4       CV4 Defoliation   Field1    34
5       CV5 Defoliation   Field1    35
6       CV1 De_graining   Field1    33
7       CV2 De_graining   Field1    34
8       CV3 De_graining   Field1    31
9       CV4 De_graining   Field1    33
10      CV5 De_graining   Field1    32
11      CV1 Defoliation   Field2    30
12      CV2 Defoliation   Field2    32
13      CV3 Defoliation   Field2    31
14      CV4 Defoliation   Field2    34
15      CV5 Defoliation   Field2    31
16      CV1 De_graining   Field2    35
17      CV2 De_graining   Field2    36
18      CV3 De_graining   Field2    31
19      CV4 De_graining   Field2    34
20      CV5 De_graining   Field2    31
21      CV1 Defoliation   Field3    32
22      CV2 Defoliation   Field3    31
23      CV3 Defoliation   Field3    32
24      CV4 Defoliation   Field3    33
25      CV5 Defoliation   Field3    34
26      CV1 De_graining   Field3    33
27      CV2 De_graining   Field3    35
28      CV3 De_graining   Field3    31
29      CV4 De_graining   Field3    32
30      CV5 De_graining   Field3    33

I will change the column names and column texts of the above data to create a clearer data structure.

colnames(dataB)[3]= c("Field")
colnames(dataB)[4]= c("Yield (kg)")

dataB
   Genotype  Experiment Field  Yield (kg)
1       CV1 Defoliation Field1         30
2       CV2 Defoliation Field1         31
3       CV3 Defoliation Field1         32
4       CV4 Defoliation Field1         34
5       CV5 Defoliation Field1         35
6       CV1 De_graining Field1         33
7       CV2 De_graining Field1         34
8       CV3 De_graining Field1         31
9       CV4 De_graining Field1         33
10      CV5 De_graining Field1         32
11      CV1 Defoliation Field2         30
12      CV2 Defoliation Field2         32
13      CV3 Defoliation Field2         31
14      CV4 Defoliation Field2         34
15      CV5 Defoliation Field2         31
16      CV1 De_graining Field2         35
17      CV2 De_graining Field2         36
18      CV3 De_graining Field2         31
19      CV4 De_graining Field2         34
20      CV5 De_graining Field2         31
21      CV1 Defoliation Field3         32
22      CV2 Defoliation Field3         31
23      CV3 Defoliation Field3         32
24      CV4 Defoliation Field3         33
25      CV5 Defoliation Field3         34
26      CV1 De_graining Field3         33
27      CV2 De_graining Field3         35
28      CV3 De_graining Field3         31
29      CV4 De_graining Field3         32
30      CV5 De_graining Field3         33

The column names have been changed like this. Next, I will also modify the text within the columns.

dataB$Field[dataB$Field=="Field1"]= "Spain"
dataB$Field[dataB$Field=="Field2"]= "Korea"
dataB$Field[dataB$Field=="Field3"]= "Germany"

When I typically use the above code to change the text names within the columns, I encounter an error along with the following message.

Warning message:
In [<-.factor(*tmp*, Data_B$Field == "Field1", value = c(NA, :
invalid factor level, NA generated

So, I will attempt to change the text names within the columns using the stringr package.

install.packages ("stringr")
library (stringr)
dataB$Field= str_replace_all (dataB$Field, 'Field1', 'Spain')
dataB$Field= str_replace_all (dataB$Field, 'Field2', 'Korea')
dataB$Field= str_replace_all (dataB$Field, 'Field3', 'Germany')

dataB
   Genotype  Experiment   Field Yield (kg)
1       CV1 Defoliation   Spain         30
2       CV2 Defoliation   Spain         31
3       CV3 Defoliation   Spain         32
4       CV4 Defoliation   Spain         34
5       CV5 Defoliation   Spain         35
6       CV1 De_graining   Spain         33
7       CV2 De_graining   Spain         34
8       CV3 De_graining   Spain         31
9       CV4 De_graining   Spain         33
10      CV5 De_graining   Spain         32
11      CV1 Defoliation   Korea         30
12      CV2 Defoliation   Korea         32
13      CV3 Defoliation   Korea         31
14      CV4 Defoliation   Korea         34
15      CV5 Defoliation   Korea         31
16      CV1 De_graining   Korea         35
17      CV2 De_graining   Korea         36
18      CV3 De_graining   Korea         31
19      CV4 De_graining   Korea         34
20      CV5 De_graining   Korea         31
21      CV1 Defoliation Germany         32
22      CV2 Defoliation Germany         31
23      CV3 Defoliation Germany         32
24      CV4 Defoliation Germany         33
25      CV5 Defoliation Germany         34
26      CV1 De_graining Germany         33
27      CV2 De_graining Germany         35
28      CV3 De_graining Germany         31
29      CV4 De_graining Germany         32
30      CV5 De_graining Germany         33

The following post is a similar method to this one and also introduces how to transpose data from columns to rows using Power Query in Excel.


Transforming Data: Stacking Multiple Columns into Rows Using R



Comments are closed.