Simplifying Data Manipulation: Transposing Columns into Rows with Ease
Sometimes, I see many people managing their data as columns like the example below. It seems convenient because we can see our data all at once. However, this data format is problematic for data analysis, which fundamentally relies on variables, namely independent and dependent variables.
Download data file (.csv) https://github.com/agronomy4future/raw_data_practice/blob/main/yield_per_location.csv
In the given data format, each level for the independent variable (i.e., location) was not combined in one column, and therefore we need to rearrange the data format. If the number of columns is small, copying and pasting to rows would be feasible. However, with hundreds of columns, it would be impractical to copy and paste them one by one.
Today, I will introduce the simplest and most efficient way to transpose column data into row data.
1) Power Query in Excel
Using the Query function, we can quickly transpose column data into row data. First, select the entire data you want to transpose by dragging over it. Then, choose Data
> From Table/Range
you will see it displayed. Then, click Ok
.
A new window, called Power Query Editor, will open up.
Let’s now select all of the location columns (from Location1 to Location12).
After selecting all of the location columns, click on Transform
> Unpivot Columns
Automatically, all of the column data will be transposed into rows.
We are currently in the Power Query Editor. To exit this window and apply the changes we’ve made to the original data, click on the Close & Load
button.
Great! The new data format is now ready.
2) Using the reshape2 package in R
Now, let’s explore how to use R to achieve the same result. First, we need to upload the Excel file.
install.packages("readr")
library (readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/yield_per_location.csv"
dataA=data.frame(read_csv(url(github),show_col_types = FALSE))
To transpose all of the column data into rows using the reshape2
package, we need to install the package first.
install.packages("reshape2")
library(reshape2)
and run the following code.
dataB=reshape2::melt(dataA, id.vars=c("Genotype", "Nitrogen", "Block"))
All of the locations have now been transposed into rows.
Next, I’ll change the column names.
colnames(dataB)[4]= c("Location")
colnames(dataB)[5]= c("Yield")
After that, I’ll export this data to an Excel file on my PC and save it with the name ‘Yield_data’.
install.packages("writexl")
library(writexl)
write_xlsx (dataB,"C:/Users/OneDrive/Desktop/Yield_data.xlsx")
# Please check the pathway in your computer
Let’s now compare the new data format generated by Power Query and R. They should be identical.
Starting now, let’s avoid manually copying and pasting data to transpose it from columns to rows.
Extra Tip!! When only one variable is present
If we have only one variable, we can simply transpose the columns to rows using the following code. For example, if the data is formatted as shown below:
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/yield_per_location.csv"
dataA=data.frame(read_csv(url(github),show_col_types = FALSE))
dataA=dataA[,c(-1,-2,-3)]
dataA
Location1 Location2 Location3 Location4 Location5 . . .
1 98.0 96.5 115.8 94.1 82.8
2 97.0 95.5 114.7 93.1 81.9
3 95.2 93.8 112.5 91.4 80.4
4 96.0 94.6 113.5 92.2 81.1
5 98.0 96.5 115.8 94.1 82.8
.
.
dataB= stack(dataA)
values ind
1 98.0 Location1
2 97.0 Location1
3 95.2 Location1
4 96.0 Location1
5 98.0 Location1
names(dataB)= c("Yield","Location")
Yield Location
1 98.0 Location1
2 97.0 Location1
3 95.2 Location1
4 96.0 Location1
5 98.0 Location1