How to select/delete specific columns using R STUDIO?
It would be helpful if you read the below post before starting!!
□ Data filtering using R Studio
I’ll generate one data.
name=c("Jack","Kate","John","Jane","David","Min","Hyuk","Jisoo")
math=c(90,85,95,75,80,90,90,85)
eng=c(85,90,90,88,95,85,87,88)
country=c("USA","Spain","France","Germany","Netherlands", rep("Korea",3))
gender=c(rep(c("Male","Female"),times=4))
enroll=c(rep(c("Yes","No"),each=4))
grade=data.frame(name,math,eng,country,gender,enroll)
grade
name math eng country gender enroll
1 Jack 90 85 USA Male Yes
2 Kate 85 90 Spain Female Yes
3 John 95 90 France Male Yes
4 Jane 75 88 Germany Female Yes
5 David 80 95 Netherlands Male No
6 Min 90 85 Korea Female No
7 Hyuk 90 87 Korea Male No
8 Jisoo 85 88 Korea Female No
Let’s say this is a math and English score for 8 students from different countries.
Let’s do several things with this data.
■ How to delete certain column?
I’d like to delete math column. I use the below code.
grade2=subset(grade, select=-math)
grade2
name eng country gender enroll
1 Jack 85 USA Male Yes
2 Kate 90 Spain Female Yes
3 John 90 France Male Yes
4 Jane 88 Germany Female Yes
5 David 95 Netherlands Male No
6 Min 85 Korea Female No
7 Hyuk 87 Korea Male No
8 Jisoo 88 Korea Female No
In case I want to delete both math and eng columns, I use the below code.
grade3=subset(grade,select=c(-math,-eng))
grade3
name country gender enroll
1 Jack USA Male Yes
2 Kate Spain Female Yes
3 John France Male Yes
4 Jane Germany Female Yes
5 David Netherlands Male No
6 Min Korea Female No
7 Hyuk Korea Male No
8 Jisoo Korea Female No
Without using subset()
, we can delete columns using below code.
variable name [-row number, - column number]
For example, If I write a code like grade [, -2]
which means I want to delete the 2nd column. In the same way, if I write a code like grade [-2,]
which means I want to delete the 2nd row.
grade4=grade[,-2]
grade4
name eng country gender enroll
1 Jack 85 USA Male Yes
2 Kate 90 Spain Female Yes
3 John 90 France Male Yes
4 Jane 88 Germany Female Yes
5 David 95 Netherlands Male No
6 Min 85 Korea Female No
7 Hyuk 87 Korea Male No
8 Jisoo 88 Korea Female No
How about delecting both 2nd and 3rd column? The code is below.
grade5=grade[,c(-2,-3)]
grade5
name country gender enroll
1 Jack USA Male Yes
2 Kate Spain Female Yes
3 John France Male Yes
4 Jane Germany Female Yes
5 David Netherlands Male No
6 Min Korea Female No
7 Hyuk Korea Male No
8 Jisoo Korea Female No
Using dplyr()
pacakge
if (require("dplyr") == F) install.packages("dplyr")
library(dplyr)
grade_1=grade %>%
dplyr::select(-math,-eng)
grade_1
name country gender enroll
1 Jack USA Male Yes
2 Kate Spain Female Yes
3 John France Male Yes
4 Jane Germany Female Yes
5 David Netherlands Male No
6 Min Korea Female No
7 Hyuk Korea Male No
8 Jisoo Korea Female No
■ How to select certain column?
Now, I’ll explain how to select certain columns. Now, I’d like to select name, math and country columns.
grade6=subset(grade,select=c(name,math,country))
grade6
name math country
1 Jack 90 USA
2 Kate 85 Spain
3 John 95 France
4 Jane 75 Germany
5 David 80 Netherlands
6 Min 90 Korea
7 Hyuk 90 Korea
8 Jisoo 85 Korea
or this code will be also possible.
grade7=grade[,c(1,2,4)]
grade7
name math country
1 Jack 90 USA
2 Kate 85 Spain
3 John 95 France
4 Jane 75 Germany
5 David 80 Netherlands
6 Min 90 Korea
7 Hyuk 90 Korea
8 Jisoo 85 Korea
Using dplyr()
pacakge
if (require("dplyr") == F) install.packages("dplyr")
library(dplyr)
grade_2=grade %>%
dplyr::select(name,math,country)
grade_2
name math country
1 Jack 90 USA
2 Kate 85 Spain
3 John 95 France
4 Jane 75 Germany
5 David 80 Netherlands
6 Min 90 Korea
7 Hyuk 90 Korea
8 Jisoo 85 Korea