Equivalent Functions: IF function in Excel vs. ifelse() in R
When working with Excel, I believe you use the IF function from time to time, especially when categorizing values. The IF function is particularly useful for this purpose.
Here is one example.
=IF(AND(E2<1,E2>=0),"<1.0",IF(AND(E2>=1,E2<2),"1.0≤OM<2.0",IF(AND(E2>=2,E2<3),"2.0≤OM<3.0",IF(AND(E2>=3,E2<4),"3.0≤OM<4.0",IF(AND(E2>=4,E2<5),"4.0≤OM<5.0","OM≥5.0")))))
I want to categorize organic matter (%) by unit 1.0. This process involves converting numeric variables to categorical variables. To achieve this, I have used the IF function as shown above.
Then, you can categorize organic matter in the F column as shown above. Now, my next question is how to perform this process in R.
Let’s practice together. First, I’ll upload the same dataset as above.
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/organic_matter.csv"
dataA= data.frame(read_csv(url(github), show_col_types= FALSE))
dataA
Crop Block Treatment Yield_Mg_ha Organic_Matter_.
1 Wheat 1 Control 3.90 3.2
2 Wheat 2 Control 4.03 3.5
3 Wheat 3 Control 3.80 3.4
4 Wheat 1 TR1 2.80 1.2
5 Wheat 2 TR1 0.90 0.9
6 Wheat 3 TR1 1.10 4.5
7 Wheat 1 TR2 3.99 3.9
8 Wheat 2 TR2 4.03 4.0
9 Wheat 3 TR2 4.10 3.7
10 Wheat 1 TR3 3.99 3.6
11 Wheat 2 TR3 2.50 4.1
12 Wheat 3 TR3 4.09 4.7
13 Wheat 1 TR4 4.07 3.8
14 Wheat 2 TR4 4.13 3.4
15 Wheat 3 TR4 4.10 4.6
16 Wheat 1 TR5 4.09 3.8
17 Wheat 2 TR5 4.03 4.0
18 Wheat 3 TR5 3.90 4.1
19 Wheat 1 TR6 3.94 4.2
20 Wheat 2 TR6 4.06 4.4
21 Wheat 3 TR6 4.02 2.8
22 Wheat 1 TR7 3.98 4.8
23 Wheat 2 TR7 4.00 1.9
24 Wheat 3 TR7 3.96 5.3
25 Wheat 1 TR8 4.10 4.0
26 Wheat 2 TR8 4.07 4.6
27 Wheat 3 TR8 5.10 3.3
28 Wheat 1 TR9 5.50 5.1
29 Wheat 2 TR9 4.50 3.3
30 Wheat 3 TR9 4.90 3.7
Let’s perform a similar operation to the Excel IF function in R. I’ll use ifelse()
.
When uploading data to R, the column name ‘Organic_Matter_%’ was changed to ‘Organic_Matter_’—missing the ‘%.’ To avoid any text errors, I’ll further change the column name to ‘OM.’
Tip: When organizing data, it’s better to avoid using symbols like % and (), or spaces in column names, as they can lead to changes in text formatting.
colnames(dataA)[5]=c("OM")
Then, I’ll use the code below.
dataA$OM_Category= ifelse(dataA$OM < 1 & dataA$OM >= 0, "<1.0",
ifelse(dataA$OM >= 1.0 & dataA$OM < 2.0, "1.0≤OM<2.0",
ifelse(dataA$OM >= 2.0 & dataA$OM < 3.0, "2.0≤OM<3.0",
ifelse(dataA$OM >= 3.0 & dataA$OM < 4.0, "3.0≤OM<4.0",
ifelse(dataA$OM >= 4.0 & dataA$OM < 5.0, "4.0≤OM<5.0",
ifelse(dataA$OM >= 5.0 & dataA$OM < 6.0, "5.0≤OM<6.0",
ifelse(dataA$OM >= 6.0 & dataA$OM < 7.0, "6.0≤OM<7.0",
"OM≥7.0")))))))
Let’s check data again.
Crop Block Treatment Yield_Mg_ha OM OM_Category
1 Wheat 1 Control 3.90 3.2 3.0≤OM<4.0
2 Wheat 2 Control 4.03 3.5 3.0≤OM<4.0
3 Wheat 3 Control 3.80 3.4 3.0≤OM<4.0
4 Wheat 1 TR1 2.80 1.2 1.0≤OM<2.0
5 Wheat 2 TR1 0.90 0.9 <1.0
6 Wheat 3 TR1 1.10 4.5 4.0≤OM<5.0
7 Wheat 1 TR2 3.99 3.9 3.0≤OM<4.0
8 Wheat 2 TR2 4.03 4.0 4.0≤OM<5.0
9 Wheat 3 TR2 4.10 3.7 3.0≤OM<4.0
10 Wheat 1 TR3 3.99 3.6 3.0≤OM<4.0
11 Wheat 2 TR3 2.50 4.1 4.0≤OM<5.0
12 Wheat 3 TR3 4.09 4.7 4.0≤OM<5.0
13 Wheat 1 TR4 4.07 3.8 3.0≤OM<4.0
14 Wheat 2 TR4 4.13 3.4 3.0≤OM<4.0
15 Wheat 3 TR4 4.10 4.6 4.0≤OM<5.0
16 Wheat 1 TR5 4.09 3.8 3.0≤OM<4.0
17 Wheat 2 TR5 4.03 4.0 4.0≤OM<5.0
18 Wheat 3 TR5 3.90 4.1 4.0≤OM<5.0
19 Wheat 1 TR6 3.94 4.2 4.0≤OM<5.0
20 Wheat 2 TR6 4.06 4.4 4.0≤OM<5.0
21 Wheat 3 TR6 4.02 2.8 2.0≤OM<3.0
22 Wheat 1 TR7 3.98 4.8 4.0≤OM<5.0
23 Wheat 2 TR7 4.00 1.9 1.0≤OM<2.0
24 Wheat 3 TR7 3.96 5.3 5.0≤OM<6.0
25 Wheat 1 TR8 4.10 4.0 4.0≤OM<5.0
26 Wheat 2 TR8 4.07 4.6 4.0≤OM<5.0
27 Wheat 3 TR8 5.10 3.3 3.0≤OM<4.0
28 Wheat 1 TR9 5.50 5.1 5.0≤OM<6.0
29 Wheat 2 TR9 4.50 3.3 3.0≤OM<4.0
30 Wheat 3 TR9 4.90 3.7 3.0≤OM<4.0
We can obtain the exact same data as we did in Excel using the IF function.