Converting Character Values to Numeric in R: A How-To Guide
First, let’s create a dataset.
df= data.frame(
location= rep(c("Illinois", "Iowa"), each = 12),
fertilizer= rep(c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D"), times = 2),
block= rep(c("I", "II", "II"), times = 8),
yield= c('100', '99', '98', '125', "NA", '135', '140', '138', '128', '80', '87', '76', '130', "NA", '140', '145', '148', '152', "#DIV/0!", '158', '165', '80', '78', "NA"),
yield1= c(100, 99, 98, 125, NA, 135, 140, 138, 128, 80, 87, 76, 130, NA, 140, 145, 148, 152, NA, 158, 165, 80, 78, NA)
)
df
location fertilizer block yield yield1
Illinois A I 100 100
Illinois A II 99 99
Illinois A II 98 98
Illinois B I 125 125
Illinois B II NA NA
.
.
Iowa B II 152 152
Iowa C I #DIV/0! NA
and observe the different data formats of each value.
'data.frame': 24 obs. of 5 variables:
$ location : chr "Illinois" "Illinois" "Illinois" "Illinois" ...
$ fertilizer: chr "A" "A" "A" "B" ...
$ block : chr "I" "II" "II" "I" ...
$ yield : chr "100" "99" "98" "125" ...
$ yield1 : num 100 99 98 125 NA 135 140 138 128 80 ...
I have two sets of yield data: one in character format (yield
column) and the other in numeric format (yield1
column).
How to convert missing value to 0 when data is numeric?
When data is numeric (yield1
column), and if there are missing values, how can we replace it to 0?
library(dplyr)
dataA= df %>%
dplyr::mutate(yield_corrected= if_else(is.na(yield1), 0, as.numeric(yield1)))
dataA
location fertilizer block yield yield1 yield_corrected
Illinois A I 100 100 100
Illinois A II 99 99 99
Illinois A II 98 98 98
Illinois B I 125 125 125
Illinois B II NA NA 0
.
.
Iowa B II 152 152 152
Iowa C I #DIV/0! NA 0
or you can also use the following code.
dataA1 = df %>%
dplyr:mutate(yield1=coalesce(yield1, 0))
How to convert missing values to 0 when the data is character?
Here is a problem: sometimes, data appears to be numerical, but its actual format is character (as seen in the yield1
column). This often occurs when downloading data from websites. In such cases, the above code may not be applicable.
dataA= df %>%
dplyr::mutate(yield_corrected= if_else(is.na(yield1), 0, as.numeric(yield1)))
Warning message:
“There was 1 warning in `mutate()`.
ℹ In argument: `yield_corrected = if_else(is.na(yield), 0, as.numeric(yield))`.
Caused by warning in `if_else()`:
! NAs introduced by coercion”
dataA
location fertilizer block yield yield1 yield_corrected
Illinois A I 100 100 100
Illinois A II 99 99 99
Illinois A II 98 98 98
Illinois B I 125 125 125
Illinois B II NA NA NA
.
.
Iowa B II 152 152 152
Iowa C I #DIV/0! NA NA
Now, NA was not replaced by 0. This is because NA (missing value) is now being treated as text. In such cases, we can use the following code.
dataB= df %>%
dplyr::mutate(yield_corrected= suppressWarnings(if_else(yield=="NA", 0, as.numeric(yield))))
dataB
location fertilizer block yield yield1 yield_corrected
Illinois A I 100 100 100
Illinois A II 99 99 99
Illinois A II 98 98 98
Illinois B I 125 125 125
Illinois B II NA NA 0
.
.
Iowa B II 152 152 152
Iowa C I #DIV/0! NA NA
Now, I’ll also convert the error type, #DIV/0!
, to 0. I used the code if_else(yield == "#DIV/0!" | yield == "NA"
to designate these two text values as 0.
dataC= df %>%
mutate(yield_corrected5= suppressWarnings(if_else(yield=="#DIV/0!" | yield=="NA", 0, as.numeric(yield))))
dataC
location fertilizer block yield yield1 yield_corrected
Illinois A I 100 100 100
Illinois A II 99 99 99
Illinois A II 98 98 98
Illinois B I 125 125 125
Illinois B II NA NA 0
.
.
Iowa B II 152 152 152
Iowa C I #DIV/0! NA 0
full code: https://github.com/agronomy4future/r_code/blob/main/Converting_Character_Values_to_Numeric_in_R_A_How_To_Guide.ipynb