Converting Character Values to Numeric in R: A How-To Guide

Converting Character Values to Numeric in R: A How-To Guide


First, let’s create a dataset.

df= data.frame(
  location= rep(c("Illinois", "Iowa"), each = 12),
  fertilizer= rep(c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D"), times = 2),
  block= rep(c("I", "II", "II"), times = 8),
  yield= c('100', '99', '98', '125', "NA", '135', '140', '138', '128', '80', '87', '76', '130', "NA", '140', '145', '148', '152', "#DIV/0!", '158', '165', '80', '78', "NA"),
  yield1= c(100, 99, 98, 125, NA, 135, 140, 138, 128, 80, 87, 76, 130, NA, 140, 145, 148, 152, NA, 158, 165, 80, 78, NA)
)

df
location	fertilizer  block  yield    yield1
Illinois	A	    I	   100	    100
Illinois	A	    II	   99	    99
Illinois	A	    II	   98	    98
Illinois	B	    I	   125	    125
Illinois	B	    II	   NA	    NA
.
.
Iowa	        B	    II	   152	    152
Iowa	        C	    I	   #DIV/0!  NA

and observe the different data formats of each value.

'data.frame':	24 obs. of  5 variables:
 $ location  : chr  "Illinois" "Illinois" "Illinois" "Illinois" ...
 $ fertilizer: chr  "A" "A" "A" "B" ...
 $ block     : chr  "I" "II" "II" "I" ...
 $ yield     : chr  "100" "99" "98" "125" ...
 $ yield1    : num  100 99 98 125 NA 135 140 138 128 80 ...

I have two sets of yield data: one in character format (yield column) and the other in numeric format (yield1 column).



How to convert missing value to 0 when data is numeric?

When data is numeric (yield1 column), and if there are missing values, how can we replace it to 0?

library(dplyr)

dataA= df %>%
       dplyr::mutate(yield_corrected= if_else(is.na(yield1), 0, as.numeric(yield1)))

dataA
location  fertilizer  block	yield	yield1	yield_corrected
Illinois  A	      I	        100	100	100
Illinois  A	      II	99	99	99
Illinois  A	      II	98	98	98
Illinois  B	      I	        125	125	125
Illinois  B	      II	NA	NA	0
.
.
Iowa	  B	      II	152	152	152
Iowa	  C	      I	        #DIV/0!	NA	0

or you can also use the following code.

dataA1 = df %>%
         dplyr:mutate(yield1=coalesce(yield1, 0))


How to convert missing values to 0 when the data is character?

Here is a problem: sometimes, data appears to be numerical, but its actual format is character (as seen in the yield1 column). This often occurs when downloading data from websites. In such cases, the above code may not be applicable.

dataA= df %>%
       dplyr::mutate(yield_corrected= if_else(is.na(yield1), 0, as.numeric(yield1)))

Warning message:
“There was 1 warning in `mutate()`.
ℹ In argument: `yield_corrected = if_else(is.na(yield), 0, as.numeric(yield))`.
Caused by warning in `if_else()`:
! NAs introduced by coercion”

dataA
location  fertilizer  block	yield	yield1	yield_corrected
Illinois  A	      I	        100	100	100
Illinois  A	      II	99	99	99
Illinois  A	      II	98	98	98
Illinois  B	      I	        125	125	125
Illinois  B	      II	NA	NA	NA
.
.
Iowa	  B	      II	152	152	152
Iowa	  C	      I	        #DIV/0!	NA	NA

Now, NA was not replaced by 0. This is because NA (missing value) is now being treated as text. In such cases, we can use the following code.

dataB= df %>%
       dplyr::mutate(yield_corrected= suppressWarnings(if_else(yield=="NA", 0, as.numeric(yield))))

dataB
location  fertilizer  block	yield	yield1	yield_corrected
Illinois  A	      I	        100	100	100
Illinois  A	      II	99	99	99
Illinois  A	      II	98	98	98
Illinois  B	      I	        125	125	125
Illinois  B	      II	NA	NA	0
.
.
Iowa	  B	      II	152	152	152
Iowa	  C	      I	        #DIV/0!	NA	NA

Now, I’ll also convert the error type, #DIV/0!, to 0. I used the code if_else(yield == "#DIV/0!" | yield == "NA" to designate these two text values as 0.

dataC= df %>%
       mutate(yield_corrected5= suppressWarnings(if_else(yield=="#DIV/0!" | yield=="NA", 0, as.numeric(yield))))

dataC
location  fertilizer  block	yield	yield1	yield_corrected
Illinois  A	      I	        100	100	100
Illinois  A	      II	99	99	99
Illinois  A	      II	98	98	98
Illinois  B	      I	        125	125	125
Illinois  B	      II	NA	NA	0
.
.
Iowa	  B	      II	152	152	152
Iowa	  C	      I	        #DIV/0!	NA	0
full code: https://github.com/agronomy4future/r_code/blob/main/Converting_Character_Values_to_Numeric_in_R_A_How_To_Guide.ipynb


Comments are closed.