Browsed by
Category: R programming

How to reorder variables in R for data analysis?

How to reorder variables in R for data analysis?

I’ll now show you how to change the order of variables in a graph. For this example, I have prepared some data and will create a graph using it. Now, let’s re-order the variables so that Calcium Nitrate comes first, followed by Urea, then Sodium Nitrate, and lastly Ammonium Sulfate. First, let’s take a look at the variables: The variables in R are currently ordered alphabetically by default. The following code will change the order of the variables as desired:…

Read More Read More

How to Rename Variables within Columns in R?

How to Rename Variables within Columns in R?

If you need to change the text of a specific column while analyzing data in R, I will introduce how to do it. First, let’s create a simple dataset First, let’s rename the column names. We will change the ‘Nation’ column name to ‘Country’ and the ‘Sex’ column name to ‘Gender’. If you enter the following code, the column names will be updated accordingly. If the nationality of DAVID is Canada instead of Germany, you can update by entering the…

Read More Read More

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

I have the following data. Nitrogen Sulphur Rep Yield 0 0 1 1.0 0 0 2 0.9 0 0 3 0.8 N1 S1 1 1.0 N1 S1 2 1.2 N1 S1 3 1.3 N1 S2 1 2.1 N1 S2 2 2.2 N1 S2 3 2.3 N2 S1 1 1.4 N2 S1 2 1.6 N2 S1 3 1.7 N2 S2 1 2.5 N2 S2 2 2.6 N2 S2 3 2.8 Let’s assume that this data is the result of investigating how…

Read More Read More

Simplifying Data Manipulation: Transposing Columns into Rows with Ease

Simplifying Data Manipulation: Transposing Columns into Rows with Ease

Sometimes, I see many people managing their data as columns like the example below. It seems convenient because we can see our data all at once. However, this data format is problematic for data analysis, which fundamentally relies on variables, namely independent and dependent variables. Download data file (.csv) https://github.com/agronomy4future/raw_data_practice/blob/main/yield_per_location.csv In the given data format, each level for the independent variable (i.e., location) was not combined in one column, and therefore we need to rearrange the data format. If the…

Read More Read More

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

In this session, I will introduce the method of calculating the Best Linear Unbiased Estimator (BLUE). Instead of simply listing formulas as many websites do to explain BLUE, this post aims to help readers understand the process of calculating BLUE with an actual dataset using R. I have the following data. location sulphur (kg/ha) block yield Cordoba 0 1 750 Cordoba 24 1 1250 Cordoba 36 1 1550 Cordoba 48 1 1120 Cordoba 0 2 780 Cordoba 24 2 1280…

Read More Read More

How to create separate linear and quadratic regression graphs for each group in the same panel using R?

How to create separate linear and quadratic regression graphs for each group in the same panel using R?

When we draw regression lines for a group, they are usually of the same type, such as simple linear regression. Here is an example using yield data for different nitrogen rates per genotype. Then, the regression graph for each group would be shown below. I think it would be better to show the quadratic regression line for genotype A. In this case, how can we create separate linear and quadratic regression graphs for each group in the same panel? Data…

Read More Read More

What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

To gain a basic understanding of the topic, I recommend reading the following posts. Analysis of Covariance (ANCOVA) I have a dataset as shown below, and I would like to analyze crop yield, and height based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates. Rep Fertilizer Yield Height Fertilizer Yield Height Fertilizer Yield Height 1 Control 12.2 45.0 Slow 16.6 63.0 Fast 9.5 52.0 2 Control 12.4 52.0…

Read More Read More

What is the F-ratio in statistics?

What is the F-ratio in statistics?

Today, I will explain the meaning of the F-value in testing for significance through statistical processing. Let me give you an example. Suppose we want to determine whether there are differences in the yield according to the varieties (A, B, C). The total experimental unit is 12 (3 varieties x 4 replicates). What would happen if there is a significant difference in yield among varieties A and C? If there is a large difference in yield between these varieties, the…

Read More Read More

Advanced Text Formatting in R STUDIO Graphs: Superscripts and Subscripts

Advanced Text Formatting in R STUDIO Graphs: Superscripts and Subscripts

Sometimes, when creating graphs using R, there may be a need to include superscripts or subscripts in axis text or titles. In this post, I will introduce about how to enter text with superscripts or subscripts. I will generate one simple data and draw a graph to demonstrate. Here, I want to add superscripts or subscripts to the axis titles of the graph. For example, for the x-axis, I want to name it as “GenotypeTM” and for the y-axis, I…

Read More Read More

What is logistic regression (feat. odds, odds ratio and model equation)?

What is logistic regression (feat. odds, odds ratio and model equation)?

Logistic regression is a type of statistical analysis used to model the relationship between a binary (yes/no) dependent variable and independent variables. The goal of logistic regression is to find a relationship between the independent variables (x) and the probability of a particular outcome for the dependent variable (y). The logistic regression model calculates the probability of a certain outcome by applying a logistic function to the linear combination of the independent variables. Here is one example. Sulphur improves plant…

Read More Read More

In R, how to adjust the unit of axis in graph?

In R, how to adjust the unit of axis in graph?

When we make graphs, the unit is great and the number would be overlapped. Here is an example. Now, I’d like to change the unit of number. For example, I want to divide each value by 1000, so that to show 5 to 30 in x-axis. We can add below codes.

What is split-split-plot design in agronomy research (feat. using R and SAS)?

What is split-split-plot design in agronomy research (feat. using R and SAS)?

In my previous post, I explained what split-plot design and the statistical model is, and also how it is different RCBD. What is split-plot design in agronomy research? I explained the main difference between split-plot design and RCBD is that in split-plot design, error is divided into two (error a and b), increasing the significance of interaction between the main plot and sub-plot. Now our interest lies in cases where we have three factors. In a split-plot design, we typically…

Read More Read More

What is split-plot design in agronomy research?

What is split-plot design in agronomy research?

Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one whole plot and one subplot. The whole plot factor is randomly assigned to the experimental units, while the subplot factor is applied to a smaller…

Read More Read More

An Introduction to Residual Analysis in Simple Linear Regression Models

An Introduction to Residual Analysis in Simple Linear Regression Models

Sample No. x y 1 10 30 2 20 40 3 30 50 4 40 80 5 50 90 6 60 100 7 70 120 Here is a dataset that allows us to analyze the relationship between x and y and obtain the model equation, y= β0 + β1x. Although statistical programs can provide us with results in just 10 seconds, it is more important to understand the principles behind the calculations than to simply know how to run the…

Read More Read More

Data filtering using R Studio

Data filtering using R Studio

When you conduct statistical analysis, you might want to include/exclude some variables. For example, here is one data. This is data about how yield, grain number (GN) and averge grain weight (AGW) are different according to two different fertilizers (N0, N1) in five genotypes (CV1 – CV5). That is, there will be 10 treatments [Genotype (5) x Nitrogen (2) =10]. Replicates are 10 as blocks, and therefore experimental unit will be 30 [10 treatments x 3 blocks = 30]. What…

Read More Read More

How to analyze quadratic plateau model in R Studio?

How to analyze quadratic plateau model in R Studio?

Previous post□ How to analyze linear plateau model in R Studio? In my previous post, I explained how to analyze linear plateau model. I simulated yield data for five different crop varieties with different sulphur applications, and suggsted the optimum sulphur application would be 23.3 kg/ha based on the linear plateau model. In this time, I’ll explain how to analyze quadratic plateau model with the same data using R studio 1) Data upload If you run the below code, the…

Read More Read More

How to analyze linear plateau model in R Studio?

How to analyze linear plateau model in R Studio?

When we talk about regression, it’s usually about simple linear regression model. This is about the relationship between two variables. FYI□ Simple linear regression (1/5)- correlation and covariance□ Simple linear regression (2/5)- slope and intercept of linear regression model Linear plateau model is similar with simple linear model, but linear plateau model is a segmented model, and this statistical model is interested in the critical value (the x-value above which there is no further increase in y), indicating the plateau value (the statistically highest value…

Read More Read More

In R, how to substrtact the mean from each value?

In R, how to substrtact the mean from each value?

In my previous post, I explained how to add extra column and row to calculate mean respectively. In R, how to add extra column and row to calculate mean respectively? Now, I’d like to substrtact the mean from each value in each column. This will be genotypic effect.

In R, how to add extra column and row to calculate mean respectively?

In R, how to add extra column and row to calculate mean respectively?

Let’s generate one data table. Now, I’d like to calculate mean of each column and row. For example, I want to calculate mean of ENV1 to ENV5, and also CV1 to CV5. First, I’ll calculate mean of each row (ENV1 to ENV 5). I discarded Environment row (dataA %>% select(-Environment)) because it’s not a numeric. Now, I’ll calculate mean of each column (CV1 to CV5). Now, mean of each column and row was calculated.

What is Probability Density Function (PDF) and Cumulative Distribution Function (CDF): How to calculate using Excel and R ?

What is Probability Density Function (PDF) and Cumulative Distribution Function (CDF): How to calculate using Excel and R ?

When we analyze data, we may need to show graphs depicting normal distributions. These graphs differ from density graphs as they convey various concepts that simple bar graphs cannot. While it is easy to draw these graphs in Excel, understanding the underlying concepts is crucial. In this article, I will explain what the Probability Density Function (PDF) is, and I will show how we can calculate it in both Excel and R. Here is a dataset of 1,000 individual wheat…

Read More Read More

How to change the name of columns in R?

How to change the name of columns in R?

Let’s upload one data to R. Now, I’d like to change the name of column as field → locationgenotype → varietyblock → repstreatment → experimentshoot → branchgrain_number → GNgrain_weight → GW I introduce two ways to change column names. 1) using colnames() 2) using rename() in dplyr package In this time, I’ll use dplyr package.

How to upload a file from GitHub to R?

How to upload a file from GitHub to R?

I uploaded one .csv file to GitHub. Now I want to analyze this data in R. Simply I can download this file and upload to R. But let’s directly upload this file from GitHub to R. First, we need to know the URL address of this file. If you click your file name in GitHub, you can find the “Raw” button. So, let’s click this button. Then, in the address bar of your web browser, you can obtain the URL…

Read More Read More

How to easily change legend name inside a graph in R?

How to easily change legend name inside a graph in R?

I’ll generate one data. Then, I’ll make a bar graph about this data. To make a bar graph, data should be summarized. Now, I want to change legend name from N0 to 0kg N/ha, and N1 to 200kg N/ha. Simply we can add more code like this; scale_fill_manual(label=c(“0kg N/ha”,”200kg N/ha”), values=c(“grey75″,”grey25″)) What if we want to change the title of legend from Fertilizer to Treatment? Just add one code like this; labs(fill=”Treatment”,x=”Genotype”, y=”Yield”)

[STAT Article] Mastering RMSE Calculation with Excel and R: A Comprehensive Guide

[STAT Article] Mastering RMSE Calculation with Excel and R: A Comprehensive Guide

When running statistical programs, you might come across RMSE (Root Mean Square Error). For instance, the table below displays RMSE values obtained from SAS, which indicate that it is approximately equal to 2.72 I am wondering how RMSE is calculated. The equation for RMSE is shown below. First, calculate the difference between the estimated and observed values: (ŷi – yi), and then square the difference: (ŷi – yi)². Second, calculate the sum of squares: Σ(ŷi – yi)². Third, divide the…

Read More Read More

What is Finlay-Wilkinson Regression Model?

What is Finlay-Wilkinson Regression Model?

The genotype is dependent on environmental changes. One genotype may strongly respond to certain environmental conditions, while another genotype may weakly respond to the same conditions. If some genotypes strongly respond under better conditions, they would be adaptable to the environment. Adaptability refers to the flexibility of a genotype in its response to improved environments. If a certain genotype exhibits high performance across a wide range of environmental conditions, it would be considered to have broad adaptation. To achieve this…

Read More Read More

What is a nested model in statistics?

What is a nested model in statistics?

One tomato farmer is growing tomato seedlings, and all of sudden he wants to investigate the amount of calcium in leaves. So, he selected four tomato seedlings, and he randomly chose three leaves in each seedling and investigated the amount of calcium. He measured twice in each leaf. This experimental design would be explained by below table. y111 means the amount of calcium in the 1st seedling – 1st leaf – 1st replicate. Then, y432 will mean the outcome of…

Read More Read More

How to calculate the optimum sample size for 2-Sample t test (using R and G*Power program)?

How to calculate the optimum sample size for 2-Sample t test (using R and G*Power program)?

When we set up our experimental design, it is not easy to decide the sample size because we don’t know exactly how many samples are required for our experiments. Of course the more, the better. However, eventually, we need to decide appropriate sample size according to our time and resources. For example, if we want to know the average height of students in University of Guelph, the best way is to measure all students’ height. According to Wikipedia, total number…

Read More Read More

What is ANCOVA (1/3)? The basic concept

What is ANCOVA (1/3)? The basic concept

Today, I will explain Analysis of Covariance (ANCOVA). ANCOVA is a statistical technique that involves including covariates, which are additional variables that may impact the dependent variable (y) in addition to the independent variable (x). I have a dataset as shown below, and I would like to analyze crop yield based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates. Rep Fertilizer Yield Fertilizer Yield Fertilizer Yield 1 Control…

Read More Read More

In R STUDIO, how to reverse the order of x-axis (numeric), and also change the direction of graph?

In R STUDIO, how to reverse the order of x-axis (numeric), and also change the direction of graph?

Here is one data Then I make a regression graph. Now, I’d like to reverse the order of x-axis by descending (60 to 0). So I delete the code, scale_x_continuous(breaks = seq(-0, 60, 10), limits = c(-0, 60)) and add a new code, scale_x_reverse(limits = c(60,0)) So the whole code is below. Now, you can see the order of x-axis is changed and also the direction of the graph is changed. How to adjust unit of the axis? To adjust…

Read More Read More