Browsed by
Category: R programming

Efficient Multivariate Summary in R: A Guide to Analyzing Multiple Independent Variables

Efficient Multivariate Summary in R: A Guide to Analyzing Multiple Independent Variables

In my previous post, I introduced how to summarize data, such as mean, standard deviation, and standard error. However, at that moment, I demonstrated how to summarize only one variable. □ Streamlined Data Summary in R STUDIO: Enhancing Bar Graphs with Error Bars Now, let’s discuss this further with a dataset. I would like to summarize the Yield data, including the mean, standard deviation, and standard error. I’ll use ddply() Now, I also want to summarize variables GN and AGW….

Read More Read More

Converting Rows to Columns in R: A Guide to Transposing Data (feat. pivot_wider and pivot_longer)

Converting Rows to Columns in R: A Guide to Transposing Data (feat. pivot_wider and pivot_longer)

When data is arranged, it can be structured either vertically (row-based) or horizontally (column-based). The choice depends on your preference for organizing data. However, when running statistics, data should be arranged row-based, as variables need to be in the same column. On the other hand, when calculating per variable, it is much easier to organize data column-based, allowing for simpler calculations. Regardless of the approach, well-organized data is essential, and the ability to restructure data is a valuable skill. Today,…

Read More Read More

A Guide to Normalizing Data for Different Treatments in R

A Guide to Normalizing Data for Different Treatments in R

I have data, as shown below, regarding iron contents in soil and the plant uptake of iron at different growth stages in winter wheat. I want to analyze the relationship between the iron content in the soil and the plant uptake of iron at different growth stages in winter wheat. We can simply draw a regression graph. However, before doing that, we need to reshape the data. I’ll transpose the data from rows to columns based on the variables in…

Read More Read More

How to delete and change specific texts within a column in R?

How to delete and change specific texts within a column in R?

When we want to change texts within a columns, you can have several methods which I already introduced before. □ How to Rename Variables within Columns in R? However, changing all texts and specific texts would be different. Let’s upload a data. Now, we can change the variables name as following code: How about changing the text in the ID column? I want to remove ‘Delta_’ and keep only the numbers. Will you change the text one by one as…

Read More Read More

How to Upload and Combine Multiple Files In R?

How to Upload and Combine Multiple Files In R?

In a folder, I have 5 different .csv files. I want to upload these files to R and combine all of them because the data format (number of columns and structure) is the same. While you can certainly upload them one by one, imagine a scenario where you have 100 datasets. Will you upload all 100 of them individually? No! That would be a waste of time. In such cases, you can use a simple code to upload multiple files…

Read More Read More

Calculating Predicted Values for Each Group in Basic Modeling

Calculating Predicted Values for Each Group in Basic Modeling

□ The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package) In my previous post, I explained how to estimate dependent values from fitting models. Now I’ll explain how to add this predicted value to the original data using R. First, let’s upload data to R. Now, I’ll predict yield using the model. I believe that ‘row’ represents a random factor for each treatment, so I’d like to adjust the residuals using BLUP (Best Linear Unbiased Predictor),…

Read More Read More

How to calculate responsiveness in response to control using R?

How to calculate responsiveness in response to control using R?

In my previous post, I explained how to quantify phenotypic plasticity and introduced the concept of ‘responsiveness.’ □ Quantifying Phenotypic Plasticity of Crops I introduced a formula to calculate responsiveness as (Treatment – Control) / Control. Genotype Control Treatment Responsiveness A 100 90 -10.0% B 120 70 -41.7% C 115 90 -21.7% D 95 85 -10.5% E 110 105 -4.5% However, when analyzing data, the format may not always be the same as above. Mostly, treatments (independent variable) are arranged in…

Read More Read More

How to customize the title format in facet_wrap()?

How to customize the title format in facet_wrap()?

□ Graph Partitioning Using facet_wrap() in R Studio By following my previous post, you can understand how to obtain the figure below. If you copy and paste the code above into your R console, you can obtain the same figure as shown above. Now, I’d like to change the title format by removing the title border. Next, I’d like to draw a line in the title. Please refer to the code below. full code: https://github.com/agronomy4future/r_code/blob/main/How_to_customize_the_title_format_in_facet_wrap().ipynb © 2022 – 2023 https://agronomy4future.com

Variable-Dependent Manipulation of Point and Line Sizes in R

Variable-Dependent Manipulation of Point and Line Sizes in R

I will randomly create a piece of data and then proceed to plot a line graph with points for this data. I have differentiated point colors and shapes based on the variable “Genotype”. In the above code, the value geom_point(size=5) sets the point size to 5 for both GenotypeA and GenotypeB. However, I would like to increase the point size specifically for GenotypeA. I will change the code from geom_point(size=5) to geom_point(aes(size=Genotype)). This means that I will adjust the point…

Read More Read More

Converting an Excel File to an R file: Optimizing File Size

Converting an Excel File to an R file: Optimizing File Size

Today, I will introduce a method for converting an Excel file into an R file. I have placed an Excel file in a folder named ‘DataBase’ on the desktop. This file contains wheat grain size data, with 96,320 rows and a size of approximately 15MB. When an Excel file is large, you may experience performance issues, such as Excel slowing down during data operations, especially if your computer has limited memory. It would be more convenient to convert this Excel…

Read More Read More

Drawing Lines in ggplot()

Drawing Lines in ggplot()

When using ggplot() to create multiple graphs, there are times when you might want to add separate lines to the graphs. Today, I’ll be posting about how to draw additional lines on graphs. Let’s start by generating a simple piece of data. Next, I will proceed to draw a regression graph for this data. 1) Drawing a 1:1 ratio line. To examine the slope of the regression line, I would like to draw a 1:1 ratio line. geom_abline (slope=1, linetype…

Read More Read More

Two-Way ANOVA Tutorial Using SAS Studio

Two-Way ANOVA Tutorial Using SAS Studio

I will introduce how to perform a Two-Way ANOVA analysis using SAS Studio. Here is the data that you have available: Upload this Excel file to SAS Studio. After uploading the Excel file to SAS Studio, create a data table named “EXP1” in My Libraries. Then, click on the EXP1 data table. Then, select the icon for generating code located at the top. By doing so, a new tab named “Program 1” will be created, allowing you to generate the…

Read More Read More

Quantifying Phenotypic Plasticity of Crops

Quantifying Phenotypic Plasticity of Crops

Phenotypic plasticity refers to the ability of an individual organism, in this case, a plant, to display varying phenotypic traits or characteristics in response to different environmental conditions. These traits can include physical features, physiological processes, and behaviors. Phenotypic plasticity is a crucial adaptive mechanism that allows organisms to optimize their survival and reproduction in varying environments. Crops are particularly reliant on phenotypic plasticity to cope with changes in factors such as light, temperature, moisture, nutrient availability, and other environmental…

Read More Read More

Statistical Inference on Binomially Distributed Data

Statistical Inference on Binomially Distributed Data

The primary purpose of our experiment is to validate hypotheses regarding the population of the subjects under study. As a result, the experimenter must determine whether to accept or reject these hypotheses based on the experiment’s results. In this context, the method of statistical analysis will vary depending on whether the sample data follows a normal distribution or a binomial distribution. Today, we will introduce statistical testing methods for data that conform to a binomial distribution. Let’s delve into an…

Read More Read More

Graph Partitioning Using facet_grid() in R Studio

Graph Partitioning Using facet_grid() in R Studio

In my previous post, I introduced how to partition graphs using facet_wrap(). Today, I’ll introduce facet_grid(). □ Graph Partitioning Using facet_wrap() in R Studio Actually, the function is the same, but there are very subtle differences between facet_wrap() and facet_grid(). Today, I’ll explain this. Let’s upload one data. I measured chlorophyll contents in leaves for two wheat genotypes under both stress and normal conditions. In this case, there are two factors (stress treatment and genotypes). If you’ve read my previous…

Read More Read More

Displaying Both Dates and Numeric Units on the X-Axis in R (feat. patchwork package)

Displaying Both Dates and Numeric Units on the X-Axis in R (feat. patchwork package)

When we create time series graphs in R, it is sometimes necessary to display both dates and numbers on the x-axis. This is because when the x-axis is set to show dates only, it can be challenging to add text or other elements directly onto the graph. However, by using both dates and numbers on the x-axis, we can easily insert texts, lines, and other annotations. Let’s talk with data. and I made a line graph over date. But I…

Read More Read More

[데이터 칼럼] 데이터의 시각화에서 데이터 정규화가 필요한 이유는 무엇일까?

[데이터 칼럼] 데이터의 시각화에서 데이터 정규화가 필요한 이유는 무엇일까?

데이터의 정규화는 여러 가지 주요 이유로 데이터를 시각화 할 때 필요한데, 가장 중요한 이유는 척도의 균일성 (scale uniformity) 때문입니다. 서로 다른 데이터 변수들은 크게 다른 척도와 단위를 가질 수 있습니다. 예를 들어, 곡물 수확량은 Mg/ha 일 수 있고, 영양소 함량은 일반적으로 % 범위 내에 있을 수 있습니다. 이러한 데이터를 정규화 하면 단위가 다른 여러 개의 변수를 동일한 그래프에서 비교하고 시각화 할 수 있습니다. 또한, 정규화는 데이터의 해석 능력 (visualization interpretability) 을 향상시킵니다. 정규화된 데이터는 패턴에 대한 해석을 더 쉽게 할…

Read More Read More

[STAT Article] Easy Guide to Cook’s Distance Calculation Using Excel and R

[STAT Article] Easy Guide to Cook’s Distance Calculation Using Excel and R

I have 1,000 data points of measurements of the length (mm) and weight (mg) of wheat grains. With this data, I want to analyze the relationship between the length and weight of the wheat grain to propose an equation model that can predict grain weight. I will draw a graph to visualize the data. If you are new to R, you can copy and paste the following code into your R script window to obtain the same graph as shown…

Read More Read More

R-Squared Calculation in Linear Regression with Zero Intercept

R-Squared Calculation in Linear Regression with Zero Intercept

Previously, I scanned wheat grains to obtain the area of each grain, and then measured the weight of each grain corresponding to its area in order to develop a model equation. The following regression demonstrates the relationship between grain area and weight. # Data download https://www.kaggle.com/datasets/agronomy4future/wheat-grain-area-vs-weight I obtained the equation y = 3.3333x – 13.7155, where y is the grain weight (mg) and x is the grain area (mm2), using both Excel and R. However, this model predicts negative values…

Read More Read More

[STAT article] Two-Way ANOVA: An Essential Tool for Understanding Factorial Experiments

[STAT article] Two-Way ANOVA: An Essential Tool for Understanding Factorial Experiments

A factorial experiment involves the simultaneous manipulation of multiple factors or independent variables (x) to study their effects on a dependent variable (y). The experiment is called factorial because it involves testing multiple factors simultaneously. In factorial experiments, the combination of the different levels of each factor being tested is called a factorial, and each factorial represents a unique combination of these levels. For instance, N0_Genotyp1, N0_Genotyp2, N1_Genotyp1, N1_Genotyp2, etc. are different factorials used to conduct the experiment and analyze…

Read More Read More

[Coding article] A Guide to Analyzing Statistical Tests for Each Level of a Factor in R without Manual Specification

[Coding article] A Guide to Analyzing Statistical Tests for Each Level of a Factor in R without Manual Specification

This is my experimental data. There are 10 corn varieties, and I want to analyze the effect of nitrogen treatments (N0, N1) on grain yield for each variety. This is One-Way ANOVA analysis. Let’s assume that there are no blocks for the replicates. Therefore, the statistical model will be a One-Way ANOVA with no blocks. If we run the above analysis, we can observe the overall effect of nitrogen treatments on grain yield across all varieties, as they are pooled…

Read More Read More

How to reorder variables in R for data analysis?

How to reorder variables in R for data analysis?

I’ll now show you how to change the order of variables in a graph. For this example, I have prepared some data and will create a graph using it. Now, let’s re-order the variables so that Calcium Nitrate comes first, followed by Urea, then Sodium Nitrate, and lastly Ammonium Sulfate. First, let’s take a look at the variables: The variables in R are currently ordered alphabetically by default. The following code will change the order of the variables as desired:…

Read More Read More

How to Rename Variables within Columns in R?

How to Rename Variables within Columns in R?

If you need to change the text of a specific column while analyzing data in R, I will introduce how to do it. First, let’s create a simple dataset First, let’s rename the column names. We will change the ‘Nation’ column name to ‘Country’ and the ‘Sex’ column name to ‘Gender’. If you enter the following code, the column names will be updated accordingly. If the nationality of DAVID is Canada instead of Germany, you can update by entering the…

Read More Read More

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

I have the following data. Nitrogen Sulphur Rep Yield 0 0 1 1.0 0 0 2 0.9 0 0 3 0.8 N1 S1 1 1.0 N1 S1 2 1.2 N1 S1 3 1.3 N1 S2 1 2.1 N1 S2 2 2.2 N1 S2 3 2.3 N2 S1 1 1.4 N2 S1 2 1.6 N2 S1 3 1.7 N2 S2 1 2.5 N2 S2 2 2.6 N2 S2 3 2.8 Let’s assume that this data is the result of investigating how…

Read More Read More

Simplifying Data Manipulation: Transposing Columns into Rows with Ease

Simplifying Data Manipulation: Transposing Columns into Rows with Ease

Sometimes, I see many people managing their data as columns like the example below. It seems convenient because we can see our data all at once. However, this data format is problematic for data analysis, which fundamentally relies on variables, namely independent and dependent variables. Download data file (.csv) https://github.com/agronomy4future/raw_data_practice/blob/main/yield_per_location.csv In the given data format, each level for the independent variable (i.e., location) was not combined in one column, and therefore we need to rearrange the data format. If the…

Read More Read More

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

In this session, I will introduce the method of calculating the Best Linear Unbiased Estimator (BLUE). Instead of simply listing formulas as many websites do to explain BLUE, this post aims to help readers understand the process of calculating BLUE with an actual dataset using R. I have the following data. location sulphur (kg/ha) block yield Cordoba 0 1 750 Cordoba 24 1 1250 Cordoba 36 1 1550 Cordoba 48 1 1120 Cordoba 0 2 780 Cordoba 24 2 1280…

Read More Read More

How to create separate linear and quadratic regression graphs for each group in the same panel using R?

How to create separate linear and quadratic regression graphs for each group in the same panel using R?

When we draw regression lines for a group, they are usually of the same type, such as simple linear regression. Here is an example using yield data for different nitrogen rates per genotype. Then, the regression graph for each group would be shown below. I think it would be better to show the quadratic regression line for genotype A. In this case, how can we create separate linear and quadratic regression graphs for each group in the same panel? Data…

Read More Read More

What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

To gain a basic understanding of the topic, I recommend reading the following posts. Analysis of Covariance (ANCOVA) I have a dataset as shown below, and I would like to analyze crop yield, and height based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates. Rep Fertilizer Yield Height Fertilizer Yield Height Fertilizer Yield Height 1 Control 12.2 45.0 Slow 16.6 63.0 Fast 9.5 52.0 2 Control 12.4 52.0…

Read More Read More

What is the F-ratio in statistics?

What is the F-ratio in statistics?

Today, I will explain the meaning of the F-value in testing for significance through statistical processing. Let me give you an example. Suppose we want to determine whether there are differences in the yield according to the varieties (A, B, C). The total experimental unit is 12 (3 varieties x 4 replicates). What would happen if there is a significant difference in yield among varieties A and C? If there is a large difference in yield between these varieties, the…

Read More Read More