Browsed by
Category: Statistics

How to calculate PCA (Principal Components Analysis) by hand?

How to calculate PCA (Principal Components Analysis) by hand?

When you conduct PCA (Principal Components Analysis), do you simply accept the result which software programs provide? If we just accept results without any doubts, we never understand the principle of PCA. In this time, I’ll introduce how PCA is calculated step by step, and if you read this post, I believe you can fully understand the concept of PCA. Here is one data. Let’s say we measured kernel number per ear (KN), average of kernel weight per ear (KW)…

Read More Read More

What is Wilcoxon Rank-Sum Test?

What is Wilcoxon Rank-Sum Test?

The reason why we use t-test, not z-test is because we don’t know the variance (σ2) of the population. The sample mean (x̅) is an unbiased estimator for the population mean (μ), and therefore we can estimate μ from x̅ (E(x̅) = μ). How about variance? If we know σ2, sample variance could be estimated by dividing σ2 by sample size (n); σ2x̄ = σ2/n. However, if we don’t know σ2, we should use standard deviation (s) of samples; s2x̄ = s2/n…

Read More Read More

[STAT Article] Mastering RMSE Calculation with Excel and R: A Comprehensive Guide

[STAT Article] Mastering RMSE Calculation with Excel and R: A Comprehensive Guide

When running statistical programs, you might come across RMSE (Root Mean Square Error). For instance, the table below displays RMSE values obtained from SAS, which indicate that it is approximately equal to 2.72 I am wondering how RMSE is calculated. The equation for RMSE is shown below. First, calculate the difference between the estimated and observed values: (ŷi – yi), and then square the difference: (ŷi – yi)². Second, calculate the sum of squares: Σ(ŷi – yi)². Third, divide the…

Read More Read More

What is Finlay-Wilkinson Regression Model?

What is Finlay-Wilkinson Regression Model?

The genotype is dependent on environmental changes. One genotype may strongly respond to certain environmental conditions, while another genotype may weakly respond to the same conditions. If some genotypes strongly respond under better conditions, they would be adaptable to the environment. Adaptability refers to the flexibility of a genotype in its response to improved environments. If a certain genotype exhibits high performance across a wide range of environmental conditions, it would be considered to have broad adaptation. To achieve this…

Read More Read More

What is a nested model in statistics?

What is a nested model in statistics?

One tomato farmer is growing tomato seedlings, and all of sudden he wants to investigate the amount of calcium in leaves. So, he selected four tomato seedlings, and he randomly chose three leaves in each seedling and investigated the amount of calcium. He measured twice in each leaf. This experimental design would be explained by below table. y111 means the amount of calcium in the 1st seedling – 1st leaf – 1st replicate. Then, y432 will mean the outcome of…

Read More Read More

In agronomy research, how to estimate a missing value in collected field data?

In agronomy research, how to estimate a missing value in collected field data?

Here is an example of missing data. There are four different cultivars and I would like to determine if there is a difference in yield among them. I have five replicates as blocks, so the experimental design is a Randomized Complete Block Design (RCBD), and we can analyze the data using one-way ANOVA with blocks. You can download the data using R by copying and pasting the code below into your R script. After running the code, an Excel file…

Read More Read More

How to calculate pooled variance when including block in the experimental design?

How to calculate pooled variance when including block in the experimental design?

New equation I suggest!! You might be already familiar with how to calculate pooled variance. This story is about pooled variance when blocks exist. If you run statistics programs, you’ll simply obtain pooled variance (also known as MSE), but you’ll never understand the concept of pooled variance if you just run software programs. Here is an example data. Let’s say this is a yield data. Cultivar A Cultivar B 120 70 130 90 110 50 Mean: 120Variance (s2): 100 Mean:…

Read More Read More

How to calculate the optimum sample size for 2-Sample t test (using R and G*Power program)?

How to calculate the optimum sample size for 2-Sample t test (using R and G*Power program)?

When we set up our experimental design, it is not easy to decide the sample size because we don’t know exactly how many samples are required for our experiments. Of course the more, the better. However, eventually, we need to decide appropriate sample size according to our time and resources. For example, if we want to know the average height of students in University of Guelph, the best way is to measure all students’ height. According to Wikipedia, total number…

Read More Read More

What is ANCOVA (2/3)? How to interpret Parameter Estimates

What is ANCOVA (2/3)? How to interpret Parameter Estimates

Previous post□ What is ANCOVA (1/3)? The basic concept In previous post, I explained how to interpret ANCOVA table (red box in below tables). In this post, I’ll explain how to interpret Parameter Estimates (blue box in below table) in ANCOVA analysis. Let’s check the ‘Parameter Estimates’ table. Most statistical programs set up one level of an experimental factor as “zero” and estimate the results based on that level. This concept is known as the Generalized Linear Model (GLM). If you…

Read More Read More

What is ANCOVA (1/3)? The basic concept

What is ANCOVA (1/3)? The basic concept

Today, I will explain Analysis of Covariance (ANCOVA). ANCOVA is a statistical technique that involves including covariates, which are additional variables that may impact the dependent variable (y) in addition to the independent variable (x). I have a dataset as shown below, and I would like to analyze crop yield based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates. Rep Fertilizer Yield Rep Fertilizer Yield Rep Fertilizer Yield…

Read More Read More

What is ANCOVA (3/3)? The common slope and adjusted mean

What is ANCOVA (3/3)? The common slope and adjusted mean

Previous post□ What is ANCOVA (1/3)? The basic concept□ What is ANCOVA (2/3)? How to interpret Parameter Estimates In the previous posts, I explained the basic concept of ANCOVA and how to interpret statistical results. Now, I will discuss the most important concept that is not commonly mentioned. The statistical program provided the following model in the previous posts. Control: y= 9.53 + 0.0558 xFast: y= 6.39 + 0.0558 xSlow: y= 13.10 + 0.0558 x Then, if we apply the mean…

Read More Read More

[데이터 칼럼] 회귀모델의 절편을 0 으로 조정 했을때 결정계수는 어떻게 변할까?

[데이터 칼럼] 회귀모델의 절편을 0 으로 조정 했을때 결정계수는 어떻게 변할까?

이전에는 밀 종자의 면적을 구하기 위해 밀 종자를 이미지 스캔하고, 그 다음에는 각 밀 종자의 면적에 해당하는 무게를 측정하였습니다. 다음 회귀 분석은 밀 종자의 면적과 무게 간의 관계를 보여줍니다. # Data download https://www.kaggle.com/datasets/agronomy4future/wheat-grain-area-vs-weight 위 데이터를 제 Github 에서 R 로 업로드 하겠습니다. 그리고 통계 분석을 해 보겠습니다. 회귀모형 y= 3.3333x – 13.7155 을 Excel 과 R을 사용하여 얻었습니다. 여기서 y 는 밀 종자 무게(mg) 이고, x 는 밀 종자 면적(mm2) 입니다. 그러나 이 모델에서, x 값이 작아지는 어느 시점부터 y…

Read More Read More

단순선형 회귀분석에서 결정계수 (R², Coefficient of Determination) 를 가장 쉽게 설명해 보자

단순선형 회귀분석에서 결정계수 (R², Coefficient of Determination) 를 가장 쉽게 설명해 보자

여기 x 와 y 에 대한 데이터가 있습니다. x 가 변화함에 따라 y 는 어떻게 달라지는지를 알고 싶어 회귀분석을 해 보겠습니다. x y 1 10 30 2 20 40 3 30 50 4 40 80 5 50 90 6 60 100 7 70 120 저는 SAS 를 이용합니다. 먼저 데이터 데이블을 생성합니다. 그리고 단순선형 회귀분석을 해 보겠습니다. 통계 프로그램은 회귀방정식 y= 11.429 + 1.5357x 를 제공해 주었습니다. 즉, x 가 1 증가할 때 y 는1.5357 배로 증가합니다. 그리고 이 회귀모형의…

Read More Read More

How to conduct Least Significant difference (LSD) test using R STUDIO?

How to conduct Least Significant difference (LSD) test using R STUDIO?

For the mean comparison among variables, Least Significant difference (LSD) test is the most common method. Today I’ll introduce LSD test using R Studio. Here is one data. This data is about the yield difference of CV1 in response to 4 different nitrogen fertilizer (N0 ,N1, N2, N3). First of all, let’s check the mean per each nitrogen fertilizer. It seems that yield is different from nitrogen fertilizers, but we need to confirm it statistically. First, I’ll run One-Way ANOVA…

Read More Read More