Browsed by
Category: Statistics

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 2] ANOVA model

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 2] ANOVA model

In my previous post, I explained how to calculate the Log-Likelihood, AIC, and BIC in a regression model. In this post, I will demonstrate the same concepts, but in the context of an ANOVA model. Here I have one dataset. Let’s say this data represents yield in response to different fertilizer types (Control, Slow, and Fast), and I want to determine the effect of fertilizer type on yield. Therefore, I will perform a one-way ANOVA. Now, I observe that the…

Read More Read More

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 1] regression model

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 1] regression model

Here I have one dataset. I want to predict grain weight using grain dimension data such as length, width, and area, and identify the best prediction model for estimating grain weight. As a result, I developed the following models. and I’ll calculate Log-likelihood for each model. To do that, I need to know each model equation. Now, I obtained each model equation, and I’ll calculate Log-likelihood For a linear regression model, the Log-Likelihood (LL) is defined as: where:n is the…

Read More Read More

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while preserving as much variability in the data as possible. It transforms the original variables in a dataset into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the original dataset. Here’s the step of Principal Component Analysis (PCA). 1. Standardize the Data: Since PCA is affected by the scale of the variables, it often begins with standardizing the…

Read More Read More

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Mean Absolute Error (MAE) is a metric used to measure the accuracy of a model’s predictions. It calculates the average magnitude of the errors in a set of predictions, without considering their direction. In other words, MAE measures the average absolute difference between the actual values and the predicted values. MAE is typically used in the context of regression analysis and prediction error evaluation, rather than in ANOVA (Analysis of Variance), which focuses on comparing the means of different groups….

Read More Read More

베이즈 (Bayes) 정리를 가장 쉽게 설명해 보자

베이즈 (Bayes) 정리를 가장 쉽게 설명해 보자

최근에는 베이지안 통계에 집중하고 있습니다. 그래서 개념 정리도 할 겸 베이즈 정리 (Bayes’ theorem)에 대해 최대한 쉽게 한번 설명해 보겠습니다. Kaggle 에서 데이터를 하나 가져오겠습니다. https://www.kaggle.com/datasets/cameronseamons/electronic-sales-sep2023-sep2024 이 데이터는 전자 마트에서 고객의 소비 성향을 분석한 데이터 입니다. Kaggle 에서 회원가입 후 데이터를 다운로드 받을 수 있습니다. 저는 파이썬 코드로 바로 데이터를 가져오겠습니다. 참고로 저는 구글 코랩 (Google Colab) 을 사용합니다. 복잡해 보이지만 어려울게 하나도 없습니다. 아래 코드를 본인의 구글 코랩에 복사/붙여넣기 하고 본인 구글 코랩의 파일 경로만 수정하면 됩니다. 자 이렇게…

Read More Read More

Stepwise Regression: A Practical Approach for Model Selection using R

Stepwise Regression: A Practical Approach for Model Selection using R

Stepwise selection, forward selection, and backward elimination are all methods used in the context of building statistical models, specifically regression models, where the goal is to select the most relevant predictors. In this section, I’ll introduce one by one. Let’s generate one dataset. This dataset includes grain yield data, along with measurements of stem biomass, grain weight (agw), and grain number (gn). I would now like to determine which variables are the most critical factors in influencing the final grain…

Read More Read More

A Practical Approach to Linear Mixed-Effects Modeling in R

A Practical Approach to Linear Mixed-Effects Modeling in R

A Linear Mixed-Effects Model (LMM) is a statistical model that combines both fixed effects and random effects to analyze data with repeated measurements or hierarchical structure. Let’s break down the key components and concepts of a Linear Mixed-Effects Model: 1) Fixed Effects: 2) Random Effects: 3) Linear Mixed-Effects Model Equation: The general equation of a Linear Mixed-Effects Model can be written as: Y= Xβ + Zb + ε 4) Estimation: In summary, Linear Mixed-Effects Models are a powerful statistical tool…

Read More Read More

Understanding Multiple Linear Regression Easily (Part 2: Calculating the Coefficient of Determination Manually)

Understanding Multiple Linear Regression Easily (Part 2: Calculating the Coefficient of Determination Manually)

□ Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually) In the previous post, we explained how to manually calculate the regression equation in multiple linear regression analysis. Now, in this post, I will explain how to calculate the coefficient of determination (R2) in multiple linear regression analysis. No. Yield (yi) Time (xi1) Moisture (xi2) 1 4.3 4 0.2 2 5.5 5 0.2 3 6.8 6 0.2 4 8.0 7 0.2 5 4.0 4 0.3 6 5.2…

Read More Read More

Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually)

Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually)

In my previous posts, I explained the simple linear regression model as five categories. I recommend reading the following posts first. □ Simple linear regression (1/5)- correlation and covariance□ Simple linear regression (2/5)- slope and intercept of linear regression model□ Simple linear regression (3/5)- standard error of slope and intercept□ Simple linear regression (4/5)- t value on the slope and intercept    □ Simple linear regression (5/5)- R_squared In this session, I will explain multiple regression analysis. Multiple regression analysis refers to…

Read More Read More

Step-by-Step Guide: Uploading Data and Conducting Statistical Analysis in SAS Studio

Step-by-Step Guide: Uploading Data and Conducting Statistical Analysis in SAS Studio

SAS Studio is a web version of the SAS program, and it can be used for free. As my current license for the statistical program I’ve been using is about to expire, I was searching for alternatives. Upon discovering SAS Studio, I decided to give it a try. Although I have never used SAS before, I’ve decided to take this opportunity to learn. I will now summarize the very basic learning materials I have covered up to this point. First,…

Read More Read More

Easy-to-Understand Guide to Factorial Experiments and Two-Way ANOVA

Easy-to-Understand Guide to Factorial Experiments and Two-Way ANOVA

Today, I’ll try to explain factorial experiments in the simplest way. When you apply multiple different factors simultaneously to derive experimental results, it’s called factorial experiments. The different treatments within the experiment are referred to as ‘factorials.’ In other words, a factorial is a combination of factors. [Note 1] A factorial experiment is a research design in which multiple independent variables, also known as factors, are manipulated simultaneously to analyze their combined effects on a dependent variable. The goal of…

Read More Read More

Two-Way ANOVA Tutorial Using SAS Studio

Two-Way ANOVA Tutorial Using SAS Studio

I will introduce how to perform a Two-Way ANOVA analysis using SAS Studio. Here is the data that you have available: Upload this Excel file to SAS Studio. After uploading the Excel file to SAS Studio, create a data table named “EXP1” in My Libraries. Then, click on the EXP1 data table. Then, select the icon for generating code located at the top. By doing so, a new tab named “Program 1” will be created, allowing you to generate the…

Read More Read More

Quantifying Phenotypic Plasticity of Crops

Quantifying Phenotypic Plasticity of Crops

Phenotypic plasticity refers to the ability of an individual organism, in this case, a plant, to display varying phenotypic traits or characteristics in response to different environmental conditions. These traits can include physical features, physiological processes, and behaviors. Phenotypic plasticity is a crucial adaptive mechanism that allows organisms to optimize their survival and reproduction in varying environments. Crops are particularly reliant on phenotypic plasticity to cope with changes in factors such as light, temperature, moisture, nutrient availability, and other environmental…

Read More Read More

Statistical Inference on Binomially Distributed Data

Statistical Inference on Binomially Distributed Data

The primary purpose of our experiment is to validate hypotheses regarding the population of the subjects under study. As a result, the experimenter must determine whether to accept or reject these hypotheses based on the experiment’s results. In this context, the method of statistical analysis will vary depending on whether the sample data follows a normal distribution or a binomial distribution. Today, we will introduce statistical testing methods for data that conform to a binomial distribution. Let’s delve into an…

Read More Read More

[STAT Article] Easy Guide to Cook’s Distance Calculation Using Excel and R

[STAT Article] Easy Guide to Cook’s Distance Calculation Using Excel and R

I have 1,000 data points of measurements of the length (mm) and weight (mg) of wheat grains. With this data, I want to analyze the relationship between the length and weight of the wheat grain to propose an equation model that can predict grain weight. I will draw a graph to visualize the data. If you are new to R, you can copy and paste the following code into your R script window to obtain the same graph as shown…

Read More Read More

R-Squared Calculation in Linear Regression with Zero Intercept

R-Squared Calculation in Linear Regression with Zero Intercept

Previously, I scanned wheat grains to obtain the area of each grain, and then measured the weight of each grain corresponding to its area in order to develop a model equation. The following regression demonstrates the relationship between grain area and weight. # Data download https://www.kaggle.com/datasets/agronomy4future/wheat-grain-area-vs-weight I obtained the equation y = 3.3333x – 13.7155, where y is the grain weight (mg) and x is the grain area (mm2), using both Excel and R. However, this model predicts negative values…

Read More Read More

[STAT article] Two-Way ANOVA: An Essential Tool for Understanding Factorial Experiments

[STAT article] Two-Way ANOVA: An Essential Tool for Understanding Factorial Experiments

A factorial experiment involves the simultaneous manipulation of multiple factors or independent variables (x) to study their effects on a dependent variable (y). The experiment is called factorial because it involves testing multiple factors simultaneously. In factorial experiments, the combination of the different levels of each factor being tested is called a factorial, and each factorial represents a unique combination of these levels. For instance, N0_Genotyp1, N0_Genotyp2, N1_Genotyp1, N1_Genotyp2, etc. are different factorials used to conduct the experiment and analyze…

Read More Read More

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

I have the following data. Nitrogen Sulphur Rep Yield 0 0 1 1.0 0 0 2 0.9 0 0 3 0.8 N1 S1 1 1.0 N1 S1 2 1.2 N1 S1 3 1.3 N1 S2 1 2.1 N1 S2 2 2.2 N1 S2 3 2.3 N2 S1 1 1.4 N2 S1 2 1.6 N2 S1 3 1.7 N2 S2 1 2.5 N2 S2 2 2.6 N2 S2 3 2.8 Let’s assume that this data is the result of investigating how…

Read More Read More

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

In this session, I will introduce the method of calculating the Best Linear Unbiased Estimator (BLUE). Instead of simply listing formulas as many websites do to explain BLUE, this post aims to help readers understand the process of calculating BLUE with an actual dataset using R. I have the following data. location sulphur (kg/ha) block yield Cordoba 0 1 750 Cordoba 24 1 1250 Cordoba 36 1 1550 Cordoba 48 1 1120 Cordoba 0 2 780 Cordoba 24 2 1280…

Read More Read More

What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

To gain a basic understanding of the topic, I recommend reading the following posts. Analysis of Covariance (ANCOVA) I have a dataset as shown below, and I would like to analyze crop yield, and height based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates. Rep Fertilizer Yield Height Fertilizer Yield Height Fertilizer Yield Height 1 Control 12.2 45.0 Slow 16.6 63.0 Fast 9.5 52.0 2 Control 12.4 52.0…

Read More Read More

What is the F-ratio in statistics?

What is the F-ratio in statistics?

Today, I will explain the meaning of the F-value in testing for significance through statistical processing. Let me give you an example. Suppose we want to determine whether there are differences in the yield according to the varieties (A, B, C). The total experimental unit is 12 (3 varieties x 4 replicates). What would happen if there is a significant difference in yield among varieties A and C? If there is a large difference in yield between these varieties, the…

Read More Read More

Simple linear regression (5/5)- Coefficient of determination

Simple linear regression (5/5)- Coefficient of determination

Here is data for x and y. I would like to perform regression analysis to understand how y changes with x. n x y 1 10 30 2 20 40 3 30 50 4 40 80 5 50 90 6 60 100 7 70 120 I have data for x and y as described above, and want to determine the regression model for this data, where the dependent variable y changes according to the independent variable x, in the form…

Read More Read More

What is logistic regression (feat. odds, odds ratio and model equation)?

What is logistic regression (feat. odds, odds ratio and model equation)?

Logistic regression is a type of statistical analysis used to model the relationship between a binary (yes/no) dependent variable and independent variables. The goal of logistic regression is to find a relationship between the independent variables (x) and the probability of a particular outcome for the dependent variable (y). The logistic regression model calculates the probability of a certain outcome by applying a logistic function to the linear combination of the independent variables. Here is one example. Sulphur improves plant…

Read More Read More

What is split-split-plot design in agronomy research (feat. using R and SAS)?

What is split-split-plot design in agronomy research (feat. using R and SAS)?

In my previous post, I explained what split-plot design and the statistical model is, and also how it is different RCBD. What is split-plot design in agronomy research? I explained the main difference between split-plot design and RCBD is that in split-plot design, error is divided into two (error a and b), increasing the significance of interaction between the main plot and sub-plot. Now our interest lies in cases where we have three factors. In a split-plot design, we typically…

Read More Read More

What is split-plot design in agronomy research?

What is split-plot design in agronomy research?

Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one whole plot and one subplot. The whole plot factor is randomly assigned to the experimental units, while the subplot factor is applied to a smaller…

Read More Read More

An Introduction to Residual Analysis in Simple Linear Regression Models

An Introduction to Residual Analysis in Simple Linear Regression Models

Sample No. x y 1 10 30 2 20 40 3 30 50 4 40 80 5 50 90 6 60 100 7 70 120 Here is a dataset that allows us to analyze the relationship between x and y and obtain the model equation, y= β0 + β1x. Although statistical programs can provide us with results in just 10 seconds, it is more important to understand the principles behind the calculations than to simply know how to run the…

Read More Read More

What is odds, log odds and logit (feat. Slam Dunk story)?

What is odds, log odds and logit (feat. Slam Dunk story)?

Odds and logit is the basic concept to understand logistic regression. Today I’ll explain what it is as much as easily. Do you know a comic book, ‘Slam Dunk’? I’ll explain odds with this story. 1) Odds Now, Shohoku high school is playing games with other high schools in the tournament. In the first round, Shohoku high school won 4 games and lost 6 games out of 10 games. Now the winning odds of Shohoku high school is 4/6 ≈…

Read More Read More

How to analyze quadratic plateau model in R Studio?

How to analyze quadratic plateau model in R Studio?

Previous post□ How to analyze linear plateau model in R Studio? In my previous post, I explained how to analyze linear plateau model. I simulated yield data for five different crop varieties with different sulphur applications, and suggsted the optimum sulphur application would be 23.3 kg/ha based on the linear plateau model. In this time, I’ll explain how to analyze quadratic plateau model with the same data using R studio 1) Data upload If you run the below code, the…

Read More Read More

How to analyze linear plateau model in R Studio?

How to analyze linear plateau model in R Studio?

When we talk about regression, it’s usually about simple linear regression model. This is about the relationship between two variables. FYI□ Simple linear regression (1/5)- correlation and covariance□ Simple linear regression (2/5)- slope and intercept of linear regression model Linear plateau model is similar with simple linear model, but linear plateau model is a segmented model, and this statistical model is interested in the critical value (the x-value above which there is no further increase in y), indicating the plateau value (the statistically highest value…

Read More Read More

Simple linear regression (4/5)- t value on the slope and intercept    

Simple linear regression (4/5)- t value on the slope and intercept    

Simple Linear Regression Series 1) Simple linear regression (1/5)- correlation and covariance 2) Simple linear regression (2/5)- slope and intercept of linear regression model 3) Simple linear regression (3/5)- standard error of slope and intercept 4) Simple linear regression (4/5)- t value on the slope and intercept 5) Simple linear regression (5/5)- Coefficient of determination In my previous post, I explained how to calculate standard error of slope and intercept in simple linear regression model. Now, I’ll explain how to calculate t…

Read More Read More