[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while preserving as much variability in the data as possible. It transforms the original variables in a dataset into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the original dataset. Here’s the step of Principal Component Analysis (PCA). 1. Standardize the Data: Since PCA is affected by the scale of the variables, it often begins with standardizing the…

Read More Read More

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Mean Absolute Error (MAE) is a metric used to measure the accuracy of a model’s predictions. It calculates the average magnitude of the errors in a set of predictions, without considering their direction. In other words, MAE measures the average absolute difference between the actual values and the predicted values. MAE is typically used in the context of regression analysis and prediction error evaluation, rather than in ANOVA (Analysis of Variance), which focuses on comparing the means of different groups….

Read More Read More

Practices in Data Normalization using normtools() in R

Practices in Data Normalization using normtools() in R

■ [R package] Normalization Methods for Data Scaling (Feat. normtools) In my previous post, I introduced the R package normtools(), which I developed to normalize data using various methods. This time, I’ll demonstrate how to use the R package normtools() for data normalization. 1. Data upload This data includes kernel number (KN), average kernel weight (AGW), and grain yield (GY) for different corn varieties across various years, populations, and locations. 2. Data normalization This is the normtools() package. First, I’ll…

Read More Read More

Sorghum panicle damage

Sorghum panicle damage

The damage to sorghum grain can result from a variety of causes, including environmental, biological, and mechanical factors. Here are some common causes: 1. Excessive Rainfall and Humidity 2. Pest Infestation 3. Temperature Stress 4. Mechanical Damage During Harvest 5. Soil Conditions 6. Delayed Harvest 7. Post-Harvest Factors To minimize sorghum grain damage, it is crucial to manage environmental conditions, ensure proper timing of harvest, and implement effective pest control and storage techniques. ■ References □ Rain devastates Downs sorghum…

Read More Read More

How to install Llama 3 in your PC?

How to install Llama 3 in your PC?

Llama 3, or Large Language Model Meta AI 3, is an advanced iteration of Meta’s language models, designed to facilitate a wide array of natural language processing tasks with enhanced capabilities. This model leverages state-of-the-art techniques in deep learning and transformer architectures, providing improved performance in text generation, comprehension, and contextual awareness. We can install Llama 3 in your PC. 1. Visit ollama.com and click the Download button. Select your OS and download. https://ollama.com After downloading, run the OllamaSetup file….

Read More Read More

[R package] Normalization Methods for Data Scaling (Feat. normtools)

[R package] Normalization Methods for Data Scaling (Feat. normtools)

■ [Data article] Data Normalization Techniques: Excel and R as the Initial Steps in Machine Learning In my previous post, I explained how to normalize data using various methods and demonstrated how to perform the calculations for each method. To simplify these calculations, I recently developed an R package that easily generates normalized data. 1. Install the normtools() package 2. Basic code format 3. Practice with actual dataset (data upload) 4. Normalize data 4.1. Z-test normalization 4.2. Robust Scaling 4.3….

Read More Read More

[코딩 교육 플랫폼 추천] 코드트리 (Code Tree)

[코딩 교육 플랫폼 추천] 코드트리 (Code Tree)

비 CS 전공자로서 코딩 공부는 늘 한계를 느끼곤 합니다. 혼자 코딩을 독학하며 현업에 필요한 프로그래밍 코드를 사용하고 있지만 가끔 “이 코드는 왜 이렇게 작동되는 것일까?” 에 대한 궁금증은 늘 가지고 있습니다. 그래서 여러 교육 플랫폼에서 온라인 강의를 들어봐도 대부분의 시각은 전공자에게 맞춰져 있기 때문에 저 같은 비 전공자가 따라 가기에는 종종 한계를 느끼곤 합니다. 최근 저 같은 비 전공자에게 아주 유익한 코딩 교육 플랫폼을 찾았습니다. 이름은 코드트리 (Code Tree) 입니다. 오늘은 이 코딩 교육 플랫폼에 대해 소개해 볼까 합니다. 참고로…

Read More Read More

베이즈 (Bayes) 정리를 가장 쉽게 설명해 보자

베이즈 (Bayes) 정리를 가장 쉽게 설명해 보자

최근에는 베이지안 통계에 집중하고 있습니다. 그래서 개념 정리도 할 겸 베이즈 정리 (Bayes’ theorem)에 대해 최대한 쉽게 한번 설명해 보겠습니다. Kaggle 에서 데이터를 하나 가져오겠습니다. https://www.kaggle.com/datasets/cameronseamons/electronic-sales-sep2023-sep2024 이 데이터는 전자 마트에서 고객의 소비 성향을 분석한 데이터 입니다. Kaggle 에서 회원가입 후 데이터를 다운로드 받을 수 있습니다. 저는 파이썬 코드로 바로 데이터를 가져오겠습니다. 참고로 저는 구글 코랩 (Google Colab) 을 사용합니다. 복잡해 보이지만 어려울게 하나도 없습니다. 아래 코드를 본인의 구글 코랩에 복사/붙여넣기 하고 본인 구글 코랩의 파일 경로만 수정하면 됩니다. 자 이렇게…

Read More Read More

R GIS: Interpolating and Plotting Corn Grain Yield Data

R GIS: Interpolating and Plotting Corn Grain Yield Data

■ Python GIS: Interpolating and Plotting Corn Grain Yield Data In my previous post, I explained how to create a GIS map using Python. Today, I’ll introduce how to create the same GIS map using R. First, let’s install all the required packages. and I’ll upload a dataset for practice. Next, I’ll extract columns for latitude, longitude, and y (output) and I’ll interpolate data Finally, I’ll create a GIS map using ggplot(). Full code If you copy and paste the…

Read More Read More

Graphing Normal Distributions with Varied Variances

Graphing Normal Distributions with Varied Variances

I want to create a normal distribution graph with a specific variance. First, it’s necessary to create the data. I’ll generate data with a mean of 100 and a variance of 100 (which means the standard deviation is 10). However, it’s important to establish a range. To do this, I’ll set up a range of 6σ, and the dataset will contain 1,000 rows. and I’ll create a normal distribution graph. These are graphs with different variances, ranging from 1σ to…

Read More Read More

[R package] Calculation for Growing Degree Days (GDDs, ºCd)

[R package] Calculation for Growing Degree Days (GDDs, ºCd)

Growing Degree Days (GDDs) are a measure of heat accumulation used to predict crop development rates such as the growth of crops. The GDDs are calculated to provide a simple model to estimate the growth and development of plants, especially crops, based on the daily temperature. To calculate GDDs, the base temperature for each crop should first be identified. The base temperature is the temperature below which crop growth is minimal or stops. This temperature varies by crop. For example,…

Read More Read More

[R package] Prediction of Grain Weight and Area in Bread Wheat (feat. kimindex)

[R package] Prediction of Grain Weight and Area in Bread Wheat (feat. kimindex)

These days, image analysis equipment can easily provide grain area measurements (mm²), and the large datasets acquired instantly from this equipment offer more insights into wheat grains. While grain weight can be a good indicator of wheat yield, obtaining data on grain weight is challenging with the available equipment. Currently, average grain weight is calculated using thousand kernel weight (TWK), a process that is time-consuming and labor-intensive. Therefore, predicting wheat grain weight from the grain area would allow us to…

Read More Read More

[R package] Probability Distribution and Z-Score Calculation Function (feat. probdistz)

[R package] Probability Distribution and Z-Score Calculation Function (feat. probdistz)

■ Introduction ■ What is Probability Density Function (PDF) and Cumulative Distribution Function (CDF): How to calculate using Excel and R? In my previous post, I explained what the Probability Density Function (PDF) and the Cumulative Distribution Function (CDF) are. I also explained the formula for the PDF and demonstrated how to manually calculate it in Excel. Additionally, I mentioned the Excel function that performs the same calculation for the PDF, as follows: I then introduced how to create a probability distribution…

Read More Read More

[R package] Finlay-Wilkinson Regression model (feat. fwrmodel)

[R package] Finlay-Wilkinson Regression model (feat. fwrmodel)

■ What is Finlay-Wilkinson Regression Model? In my previous post, I introduced what Finlay-Wilkinson Regression Model is and how to calculate adaptability (or stability). Actually, adaptability and stability are opposite concept with the same data. Have you ever heard heritability (h2)? Heritability is a key concept in genetics and breeding that measures how much of the variation in a trait within a population is due to genetic differences among individuals. In other words, it quantifies the proportion of phenotypic variation…

Read More Read More

Python GIS: Interpolating and Plotting Corn Grain Yield Data

Python GIS: Interpolating and Plotting Corn Grain Yield Data

I have corn yield data (Mg/ha) that I want to visualize. First, let’s upload the data. First, I’ll create yield distribution data. Now, we can see that the general grain yield varies from 10 to 30 Mg/ha, with some outliers. I’ll create a yield map to visualize this variation. https://github.com/agronomy4future/python_code/blob/main/Python_GIS_Interpolating_and_Plotting_Corn_Grain_Yield_Data.ipynb If you have the ArcGIS program, you can create a more detailed GIS map.

Machine Learning: Predicting Values with Multiple Models- Part II

Machine Learning: Predicting Values with Multiple Models- Part II

In my previous post, I predicted grain weight from length and width of grains using Random Forest. ■ Machine Learning: Predicting Values with Multiple Models- Part I Now, my next question is how the model accuracy changes when grain area and genotype are added. If you followed my previous post closely, you should be able to understand the code below. ■ Data upload ■ Data Splitting Unlike the previous data, I have now added genotype and grain area to the model. ■ Machine…

Read More Read More

Machine Learning: Predicting Values with Multiple Models- Part I

Machine Learning: Predicting Values with Multiple Models- Part I

Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn from and make predictions or decisions based on data. Rather than being explicitly programmed to perform a specific task, ML algorithms use data to identify patterns and make inferences or predictions. Machine Learning can be divided into supervised and unsupervised learning. In supervised learning, the model is trained on labeled data, which means the input data is paired with the correct output, and it can…

Read More Read More

Machine Learning: How to Perform Classification with Different Models?

Machine Learning: How to Perform Classification with Different Models?

Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn from and make predictions or decisions based on data. Rather than being explicitly programmed to perform a specific task, ML algorithms use data to identify patterns and make inferences or predictions. What is Classification in Machine Learning? Classification is a type of supervised learning where the goal is to categorize data into predefined classes. For example, classifying emails as “spam” or “not spam.” Different models…

Read More Read More

Sorghum grain weight in response to assimilate availability

Sorghum grain weight in response to assimilate availability

Sorghum panicle de-graining is an experimental technique used to study grain size and assimilate availability. De-graining is used to artificially increase assimilate availability to the remaining grains by removing a portion of the panicle. Typically, researchers remove the top half of selected panicles at anthesis (flowering stage). This is done before significant grain development has occurred. Removing part of the panicle generally results in an increase in grain weight for the remaining grains. This technique helps isolate genetic effects associated with…

Read More Read More

Is no-tillage farming beneficial to farmers?

Is no-tillage farming beneficial to farmers?

This year, in the no-tillage field, the farmer has sprayed herbicide three times since planting. These days, the hot trend is carbon farming, and all approaches aim to mitigate GHG emissions, suggesting that no-tillage is beneficial for carbon sequestration. Although the academic trend consistently indicates that carbon farming is important, in reality, farmers have been using more herbicides to suppress weeds. Is carbon farming good for farmers and real farming systems? No-tillage might mitigate GHG emissions, but more herbicides are…

Read More Read More

Soybean Growing Stage at R1 @ Illinois, Champaign (26 June 2024)

Soybean Growing Stage at R1 @ Illinois, Champaign (26 June 2024)

Soybean growth is divided into two main phases: Vegetative (V) stages and Reproductive (R) stages. The vegetative stages are characterized by leaf and node development, while the reproductive stages begin with flowering and include pod development, seed development, and plant maturation. The reproductive stage R1, also known as the beginning bloom stage, is defined as having one open flower at any node on the main stem. This marks the start of the reproductive phase, even if the plant continues to produce new…

Read More Read More

Urea Application in Sorghum Field @ Champaign, Illinois (25 June 2024)

Urea Application in Sorghum Field @ Champaign, Illinois (25 June 2024)

■ Unit conversion □ Area Acre ha m2 ft2 Acre 1 0.404686 4,046.86 43,560 ha 2.47105 1 10,000 107,639 m2 0.000247105 0.0001 1 10.7639 ft2 0.000022956 0.00000929 0.092903 1 □ Weight lbs kg lbs 1 0.453592 kg 2.20462 1 bushel [corn] 56 56 * 0.453592 = 25.4 bushel [wheat] 60 60 * 0.453592 = 27.2 bushel [soybean] 60 60 * 0.453592 = 27.2 My target nitrogen application rate is 100 N lbs / acre in sorghum, and my plot size is…

Read More Read More

[Paper review] Weight of individual wheat grains estimated from high-throughput digital images of grain area

[Paper review] Weight of individual wheat grains estimated from high-throughput digital images of grain area

Kim, J., Savin, R. and Slafer, G.A., 2021. Weight of individual wheat grains estimated from high-throughput digital images of grain area. European Journal of Agronomy, 124, p.126237. https://www.sciencedirect.com/science/article/pii/S1161030121000095 ■ Context and Objective This study focuses on estimating the weight of individual wheat grains using high-throughput digital images of grain area. Given the importance of average grain weight (AGW) as a key component of wheat yield, the researchers aimed to develop a reliable model to convert grain dimensions from 2D images…

Read More Read More

Predicting Intermediate Data Points with Linear Interpolation in Excel and R

Predicting Intermediate Data Points with Linear Interpolation in Excel and R

Today, I’ll explain the interpolation technique used to predict in-between data points. For example, when collecting field data, we might not be able to gather information every day, so we establish our own interval (e.g., weekly or bi-weekly). However, when presenting the data, it might be necessary to show it on a daily basis. As another example, consider investigating yield differences in response to varying continuous variables, such as nitrogen at levels of 0, 30, 60, 120. What if we…

Read More Read More

Long and Short-Day Plants: The Significance of Photoperiodicity

Long and Short-Day Plants: The Significance of Photoperiodicity

Photoperiodicity refers to the response of plants to the relative lengths of day and night, which regulates their growth and flowering. It is an important factor in agriculture as it allows growers to optimize crop yields and quality by understanding and manipulating the photoperiod requirements of different plant species. In terms of photoperiodicity, plants can be divided into long-day and short-day plants. ■ Long-day Plant For example, wheat is a long-day plant, meaning it requires extended periods of daylight (typically…

Read More Read More

Understanding Autotrophic and Heterotrophic Respiration in Crop Science: Importance and Impact

Understanding Autotrophic and Heterotrophic Respiration in Crop Science: Importance and Impact

Understanding the difference between autotrophic and heterotrophic respiration is crucial in crop science. These processes play a vital role in the carbon cycle and have significant implications for carbon emissions, climate change, and sustainable agriculture. 1. Autotrophic Respiration Autotrophic respiration is the process by which plants convert the carbohydrates produced during photosynthesis into energy. This energy is used for growth, maintenance, and reproduction. There are three main types of autotrophic respiration: During autotrophic respiration, plants release carbon dioxide back into…

Read More Read More

How to identify soybean vegetative growth stages?

How to identify soybean vegetative growth stages?

Identifying the vegetative growth stages of soybean is crucial for effective crop management. These stages are marked by the development of trifoliolate leaves, which are a key indicator starting from the V1 stage. Here’s a brief guide on how to recognize these stages: ■ Emergence (VE) The first stage is emergence (VE), where the soybean seedling breaks through the soil surface. At this stage, the cotyledons, or seed leaves, are visible. They supply nutrients to the young plant before the…

Read More Read More

Machine Learning: Modeling with Random Forest Using Python

Machine Learning: Modeling with Random Forest Using Python

In my previous post, I introduced stepwise regression to select the best model. I suggested that grain yield = -4616.47 + 10.53 * stem biomass + 41.03 * height, indicating that stem biomass and height are the most important variables affecting grain yield. ■ Stepwise Regression: A Practical Approach for Model Selection using R Now, I’ll find the best model using machine learning. This is a small dataset, which might not be suitable for machine learning, but it serves as…

Read More Read More

In R Studio, how to exclude missing value (NA)?

In R Studio, how to exclude missing value (NA)?

I’ll create one data. In genotype D, yield data was missed, so it was indicated as NA. Now I’ll calculate the mean of total yield across all genotypes. As you see above, we can’t calculate the mean dud to NA. To obtain the mean of total yield, we should exclude NA. Using subset(), we can simply exclude Genotype D, But, a much simpler way is to use the code na.rm=TRUE, which enables you to avoid using subset(). When the data…

Read More Read More