How to summarize data using Python?

February 25, 2024 JK

In my previous post, I demonstrated how to create a data table using Python. If you’re interested, please refer to the post below.

■ How to create a data table in Python?

import pandas as pd

genotypes = ["Genotype_A", "Genotype_B", "Genotype_C", "Genotype_D"] * 16
blocks = ["I", "II", "III", "IV"] * 16
treatment = ["Control", "Fertilizer1", "Fertilizer2", "Fertilizer3"] * 16
grain_yield = [
        42.9, 41.6, 28.9, 30.8, 53.3, 69.6, 45.4, 35.1, 62.3, 58.5, 44.6, 
        50.3, 75.4, 65.6, 54, 52.7, 53.8, 58.5, 43.9, 46.3, 57.6, 69.6, 42.4, 
        51.9, 63.4, 50.4, 45, 46.7, 70.3, 67.3, 57.6, 58.5, 49.5, 53.8, 40.7, 
        39.4, 59.8, 65.8, 41.4, 45.4, 64.5, 46.1, 62.6, 50.3, 68.8, 65.3, 45.6, 
        51, 44.4, 41.8, 28.3, 34.7, 64.1, 57.4, 44.1, 51.6, 63.6, 56.1, 52.7, 
        51.8, 71.6, 69.4, 56.6, 47.4
    ]

df = pd.DataFrame({
    "genotype": genotypes,
    "block": blocks,
    "treatment": variables,
    "grain_yield": values
})

df
        genotype	block	treatment	grain_yield
0	Genotype_A	I	Control   	42.9
1	Genotype_B	II	Fertilizer1	41.6
2	Genotype_C	III	Fertilizer2	28.9
3	Genotype_D	IV	Fertilizer3	30.8
4	Genotype_A	I	Control	53.3
...	...	...	...	...
59	Genotype_D	IV	Fertilizer3	51.8
60	Genotype_A	I	Control	71.6
61	Genotype_B	II	Fertilizer1	69.4
62	Genotype_C	III	Fertilizer2	56.6
63	Genotype_D	IV	Fertilizer3	47.4
64 rows × 4 columns

I’ll summarize this data by mean and standard error.

import numpy as np

summary_stats = df.groupby(['genotype', 'treatment']).agg(
    mean_value=('grain_yield', np.mean),
    std_error=('grain_yield', lambda x: np.std(x, ddof=1) / np.sqrt(len(x)))
).reset_index()

summary_stats

        genotype	treatment	mean_value	std_error
0	Genotype_A	Control	        60.33125	2.376547
1	Genotype_B	Fertilizer1	58.55000	2.419969
2	Genotype_C	Fertilizer2	45.86250	2.336733
3	Genotype_D	Fertilizer3	46.49375	1.918147

full code: https://github.com/agronomy4future/python_code/blob/main/How_to_summarize_data_using_Python.ipynb

Agronomy4future

Stories about cereals and statistics (plus coding). We aim to develop open-source code for agronomy.

How to summarize data using Python?

February 25, 2024 JK

■ How to create a data table in Python?