Efficient Data Management: Variable Filtering in SAS Studio
Today, I’ll introduce how to filter variables after uploading data to SAS Studio. First, let’s upload data to SAS Studio.
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/sorghum_grain.csv"
dataA= data.frame(read_csv(url(github), show_col_types=FALSE))
dataA
Location Crop Plot Block Treatment Treatment_modified Total_DW Grain_yield AGW GN
1 Location_A Sorghum 15 I Control Control 852.3 4303.6 28.1 31627
2 Location_A Sorghum 17 II Control Control 1295.7 6058.4 26.3 40157
3 Location_A Sorghum 19 III Control Control 1488.4 7438.7 20.8 45417
4 Location_A Sorghum 22 IV Control Control 925.6 5001.3 20.8 20770
5 Location_A Sorghum 1 I TR1 TR1_2 807.4 3285.2 30.7 14488
6 Location_A Sorghum 4 II TR1 TR1_2 907.4 3896.1 28.5 20323
7 Location_A Sorghum 9 III TR1 TR1_2 778.6 3207.3 28.8 15636
8 Location_A Sorghum 10 IV TR1 TR1_2 895.2 4418.2 28.6 20807
9 Location_A Sorghum 1 I TR2 TR1_2 748.9 3130.9 49.1 11556
10 Location_A Sorghum 4 II TR2 TR1_2 1032.4 4809.1 24.8 16821
11 Location_A Sorghum 9 III TR2 TR1_2 1388.2 6409.9 26.7 32546
12 Location_A Sorghum 10 IV TR2 TR1_2 1215.1 5183.3 29.3 20720
13 Location_A Sorghum 1 I TR3 TR3 668.7 3008.1 25.5 10390
14 Location_A Sorghum 4 II TR3 TR3 1564.1 7160.5 23.9 42890
15 Location_A Sorghum 9 III TR3 TR3 1283.4 6284.1 23.9 23713
16 Location_A Sorghum 10 IV TR3 TR3 1612.0 7445.1 20.9 34173
17 Location_A Sorghum 1 I TR4 TR4_5 1010.5 4340.3 29.2 23492
18 Location_A Sorghum 4 II TR4 TR4_5 757.4 2865.3 29.2 17564
19 Location_A Sorghum 9 III TR4 TR4_5 1087.1 5106.0 25.4 30184
20 Location_A Sorghum 10 IV TR4 TR4_5 1167.7 5383.6 27.7 23543
21 Location_A Sorghum 1 I TR5 TR4_5 783.3 2765.0 26.5 22890
22 Location_A Sorghum 4 II TR5 TR4_5 1002.5 3965.2 28.8 20570
23 Location_A Sorghum 9 III TR5 TR4_5 733.4 3445.1 29.5 15980
24 Location_A Sorghum 10 IV TR5 TR4_5 1067.9 5000.5 27.7 28911
I’ll summarize data based on Treatment_modified
.
library(dplyr)
summary=data.frame(dataA %>%
group_by(Location,Block, Treatment_modified) %>%
dplyr::summarize(across(c(Grain_yield,GN,AGW),
.fns= list(Mean=~mean(., na.rm=TRUE)))))
summary
Location Block Treatment_modified Grain_yield_Mean GN_Mean AGW_Mean
1 Location_A I Control 4303.60 31627.0 28.10
2 Location_A I TR1_2 3208.05 13022.0 39.90
3 Location_A I TR3 3008.10 10390.0 25.50
4 Location_A I TR4_5 3552.65 23191.0 27.85
5 Location_A II Control 6058.40 40157.0 26.30
6 Location_A II TR1_2 4352.60 18572.0 26.65
7 Location_A II TR3 7160.50 42890.0 23.90
8 Location_A II TR4_5 3415.25 19067.0 29.00
9 Location_A III Control 7438.70 45417.0 20.80
10 Location_A III TR1_2 4808.60 24091.0 27.75
11 Location_A III TR3 6284.10 23713.0 23.90
12 Location_A III TR4_5 4275.55 23082.0 27.45
13 Location_A IV Control 5001.30 20770.0 20.80
14 Location_A IV TR1_2 4800.75 20763.5 28.95
15 Location_A IV TR3 7445.10 34173.0 20.90
16 Location_A IV TR4_5 5192.05 26227.0 27.70
I want to summarize for each unique ID. So, I’ll add numbers from 1 up to the end.
dataA= dataA %>%
mutate(No=1:nrow(dataA))
dataA
Location Crop Plot Block Treatment Treatment_modified Total_DW Grain_yield AGW GN No
1 Location_A Sorghum 15 I Control Control 852.3 4303.6 28.1 31627 1
2 Location_A Sorghum 17 II Control Control 1295.7 6058.4 26.3 40157 2
3 Location_A Sorghum 19 III Control Control 1488.4 7438.7 20.8 45417 3
4 Location_A Sorghum 22 IV Control Control 925.6 5001.3 20.8 20770 4
5 Location_A Sorghum 1 I TR1 TR1_2 807.4 3285.2 30.7 14488 5
6 Location_A Sorghum 4 II TR1 TR1_2 907.4 3896.1 28.5 20323 6
7 Location_A Sorghum 9 III TR1 TR1_2 778.6 3207.3 28.8 15636 7
8 Location_A Sorghum 10 IV TR1 TR1_2 895.2 4418.2 28.6 20807 8
9 Location_A Sorghum 1 I TR2 TR1_2 748.9 3130.9 49.1 11556 9
10 Location_A Sorghum 4 II TR2 TR1_2 1032.4 4809.1 24.8 16821 10
11 Location_A Sorghum 9 III TR2 TR1_2 1388.2 6409.9 26.7 32546 11
12 Location_A Sorghum 10 IV TR2 TR1_2 1215.1 5183.3 29.3 20720 12
13 Location_A Sorghum 1 I TR3 TR3 668.7 3008.1 25.5 10390 13
14 Location_A Sorghum 4 II TR3 TR3 1564.1 7160.5 23.9 42890 14
15 Location_A Sorghum 9 III TR3 TR3 1283.4 6284.1 23.9 23713 15
16 Location_A Sorghum 10 IV TR3 TR3 1612.0 7445.1 20.9 34173 16
17 Location_A Sorghum 1 I TR4 TR4_5 1010.5 4340.3 29.2 23492 17
18 Location_A Sorghum 4 II TR4 TR4_5 757.4 2865.3 29.2 17564 18
19 Location_A Sorghum 9 III TR4 TR4_5 1087.1 5106.0 25.4 30184 19
20 Location_A Sorghum 10 IV TR4 TR4_5 1167.7 5383.6 27.7 23543 20
21 Location_A Sorghum 1 I TR5 TR4_5 783.3 2765.0 26.5 22890 21
22 Location_A Sorghum 4 II TR5 TR4_5 1002.5 3965.2 28.8 20570 22
23 Location_A Sorghum 9 III TR5 TR4_5 733.4 3445.1 29.5 15980 23
24 Location_A Sorghum 10 IV TR5 TR4_5 1067.9 5000.5 27.7 28911 24
and I’ll summarize data again.
summary=data.frame(dataA %>%
group_by(No, Location,Block, Treatment_modified) %>%
dplyr::summarize(across(c(Grain_yield,GN,AGW),
.fns= list(Mean=~mean(., na.rm=TRUE)))))
summary
No Location Block Treatment_modified Grain_yield_Mean GN_Mean AGW_Mean
1 1 Location_A I Control 4303.6 31627 28.1
2 2 Location_A II Control 6058.4 40157 26.3
3 3 Location_A III Control 7438.7 45417 20.8
4 4 Location_A IV Control 5001.3 20770 20.8
5 5 Location_A I TR1_2 3285.2 14488 30.7
6 6 Location_A II TR1_2 3896.1 20323 28.5
7 7 Location_A III TR1_2 3207.3 15636 28.8
8 8 Location_A IV TR1_2 4418.2 20807 28.6
9 9 Location_A I TR1_2 3130.9 11556 49.1
10 10 Location_A II TR1_2 4809.1 16821 24.8
11 11 Location_A III TR1_2 6409.9 32546 26.7
12 12 Location_A IV TR1_2 5183.3 20720 29.3
13 13 Location_A I TR3 3008.1 10390 25.5
14 14 Location_A II TR3 7160.5 42890 23.9
15 15 Location_A III TR3 6284.1 23713 23.9
16 16 Location_A IV TR3 7445.1 34173 20.9
17 17 Location_A I TR4_5 4340.3 23492 29.2
18 18 Location_A II TR4_5 2865.3 17564 29.2
19 19 Location_A III TR4_5 5106.0 30184 25.4
20 20 Location_A IV TR4_5 5383.6 23543 27.7
21 21 Location_A I TR4_5 2765.0 22890 26.5
22 22 Location_A II TR4_5 3965.2 20570 28.8
23 23 Location_A III TR4_5 3445.1 15980 29.5
24 24 Location_A IV TR4_5 5000.5 28911 27.7
Then, I’ll download this data to my PC.
library(writexl)
write_xlsx (summary,"C:/Users/Desktop/sorghum.xlsx")
1) to upload data to SAS
First, let’s upload the data to SAS. I’ll assign this data to the Test
table that I created.
If you click Run
, you can see the uploaded data in the Output Data
. Then, click the Code
tab.
2) Coding
You can see the code that allows you to display the data. Click Edit
.
Then, you can add more codes. I’ll add the following codes:
proc sql;
create table Work.Control as
select *
from Work.Test
where Treatment_modified="Control";
quit;
PROC CONTENTS DATA=WORK.Control;
RUN;
proc sql;
create table Work.TR3 as
select *
from Work.Test
where Treatment_modified="TR3";
quit;
PROC CONTENTS DATA=WORK.TR3;
RUN;
and run the code. You can find the newly created tables in the Libraries. In this code, I created two tables named Control and TR3, filtering specific variables.
/* Generated Code (IMPORT) */
/* Source File: sorghum.xlsx */
/* Source Path: /Users/***@illinois.edu/My Folder/Practice/sorghum.xlsx */
/* Code generated on: Feb 20, 2024, 7:27:03 PM */
proc sql;
%if %sysfunc(exist(WORK.Test)) %then %do;
drop table WORK.Test;
%end;
%if %sysfunc(exist(WORK.Test,VIEW)) %then %do;
drop view WORK.Test;
%end;
quit;
FILENAME REFFILE FILESRVC FOLDERPATH='/Users/***@illinois.edu/My Folder/Practice' FILENAME='sorghum.xlsx';
PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=WORK.Test;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.Test; RUN;
proc sql;
create table Work.Control as
select *
from Work.Test
where Treatment_modified="Control";
quit;
PROC CONTENTS DATA=WORK.Control;
RUN;
proc sql;
create table Work.TR3 as
select *
from Work.Test
where Treatment_modified="TR3";
quit;
PROC CONTENTS DATA=WORK.TR3;
RUN;