- DATA step
- A DATA step creates a SAS dataset (a collection
of data together with a "data dictionary",
which defines the variables and their properties).
Data must be in the form of a SAS dataset before it
can be analyzed by SAS procedures.
In the example SAS program, these lines create the dataset CLASS from raw data input:

DATA CLASS; INPUT NAME $ SEX $ AGE HEIGHT WEIGHT; CARDS; JOHN M 12 59.0 99.5 JAMES M 12 57.3 83.0 ... (more data lines)

- PROC step
- A PROCedure step calls a SAS procedure to analyse
or process a SAS dataset.
In the example SAS program, these lines call two SAS procedures to analyze the CLASS dataset:

PROC PRINT; PROC MEANS; VARIABLES HEIGHT WEIGHT;

- All SAS statements start with a
**keyword** **All SAS statements end with a semicolon (;)**- SAS statements can be entered in
**free-format** **Uppercase and lowercase are equivalent, except inside quote marks**`sex = 'm';`is not the same as`sex = 'M';`).

PROC MEANS DATA=CLASS; VAR HEIGHT WEIGHT;The

Some other statements that can be used with most SAS procedure steps are:

- BY variable(s);
- Causes the procedure to be repeated
automatically for
**each** - ID variable(s);
- Give the name of a variable to be used as
an observation
**ID**entifier. - LABEL var='label';
- Assign a descriptive label to a variable.
- WHERE (expression);
- Select only those observations for which the expression is true.

PROC SORT DATA=CLASS; BY SEX; PROC MEANS DATA=CLASS; VAR HEIGHT WEIGHT; BY SEX; LABEL SEX='Gender';If the

- PROC CORR
- Correlations among a set of variables.
**PROC CORR**DATA=*SASdataset options***;***options:*NOMISS ALPHA VAR*variable(s)*; WITH*variable(s)*; - PROC FREQ
- Frequency tables,
*chi ²*tests**PROC FREQ**DATA=*SASdataset***;**TABLES*variable(s) / options*;*options:*NOCOL NOROW NOPERCENT OUTPUT OUT=*SASdataset*; - PROC MEANS
- Means, standard deviations, and a host of
other univariate statistics for a set of
variables.

Statistical options on the PROC MEANS statement determine which statistics are printed. The (optional) OUTPUT statement is used to create a SAS dataset containing the values of these statistics.**PROC MEANS**DATA=*SASdataset options***;***options:*N MEAN STD MIN MAX SUM VAR CSS USS VAR*variable(s)*; BY*variable(s)*; OUTPUT OUT=*SASdataset keyword=variablename ...*; - PROC UNIVARIATE
- Univariate statistics and displays for a
set of variables.
**PROC UNIVARIATE**DATA=*SASdataset options***;***options:*PLOT VAR*variable(s)*; BY*variable(s)*; OUTPUT OUT=*SASdataset keyword=variablename ...*;

- PROC ANOVA
- Analysis of variance (balanced designs)
**PROC ANOVA**DATA=*SASdataset options***;****CLASS**;**variable(s)****MODEL**=**dependent(s)**;**effect(s)** - PROC GLM
- General linear models, including ANOVA,
regression and analysis of covariance models.
**PROC GLM**DATA=*SASdataset options***;**CLASS*variable(s)*;**MODEL**=**dependent(s)**; OUTPUT OUT=**effect(s)***SASdataset keyword=variablename ...*; - PROC REG
- Regression analysis
**PROC REG**DATA=*SASdataset options***;****MODEL****dependent(s) = regressors***/ options*; PLOT*variable | keyword.***variable | keyword.*= symbol ; OUTPUT OUT=*SASdataset*P=name R=name ... ;

- PROC CHART
- Histograms and bar charts
**PROC CHART**DATA=*SASdataset options***;**VBAR*variable / options*; HBAR*variable / options*;*options:*MIDPOINTS= GROUP= SUMVAR= - PROC PLOT
- Scatter plots

Note that the parenthesized form in the PLOT statement plots**PROC PLOT**DATA=*SASdataset options***;***options:*HPERCENT= VPERCENT=**PLOT****yvariable**=**xvariable***symbol / options*; PLOT*(yvariables)***(xvariables)*=*symbol / options*;*PLOT options:*BOX OVERLAY VREF= HREF= BY*variable(s)*;**each**y-variable listed against each x-variable.

- PROC PRINT
- Print a SAS data set
**PROC PRINT**DATA=*SASdataset options***;***options:*UNIFORM LABEL SPLIT='char' VAR*variable(s)*; BY*variable(s)*; SUM*variable(s)*; - PROC SORT
- Sort a SAS data set according to one or
more variables.
**PROC SORT**DATA=*SASdataset options***;***options:*OUT=**BY**;**variable(s)**

- 1 - 8 characters in length
- begin with A-Z or _ (underscore)
- cannot contain blanks or special symbols (e.g., &, %, $, #, etc.)

- Character variables (e.g.,
`NAME='Michael';`) - Numeric variables
- Missing data: represented by
`'.'`for numeric variables; by`' '`(a space) for character variables.

Some of the (many) statements that can be used in the DATA step are:

- DATA
- The DATA statememt signals the start of a DATA
step and names the dataset(s) to be created.
DATA

*SASdataset(s)*; - INPUT
- The INPUT statement specifies how raw data is to
be read.
**List input**`$`sign following the name of any variable indicates that variable is to be read as characters.INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;

**Column input**INPUT NAME $1-8 SEX $11 AGE 13-14 HEIGHT 16-19 WEIGHT 22-25;

- SET
- The SET statement reads observations from an
existing SAS dataset. These statements simply make a
copy of the CLASS dataset.
data newclass; set class;

- Assignment
- The assignment statement creates
new variables or changes existing variables. All the
usual arithmetic operations, and many SAS functions
can be used.

Use parentheses to indicate grouping in complex expressions:**Symbol Operation Example**** Exponentiation Y = X **2; * Multiplication AREA = LEN * WIDTH ; / Division DENSITY = MASS / VOLUME; + Addition PRICE = COST + MARKUP; - Subtraction COST = PRICE - MARKUP;AVG = (TEST1 + 2*TEST2 + 5*FINAL) / 8 + BONUS;

- IF
- The IF statement is used for conditional
processing.
IF expression THEN statement; ELSE statement;

The ELSE statement is optional. The IF ... THEN parts comprise a single statment. For example,If age < 13 then group = 'preteen'; else group = 'teen'; If sex = 'F' then SX = 1; /* Dummy variable for sex */ else SX = 0; SX = (sex='F'); /* same as above (if no missing) */

SAS comparison operators are shown below. You can use either the symbol or the two-letter abbreviation.

A special form of the IF statement is used for**Symbol Abbrev**<, <= LT, LE less than, less than or equal >, >= GT, GE greater than, greater than or equal =, ^= EQ, NE equal, not equal**subsetting a dataset**. To extract the males from the CLASS dataset:DATA MALES; SET CLASS; IF SEX='M';

The statement`IF SEX='M';`is equivalent to each of the statements:IF SEX='M' THEN OUTPUT; IF SEX^='M' THEN DELETE;

- Comments
- Two types of comments: the comment statement
(
`* ... ;`) and comment stuff (`/* ... */`) The comment statement (like all SAS statements) must end with a semi-colon. Comment stuff can appear anywhere a single blank can appear. The comments are shown**bold**in the example below. Note that an entire statement is treated as a comment.data class;

*** Read in the variables;**input name $ sex $ age height weight;**/* ignore next statement age = age + 3; */**

- ABS(x)
- Absolute value,
*|x|*. - EXP(x)
- Exponential,
*e*; EXP(1) = 2.71828183....^{x} - INT(x)
- Truncate x to an integer; INT(3.145) = 3.
- LOG(x)
- Natural logarithm,
*log*; LOG(10) = 2.30258509...._{e}( x ) - LOG10(x)
- Common logarithm,
*log*; LOG10(10) = 1._{10}( x ) - MOD(x,d)
- Remainder when x is divided by d; MOD(10,3) = 1.
- ROUND(num)
- ROUND(num, unit)
- Round a number to the nearest integer (or nearest specified unit); ROUND(3.678) = 4; ROUND (3.678,.1) = 3.7.
- SQRT(x)
- Calculate the square root of x.
- NORMAL(seed)
- Return a normally distributed random number
- UNIFORM(seed)
- Return a uniform [0,1] random number.

- MEAN(v1,v2,...)
- MEAN(OF age ht)
- MEAN(OF V1-V6)
- Mean of the non-missing values of the variables
- MAX
- Maximum
- MIN
- Minimum
- STD
- Standard deviation
- VAR
- Variance
- USS
- Uncorrected sum of squares, S
*v*._{ i}^{2} - CSS
- Corrected sum of squares, S
*( v*._{ i}- v bar )^{2}

DATA CLASS; INPUT NAME $ SEX $ AGE HEIGHT WEIGHT; If age < 13 then group = 'preteen'; else group = 'teen'; logwt = log10(weight); /* transform variables */ rootht= sqrt(height); CARDS; JOHN M 12 59.0 99.5 JAMES M 12 57.3 83.0 ... (more data lines)