This is an outline developed for a 'short course' on SAS/IML developed by Walter Davis for the Institute for Research in Social Science at the University of North Carolina - Chapel Hill. The outline itself is based in part on one put together by Tim Dorcey while he was at the Purdue University Computation Center. I have modified it further. Please direct all comments, suggestions and other correspondence on this version to Michael Friendly, <friendly@yorku.ca>.

- Introduction
- Using SAS/IML
- Defining and indexing matrices
- Reading and Creating SAS datasets in IML
- Introduction to IML programming
- Storage of IML modules and matrices

- IML is a matrix language - similar to Gauss, APL, and MATLAB
- built-in operators and functions for most standard matrix operations
- you can define your own modules (subroutines) and functions.
- But IML has only 2-dimensional matrices, not multi-dimensional ones.

- expectations of users
- assume have basic SAS knowledge
- will not cover matrix algebra, so assume your expertise

- when to use IML
- for programming statistical procedures that SAS does not have (including iterative procedures)
- for doing matrix operations
- for doing operations on rows
**and**columns of a data table. - IML allows you to construct (novel) graphics which could not be created with SAS/GRAPH.

- when not to use IML
- in general, if 'regular' SAS can do it, don't use IML.
- IML can be used for data management and graphics, but regular SAS data step and SAS/GRAPH are often more high-level, and therefore easier to use for many graphs.
- SAS macro facility can provide some of the programming capabilities of IML.

proc iml; IML ready > reset log print; > x = 12.3; X 1 row 1 col (numeric) 12.3 > quit; Exiting IML

- IML supports both character and numeric matrices, vectors and scalars. Numeric and character information cannot be combined in one matrix, however, character vectors can be used for row and column labels of a matrix.
- scalars
- Numeric values can be used as is, or surrounded by
braces or squiggly brackets,
`{ }`. Character values must be within single or double quotes. The following are valid:> x = 12.3; X 1 row 1 col (numeric) 12.3 > y = {57}; Y 1 row 1 col (numeric) 57 > name = 'Bob'; NAME 1 row 1 col (character, size 3) Bob

- if you use
`name=Bob;`, rather than`name='Bob';`IML will look for a matrix called bob.

- Numeric values can be used as is, or surrounded by
braces or squiggly brackets,
- matrices (including vectors). To define a matrix, squiggly
brackets {} must be used. (You can also use (| and |) as
matrix delimiters. Commas are used to separate rows. Some
examples:
> a = { 2 4, 3 1}; A 2 rows 2 cols (numeric) 2 4 3 1 > b = { 4 5, 0 1}; B 2 rows 2 cols (numeric) 4 5 0 1 aa={'a' 'b' 'c', 'd' 'e' 'f'}; /* a 2 x 3 char matrix */ b = { 1 2 3 4 5 }; /* row vector */ c = { 1, 2, 3, 4, 5}; /* column vector */

- index operator (:) - creates a row vector of consecutive
integers or character strings. Use the transpose operation
(
`t()`or the backquote character,```) to get a column vector.> index = 1:5; INDEX 1 row 5 cols (numeric) 1 2 3 4 5 > col = 4:6`; COL 3 rows 1 col (numeric) 4 5 6 > rindex= 5:1; RINDEX 1 row 5 cols (numeric) 5 4 3 2 1 > vars = 'XX1': 'XX7'; VARS 1 row 7 cols (character, size 3) XX1 XX2 XX3 XX4 XX5 XX6 XX7

- The
`do()`function creates an arithmetic series with any increment.> series= do(12,72,12); SERIES 1 row 6 cols (numeric) 12 24 36 48 60 72

- identity matrix:
`I(size)`a=I(6); /* a 6x6 identity matrix */

- constant matrix:
`J(nrow,ncol,`) a=j(5,5,0); /* a 5x5 matrix of zeroes */ b=j(6,1); /* a 6x1 column vector of 1's */

- diagonal matrices:
`DIAG(vector)`or`DIAG(matrix)`and`VECDIAG(matrix)``DIAG(vector)`- creates a square matrix with the elements of the vector along the main diagonal and zeroes elsewhere.> d = diag( {1 2 4} ); D 3 rows 3 cols (numeric) 1 0 0 0 2 0 0 0 4

`DIAG(matrix)`- creates a square matrix which retains the main diagonal of the argument matrix, but places zeroes elsewhere.> d = diag( {1 2, 3 4} ); D 2 rows 2 cols (numeric) 1 0 0 4

`VECDIAG (matrix)`- returns a column vector which equals the main diagonal of the argument matrix.

- Transpose:
`t(mat)`or`mat``returns a matrix with rows and columns interchanged.

- Elementwise sum (+) and difference (-):
a = { 2 4, 3 1}; b = { 4 5, 0 1}; sum = a + b; SUM 2 rows 2 cols (numeric) 6 9 3 2 diff = a - b; DIFF 2 rows 2 cols (numeric) -2 -1 3 0

- Elementwise product (
`#`) and matrix product (`*`)times = a # b; TIMES 2 rows 2 cols (numeric) 8 20 0 1 prod = a * b; PROD 2 rows 2 cols (numeric) 8 14 12 16

- Both scalars and vectors may be used as arguments for indexing.
Some examples:
a[1,2]=0; /* changes element [1,2] to zero */ a[1,]=0; /* changes first row values to 0 */ a[1,1:3]=0; /* 1:3 creates a vector of values from 1 to 3. So this changes values of the first row, in columns 1-3 to zero */ ind={2 3}; b=a[ind,ind]; /* set b equal to rows 2,3 and cols 2,3 of a */

- element operations may also be used as the index of a matrix to
perform reduction operations:
b=a[+,]; /* b is the column sums of a */ b=a[##,]; /* b is the column sums of squares of a */ b=a[,:]; /* when used alone as an index, the : operator gives the mean. So b will be a vector containing the mean of each row */

**Note:**IML functions will often perform the same task more quickly and efficiently than using index operators as above. For example:b=a[+,+]; c=sum(a); /* b and c will have the same value, but the sum function will be more efficient.*/

- the LOC function is often very useful for subsetting vectors
and matrices. This function is used for locating elements
which meet a given condition. The positions of the elements
are returned in row-major order. For vectors, this is
simply the position of the element. For matrices, some
manipulation is often required in order to use the result of
the LOC function as an index. The syntax of the function
is:
matrix2=LOC(matrix1=value);

For example:a={1 . 1 1, 2 2 2 2, 3 3 3 3, 4 4 4 4}; notmiss=loc( a[,2] ^= .); /* notmiss will equal the location (rows) in which which the second column of a is not missing */ newa=a[notmiss,]; /* newa contains rows of a with no missing elements elements in the second column. So, newa = {2 2 2 2, 3 3 3 3, 4 4 4 4};

Even more efficient (but more confusing to follow) is to bypass the intermediate step of creating the vector notmiss:newa = a[ loc(a[,2]^=.), ];

- element operations - for operations which work on elements
(e.g. addition, element multiplication, minimum/maximum,
etc.), a missing value will be assigned to each element for
which a missing value is used. For example:
a={1 . 1 1, 2 2 2 2, 3 3 3 3, 4 4 4 4}; b=a+a; /* b equals {2 . 2 2, 4 4 4 4, 6 6 6 6, 8 8 8 8}; */ c=a#a; /* c equals {1 . 1 1, 4 4 4 4, 9 9 9 9, 16 16 16 16} */

- matrix algebra - for operations which are uniquely matrix operations (e.g. multiplication, inversion, etc.), IML will not accept matrices with missing values.
- IML functions - some IML functions will ignore missing values (e.g. SUM), while others will treat them as missing (e.g. INV). Unfortunately, there is little in the manual on this.

- Define a matrix representing number of cups of coffee drunk by
three staff members on each day of the week. Define
character vectors for row and column labels.
coffee = { 4 2 2 3 2, 3 3 1 2 2, 2 1 0 4 5 }; COFFEE 3 rows 5 cols (numeric) 4 2 2 3 2 3 3 1 2 2 2 1 0 4 5 days = { Mon Tue Wed Thu Fri }; DAYS 1 row 5 cols (character, size 3) MON TUE WED THU FRI names = { 'Lenny', 'Linda', 'Sue'}; NAMES 3 rows 1 col (character, size 5) Lenny Linda Sue

- Print the matrix using row and column labels. Note that
`rowname`and`colname`can be abbreviated.print coffee[r=names c=days]; COFFEE MON TUE WED THU FRI Lenny 4 2 2 3 2 Linda 3 3 1 2 2 Sue 2 1 0 4 5

- Calculate daily and weekly cost at $.50/cup
daycost = .50 # coffee; DAYCOST 3 rows 5 cols (numeric) 2 1 1 1.5 1 1.5 1.5 0.5 1 1 1 0.5 0 2 2.5 ones = j(5,1); weektot = daycost * ones; WEEKTOT 3 rows 1 col (numeric) 6.5 5.5 6 weektot = daycost[,+]; WEEKTOT 3 rows 1 col (numeric) 6.5 5.5 6 daytot = daycost[+,]; DAYTOT 1 row 5 cols (numeric) 4.5 3 1.5 4.5 4.5 total = daycost[+,+]; TOTAL 1 row 1 col (numeric) 18

- Print a formatted table. Specify a format by
`[format= ]`following the name of the matrix.print coffee[r=names c=days] weektot[format=dollar7.2] , daytot[c=days f=dollar8.2] ' ' total[f=dollar7.2]; COFFEE MON TUE WED THU FRI WEEKTOT Lenny 4 2 2 3 2 $6.50 Linda 3 3 1 2 2 $5.50 Sue 2 1 0 4 5 $6.00 DAYTOT MON TUE WED THU FRI TOTAL $4.50 $3.00 $1.50 $4.50 $4.50 $18.00

use psy303.fitness; read all into mat[rowname=name];The

For output to a SAS dataset, use the ` create` and `
append` statements, as in

*-- Output results to data set out ; xys = yhat || res || weight; cname = {"_YHAT_" "_RESID_" "_WEIGHT_" }; create out from xys [ colname=cname ]; append from xys;This creates the SAS dataset,

- the USE statement - read access to an existing SAS dataset:
USE libref.dataset (dataset options);

- the EDIT statement - read/write access to an existing SAS
dataset:
EDIT libref.dataset (dataset options);

**Note:**IML can have only one input and one output dataset at a time. The EDIT statement will assign one dataset as the current input and current output dataset. The USE statement will assign the dataset as the current input dataset without changing the current output dataset. - the READ statement - reading variables into IML
vectors/matrices.
READ <range> <VAR operand> <WHERE (expression)> <INTO name <[rowname=variable colname=matrix]>>

- range : specifies the range of observations to be read
from the dataset. Valid values for range are:
- ALL
- all observations. This is the usual case when you want to read all observations into a matrix.
- CURRENT
- current observation (default)
- NEXT n
- next observation or next n observations
- AFTER
- all observations after the current one
- POINT operand
- obs specified by number where
operand may be:
- point 5
- number
- point {2 5 10}
- list of obs
- point p
- matrix containing observation numbers
- point (p+1)
- expression yielding obs numbers

- VAR operand : specifies variables to be read in.
Default is all numeric variables. The operand may
take the following values
- _ALL_
- for all variables (must be same type)
- _NUM_
- all numeric variables (default)
- _CHAR_
- all character variables
- {var1 var2..}
- list of variable names
- matrix
- character matrix containing names of desired variables.

- WHERE(expression) : conditional selection of
observations. Expression is a logical condition
which is evaluated as either true or false. For
example:
WHERE (var1^=.)

**Note:**the WHERE clause does not 'override' the range specification. If range is not specified (default is current), WHERE clause will only evaluate current observation. - INTO matrix "rowname=varname colname=matrix" : names the target matrix for the data read in. Only one target matrix can be specified per READ statement. If the INTO clause is not used, the default is to create a single vector for every variable read in and give that vector the name of the variable. The rowname option allows you to permanently assign a CHARACTER variable in the incoming dataset as a vector of rownames to be used whenever the target matrix is printed. The colname option creates a character vector containing the names of the variables read in and also permanently assigns that colname to the target matrix.

## Examples of READ statements

- Read all numeric variables for all observations from the
dataset
`psy303.fitness`into a matrix named`mat`. Create a character vector of observation labels from the dataset variable`name`use psy303.fitness; read all into mat[rowname=name];

- Read only the males.
use psy303.fitness; read all into mat[rowname=name] where sex='M';

- Read the variables x1, x2 and x3 for all observations in
the dataset
`work.data1`into the matrix X.use data1; read all var {x1 x2 x3} into x;

- Read every 10th observation up to 100 into a matrix X.
A character vector id will be created from the
variable id and will be permanently associated as the
rowname for x. A character vector containing the
names of all variables read into x will be called
coln and will be permanently assigned as the colname
for x.
keepobs=(1:10)#10; READ point keepobs into x[rowname=id colname=coln];

## IML output statements

- range : specifies the range of observations to be read
from the dataset. Valid values for range are:
- CREATE statement - creating new datasets from IML:
CREATE libref.dataset <VAR operand>; CREATE libref.dataset <FROM matrix <[r=vector1 c=vector2]>>;

- VAR operand : structures a dataset from the listed IML vectors (and only vectors may be used). The operand may take the same form as the operands used on the VAR clause in the READ statement. The dataset variables will be given the same name as the IML vectors.
- FROM matrix <[r=vector1 c=vector2]> : structures a dataset from the IML matrix named. Note that since an IML matrix can only be all numeric or all character that the dataset created will contain variables of only one type. The r(owname) option will create a single character variable from a character vector and the variable will have the same name as the vector. The c(olname) option will assign a name to the variables in the new dataset (other than that created by the r option). If c= is not specified, the variables will be called COL1, COL2, etc.

**Note:**The CREATE statement does not put data into the dataset but only defines the structure of the dataset. The VAR clause and the FROM clause are mutually exclusive. - the APPEND statement - place data from IML vectors or matrices
into the current output dataset:
APPEND <VAR operand>; OR APPEND <FROM matrix <[r=vector1]>>;

- the VAR clause and the FROM clause operate as they do in the CREATE statement. Note that the FROM clause does not have a c= option, since no data need actually be read from the colname vector. When the VAR clause has been used on the CREATE statement, it need not be specified on the APPEND statement.**Note:**The VAR clause and the FROM clause are mutually exclusive. The APPEND statement will always output to the current output dataset, whether that dataset has been specified via the CREATE or the EDIT statement.**Note:**It is possible to use external files (i.e. not SAS datasets) in IML. In terms of syntax, this is very similar to reading and writing these files in the SAS data step, but it is usually easier to do this in a SAS data step.

This section focusses on IML programming features, namely iterative and conditional processing. IML programming can take place in 'open' code or within modules (compiled programs). If the program is only used once then open code is generally preferable. If the program is used often, whether in one session or across sessions, the module format is probably preferred. Modules may be stored permanently in compiled form.

IF expression THEN statement1; ELSE IF expression THEN statement2;

**Note:** IML uses the symbol | for OR and the
symbol & for AND. It will not accept the words as alternatives for
logical operators as in the data step.

x=3; if x=3 then print 'x=' x; else if x=4 then print 'x is 4'; else print 'x is bad'; x= 3 x=4; if x=3 then print 'x=' x; else if x=4 then print 'x is 4'; else print 'x is bad'; x is 4 x=5; if x=3 then print 'x=' x; else if x=4 then print 'x is 4'; else print 'x is bad'; x is bad

DO variable = start TO stope.g., do i=1 to 100 by 10; ... end; do j=1 to 10; ... end; DO WHILE (expression); e.g., count=1; do while (count<5); ... end; DO UNTIL (expression); e.g., do until (count<5); ... end;

**Note:** the ` DO WHILE` loop is evaluated
at the top, meaning that if count was 10 in this example, the loop
would not execute. The ` DO UNTIL` loop is evaluated at the
bottom, meaning that it will ** always** execute at least once.
In the above example, if count equals 1 to start, the DO loop will
still execute once even though count is less than 5 to start with.

reset name; x=1; do while (x<2); print x; x=x+1; end; X 1 x=3; do while (x<2); /* note this loop does not execute */ print x; x=x+1; end; do until (x<4); print '** do until loop executes although X is less than 4', x; x=x-1; end; ** do until loop executes although X is less than 4 X 3

- Program modules
The following syntax DEFINES a program module:
START module-name <(argument1, argument2,...)>; IML statements; FINISH;

To run a program module:RUN module-name <(argument1, argument2, ...)>;

### Example

MODULE RMISS: delete rows of a matrix which contain missing values. Arguments are`mat1`(original matrix),`mat2`(target matrix) and`miss`(missing value indicator,`.`is the default. Miss must be specified in the argument list even if default is to be used.start rmiss(mat1, mat2, miss); if nrow(miss)=0 then miss={.}; badpos=loc(mat1=miss); print badpos; /* positions of missing values in row-major order */ badrow=ceil(badpos/ncol(mat1)); print badrow; /* badrow will be rows with at least one msg value */ keeprow=remove(1:nrow(mat1),badrow); print keeprow; /* 1:nrow(mat1) creates vector of values from 1 to the number of rows of mat1. Then badrow numbers are removed from this vector */ mat2=mat1[keeprow,]; print mat2; /* mat2 is subset of mat1 containing only rows with no msg values */ finish;

The RMISS module is used as follows:x={1 . 1 1, 2 2 2 2, 3 . 3 ., 4 4 4 4}; run rmiss(x,y,miss); BADPOS 2 10 12 BADROW 1 3 3 KEEPROW 2 4 MAT2 2 2 2 2 4 4 4 4

- Function (assignment) modules
The purpose of such a module is to return a value (scalar,
vector or matrix) and assign that value to the specified
matrix. In general, it is best to specify arguments for
this type of module. The syntax is very similar, except
note that the RETURN statement is required for this type of
module:
START module-name <(argument1, argument2, ...)>; IML statements; RETURN matrix; FINISH;

To use an assignment module:mat1=module-name <(argument1, argument2, ...)>;

### Example

LEN : A function module to return the vector length of each column of a matrix, X.*-- Define a length function (LEN); start len(X); ssq = X[##,]; return (sqrt( ssq )); finish;

**Note:**It is not possible to directly assign default values for module arguments. It seems to be completely impossible for function modules. For an example of how this can be done in a program module, see the RMISS module example. - Differences between Program and Function modules There is really only one main difference. The purpose of a function model is to return one and only one value. Consequently, it contains a RETURN statement and cannot accept arguments which do not exist. Program modules can accept arguments which do not (yet) exist (see RMISS example) and create them.

a=10; b=20; c=30; /* A,B,C are all global */ start mod1; /* module uses global table */ p=a+b; /* p is global */ c=40; /* c already global */ finish; run mod1; print a b c p; /* note c changed to 40 */ A B C P 10 20 40 30When a module is defined

start mod2(a,b); /* module with args creates local table */ p=2*(a+b); /* p is local */ b=50; /* b is local */ finish; run mod2(a,c); /* note that b (global) remains the same. Since C (global) is defined as b (local) and b is changed in the module, C (global) is changed. Note that p also remains the same. */ print a b c p; A B C P 10 20 50 30

IML storage catalogs are useful for saving large intermediate results for later use when memory is a concern. Also, these catalogs are necessary for having access to IML matrices and modules after an IML session is completed.

- RESET STORAGE=libref.catalog; This specifies the name of the currently open catalog (only one catalog may be open at a time). You can both store items to and load items from this catalog. SAS will provide a temporary catalog by default, but you would need to use this statement, for example, to change the open catalog to a permanent one.
- SHOW STORAGE; This will list all items stored in the current catalog.
- STORE
; This stores the named matrices and/or modules in the open catalog. Modules are stored in compiled form. The stored matrix or module still remains in the active workspace after storing. If you are using STORE to conserve memory, first STORE the matrix then FREE it. The following are examples of STORE statements: STORE a b c; /* stores matrices A,B and C*/ STORE module=mod1; /* stores module MOD1 */ STORE module=(mod1 mod2); /* stores modules MOD1 and MOD2 */ STORE; /* stores EVERYTHING */

- LOAD
; The opposite of STORE, this loads matrices and modules into the active workspace. The syntax for the arguments is the same as for STORE. The matrix or module will still be present in the catalog after it has been loaded. - REMOVE
; Erases matrices and modules from the storage catalog. Arguments take the same form as the STORE statement.