PROC CALIS

Michael Friendly
SCS Short Course


PROC CALIS

The CALIS procedure is designed for analysis of covariance structure models (confirmatory factor analysis), linear structural equations with latent variables, and path analysis models. These are models in which hypothesized relations among variables specified in terms of the variances and covariances of variables and fit to an observed covariance matrix. PROC CALIS adds to SAS/STAT the kinds of analyses performed by the LISREL program under SPSSX and the EQS program in BMDP.

Model specification with PROC CALIS

Structural equations models allow free, fixed, and constrained parameters. These models are sometimes represented as linear equations, path diagram and parameter matrices. The CALIS procedure provides a syntax for each of these ways to specify a model.

Examples

The examples below simply show the flavors of model specification with the CALIS procedure. No attempt is made here to explain the theory of linear structural equations.

Errors-in-variables regression

Consider fitting a linear equation to two observed variables, Y and X. If the variable X is fixed and error free (but Y is assumed to be random and subject to error) simple linear regression may be used to fit the model,
Y = alpha + beta % X + e sub y
However, if both X and Y are contaminated by error, and you want to estimate the the relation between their true, error-free parts, the appropriate model consists of the structural equations,
Y = alpha + beta % F sub X + E sub Y
X = F sub X + E sub X (1)
cov ( F sub X , E sub X ) = cov ( F sub X , E sub Y ) = cov ( E sub X , E sub Y ) = 0

where E sub Y and E sub X are the random errors associated with Y and X. F sub X is the latent variable representing the true value of X. This model can be expressed as a set of linear equations in CALIS as follows:
proc calis;
   lineqs y = beta fx + ey,
          x = fx + ex;
   std   fx = vfx,
         ey = vey,
         ex = vex;
The LINEQS statement specifies the structural equations for Y and X; The STD statement specifies the variances of each latent variable in the model. When the variance of a variable is given as a name, as in fx = vfx, that variance is considered as a free parameter which is estimated along with other parameters, such as BETA.

Factor analysis models

Factor analysis models attempt to explain the relations among observed variables in terms of underlying latent variables called factors. This example considers a confirmatory factor model for four vocabulary tests.

Two tests, X1 and X2 had 15 items and were administered under liberal time limits; tests Y1 and Y2 had 75 items and were given under time pressure. The data (from Lord, 1957) consists of the covariance matrix from the scores of 649 examinees given these four tests. The covariance matrix is read in with the following DATA step:

data lord(type=cov);
   input _type_ $ _name_ $  x1 x2 y1 y2;
datalines;
n   .  649       .       .       .
cov x1 86.3937   .       .       .
cov x2 57.7751 86.2632   .       .
cov y1 56.8651 59.3177 97.2850   .
cov y2 58.8986 59.6683 73.8201 97.8192
;
(Note the first observation with _TYPE_='N' is necessary to establish the sample size for a covariance matrix.)

The statistical question is to determine a model which accounts for the variances and covariances among these tests. One model states that tests X1 and X2 are determined by a single common factor, F1, and that Y1 and Y2 are determined by a second common factor, F2. The two common factors are assumed to be correlated and it is desired to estimate this correlation. This model can be specified by the following structural equations:

X sub 1 = beta sub 1 % F sub 1 + e sub 1
X sub 2 = beta sub 2 % F sub 1 + e sub 2 (2)
Y sub 1 = beta sub 3 % F sub 2 + e sub 3
Y sub 2 = beta sub 4 % F sub 2 + e sub 4

with corr ( F sub 1 , F sub 2 ) = rho , and var ( e sub i ) = sigma sub i sup 2 . This model can also be expressed in matrix form as:
left [ matrix < ccol < x sub 1 above x sub 2 above y sub 1 above y sub 2 > > right ] %% = %% left [ matrix < ccol < beta sub 1 above beta sub 2 above 0 above 0 > ccol < 0 above 0 above beta sub 3 above beta sub 4 > > right ] %%% left [ matrix < ccol < F sub 1 above F sub 2 > > right ] % + % left [ matrix < ccol < e sub 1 above e sub 2 above e sub 3 above e sub 4 > > right ] %% = %% bold Lambda % F + e

Model (2) can be estimated by PROC CALIS by transcribing the structural equations into the LINEQS statement shown below:

title "Lord's data: H4- unconstrained two-factor model";
proc calis data=lord cov;
   lineqs  x1 = beta1 F1  + e1,
           x2 = beta2 F1  + e2,
           y1 = beta3 F2  + e3,
           y2 = beta4 F2  + e4;
   std  F1 F2 = 1,
        e1 e2 e3 e4 = ve1 ve2 ve3 ve4;
   cov  F1 F2 = rho;
run;

Path diagrams

The structural equations model for the vocabulary tests can also be represented in a path diagram as shown in Figure 6. The boxes in the path diagram represent observed variables; the latent variables are represented by ovals or circles. Single-headed arrows represent regression parameters connecting the observed and latent variables; double-headed arrows represent variances or covariances.
+-----------------------------------------------------------------+
|                                                                 |
|   Figure 6:  Path diagram for Lord's    |
|  vocabulary tests                                               |
|                                                                 |
+-----------------------------------------------------------------+

In the CALIS procedure, a path diagram is described by a RAM statement. The model for the vocabulary tests shown in Figure 6 is specified by the RAM statement shown below. Each line in the statement describes one link or arrow between nodes in the path diagram. The last item on the line is either the name of a parameter to be estimated (e.g., BETA1) or a constant, indicating a fixed parameter (e.g., 1.0, for the variances of F sub 1 and F sub 2 ).

*-- Same model represented as a path (RAM) model;
proc calis data=lord cov;
/*  number of heads on arrow in path diagram (matrix number)
        |  node pointed to (matrix row)
        |   |  node arrow leaves from (matrix col)
        |   |   |   parameter or value
        |   |   |     |
        v   v   v     v                                 */
   ram  1   1   5   beta1,
        1   2   5   beta2,
        1   3   6   beta3,
        1   4   6   beta4,
        2   1   1   ve1,
        2   2   2   ve2,
        2   3   3   ve3,
        2   4   4   ve4,
        2   5   5   1.0,
        2   6   6   1.0,
        2   5   6   rho;