PROC PRINQUAL and PROC TRANSREG

Michael Friendly
SCS Short Course


PROC PRINQUAL and PROC TRANSREG

PROC PRINQUAL and PROC TRANSREG are two new procedures designed to find transformations of data to optimize some criterion.
PRINQUAL
The PRINQUAL procedure provides methods to find transformations that decrease the rank of the covariance matrix of the transformed variables or maximize the variance accounted for by a few linear combinations. As such it generalizes principal components analysis to ordinal or qualitative data.
TRANSREG
The TRANSREG procedure provides methods to find transformations that optimize the R sup 2 between a transformation of one or more dependent variables and transformations of a set of predictor variables. It generalizes the methods of linear regression, canonical correlation, and analysis of variance.
The data for both procedures can contain variables with nominal, ordinal, interval, or ratio scales. Any mix is allowed with all methods: In addition, both procedures provide methods for estimating missing data. For all transformations, missing data can be estimated:

Transformations

PRINQUAL and TRANSREG can carry out transformations in various families as shown below. The optimization carried out by the procedures takes place only for the optimal transformations.

PROC PRINQUAL

PROC PRINQUAL is primarily a scoring procedure, like PROC RANK or PROC SCORE. It produces very little printed output. It creates an output data set containing the transformed (optimally scaled) variables. You can then use this data set as the input to other SAS procedures, such as PROC FACTOR.

The PRINQUAL procedure is controlled by these statements:

PROC PRINQUAL DATA=SAS-data-set < options > ;
   TRANSFORM transform (variables)
        <... transform (variables) > ;
   BY variables;
   ID variables;
   OUTPUT  ;

Example

This example illustrates the use of PROC PRINQUAL to perform a nonmetric multidimensional preference (MDPREF) analysis. This is a principal components analysis of a rectangular data matrix whose columns correspond to people (judges) and whose are objects judged. In this case, 25 judges make preference ratings on a scale of 0 to 9 (9=most preferred) of 17 automobile models.
title 'Preference Ratings for Automobiles Manufactured in 1980';
data carpref;
   input make $ 1-10 model $ 12-22 @25 (judge1-judge25) (1.);
   datalines;
Cadillac   Eldorado     8007990491240508971093809
Chevrolet  Chevette     0051200423451043003515698
Chevrolet  Citation     4053305814161643544747795
Chevrolet  Malibu       6027400723121345545668658
Ford       Fairmont     2024006715021443530648655
Ford       Mustang      5007197705021101850657555
Ford       Pinto        0021000303030201500514078
Honda      Accord       5956897609699952998975078
Honda      Civic        4836709507488852567765075
Lincoln    Continental  7008990592230409962091909
Plymouth   Gran Fury    7006000434101107333458708
Plymouth   Horizon      3005005635461302444675655
Plymouth   Volare       4005003614021602754476555
Pontiac    Firebird     0107895613201206958265907
Volkswagen Dasher       4858696508877795377895000
Volkswagen Rabbit       4858509709695795487885000
Volvo      DL           9989998909999987989919000
;

For a nonmetric principal components analysis, PROC PRINQUAL is used to find monotonic transformations of the judges ratings so that two principal components account for the greatest amount of variance of the transformed ratings. The following SAS statements carry out this analysis:

   /* Transform to a better fit to a 2-component model */
proc prinqual data=carpref
              out=results n=2
              replace standard scores correlations;
   id model;
   transform monotone(judge1-judge25);
   title2 'Multidimensional Preference (MDPREF) Analysis';
The printed output from PRINQUAL is shown in Figure 5. The "Proportion of Variance" column shows the amount of variance of the ratings accounted for by two principal components. The value 0.66946 on the first iteration is the same as that given by a principal components analysis of the raw data. The value 0.82995 on the last iteration indicates a substantial improvement in fit due to the monotone transformations.


+-----------------------------------------------------------------+
|                                                                 |
|     Preference Ratings for Automobiles Manufactured in 1980     |
|          Multidimensional Preference (MDPREF) Analysis          |
|                                                                 |
|                  PRINQUAL MTV Iteration History                 |
|                                                                 |
|   Iteration    Average    Maximum    Proportion of    Variance  |
|    Number      Change     Change       Variance        Change   |
|   ------------------------------------------------------------  |
|       1        0.24994    1.28017       0.66946        .        |
|       2        0.07223    0.36958       0.80194       0.13249   |
|       3        0.04522    0.29026       0.81598       0.01404   |
|       4        0.03096    0.25213       0.82178       0.00580   |
|       5        0.02182    0.23045       0.82493       0.00315   |
|       6        0.01602    0.19017       0.82680       0.00187   |
|       ...       ...        ...           ...           ...      |
|      25        0.00059    0.00871       0.82994       0.00001   |
|      26        0.00050    0.00720       0.82995       0.00000   |
|      27        0.00043    0.00642       0.82995       0.00000   |
|      28        0.00037    0.00573       0.82995       0.00000   |
|      29        0.00031    0.00510       0.82995       0.00000   |
|      30        0.00027    0.00454       0.82995       0.00000   |
|                                                                 |
|                                                                 |
+-----------------------------------------------------------------+
Figure 5: PRINQUAL transformation for Automobile Preference data

The output data set RESULTS from the PRINQUAL procedure can be used to perform the principal components analysis of the transformed data as follows:

   /*---Final Principal Component Analysis---*/
proc factor data=results nfactors=2 scree;
   var judge1-judge25;
   where _type_='SCORE';
   title3 'Principal Components of Monotonically Transformed Data';
run;

PROC TRANSREG

PROC TRANSREG is also a scoring procedure. It creates an output data set containing the transformed (optimally scaled) variables. You can then use this data set as the input to other SAS procedures, such as PROC PLOT, PROC REG or PROC GLM. However, all significance tests and p-values from these confirmatory procedures should be ignored , since they do not take into account that the data has been optimally transformed.

The TRANSREG procedure is controlled by the statements:

PROC TRANSREG DATA=SAS-data-set < options > ;
   MODEL transform (dependents) = transform (independents)
                             <... transform (independents) > ;
   BY variables;
   ID variables;
   OUTPUT  ;

MODEL transformations

The MODEL statements below illustrate some of the models which can be handled by the TRANSREG procedure.
model linear(y1-y3) = linear(x1-x4);    ordinary multivariate
                                        multiple regression

model monotone(y) = linear(x1-x4);      monotone multiple regression

model mspline(y) = linear(x1-x4);       smooth monotone (quadratic
                                        spline) multiple regression

model linear(y) = class(gp)             parallel monotone curves,
                  spline(x/degree=1);   separate intercepts

model monotone(y1) = class(x1-x3);      monotone ANOVA (conjoint
                                        analysis), main effects model

model monotone(y1) = point(x1-x3);      PREFMAP or ideal point
                                        regression

Example: monotone ANOVA

The following example shows how TRANSREG is used to find a transformation for two-way ANOVA data with one observation per cell. The (hypothetical) data represent the reaction times of three subjects to make a decision about each of three types of sentences.
Title 'Response times for 3 types of sentences';
data rt;
     input subj @;
     do sent=1 to 3;
        input y @;
        output;
        end;
datalines;
 1  1.7  1.9  2.0
 2  4.4  4.5  5.7
 3  6.6  7.4 10.5
;
proc glm data=rt;
   class subj sent;
   model y = subj sent;
The two-way ANOVA carried out with PROC GLM (see Extended Example TREGSENT SAS) shows that the effect of SENTence is not significant. However, with n = 1 per cell, it is suspected that the data are non-additive (interaction between subjects and sentences). The data are transformed by TRANSREG with these statements:
proc transreg data=rt;
   model monotone(Y) = class(subj sent) / intercept;
   output out=trans coefficients;
run;
The transformed Y variable is now perfectly fit ( R sup 2 = 1 ) by an additive model. Since the data were optimally transformed, significance tests on the transformed variable are inappropriate. However, the plot of transformed-Y ( Y prime ) vs. Y suggests that the relationship is
Y Prime approx sqrt Y
An analysis of variance on sqrt Y is both sensible and appropriate.