REWEIGHT statement can be used to exclude
observations from the analysis. The ADD and DELETE
statements can be used to add or delete variables from the
model.
This means you can submit a model, look at the output, then submit additional statements without starting over. A procedure is ended by another PROC step, a DATA step, or the QUIT; statement.
The interactive statements in Version 6 are:
Note: A number of the interactive statements have a lasting effect. For example, when you REWEIGHT or PAINT an observation, that action continues until you explicitly undo it with another statement.
/* Interactive Analysis */
proc reg data=class;
model weight = age height;
run; /* fit initial model */
delete age; /* delete AGE from model */
print; /* ANOVA summary and parameter estimates */
run;
add age; /* put AGE back in */
plot r.*p.; /* plot residual * predicted */
run;
reweight r.>20; /* refit, ignoring obs with residual > .20 */
plot; /* plot residual * predicted again */
run;
Note that same the plot specification (R. * P.) is used again in
the second PLOT statement. In the same way, the REWEIGHT condition
(R. > .20) would continue to be used in subsequent steps until
it was changed with a new REWEIGHT statement.
The example below produces two sets of scatter plots using the PAINT statement to identify individual points. The first plot, shown in Figure 2, identifies the observation with name='Henry' and all observations with large absolute studentized residuals. The plotting symbols are specified by the SYMBOL= option in each PAINT statement.
/* ---Painting Scatter Plots--- */
proc reg data=class;
model weight=age height / noprint;
run;
paint name='Henry' /* identify Henry with 'H' */
/ symbol='H';
paint student.>=2 or student.<=-2 /* identify obs with large abs */
/ symbol='$'; /* studentized residuals */
plot student.*p.; /* plot studentized residual vs. yhat */
run;
paint student.>=1 / symbol='p';
paint student.<1 and student.>-1
/ symbol='s';
paint student.<=-1 / symbol='n';
plot student.*p. cookd.*h. / hplots=2 vplots=2;
run;
---+-----+-----+-----+-----+-----+-----+-----+-----+-----+---
STUDENT | |
| |
3 + +
| |
| |
| |
| |
| |
| $ |
2 + +
S | |
t | |
u | |
d | |
e | 1 |
n | 1 1 |
t 1 + 1 +
i | |
z | |
e | 11 |
d | |
| 1 |
R | |
e 0 + 1 1 +
s | |
i | H |
d | 1 2 |
u | |
a | 1 |
l | |
-1 + +
| 1 1 |
| |
| 1 |
| 1 |
| |
| |
-2 + +
---+-----+-----+-----+-----+-----+-----+-----+-----+-----+---
50 60 70 80 90 100 110 120 130 140
Predicted Value of WEIGHT PRED
Figure 2: Painting observations
The second PLOT statement in the example above produces two small scatter plots on a single page, as shown in Figure 3. The PAINT statements identify observations with large positive residuals by the character 'p', those with large negative residuals by the character 'n', and those with small absolute residuals by the character 's'. The options HPLOTS=2 VPLOTS=2 allow for four plots on a page, of which two are actually used. The same plotting symbols are used in both plots, so you can relate the observations in the two plots.
--+----+----+----+----+----+-- -+-----+-----+-----+-----+--
| | | |
4 + + 1.00 + +
| | | |
| | | |
| | | p |
S | p | C 0.75 + +
T 2 + + O | |
U | | O | |
D | p p p | K | |
E | s | D 0.50 + +
N | s | | |
T 0 + s s + | |
| s s s | | |
| s | 0.25 + +
| n n | | p |
| n n | | n s |
-2 + + | n pp n s |
| | 0.00 + sss s s +
--+----+----+----+----+----+-- -+-----+-----+-----+-----+--
40 60 80 100 120 140 0.0 0.1 0.2 0.3 0.4
PRED H
Figure 3: Painting observations in two
plots
These methods are all provided in PROC REG in Version 6, using the SELECTION= option on the MODEL statement. In addition, a modification of the syntax for the MODEL statement and a GROUPNAMES option allows groups of variables to be entered or removed as a whole in stepwise methods.
Note: For compatiblity across versions, if PROC STEPWISE or PROC RSQUARE is requested in Version 6, PROC REG with the appropriate model-selection method is actually used.
For each of these methods, you can use the option START= and STOP= to specify the smallest and largest number of variables in a given model. The option BEST= specifies the maximum number of models to be printed for a given number of predictors.
The following example uses the RSQUARE criterion to identify the best 4 models with 2, 3, 4 and 5 predictors for the fitness data.
proc reg data=fitness;
model oxy = runtime age weight runpulse maxpulse rstpulse
/ selection=rsquare
start=2 stop=5 best=4;