Introduction to SAS/Graph
Michael Friendly
Statistical Consulting Service
ANNOTATE= data sets are special SAS data sets that enable you to
customize SAS/GRAPH procedure output or to create your own
individualized graphics output.
ANNOTATE= data sets contain the commands or functions that
instruct SAS/GRAPH software on how to enhance your output. Two
basic functions allow for moving and drawing. Other functions are
used to position labels. More complex functions are used to create
bars, pies, and polygons. An ANNOTATE= data set consists of one
observation per function, and predetermined variable names are used
to define functions.
You can use ANNOTATE= data sets to position text labels in a
graph, to place symbols or city names on maps, to draw lines
between points, and to compose special presentation graphics. You
annotate your graphs by using a DATA step to place your commands
into an ANNOTATE= data set and then specifying that data set with
the ANNOTATE= option in any of the following procedures: GANNO,
GCHART, GCONTOUR, GMAP, GPLOT, GSLIDE, and G3D.
For example, if you have an ANNOTATE= data set named GDATA, you
can specify
PROC GPLOT ANNOTATE=GDATA;
more SAS statements
to customize the plot, or you can specify the ANNOTATE= option in
the PLOT statement in GPLOT, for example:
PROC GPLOT ;
PLOT Y * X / ANNOTATE=GDATA;
more SAS statements
When you specify the ANNOTATE= option in a PROC statement, the
option remains in effect until the end of the PROC step. The
output from that statement is placed on the graph in addition to
any ANNOTATE= output generated from the PROC statement
specification.
ANNOTATE= data sets now support BY statement processing. The
same BY variable(s) must be present in the PROC input data and the
ANNOTATE= data set.
The following variables are used to create an ANNOTATE= data set:
- FUNCTION
- the particular feature you want to activate
- X
- the X or horizontal coordinate
- Y
- the Y or vertical coordinate.
In addition to FUNCTION, X, and Y, several other variables can be
placed in an ANNOTATE= data set to control such attributes as
color, font, line style, and so on. When you are creating an
ANNOTATE= data set, you should also specify a LENGTH statement to
indicate the length of the character variables to be included in
the data set.
Say you have data on height, weight, and age for a class of
students, as in the data set CLASS below, and you want to make a
plot of weight against height.
DATA CLASS;
INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;
CARDS;
JOHN M 12 59 99.5
JAMES M 12 57.3 83
ALFRED M 14 69 112.5
...
BARBARA F 13 65.3 98
MARY F 15 66.5 112
ALICE F 13 56.5 84
If you wanted to label each point in the plot with the person's
name, you could construct an ANNOTATE data set as shown below. In
this data step, FUNCTION = 'LABEL' means to write the TEXT at the
position given by X and Y in the plot.
Data AnnoName;
Set CLASS;
XSYS = '2'; YSYS = '2';
X = HEIGHT; Y = WEIGHT;
FUNCTION='LABEL';
TEXT=NAME;
POSITION='2';
PROC PRINT;
PROC GPLOT;
PLOT WEIGHT*HEIGHT=SEX / anno=AnnoName;
The FUNCTION variable tells SAS/GRAPH what action to perform. When
you create an ANNOTATE= data set in a DATA step, you specify the
function you want to use like this:
FUNCTION='functionname';
Some basic functions need only the FUNCTION variable and the X and
Y variables to perform an action. Other more advanced functions
require additional variables but allow you more control over your
output and more flexibility in use. The basic Annotate functions
are listed and described briefly below.
Basic Functions
- FUNCTION='BAR'
- constructs a rectangle. You can define the color
of the fill, the fill pattern, and the edge lines to
be drawn.
- FUNCTION='COMMENT'
- places comments in your program. The text of the
comment is ignored when the data set is processed.
- FUNCTION='DRAW'
- draws a line on the display area. You can define
the color, line style, and thickness of the lines to
be drawn.
- FUNCTION='FRAME'
- draws a border around the outside of the defined
display area. You can optionally specify a
background color for the area of the display enclosed
by the frame.
- FUNCTION='LABEL'
- places text on the display area. You can specify
the color, size, font, base angle, and rotation of
the characters displayed.
- FUNCTION='MOVE'
- allows you to move to a specific point on the
display area without drawing a line. MOVE is most
often used to prepare for a DRAW command or advanced
text functions.
- FUNCTION='PIE'
- draws pie slices on the display area. You can
specify the color, fill pattern, arc angle, radius,
and edge lines of the slice being drawn.
- FUNCTION='POINT'
- places a single point at the (X,Y) coordinates
using the specified color.
- FUNCTION='POLY'
- specifies the beginning of a polygon definition.
You can define the fill pattern and color, as well as
the line type for the outline. The POLY function is
used with the POLYCONT function to define and fill
areas on the display area.
- FUNCTION='POLYCONT'
- specifies successive points in the polygon
definition in separate continuation observations. The
color for the outline is specified in the first
POLYCONT observation.
- FUNCTION='SYMBOL'
- places special symbols on the display area. You
can specify the symbol, font, height, and color to be
used.
X, Y, and Z variables
You can use the variables X, Y,
and (in PROC G3D only) Z to specify the coordinates on the graph to
which a function is applied. X, Y, and Z must be numeric. The X
variable defines the horizontal coordinate and the Y variable
defines the vertical coordinate. The Z variable references the
three-dimensional Z data values in PROC G3D. Some functions do not
require or use the X and Y variables.
XC and YC variables
XC and YC are character type
equivalents of X and Y. They are used when you specify a
coordinate system based on data values (see the XSYS and YSYS
variables below), and a data axis is typed as character. The
coordinate systems defined by XSYS and YSYS (for X and Y) are the
same for XC and YC. XC and YC are ignored in all procedures if the
axes are numeric.
Utility variables
The two utility pairs (XLAST, YLAST)
and (XLSTT, YLSTT) are used to supply default values when X or Y
contains a missing value. Both pairs are initially set to zero and
remain zero until a valid function updates the values.
The variables XLAST and YLAST are an internal coordinate pair that
track the last values specified for X and Y. Since the
(XLAST,YLAST) coordinates are updated internally, you cannot
specify values for them. However, these variables can be
manipulated by the utility advanced functions documented in the
Advanced Functions section. The coordinate pair (XLAST,YLAST) is
automatically updated by certain functions and is available for use
by other functions that follow.
The coordinate pair (XLSTT,YLSTT) is similar to the (XLAST,YLAST)
coordinate pair except that the (XLSTT,YLSTT) pair is only updated
by the text-handling function LABEL. Thus, it is possible to
maintain two different coordinate pairs: one for text-handling
functions and one for nontext handling functions. The
(XLSTT,YLSTT) coordinates are updated automatically, and you cannot
specify values for them. However, these values can be manipulated
indirectly by the advanced functions described in the Advanced
Functions section.
When you specify values for the X and Y variables (and Z variable
in PROC G3D), you can also specify the coordinate system to use to
place this information on the display area. A coordinate system
defines the graphics area, units, and location for displaying
Annotate information.
Defining the graphics area
Three types of area
definition tell SAS/GRAPH software what portion of the display area
to use to display the Annotate information: the DATA graphics
area, the SCREEN graphics area, and the WINDOW graphics area.
The DATA graphics area is the area that is bordered by the
horizontal and/or vertical axis (and in PROC GMAP, the range of map
coordinates) specified in a procedure step. For example, if you
customize output from PROC GPLOT and specify the DATA coordinate
system, the available area is the area enclosed by the axis lines.
The SCREEN graphics area is the entire area available for
graphics on the device. When you specify the HSIZE= and VSIZE=
options, these values set the physical limit of the SCREEN
coordinate system.
The WINDOW graphics area is the same area defined by the
SCREEN area minus the amount of space required by TITLE and
FOOTNOTE statements.
Defining units for X and Y
After defining the area to be
used to display Annotate information, you can define the units used
to interpret X and Y values and place information on the display.
You can specify that X or Y be interpreted in character cells (for
SCREEN and WINDOW areas) and data values (for DATA areas) or in
percentage values (based on a scale ranging from 0% to 100% for
DATA, SCREEN, and WINDOW areas).
Defining location
When you specify values for X and Y,
you can also specify whether they are to be interpreted as absolute
values (placed the specified distance from a fixed origin) or
relative values (placed the specified distance from the last point
referenced).
By combining values to define the area, unit, and location for the
Annotate information, you can specify up to twelve coordinate
systems. Each unique combination of area, unit, and location
definition has been assigned a value that you specify with the
XSYS, YSYS, ZSYS, and HSYS variables.
XSYS, YSYS, and (in PROC G3D only) ZSYS are character variables
that define the area and coordinate system used by the X, Y, and Z
variables to display the Annotate information. Thus, the values
for X, Y, and (in PROC G3D) Z can be interpreted in a variety of
ways depending upon the coordinate system you specify with the
XSYS, YSYS, and ZSYS variables.
You can use these coordinate systems in any combination to specify
display area locations in the ANNOTATE= data set. The X and Y
variables need not be referenced with the same system value. X can
be referenced as 'data value' and Y referenced as 'window
percentage' in one observation, and both coordinates can be
referenced as 'screen value' in the next.
The HSYS variable determines what coordinate system the variable
SIZE uses. The coordinate system values specified with XSYS, YSYS,
and ZSYS are valid with HSYS.
The following variables can be used in addition to the FUNCTION, X,
Y, XSYS, YSYS, ZSYS, and HSYS variables to create an ANNOTATE= data
set.
- ANGLE
- is a function-dependent numeric variable. ANGLE can
be used with the functions LABEL and PIE. ANGLE is measured
in degrees.
- COLOR
- is a character variable of length 8 that specifies
the color used by the function COLOR can be used with the
functions BAR, DRAW, FRAME, LABEL, PIE, POLY, and POLYCONT.
- GROUP
- is a variable used to reference the GCHART GROUP
data value. GROUP should be of the same type (character or
numeric) as the GCHART variable it references. GROUP is only
valid with PROC GCHART. The use of the GROUP variable is
similar to that of the X and Y variables when used for data
dependent placement.
- LINE
- is a function-dependent variable that usually
specifies the line type. LINE can be used with the functions
BAR, DRAW, FRAME, PIE, and POLY.
- MIDPNT
- is a variable used to reference the GCHART MIDPOINT
data value. MIDPNT should be of the same type (character or
numeric) as the GCHART variable it references. MIDPNT is only
valid with PROC GCHART. The use of the MIDPNT variable is
similar to that of the X and Y variables when used for
data-dependent placement.
- POSITION
- is a character variable used to control placement
and alignment of a text string. POSITION can be used only
with the LABEL function.
- ROTATE
- is a function-dependent numeric variable. ROTATE
can be used with the fuctions LABEL and PIE. ROTATE can be
used with the ANGLE variable and is measured in degrees.
- SIZE
- is a function-dependent numeric variable. SIZE can be
used with the functions DRAW, FRAME, LABEL, PIE, and PIEXY.
The value of the variable SIZE defaults to 1.00 in all cases.
- STYLE
- is a function-dependent character variable of length
8 that normally specifies a font or pattern for the function.
STYLE can be used with the functions BAR, FRAME, LABEL, PIE,
and POLY.
- SUBGRP
- is a variable used to reference the actual GCHART
SUBGROUP data value. SUBGRP should be of the same type
(character or numeric) as the GCHART variable it references.
SUBGRP is only valid with PROC GCHART. The use of the SUBGRP
variable is similar to that of the X and Y variables when
used for data-dependent placement.
- WHEN
- is a sequencing variable that specifies when the
function is performed in relation to generating graphics
output for the procedure. WHEN can take the values 'A' (AFTER
the actual graph is drawn) or 'B' (BEFORE the graph is
drawn). A missing value is equivalent to specifying BEFORE.
Normally, observations in an ANNOTATE= data set are processed
sequentially. If the variable WHEN is used, all those
observations with a WHEN value of 'B' are processed first,
the graph is then processed (if there is one to be produced),
and finally the observations with a WHEN value of 'A' are
processed. WHEN should always be a character variable of
length 1.
© 1995 Michael Friendly
Author
Michael Friendly
Email<Friendly>
Last Updated:Friday, July 19, 1995
To Menu.