Introduction to SAS/Graph

Michael Friendly
Statistical Consulting Service

ANNOTATE= data sets

ANNOTATE= data sets are special SAS data sets that enable you to customize SAS/GRAPH procedure output or to create your own individualized graphics output.

ANNOTATE= data sets contain the commands or functions that instruct SAS/GRAPH software on how to enhance your output. Two basic functions allow for moving and drawing. Other functions are used to position labels. More complex functions are used to create bars, pies, and polygons. An ANNOTATE= data set consists of one observation per function, and predetermined variable names are used to define functions.

You can use ANNOTATE= data sets to position text labels in a graph, to place symbols or city names on maps, to draw lines between points, and to compose special presentation graphics. You annotate your graphs by using a DATA step to place your commands into an ANNOTATE= data set and then specifying that data set with the ANNOTATE= option in any of the following procedures: GANNO, GCHART, GCONTOUR, GMAP, GPLOT, GSLIDE, and G3D.

For example, if you have an ANNOTATE= data set named GDATA, you can specify

PROC GPLOT ANNOTATE=GDATA; more SAS statements to customize the plot, or you can specify the ANNOTATE= option in the PLOT statement in GPLOT, for example: PROC GPLOT ; PLOT Y * X / ANNOTATE=GDATA; more SAS statements When you specify the ANNOTATE= option in a PROC statement, the option remains in effect until the end of the PROC step. The output from that statement is placed on the graph in addition to any ANNOTATE= output generated from the PROC statement specification.

ANNOTATE= data sets now support BY statement processing. The same BY variable(s) must be present in the PROC input data and the ANNOTATE= data set.

ANNOTATE= VARIABLES

The following variables are used to create an ANNOTATE= data set:

FUNCTION
the particular feature you want to activate

X
the X or horizontal coordinate

Y
the Y or vertical coordinate.
In addition to FUNCTION, X, and Y, several other variables can be placed in an ANNOTATE= data set to control such attributes as color, font, line style, and so on. When you are creating an ANNOTATE= data set, you should also specify a LENGTH statement to indicate the length of the character variables to be included in the data set.

ANNOTATE example

Say you have data on height, weight, and age for a class of students, as in the data set CLASS below, and you want to make a plot of weight against height. DATA CLASS; INPUT NAME $ SEX $ AGE HEIGHT WEIGHT; CARDS; JOHN M 12 59 99.5 JAMES M 12 57.3 83 ALFRED M 14 69 112.5 ... BARBARA F 13 65.3 98 MARY F 15 66.5 112 ALICE F 13 56.5 84 If you wanted to label each point in the plot with the person's name, you could construct an ANNOTATE data set as shown below. In this data step, FUNCTION = 'LABEL' means to write the TEXT at the position given by X and Y in the plot. Data AnnoName; Set CLASS; XSYS = '2'; YSYS = '2'; X = HEIGHT; Y = WEIGHT; FUNCTION='LABEL'; TEXT=NAME; POSITION='2'; PROC PRINT; PROC GPLOT; PLOT WEIGHT*HEIGHT=SEX / anno=AnnoName;

FUNCTION Variable

The FUNCTION variable tells SAS/GRAPH what action to perform. When you create an ANNOTATE= data set in a DATA step, you specify the function you want to use like this: FUNCTION='functionname'; Some basic functions need only the FUNCTION variable and the X and Y variables to perform an action. Other more advanced functions require additional variables but allow you more control over your output and more flexibility in use. The basic Annotate functions are listed and described briefly below.

Basic Functions

FUNCTION='BAR'
constructs a rectangle. You can define the color of the fill, the fill pattern, and the edge lines to be drawn.

FUNCTION='COMMENT'
places comments in your program. The text of the comment is ignored when the data set is processed.

FUNCTION='DRAW'
draws a line on the display area. You can define the color, line style, and thickness of the lines to be drawn.

FUNCTION='FRAME'
draws a border around the outside of the defined display area. You can optionally specify a background color for the area of the display enclosed by the frame.

FUNCTION='LABEL'
places text on the display area. You can specify the color, size, font, base angle, and rotation of the characters displayed.

FUNCTION='MOVE'
allows you to move to a specific point on the display area without drawing a line. MOVE is most often used to prepare for a DRAW command or advanced text functions.

FUNCTION='PIE'
draws pie slices on the display area. You can specify the color, fill pattern, arc angle, radius, and edge lines of the slice being drawn.

FUNCTION='POINT'
places a single point at the (X,Y) coordinates using the specified color.

FUNCTION='POLY'
specifies the beginning of a polygon definition. You can define the fill pattern and color, as well as the line type for the outline. The POLY function is used with the POLYCONT function to define and fill areas on the display area.

FUNCTION='POLYCONT'
specifies successive points in the polygon definition in separate continuation observations. The color for the outline is specified in the first POLYCONT observation.

FUNCTION='SYMBOL'
places special symbols on the display area. You can specify the symbol, font, height, and color to be used.

X, Y, Z and Related Variables

X, Y, and Z variables

You can use the variables X, Y, and (in PROC G3D only) Z to specify the coordinates on the graph to which a function is applied. X, Y, and Z must be numeric. The X variable defines the horizontal coordinate and the Y variable defines the vertical coordinate. The Z variable references the three-dimensional Z data values in PROC G3D. Some functions do not require or use the X and Y variables.

XC and YC variables

XC and YC are character type equivalents of X and Y. They are used when you specify a coordinate system based on data values (see the XSYS and YSYS variables below), and a data axis is typed as character. The coordinate systems defined by XSYS and YSYS (for X and Y) are the same for XC and YC. XC and YC are ignored in all procedures if the axes are numeric.

Utility variables

The two utility pairs (XLAST, YLAST) and (XLSTT, YLSTT) are used to supply default values when X or Y contains a missing value. Both pairs are initially set to zero and remain zero until a valid function updates the values.

The variables XLAST and YLAST are an internal coordinate pair that track the last values specified for X and Y. Since the (XLAST,YLAST) coordinates are updated internally, you cannot specify values for them. However, these variables can be manipulated by the utility advanced functions documented in the Advanced Functions section. The coordinate pair (XLAST,YLAST) is automatically updated by certain functions and is available for use by other functions that follow.

The coordinate pair (XLSTT,YLSTT) is similar to the (XLAST,YLAST) coordinate pair except that the (XLSTT,YLSTT) pair is only updated by the text-handling function LABEL. Thus, it is possible to maintain two different coordinate pairs: one for text-handling functions and one for nontext handling functions. The (XLSTT,YLSTT) coordinates are updated automatically, and you cannot specify values for them. However, these values can be manipulated indirectly by the advanced functions described in the Advanced Functions section.

X and Y Coordinate Systems

When you specify values for the X and Y variables (and Z variable in PROC G3D), you can also specify the coordinate system to use to place this information on the display area. A coordinate system defines the graphics area, units, and location for displaying Annotate information.

Defining the graphics area

Three types of area definition tell SAS/GRAPH software what portion of the display area to use to display the Annotate information: the DATA graphics area, the SCREEN graphics area, and the WINDOW graphics area.

The DATA graphics area is the area that is bordered by the horizontal and/or vertical axis (and in PROC GMAP, the range of map coordinates) specified in a procedure step. For example, if you customize output from PROC GPLOT and specify the DATA coordinate system, the available area is the area enclosed by the axis lines.

The SCREEN graphics area is the entire area available for graphics on the device. When you specify the HSIZE= and VSIZE= options, these values set the physical limit of the SCREEN coordinate system.

The WINDOW graphics area is the same area defined by the SCREEN area minus the amount of space required by TITLE and FOOTNOTE statements.

Defining units for X and Y

After defining the area to be used to display Annotate information, you can define the units used to interpret X and Y values and place information on the display. You can specify that X or Y be interpreted in character cells (for SCREEN and WINDOW areas) and data values (for DATA areas) or in percentage values (based on a scale ranging from 0% to 100% for DATA, SCREEN, and WINDOW areas).

Defining location

When you specify values for X and Y, you can also specify whether they are to be interpreted as absolute values (placed the specified distance from a fixed origin) or relative values (placed the specified distance from the last point referenced).

By combining values to define the area, unit, and location for the Annotate information, you can specify up to twelve coordinate systems. Each unique combination of area, unit, and location definition has been assigned a value that you specify with the XSYS, YSYS, ZSYS, and HSYS variables.

XSYS, YSYS, ZSYS, and HSYS Variables

XSYS, YSYS, and (in PROC G3D only) ZSYS are character variables that define the area and coordinate system used by the X, Y, and Z variables to display the Annotate information. Thus, the values for X, Y, and (in PROC G3D) Z can be interpreted in a variety of ways depending upon the coordinate system you specify with the XSYS, YSYS, and ZSYS variables.

You can use these coordinate systems in any combination to specify display area locations in the ANNOTATE= data set. The X and Y variables need not be referenced with the same system value. X can be referenced as 'data value' and Y referenced as 'window percentage' in one observation, and both coordinates can be referenced as 'screen value' in the next.

The HSYS variable determines what coordinate system the variable SIZE uses. The coordinate system values specified with XSYS, YSYS, and ZSYS are valid with HSYS.

Additional Variables

The following variables can be used in addition to the FUNCTION, X, Y, XSYS, YSYS, ZSYS, and HSYS variables to create an ANNOTATE= data set.

ANGLE
is a function-dependent numeric variable. ANGLE can be used with the functions LABEL and PIE. ANGLE is measured in degrees.

COLOR
is a character variable of length 8 that specifies the color used by the function COLOR can be used with the functions BAR, DRAW, FRAME, LABEL, PIE, POLY, and POLYCONT.

GROUP
is a variable used to reference the GCHART GROUP data value. GROUP should be of the same type (character or numeric) as the GCHART variable it references. GROUP is only valid with PROC GCHART. The use of the GROUP variable is similar to that of the X and Y variables when used for data dependent placement.

LINE
is a function-dependent variable that usually specifies the line type. LINE can be used with the functions BAR, DRAW, FRAME, PIE, and POLY.

MIDPNT
is a variable used to reference the GCHART MIDPOINT data value. MIDPNT should be of the same type (character or numeric) as the GCHART variable it references. MIDPNT is only valid with PROC GCHART. The use of the MIDPNT variable is similar to that of the X and Y variables when used for data-dependent placement.

POSITION
is a character variable used to control placement and alignment of a text string. POSITION can be used only with the LABEL function.

ROTATE
is a function-dependent numeric variable. ROTATE can be used with the fuctions LABEL and PIE. ROTATE can be used with the ANGLE variable and is measured in degrees.

SIZE
is a function-dependent numeric variable. SIZE can be used with the functions DRAW, FRAME, LABEL, PIE, and PIEXY. The value of the variable SIZE defaults to 1.00 in all cases.

STYLE
is a function-dependent character variable of length 8 that normally specifies a font or pattern for the function. STYLE can be used with the functions BAR, FRAME, LABEL, PIE, and POLY.

SUBGRP
is a variable used to reference the actual GCHART SUBGROUP data value. SUBGRP should be of the same type (character or numeric) as the GCHART variable it references. SUBGRP is only valid with PROC GCHART. The use of the SUBGRP variable is similar to that of the X and Y variables when used for data-dependent placement.

WHEN
is a sequencing variable that specifies when the function is performed in relation to generating graphics output for the procedure. WHEN can take the values 'A' (AFTER the actual graph is drawn) or 'B' (BEFORE the graph is drawn). A missing value is equivalent to specifying BEFORE. Normally, observations in an ANNOTATE= data set are processed sequentially. If the variable WHEN is used, all those observations with a WHEN value of 'B' are processed first, the graph is then processed (if there is one to be produced), and finally the observations with a WHEN value of 'A' are processed. WHEN should always be a character variable of length 1.

© 1995 Michael Friendly

Author Michael Friendly
Email<Friendly>
Last Updated:Friday, July 19, 1995

[Back] To Menu.