Saturday, March 15, 2008

10 Essential Proc's


List of 10 "Essential Proc's" from net, the general syntax for a PROC step is:

PROC name [DATA=dataset [dsoptions] ] [options];
[other PROC-specific statements;]
[BY varlist;]
RUN;


where:
name:------ identifies the procedure you want to use.
dataset:---- identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used.
dsoptions:-- specifies the data set options to be used.
varlist:----- specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables.
options:---- specifies the PROC-specific options to be used.

/***********************************************************/

1)
PROC COMPARE compares two SAS datasets with each other and detects observations / variables that do not match with each other. The BASE defines the data set that SAS will use as a basis for comparison.

PROC COMPARE BASE = mydataset
COMPARE = otherds;
RUN;

/***********************************************************/

2) SAS Language allows users to assign their own formats to values in their data. Usually, this involves users asking SAS to replace numbers in their data with some kind of labels. The most common example is the user request that: SAS take '1's and '2's in a variable called SEX and format those values as the words 'female' and 'male' when they are displayed in output.

PROC FORMAT;
VALUE sexfmt
1="Female"
2="Male";
RUN;


FORMAT sex sexfmt.;
O U
-----------------------
1 female
2 male
2 male
1 female
2 male

O: original data, U: output using user-defined format

The original data stay the same -- coded as numbers -- but whenever the format is requested, the output shows the assigned labels instead of the original numbers. These kinds of formats are called 'user-defined formats'. The formats already known to SAS are called SAS formats.
-
Ten Things You Should Know About PROC FORMAT:

/***********************************************************/

3)
Every PROC SQL query must have at least one SELECTstatement. The purpose of the SELECT statement is to name the columns that will appear on the report and the order in which theywill appear. The FROM clause names the data set from which the information will be extracted from.

PROC SQL;
SELECT STATE, SALES,
(SALES * .05) AS TAX
FROM USSALES;
QUIT;


It has the functionality of DATA and PROC steps into a single step. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new table or view all in one step.

/***********************************************************/

4)
PROC CONTENTS describes the structure of the data set rather than the data values. Displays valuable information at the...
  • Data set level
    Name
    Engine
    Creation date
    Number of observations
    Number of variables
    File size (bytes)
  • Variable level
    Name
    Type
    Length
    Formats
    Position
    Label

PROC CONTENTS DATA=temp (options);
RUN;

Few options:-
Position: output lists the variables by their position in the data set (default is alphabetical).
Short: output is just the variable names in a row by row format.
Out=filename: creates a data set where each observation is a variable from the original data set.

/***********************************************************/

5) Values of all the variables in the data set will be listed (by PROC PRINT) unless a variables (VAR) statement accompanies the procedure. The VAR statement can also be used to specify the order in which variables are to be printed.

PROC PRINT DATA=temp (options);
RUN;

Few options:-
N: The number of observations is printed after the listing.
UNIFORM: This option specifies that the values for each variable be printed in the same columns on every page of the output.
DOUBLE: This option forces SAS to double space the output.
ROUND: This option causes SAS to round variables being summed. NOOBS: This option causes suppression of the observation number in the printed output.
LABEL: Use information from LABEL statements (where defined) as the column headings rather than the variable names.

/***********************************************************/

6) The FREQ procedure produces one-way frequency tables and n-way cross tabulations. For example, to obtain frequency counts for all variables in the SURVEY data set, enter: PROC FREQ;

PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;
RUN;

Observe that:

  • Tables can be produced for numeric and character variables.
  • More than one TABLES statement can be used.
  • When the TABLES statement is omitted, one-way frequency tables are printed for each variable in the data set.
  • There is no limit to the number of variables in a TABLES request.
  • For a table of two or more variables an * must be given between each pair of variable names.

The following example illustrates the generation of one-, two-, and three-way frequency tables:
PROC FREQ;
TABLES sex;
TABLES sex * r1;
TABLES sex * r1 * r2;
The first TABLES statement generates a one-way frequency table for the variable SEX. The second TABLES statement generates a two-way frequency table for the row variable SEX and the column variable R1. The third TABLES statement generates a three-way frequency table for the variables SEX, R1, and R2; R1 is the row variable and R2 is the column; a new table is printed for each different value of the variable SEX.

/***********************************************************/

7) The MEANS procedure is used to produce simple descriptive statistics on numeric variables in a SAS data set.

PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
RUN;

/***********************************************************/

8) PROC TRANSPOSE helps to reshape the long data to wide one.

data long1;
input famid year faminc;
cards;
1111 96 40000
1111 97 40500
1111 98 41000
2222 96 45000
2222 97 45400
2222 98 45800
3333 96 75000
3333 97 76000
3333 98 77000 ;

run;
proc sort data=long1; by famid; run;

*incyr: income year, prefix is used for identification;
PROC TRANSPOSE DATA=long1 OUT=wide1 PREFIX=incyr;
*same sort order is used as in proc sort statement;
BY FAMID;
*this is the variable that would become as: identifier/header;
ID YEAR;
*this is the variable that would become as: observation;
VAR FAMINC;
RUN;

Result:

Obs famid _NAME_ incyr96 incyr97 incyr98

1--- 1111 --faminc- 40000 -40500 -41000
2--- 2222 --faminc- 45000 -45400 -45800
3--- 3333 --faminc- 75000 -76000 -77000

/***********************************************************/

9) PROC SORT: sort's the data file. The program below sorts the data file called "DRAFT" on the variable "USUBJID and VISIT" and saves the sorted file as "DRAFT1". The original file remains unchanged since we used out=DRAFT1 to specify that the sorted data should be placed in DRAFT1.

PROC SORT DATA=DRAFT OUT=DRAFT1;
BY USUBJID VISIT;
RUN ;

*new data;
DATA DRAFT2;
SET DRAFT1;
BY USUBJID VISIT;

*if first dot is used to pick out the first obs;
IF FIRST.VISIT;
RUN;

(NODUPKEY option: is used with proc sort to remove duplicates)

/***********************************************************/

10) PROC REPORT: Influence the way in which a data has to be presented and the compute block introduces lines of text into the report.

PROC REPORT .......;
column sex;
.....;
COMPUTE BEFORE sex;
line@1 'Gender';
line@1 sex $200.;
endcomp;
RUN;

No comments:

Post a Comment