Saturday, March 22, 2008

Character array for creating a dummy dataset

-
data one;
input visit$ 1-10;
datalines;
DAY 8
DAY 15
DAY 29
DAY 57
DAY 85

;
run;
-
*creating a dummy dataset with 3 different xxorres for each visit present in dataset one;
data dummy;
set one;
length xxorres $ 20;
array x{*} _character_;
do i = 1 to dim(x);
if i gt 0 then do;
xxorres='COMPLETE CLEARANCE'; output;
xxorres='INCOMPLETE CLEARANCE'; output;
xxorres='UNABLE TO ASSESS'; output;
i=i+1;
end;
end;
run;
proc sort data=dummy; by i; run;
proc print; run;
-
More:
http://dist.stat.tamu.edu/flash/SAS/
-
*display of zero for missing values (using:character array);
data display;
set disp;
*new: is array name, l1 to l4: are character variable;
array new l1 - l4 _character_;
do i=1 to dim(new);
if new{i}='' then new{i}='0';
end;
run;
-

Tuesday, March 18, 2008

"When" for data analysis

  • "When the ORRES is populated, the STAT (& REASND) remains blank. When the STAT (& REASND) is populated, the ORRES remains blank".

  • "When both the raw dataset and the derived dataset names are same, the derived dataset repalce the raw dataset".

  • "When two dataset's are merged & one of them have few missing usubjid's:- if a and b; condition is preferred to avoid blanks.

  • "When TERM is missing, the derived dataset have blank values". Ex:- To drop observations without value:- if aeterm ne " "; condition is used.

  • "When the derived dataset have both TERM and STAT:- if aeterm ne " " or aestat ne " "; condition is used".

  • "When the result is not missing in raw dataset then ORRES=result, STRESC=ORRES and STRESN=STRESC(the numeric form of STRESC)". Usually ORRES & STRESC are displayed as "Y" / "N" instead of "Yes" / "No".

  • "When the STRESC is "Y" / "N then STRESN is displayed as "blank" (rather than 1 / 2) ".

  • "When END DATE is " " then ENRF="ONGOING"; Example: MH dataset".

  • "When a PARTIAL DATE is present in a raw dataset, then the same is displayed in the derived dataset".

  • "When a sequence number is assigned, it remains unique for all observations within a usubjid".

  • When working on display of DAYS, Programmer's check for DAY:- not displayed as 0".

  • When the variables in the SAS data set are not in a desired order then one of the following (ATTRIB, ARRAY, FORMAT, INFORMAT, LENGTH, RETAIN) is used before the SET statement to re-order it.
  • When a split character is used in the listing/table programming then the same split character must be used in the proc report with a FLOW option. This flow option hold back the direct display of split chatacter "*" in the output.
  • When the following warning is received in the log, the PROC PRINT option is removed from the program. "WARNING: Data too long for column "TEST"; truncated to 125 characters to fit."

  • When an order is missing in the final dataset of a table and if the same order is being used in the proc report: - the row for which the order is missing will not be displayed.

  • When different order numbers are given for the "result HEADER" and the "RESULT"(in listing/table) & if the same order variable is used in the break after statement (break after ord/ skip;) , there appears a blank line. To avoid this same order number is given for both the header and result.

Age (years)-> result header

blank line ------------
N->result

Saturday, March 15, 2008

10 Essential Proc's


List of 10 "Essential Proc's" from net, the general syntax for a PROC step is:

PROC name [DATA=dataset [dsoptions] ] [options];
[other PROC-specific statements;]
[BY varlist;]
RUN;


where:
name:------ identifies the procedure you want to use.
dataset:---- identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used.
dsoptions:-- specifies the data set options to be used.
varlist:----- specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables.
options:---- specifies the PROC-specific options to be used.

/***********************************************************/

1)
PROC COMPARE compares two SAS datasets with each other and detects observations / variables that do not match with each other. The BASE defines the data set that SAS will use as a basis for comparison.

PROC COMPARE BASE = mydataset
COMPARE = otherds;
RUN;

/***********************************************************/

2) SAS Language allows users to assign their own formats to values in their data. Usually, this involves users asking SAS to replace numbers in their data with some kind of labels. The most common example is the user request that: SAS take '1's and '2's in a variable called SEX and format those values as the words 'female' and 'male' when they are displayed in output.

PROC FORMAT;
VALUE sexfmt
1="Female"
2="Male";
RUN;


FORMAT sex sexfmt.;
O U
-----------------------
1 female
2 male
2 male
1 female
2 male

O: original data, U: output using user-defined format

The original data stay the same -- coded as numbers -- but whenever the format is requested, the output shows the assigned labels instead of the original numbers. These kinds of formats are called 'user-defined formats'. The formats already known to SAS are called SAS formats.
-
Ten Things You Should Know About PROC FORMAT:

/***********************************************************/

3)
Every PROC SQL query must have at least one SELECTstatement. The purpose of the SELECT statement is to name the columns that will appear on the report and the order in which theywill appear. The FROM clause names the data set from which the information will be extracted from.

PROC SQL;
SELECT STATE, SALES,
(SALES * .05) AS TAX
FROM USSALES;
QUIT;


It has the functionality of DATA and PROC steps into a single step. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new table or view all in one step.

/***********************************************************/

4)
PROC CONTENTS describes the structure of the data set rather than the data values. Displays valuable information at the...
  • Data set level
    Name
    Engine
    Creation date
    Number of observations
    Number of variables
    File size (bytes)
  • Variable level
    Name
    Type
    Length
    Formats
    Position
    Label

PROC CONTENTS DATA=temp (options);
RUN;

Few options:-
Position: output lists the variables by their position in the data set (default is alphabetical).
Short: output is just the variable names in a row by row format.
Out=filename: creates a data set where each observation is a variable from the original data set.

/***********************************************************/

5) Values of all the variables in the data set will be listed (by PROC PRINT) unless a variables (VAR) statement accompanies the procedure. The VAR statement can also be used to specify the order in which variables are to be printed.

PROC PRINT DATA=temp (options);
RUN;

Few options:-
N: The number of observations is printed after the listing.
UNIFORM: This option specifies that the values for each variable be printed in the same columns on every page of the output.
DOUBLE: This option forces SAS to double space the output.
ROUND: This option causes SAS to round variables being summed. NOOBS: This option causes suppression of the observation number in the printed output.
LABEL: Use information from LABEL statements (where defined) as the column headings rather than the variable names.

/***********************************************************/

6) The FREQ procedure produces one-way frequency tables and n-way cross tabulations. For example, to obtain frequency counts for all variables in the SURVEY data set, enter: PROC FREQ;

PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;
RUN;

Observe that:

  • Tables can be produced for numeric and character variables.
  • More than one TABLES statement can be used.
  • When the TABLES statement is omitted, one-way frequency tables are printed for each variable in the data set.
  • There is no limit to the number of variables in a TABLES request.
  • For a table of two or more variables an * must be given between each pair of variable names.

The following example illustrates the generation of one-, two-, and three-way frequency tables:
PROC FREQ;
TABLES sex;
TABLES sex * r1;
TABLES sex * r1 * r2;
The first TABLES statement generates a one-way frequency table for the variable SEX. The second TABLES statement generates a two-way frequency table for the row variable SEX and the column variable R1. The third TABLES statement generates a three-way frequency table for the variables SEX, R1, and R2; R1 is the row variable and R2 is the column; a new table is printed for each different value of the variable SEX.

/***********************************************************/

7) The MEANS procedure is used to produce simple descriptive statistics on numeric variables in a SAS data set.

PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
RUN;

/***********************************************************/

8) PROC TRANSPOSE helps to reshape the long data to wide one.

data long1;
input famid year faminc;
cards;
1111 96 40000
1111 97 40500
1111 98 41000
2222 96 45000
2222 97 45400
2222 98 45800
3333 96 75000
3333 97 76000
3333 98 77000 ;

run;
proc sort data=long1; by famid; run;

*incyr: income year, prefix is used for identification;
PROC TRANSPOSE DATA=long1 OUT=wide1 PREFIX=incyr;
*same sort order is used as in proc sort statement;
BY FAMID;
*this is the variable that would become as: identifier/header;
ID YEAR;
*this is the variable that would become as: observation;
VAR FAMINC;
RUN;

Result:

Obs famid _NAME_ incyr96 incyr97 incyr98

1--- 1111 --faminc- 40000 -40500 -41000
2--- 2222 --faminc- 45000 -45400 -45800
3--- 3333 --faminc- 75000 -76000 -77000

/***********************************************************/

9) PROC SORT: sort's the data file. The program below sorts the data file called "DRAFT" on the variable "USUBJID and VISIT" and saves the sorted file as "DRAFT1". The original file remains unchanged since we used out=DRAFT1 to specify that the sorted data should be placed in DRAFT1.

PROC SORT DATA=DRAFT OUT=DRAFT1;
BY USUBJID VISIT;
RUN ;

*new data;
DATA DRAFT2;
SET DRAFT1;
BY USUBJID VISIT;

*if first dot is used to pick out the first obs;
IF FIRST.VISIT;
RUN;

(NODUPKEY option: is used with proc sort to remove duplicates)

/***********************************************************/

10) PROC REPORT: Influence the way in which a data has to be presented and the compute block introduces lines of text into the report.

PROC REPORT .......;
column sex;
.....;
COMPUTE BEFORE sex;
line@1 'Gender';
line@1 sex $200.;
endcomp;
RUN;

Monday, March 10, 2008

TRANWRD, COMPRESS & INDEXW

-
TRANWRD Function: "Replace all ocurrences of a word".
name=tranwrd(name, "Miss", "Ms.");
put name;

Value: Miss. Joan Smith
Result: Ms. Joan Smith

More:
http://www.asu.edu/sas/sasdoc/sashtml/lgref/z0215027.htm

COMPRESS Function: "Removes specific characters from a character string".
a='AB C D ';
b=compress(a);
put b;

Value: 'AB C D ';
Result: ABCD

More:
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000212246.htm

INDEXW Function: "Returns the first position in the character-value that contains the find-string". If the find-string is not found, the function returns a 0.
indexw(STRING1,"the");

Value: STRING1 = "there is a the here"
Result: 12 (the word "the")

More:
http://support.sas.com/publishing/pubcat/chaps/59343.pdf
-

Wednesday, March 5, 2008

Formulas for creating Vital Signs Dataset

-
For temperature:
Conversion of Fahrenheit to Celsius:
Temperature_in_Celsius = (5/9)*(Tf-32); *where Tf is the "temperature in fahrenheit";
Conversion of Celsius to Fahrenheit :
Temperature_in_Fahrenheit = (9/5)*Tc+32; *where Tc is the "temperature in celsius";
height:
Conversion of Inches to Centimeters:
Centimeters = inches x 2.54;
Conversion of Centimeters to Inches:
Inches = centimeters x 0.3937;
weight:
Conversion of Pounds to Kilograms;
Kilograms = lbs / 2.2;
Conversion of Kilograms to Pounds:
Pounds = kg x 2.2;
& bmi:
*Weight in Pounds:
BMI = (Weight in Pounds / (Height in inches) x (Height in inches)) x 703
*Weight in Kilograms:
BMI = (Weight in Kilograms / (Height in Meters) x (Height in Meters))
-