CODES READYMADE: 2008

Saturday, December 20, 2008

Capturing the data into a macro

For the display of (N=xx) in a summary table, we have to capture the data into a macro. There are two different methods for capturing the data into a macro. One is using PROC SQL & the other one is using CALL SYMPUT.

***** GETTING "N" FOR EACH TRT GROUP *****;

1) PROC SQL:

data total;

set main(where=(saf=1));

trtgrp = input(treat,best.);

output;

trtgrp = 3;

output;

run;

proc sql noprint;

select count(distinct usubjid) into : trt1 through : trt3 from total group by trtgrp;

quit;

%put &trt1 &trt2 &trt3;

2) SYMPUT:

proc freq data = total noprint;

tables trtgrp / out = deno(drop = percent) list missing;

run;

data _null_;

set deno;

if trtgrp=1 then call symput('trt1',trim(left(put(count,3.))));

if trtgrp=2 then call symput('trt2',trim(left(put(count,3.))));

if trtgrp=3 then call symput('trt3',trim(left(put(count,3.))));

run;

Thursday, December 18, 2008

DUPOUT=option

-
The following data set contains duplicate observations for Patients (PT:- 01 & 02) :
-
PT NO
01 12
01 3
01 56
02 2
02 12
03 12
-
How does the dupout option work on this data?????
-
proc sort data = test out = dedup
-------------------------dupout = dups nodupkey;
--by pt;
run;

It works in a similar manner to nodupkey. Here the "duplicates are detected by the nodupkey option" & are directed into an output dataset with DUPOUT!

The dedup is an output data set that contains only the original observations.
PT NO
01 12
02 2
03 12
-
Where as the dups is an output data set that contains all duplicate observations.
PT NO
01 3
01 56
02 12
-

Tuesday, December 16, 2008

Use of SUBSTR with INDEX function

-
The syntax for substr function is:
SUBSTR(string, starting position, <length>)
-
It is easy to determine the starting position or the length of the string (if it contains DATE).
-
For example:
day=substr(date,1,2);

but the same substr function is not applicable if the string contains NAME. The reason is that the length of each name varies from one observation to the other. Hence the use of delimiters (like: "." ",") can be used to locate the position within the string.
-
Below is an example for the use of substr with index function:
-
*** INDEX IS USED TO FIND WHERE TO BEGIN & END A SUBSTRING ***;
-
data fullname;
length fname lname $20;
set nameds;
fname= substr(name, 1, index(name, ',') - 1);
lname = trim(left(substr(name, index(name, ',') + 1)));
run;
-
Output:
name----------------------- fname---- lname
SOMA, SUNDARAM---- SOMA---- SUNDARAM
SHIVA, SHEKARAN---- SHIVA---- SHEKARAN
-

Saturday, November 1, 2008

Y FREQ ??? use MEANS...

No dummy dataset... No data doubling...

“Making of Table” has become much simpler using PROC MEANS procedure! PROC MEANS when used in combination with MULTILABEL format and PRELOADFMT option gives all possible values of a formatted variable.

For example: If there are two different treatment groups (1 & 2) in a table and if the third column has to display a total count of both the treatment groups (1+2), then the MULTILABEL format is used.

proc format;

value $tmt (multilabel)

'1'='trt1'

'2'='trt2'

'1','2' = 'Total';

run;

This option allows the user to define:

overlapping ranges across labels and to
assign another label to the same value

This format is very efficient when used in conjunction with the means procedure TO PRODUCE THE TOTAL COUNT. Not only the multilabel format but also the use of PRELOADFMT with COMPLETETYPES enhances the efficiency of a program in CREATING ALL POSSIBLE COMBINATIONS OF A VARIABLE.

Now let’s consider three different categories (‘MILD’, ‘MODERATE’ & 'SEVERE') for a variable AEREL and if the dataset does not contain one of the category (‘MODERATE’) and if this same category has to be displayed in the table, then the PRELOADFMT option is used [as it gives instruction to load the FORMAT for the missing category ‘MODERATE’].

-
proc format;

value $rel

'MILD' = 'Mild'

‘MODERATE’ = 'Moderate'

'SEVERE' = 'Severe'

' ' = 'Missing';

run;

-
*USE OF PROC MEANS WITH MULTILABEL & PRELOADFMT;

-
proc means data=test completetypes nway chartype missing noprint;
by visit;
class treat aerel/preloadfmt missing mlf order=formatted;
format treat $tmt. aerel $rel.;
output out=cnt1 n=ccnt;
run;
-

More: http://www.lexjansen.com/pharmasug/2007/cc/cc10.pdf

Sunday, October 26, 2008

Display the "CLASSIFICATION", Not the "CODE"!

The Anatomical Therapeutic Chemical Classification System (ATCCS) is used for the classification of drugs. It classifies drugs at 5 different levels; based on the organ or system on which they act and/or their therapeutic and chemical characteristics.

For example, the ATCCS for drug SUMATRIPTAN is given below:

ATC1 TERM: NERVOUS SYSTEM, N
ATC1 TERM CODE: N

ATC2 TERM: ANALGESICS, N02
ATC2 TERM CODE: N02

ATC3 TERM: ANTIMIGRAINE PREPARATIONS, N02C
ATC3 TERM CODE:N02C

ATC4 TERM: SELECTIVE SEROTONIN (5HT1) AGONISTS, N02CC
ATC4 TERM CODE: N02CC

ATC5 TERM: SUMATRIPTAN, N02CC01
ATC5 TERM CODE: N02CC01

This "ATC LEVEL 2 TERM" may be captured in CONCOMITANT MEDICATION DATASET, which when captured should display only the CLASSIFICATION: "ANALGESICS" and not the CODE: "N02". Hence the code is trimed off from the chemical classification.

***** USE OF REVERSE FUNCTION TO TRIM OFF CODE *****
-
data cmed;
set med;
x=trim(left(reverse(atc2t)));
y=substr(x,6,150);
z=reverse(y);
atc2term=strip(z);
run;

Tuesday, October 21, 2008

A code to split numbers!

The dummy dataset code has numbers separated by a hash or space. The code that follows splits those numbers and output each number into a unique observation.

***** DUMMY DATASET WITH #, " " *****

data TEST;
length pt $20 txt $40;
pt ='1';
txt= '#12, #3, #56 34-98';
output;
pt ='2';
txt= '#2 #12';
output;
pt ='3';
txt= '#12';
output;
run;

-
***** SPLIT & OUTPUT NUMBERS *****
-
data TEST1;
set TEST;
stat:
do i = 1 to length(txt);
-
if substr(txt,i,1) = '#' then do;
-
if substr(txt,i+2,1) in (',','') then do;
cno = substr(txt,i+1,1); *3: for single digit;
txt = substr(txt,i+2);
end;
else if substr(txt,i+3,1) in (',','') then do;
cno = substr(txt,i+1,2); *12: for two digit;
txt = substr(txt,i+3);
end;
-
output;
if length(trim(left(txt))) > 2 then goto stat;
leave;
end;
-
end;
run;

***** SAME CODE FOR DATA WITHOUT # *****

-
data TEST1;
set TEST;
do i = 1 to length(txt);

-
if substr(txt,i+1,1) in (',','') then do;
cno = trim(left(substr(txt,i,1)));
i=i+2;
end;
else if substr(txt,i+2,1) in (',','') then do;
cno = trim(left(substr(txt,i,2)));
i=i+3;
end;

-
if cno ne '' then output;
end;
run;

Thursday, October 9, 2008

Retain for Repeated Visit

-
***** RETAIN THE VALUE FOR REPEATED VISIT *****
-
data test;
set vis;
by pt visitnum vstdt;
retain x;
if first.visitnum then x = visitnum;
else x=x+.1;*add .1 for repeated visit;
drop visitnum;
run;
-

Sunday, September 7, 2008

Using wildcard to reduce code

Using a wildcard (:) in the variable lists would reduce the code to a great extent.
-
Colon can be used as a wildcard in variable lists. In the example given below, (keep=v:) keeps all variable names that begins with letter v.
-
data x(keep=v:) ;
v1=1 ;
v2=10 ;
v3=100 ;
v4=1000 ;
v5=10000 ;
w1=1;
w2=10;
run;

More: http://support.sas.com/publishing/pubcat/chaps/55513.pdf
-

Thursday, September 4, 2008

Four different ways to find duplicates

We must have probably come across duplicate records in our dataset & tried removing those by using nodup and nodupkey options.

I

Use of nodup will remove exact duplicates as given below:

proc sort data = draft out = draft1 nodup;
by _all_ ;
run;

NAME AGE SEX

Rama _27_ M

-
II

-
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)

proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;

PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT

-
III

-
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.

data check;

set dups;

by id;

if first.id ne last.id; *brings the duplicates;

run;

This was quite confusing to me as my little brain perceived it this way:

"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....

ID
001
001

but SAS THINKS this way

ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1

Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.

-
IV

The last way is the use of freq:

proc freq data = draft noprint ;
table ID / out = draftdup (keep = ID Count where = (Count > 1)) ;
run;

More: http://www2.sas.com/proceedings/sugi31/164-31.pdf

Wednesday, August 27, 2008

Concatenate a set of Numbers

Some time a programmer has to concatenate a set of Numbers. For example an inclusion criteria listing may demand the display of “Number of inclusion criteria’s (1-7) present in the dataset” as column. In the dataset given below, “NO equals the subjects who has not satisfied inclusion criteria". To display this, the following code will concatenate the (inclusion criteria) numbers to bring the inclusion criteria not satisfied per subject as 1, 2, 3, 4, 5, and 6.

SUBJECT AGE IN1 IN2 IN3 -IN4 IN5 -IN6 IN7
00001111 --43- YES YES YES .----YES YES YES
00001222 --20- NO –NO -NO -NO NO -NO -.

(inclusion criteria not satisfied is stored as a Numeric value 2 with format NO)

data incl1;
length no $ 50;
set incl;
retain no;
array a[7] in1 - in7;
no = '';*no is the variable to HOLD THE VALUE;
do i=1 to 7;*loop is RUN TILL i=8;
if a[i]=2 then do;*if loop satisfies the condition "NO";
if no eq '' then no=trim(left(put(i,best.))); *then ASSIGN 1 to no;
else if no ne '' then no=trim(left(no))II', 'IItrim(left(put(i,best.)));
*now CONCATENATE no:1 with 2;
end;
end;
run;

Output for this subject 00001222 will be:

-
1, 2, 3, 4, 5, 6
-

Tuesday, August 26, 2008

Simple way of % Display!

***** DISPLAY OF PERCENTAGE IN PROC REPORT ******;

data new;
input cnt deno;
datalines;
53 54
9 -10
12 12
2 ---9
1 ---5
;
run;

data new1;
set new;

-
if cnt ne . and deno ne . then per = (cnt*100)/deno;

*cnt:COUNT, deno:DENOMINATOR, per:PERCENTAGE;

-
if (per ne 0 and per ne .) and per ne 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.1)))'%)',8.));
else if per eq 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.)))'%)',8.));

-
run;
proc print; run;
-

Together it takes a TOTAL of 11 spaces for displaying: xx (xx.x%)

3. -=> is for COUNT xxx
4.1 => is for PERCENTAGE xx.x / 4. => is for % (if 100) xxx
+3 => is for SPECIAL CHARACTERS (%)
+1 => is for a space in b/w COUNT & PERCENTAGE

Thursday, August 21, 2008

Display 'No Observations'

Code for Displaying 'No Observations':

***** FINAL IS THE DATASET WITH 0 OBSERVATION *****;

-
proc sql noprint;
select count (*) into : n from final;
run;

*if observation is not equal to 0 set final, else if the observation is equal to 0 then display no observation in the first column (usubjid);

-
data disp;
length usubjid $20;
if &n^ = 0 then set final;
else usubjid = 'No Observations';
run;

Thursday, August 14, 2008

Wanna underline a header?

proc report data= ds split='^' nowd headline headskip spacing=1 missing;
by descending treat;

-
column (rest1 rest2 ("_First Value_" (fval1 fval2)) ("_Second Value_" (sval1 sval2 sval3)) rest3 rest4);

-
define rest1 / order-- width = 22 left-- "rest of the values (1)" spacing = 0;
define rest2 / order-- width = 22 left-- "rest of the values (2)";
define fval1 / display width = 22 center "first:-value (1)";
define fval2 / display width = 22 center "first:-value (2)";
define sval1 / display width = 22 center "second value (1)";
define sval2 / display width = 22 center "second value (2)";
define sval3 / display width = 22 center "second value (3)";
define rest3 / display width = 22 center "rest of the values (3)";
define rest4 / display width = 22 center "rest of the values (4)";

-
break after rest1 / skip;
footnote1 "&under";
footnote2 "Required footnote for display.";

-
run;

The above example: shows the use of "_" to underline the header(First Value & Second Value). These are the variables (fval1 fval2, sval1 sval2 sval3) which will get displayed under the header.
-

Saturday, August 2, 2008

Generating missing values in a Table

USUBJID VISITNUMBER
10011001 1
10011001 7
10011002 1
10011003 1

In the above example, the usubjid (10011002 and 10011003) has visitnumber 7 missing. The proc transpose statement helps in generating these missing values.

** PROC TRANSPOSE FOR GENERATING MISSING VALUES **;

proc transpose data = d1 out = d2 prefix = v;
by usubjid treatnc ;
id vis;
*vis is a character variable which contains visitnumber (1,7);
var vala valb valc vald vale valf valg valh vali valj;
run;

When transposed the data becomes:
USUBJID V1 V7
10011001 -0 --0
10011002 -0 --''
10011003 -0 --''

** RE(cursive) TRANSPOSE: TO BRING THE DATA BACK TO ITS SHAPE **;

proc transpose data = d2 out = d3;
by usubjid treatnc ;
id _name_;
*all values vala - valj becomes identifier again;
var v1 v7;
run;

The data would now look like:
USUBJID _name_ vala
10011001 -v1-------0
10011001 -v7-------0
10011002 -v1-------0
10011002 -v7-------''
10011003 -v1-------0
10011003 -v7-------''

proc sort data = d3;
by usubjid treatnc _name_;
run;

** BRING THE VARIABLES ONE AFTER THE OTHER FOR FREQ **;

proc transpose data = d3(rename = (_name_ = name)) out = d4;
by usubjid treatnc name;
var vala valb valc vald vale valf valg valh vali valj;
run;

Friday, August 1, 2008

Why two proc reports?????

When proc report had more columns that do not fit in one page, I tried using two different proc reports. However the disadvantage of using two different proc report is, the report one: gets generated first…

To solve this problem, define a variable with ID. This ID variable appears at the left of every page of a report. Hence the use of ID ensures that you can identify each row of the report when the report contains more columns that will not fit in one page.

In the example given below, the variables (param & grade) act as an id variable. In the first page, these variables (param & grade) appear with other variables (pv1 pv2 pv3 pv4 pv5 pv6). Same way in the next page, it appears with the rest of the variables (pv7 - total).

options nocenter;

proc report data=final headline headskip nowd missing spacing=1 split ='*';

columns (param grade pv1 pv2 pv3 pv4 pv5 pv6 pv7 pv8 pv9 pv10 pv11 total);

define param- / order id left width = 30 "Parameter" spacing = 0 flow;

define grade-- / order id left width = 7- "Grade";

define pv1-----/---------. left width = 10 "Visit 1*(N=&vis1)";

define pv2-----/---------. left width = 10 "Visit 2*(N=&vis2)";

define pv3-----/---------. left width = 10 "Visit 3*(N=&vis3)";

define pv4-----/---------. left width = 10 "Visit 4*(N=&vis4)";

define pv5-----/---------. left width = 10 "Visit 5*(N=&vis5)";

define pv6-----/---------. left width = 10 "Visit 6*(N=&vis6)";

define pv7-----/---------. left width = 10 "Visit 7*(N=&vis7)";

define pv8-----/---------. left width = 10 "Visit 8*(N=&vis8)";

define pv9-----/---------. left width = 10 "Visit 9*(N=&vis9)";

define pv10----/--------.. left width = 10 "Visit 10*(N=&vis10)";

define pv11----/--------.. left width = 10 "Visit 11*(N=&vis11)";

define total-----/-------... left width = 10 "Overall*(N=&total)";

break after param / skip;

footnote1 "&under";

run;

Wednesday, June 18, 2008

SAS Functions

Mathematical and Statistical Functions

More: http://analytics.ncsu.edu/sesug/2005/CC04_05.PDF

Saturday, June 14, 2008

Verbatim Terms and Medical Dictionaries

"These are answers that are recorded word by word (from a subject)" and are often used as a backup of research findings in the final report. These verbatim terms entered in the CRF's are then matched with the "standardized medical dictionaries", which are essential for medical coding. A little more on use of Dictionary is given below:
-
Dictionary name and its Use

WHOART: World Health Organization Adverse Reaction Terminology:

is used for Adverse event (AE) coding

COSTART: (FDA’s) Coding Symbols for a Thesaurus of Adverse Reaction Terms:

is used for Adverse event (AE) coding

MedDRA: Medical Dictionary for Regulatory Activities Terminology:

is used for AE / Med. History / Terms coding

ICD9CM: International Classification of Diseases 9th Revision Clinical Modification:

is used for Medical History coding

WHODD: World Health Organization Drug Dictionary (B1, B2, C):

is used for Medication coding (concomitant)

The WHO Dictionaries include:

WHODD : WHO Drug Dictionary
WHOHD : WHO Herbal Dictionary
WHODDE : WHO Drug Dictionary enhanced.

*ATC: Anatomical Therapeutic Chemical classification is a classification used for coding the therapeutic use of drugs. This classification categorizes substances at five different levels according to the system or organ they act on and according to the chemical, therapeutic & pharmacological properties.

More: http://www.riteshmandal.com/medcoding.htm

Friday, June 13, 2008

"AETERM is Verbatim Name" and "AEDECOD is Preferred Term"

While doing AE Listings, there exists some confusion as these words (AETERM & Preferred TERM) sound similar. Every time I annotate AE Listing, I map “–TERM Variable for Preferred Term” and “–DECOD Variable for Verbatim Name” which is exclusively wrong. I have posted it here just to remind myself "AETERM (i.e., reported term) is Verbatim Name" and "AEDECOD is Preferred Term". Listed below is the CDISC Guidelines for Variables:

Define Variables and CDISC Guidelines:

System Organ Class (SOC) / --BODSYS:
It is the Body System or preferred system organ class that is involved in an event or measurement from the standard hierarchy.

Preferred Term (PT) / --DECOD:
It is the Dictionary derived text description of the topic variable. Equivalent to the Preferred Term (PT in MedDRA). The dictionary name and version should be provided in the comments column in the data definition file.

--TERM:
It is the Topic variable for an event observation, which is the Verbatim Name of the event. Use --MODIFY for modifying the verbatim term and --DECOD for coding --MODIFY or --TERM.

--MODIFY:
It is the Modified Text, if the topic variable is modified as part of a defined procedure.

An Eg. of MedDRA Hierarchy

System Organ Class (SOC) : Gastrointerstinal Disorders

High Level Group Term (HLGT): Dental and Gingival conditions

High Level Term (HLT): Dental pian and sensation disorders

Preferred Term (PT): Toothache <=SSC or SMQ

Lowest Level Term (LLT): Toothache

Must Read For SAS Clinical Programmers!

http://www.cdisc.org/pdf/V3CRTStandardV1_2.pdf

It is an informative pdf on "CDISC Submission Data Domain Models"

Saturday, May 10, 2008

Alt Codes for SAS Programming

Listed below, is the set of Alt Codes used for entering symbols in SAS programming.
Instructions for using Alt Codes:

Make sure that the "Number Lock" is switched on.
Hold down the "Alt" key and type the code on keypad.
Do not release the "Alt" key until the alt code is typed.
Special character are seen when the key is released.

Alt Codes:

Use of space for ‘alignment’ may not work at times. In case if it doesn't work, one can try using Alt 255, which will replace the space with an invisible character and make the alignment look perfect!

More: http://www.usefulshortcuts.com/alt-codes/

Sunday, May 4, 2008

SAS Professionals

2008 SAS Employee Video Contest Winner. A video that I enjoyed watching.......

Wednesday, April 2, 2008

For Partial Dates (UN-NOV-2006)

-
*pardt:- partial date;
if pardt ne '' then do;
date = substr(pardt,1,2);
month= substr(pardt,4,3);
year = substr(pardt,8,4);
if (month > '' and date >'' and year >'') then do;*;
if compress(date,'UN')='' then ndate='01';
if compress(ndate,'01')='' then do;**ndate;
if month > '' and upcase(month) in
('JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC') then do;*** month;
if compress(year,'0123456789')='' then do;****year;
endtc=input(compress(ndatemonthyear),date9.);
end;****;
format endtc yymmdd10.;
endtc1=put(endtc,yymmdd10.);
endtc1=trim(left(endtc1));
enddate=substr(endtc1,1,7);
end;***;end;**;
end;*;
end;
-
More: http://www.clinplus.com/vboard/showthread.php?t=8
-

The "n" means no separator

*new data;
data date;
input visitdt DATE9.;
datalines;
26MAY2006
;
run;
*conversion of date9.;
data date1;
set date;
format visitdt DATE9.;
*"n" means no separator;
visitdtc=put(visitdt, yymmddn8.);
run;
proc contents data=date1; run;

Output:
20060526

More: http://ftp.sas.com/techsup/download/sample/datastep/dateformat.html

How to insert a picture into a Power Point Slide

How to Add a Picture / Transparent background

Open a blank PowerPoint presentation, or a completed presentation to which you want to add a background. Note: If your completed presentation contains master slides, you might not want to apply the background to the master slides and risk unwanted changes to your presentation. The alternative is to add the background to one slide at a time.

To add a background to all the slides in a blank presentation, point to Master on the View menu, click Slide Master, and perform the following steps on the slide master. To add a background to a single slide, simply select a slide and follow the same steps.
Choose the logo or clip art you want to use as a slide background. On the Insert menu, point to Picture, and then click Clip Art to create a background from a new picture, or click From File to use a picture you have in your files. Keep in mind that not all clip art makes an appropriate background. You might have to experiment a little.
Select the logo or clip art you want and insert it in the presentation. The logo or clip art appears on the slide master, but is not yet transparent.

Right-click the logo or clip art on the slide, and then click Format Picture.

On the Picture tab, click the arrow next to the Color box, click Washout (Watermark in PowerPoint 2000), and then click OK.

With the logo or clip art still selected, right-click and then click Save as Picture. Save the image where you can find it. Delete the original image from the slide. You'll replace it with the transparent version you just prepared and saved.
On the Format menu, click Background. In the Background dialog box, click the arrow in the box under Background fill, click Fill Effects, and on the Picture tab, click Select Picture. Find and select the picture you formatted as a transparent background, click Insert, click OK, and then click Apply.
Your transparent image has been added to the master slide as a background, and the new background will appear on every new slide you add to the presentation. If you selected a single slide, the background has been added to the slide you selected.

More:http://office.microsoft.com/en-us/powerpoint/HA010797741033.aspx?pid=CL100626991033

Saturday, March 22, 2008

Character array for creating a dummy dataset

-
data one;
input visit$ 1-10;
datalines;
DAY 8
DAY 15
DAY 29
DAY 57
DAY 85
;
run;
-
*creating a dummy dataset with 3 different xxorres for each visit present in dataset one;
data dummy;
set one;
length xxorres $ 20;
array x{*} _character_;
do i = 1 to dim(x);
if i gt 0 then do;
xxorres='COMPLETE CLEARANCE'; output;
xxorres='INCOMPLETE CLEARANCE'; output;
xxorres='UNABLE TO ASSESS'; output;
i=i+1;
end;
end;
run;
proc sort data=dummy; by i; run;
proc print; run;
-
More: http://dist.stat.tamu.edu/flash/SAS/
-
*display of zero for missing values (using:character array);
data display;
set disp;
*new: is array name, l1 to l4: are character variable;
array new l1 - l4 _character_;
do i=1 to dim(new);
if new{i}='' then new{i}='0';
end;
run;
-

Tuesday, March 18, 2008

"When" for data analysis

"When the ORRES is populated, the STAT (& REASND) remains blank. When the STAT (& REASND) is populated, the ORRES remains blank".

"When both the raw dataset and the derived dataset names are same, the derived dataset repalce the raw dataset".

"When two dataset's are merged & one of them have few missing usubjid's:- if a and b; condition is preferred to avoid blanks.

"When TERM is missing, the derived dataset have blank values". Ex:- To drop observations without value:- if aeterm ne " "; condition is used.

"When the derived dataset have both TERM and STAT:- if aeterm ne " " or aestat ne " "; condition is used".

"When the result is not missing in raw dataset then ORRES=result, STRESC=ORRES and STRESN=STRESC(the numeric form of STRESC)". Usually ORRES & STRESC are displayed as "Y" / "N" instead of "Yes" / "No".

"When the STRESC is "Y" / "N then STRESN is displayed as "blank" (rather than 1 / 2) ".

"When END DATE is " " then ENRF="ONGOING"; Example: MH dataset".

"When a PARTIAL DATE is present in a raw dataset, then the same is displayed in the derived dataset".

"When a sequence number is assigned, it remains unique for all observations within a usubjid".

When working on display of DAYS, Programmer's check for DAY:- not displayed as 0".

When the variables in the SAS data set are not in a desired order then one of the following (ATTRIB, ARRAY, FORMAT, INFORMAT, LENGTH, RETAIN) is used before the SET statement to re-order it.

When a split character is used in the listing/table programming then the same split character must be used in the proc report with a FLOW option. This flow option hold back the direct display of split chatacter "*" in the output.

When the following warning is received in the log, the PROC PRINT option is removed from the program. "WARNING: Data too long for column "TEST"; truncated to 125 characters to fit."

When an order is missing in the final dataset of a table and if the same order is being used in the proc report: - the row for which the order is missing will not be displayed.

When different order numbers are given for the "result HEADER" and the "RESULT"(in listing/table) & if the same order variable is used in the break after statement (break after ord/ skip;) , there appears a blank line. To avoid this same order number is given for both the header and result.

Age (years)-> result header

blank line ------------
N->result

Saturday, March 15, 2008

10 Essential Proc's

List of 10 "Essential Proc's" from net, the general syntax for a PROC step is:

PROC name [DATA=dataset [dsoptions] ] [options];
[other PROC-specific statements;]
[BY varlist;]
RUN;

where:
name:------ identifies the procedure you want to use.
dataset:---- identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used.
dsoptions:-- specifies the data set options to be used.
varlist:----- specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables.
options:---- specifies the PROC-specific options to be used.

/***********************************************************/

1) PROC COMPARE compares two SAS datasets with each other and detects observations / variables that do not match with each other. The BASE defines the data set that SAS will use as a basis for comparison.

PROC COMPARE BASE = mydataset
COMPARE = otherds;
RUN;

/***********************************************************/

2) SAS Language allows users to assign their own formats to values in their data. Usually, this involves users asking SAS to replace numbers in their data with some kind of labels. The most common example is the user request that: SAS take '1's and '2's in a variable called SEX and format those values as the words 'female' and 'male' when they are displayed in output.

PROC FORMAT;
VALUE sexfmt
1="Female"
2="Male";
RUN;

FORMAT sex sexfmt.;
O U
-----------------------
1 female
2 male
2 male
1 female
2 male

O: original data, U: output using user-defined format

The original data stay the same -- coded as numbers -- but whenever the format is requested, the output shows the assigned labels instead of the original numbers. These kinds of formats are called 'user-defined formats'. The formats already known to SAS are called SAS formats.

Ten Things You Should Know About PROC FORMAT:

http://www.ats.ucla.edu/stat/sas/library/nesug00/bt3014.pdf

/***********************************************************/

3) Every PROC SQL query must have at least one SELECTstatement. The purpose of the SELECT statement is to name the columns that will appear on the report and the order in which theywill appear. The FROM clause names the data set from which the information will be extracted from.

PROC SQL;
SELECT STATE, SALES,
(SALES * .05) AS TAX
FROM USSALES;
QUIT;

It has the functionality of DATA and PROC steps into a single step. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new table or view all in one step.

/***********************************************************/

4) PROC CONTENTS describes the structure of the data set rather than the data values. Displays valuable information at the...

Data set level
Name
Engine
Creation date
Number of observations
Number of variables
File size (bytes)
Variable level
Name
Type
Length
Formats
Position
Label

PROC CONTENTS DATA=temp (options);
RUN;

Few options:-
Position: output lists the variables by their position in the data set (default is alphabetical).
Short: output is just the variable names in a row by row format.
Out=filename: creates a data set where each observation is a variable from the original data set.

/***********************************************************/

5) Values of all the variables in the data set will be listed (by PROC PRINT) unless a variables (VAR) statement accompanies the procedure. The VAR statement can also be used to specify the order in which variables are to be printed.

PROC PRINT DATA=temp (options);
RUN;

Few options:-
N: The number of observations is printed after the listing.
UNIFORM: This option specifies that the values for each variable be printed in the same columns on every page of the output.
DOUBLE: This option forces SAS to double space the output.
ROUND: This option causes SAS to round variables being summed. NOOBS: This option causes suppression of the observation number in the printed output.
LABEL: Use information from LABEL statements (where defined) as the column headings rather than the variable names.

/***********************************************************/

6) The FREQ procedure produces one-way frequency tables and n-way cross tabulations. For example, to obtain frequency counts for all variables in the SURVEY data set, enter: PROC FREQ;

PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;
RUN;

Observe that:

Tables can be produced for numeric and character variables.
More than one TABLES statement can be used.
When the TABLES statement is omitted, one-way frequency tables are printed for each variable in the data set.
There is no limit to the number of variables in a TABLES request.
For a table of two or more variables an * must be given between each pair of variable names.

The following example illustrates the generation of one-, two-, and three-way frequency tables:
PROC FREQ;
TABLES sex;
TABLES sex * r1;
TABLES sex * r1 * r2;
The first TABLES statement generates a one-way frequency table for the variable SEX. The second TABLES statement generates a two-way frequency table for the row variable SEX and the column variable R1. The third TABLES statement generates a three-way frequency table for the variables SEX, R1, and R2; R1 is the row variable and R2 is the column; a new table is printed for each different value of the variable SEX.

/***********************************************************/

7) The MEANS procedure is used to produce simple descriptive statistics on numeric variables in a SAS data set.

PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
RUN;

/***********************************************************/

8) PROC TRANSPOSE helps to reshape the long data to wide one.

data long1;
input famid year faminc;
cards;
1111 96 40000
1111 97 40500
1111 98 41000
2222 96 45000
2222 97 45400
2222 98 45800
3333 96 75000
3333 97 76000
3333 98 77000 ;
run;
proc sort data=long1; by famid; run;

*incyr: income year, prefix is used for identification;
PROC TRANSPOSE DATA=long1 OUT=wide1 PREFIX=incyr;
*same sort order is used as in proc sort statement;
BY FAMID;
*this is the variable that would become as: identifier/header;
ID YEAR;
*this is the variable that would become as: observation;
VAR FAMINC;
RUN;

Result:

Obs famid _NAME_ incyr96 incyr97 incyr98

1--- 1111 --faminc- 40000 -40500 -41000
2--- 2222 --faminc- 45000 -45400 -45800
3--- 3333 --faminc- 75000 -76000 -77000

/***********************************************************/

9) PROC SORT: sort's the data file. The program below sorts the data file called "DRAFT" on the variable "USUBJID and VISIT" and saves the sorted file as "DRAFT1". The original file remains unchanged since we used out=DRAFT1 to specify that the sorted data should be placed in DRAFT1.

PROC SORT DATA=DRAFT OUT=DRAFT1;
BY USUBJID VISIT;
RUN ;
*new data;
DATA DRAFT2;
SET DRAFT1;
BY USUBJID VISIT;
*if first dot is used to pick out the first obs;
IF FIRST.VISIT;
RUN;
(NODUPKEY option: is used with proc sort to remove duplicates)

/***********************************************************/

10) PROC REPORT: Influence the way in which a data has to be presented and the compute block introduces lines of text into the report.

PROC REPORT .......;
column sex;
.....;
COMPUTE BEFORE sex;
line@1 'Gender';
line@1 sex $200.;
endcomp;
RUN;

Monday, March 10, 2008

TRANWRD, COMPRESS & INDEXW

-
TRANWRD Function: "Replace all ocurrences of a word".
name=tranwrd(name, "Miss", "Ms.");
put name;

Value: Miss. Joan Smith
Result: Ms. Joan Smith

More: http://www.asu.edu/sas/sasdoc/sashtml/lgref/z0215027.htm

COMPRESS Function: "Removes specific characters from a character string".
a='AB C D ';
b=compress(a);
put b;

Value: 'AB C D ';
Result: ABCD

More: http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000212246.htm

INDEXW Function: "Returns the first position in the character-value that contains the find-string". If the find-string is not found, the function returns a 0.
indexw(STRING1,"the");

Value: STRING1 = "there is a the here"
Result: 12 (the word "the")

More: http://support.sas.com/publishing/pubcat/chaps/59343.pdf
-

Wednesday, March 5, 2008

Formulas for creating Vital Signs Dataset

-
For temperature:
Conversion of Fahrenheit to Celsius:
Temperature_in_Celsius = (5/9)*(Tf-32); *where Tf is the "temperature in fahrenheit";
Conversion of Celsius to Fahrenheit :
Temperature_in_Fahrenheit = (9/5)*Tc+32; *where Tc is the "temperature in celsius";
height:
Conversion of Inches to Centimeters:
Centimeters = inches x 2.54;
Conversion of Centimeters to Inches:
Inches = centimeters x 0.3937;
weight:
Conversion of Pounds to Kilograms;
Kilograms = lbs / 2.2;
Conversion of Kilograms to Pounds:
Pounds = kg x 2.2;
& bmi:
*Weight in Pounds:
BMI = (Weight in Pounds / (Height in inches) x (Height in inches)) x 703
*Weight in Kilograms:
BMI = (Weight in Kilograms / (Height in Meters) x (Height in Meters))
-

Saturday, February 23, 2008

Industry Reports

The Thomson CenterWatch 2007 Survey of Investigative Sites in the U.S. finds that: sites have rated Kendle, Covance, Omnicare as Top CROs to work with.