CODES READYMADE

Monday, April 27, 2009

A Fwd that I enjoyed reading...

A Priest, a Doctor and a CRO employee die and go to heaven,
they hear God's voice and it says:

“My dear sons I am really very sorry but the heaven is full and I can accommodate only one of you, so to choose the right person, one by one tell me what you have done in your lifetime.”

The Priest goes up first and says:

"Well God I'm a priest, I am your humble servant and have spent all my life working to spread your message."

The Doctor goes up next and says:

"Well I'm a doctor and I have helped thousands of people recover from their illnesses"

The CRO employee goes up says:

"Well I worked in ***** Clinical Research India Pvt Ltd and........", Before the CRO employee could say anything further, the heaven's gate opened and God came with tears in his eyes and said to the CRO employee… Say no more my son come with me, you have already been through HELL...

Wednesday, April 8, 2009

How to read variable names?

The system dataset SASHELP.VCOLUMN is a SAS view. This view is where the attributes (like: variable names, labels and formats) of a dataset are stored as the records of a dataset. This is like an output dataset from the CONTENTS procedure.

For example, in the code below: If the DM dataset is executed, the corresponding attributes of this dataset gets saved in a view called sashelp.vcolumn (which can later be set for analysis).

*** RUN DM DATASET ***;

data dm;
set dm;
run;

*** SET SASHELP.VCOLUMN TO READ VARIABLE NAMES ***;

data test;
set sashelp.vcolumn(where=(libname="WORK" and memname in ("DM"))) end=last;
if find (label, "Date"); *capture date variables;
run;

More: http://www2.sas.com/proceedings/sugi28/150-28.pdf

Saturday, April 4, 2009

Checklists for validation

1) Log Check:

Is the program log clean (with no errors, warnings, uninitialized values, character to numeric conversions and vice versa?)

2) Cosmetic Check:

Does the output match mock up?
Are all words spelled correctly?
Does the number [1] in header match footnote number?
Does each footnote end with a period except reference?
Does the footnote refer to corresponding listing? (For Tables)
Are all variables aligned: Numeric – Right & decimal aligned, Character – Left aligned?
Is there any truncation for character variables?
Are ordering / display of subtitle consistent across all listings?
Does 5 subjects in QC program (a random pick) match output?

3) Logic Check:

Are the TLFs subset for right population?
Does the number of records displayed match with QC program?
Does the output content in each column relate to the annotated specification?
Does the output make logical sense for the given data?
Is there any duplicates in the given data?

Use of #BYVAL(variable-name)

How to customize our title display??? How do we insert a text in the title???

This option #byval helps the user insert text at the position it is placed in the title statement.

For example in the code below:

The use of #byval option with the variable TRTGRP in the BY statement (within a proc step) makes its dynamic value get displayed in the title. As result of which we get different treatment groups displayed in the title part.

title1 "Listing of Patient Population";
title2 "Treatment Group=#byval(trtgrp)";
proc report data=dummy nowd headline headskip spacing=1 missing;
by trtgrp;
column(subj rand saft ittp eval);

define subj/ order_ width = 20 left spacing = 0 "Subject" ;
define rand/ order_ width = 20 center "Randomized";
define saft/ display width = 20 center "Safety" ;
define ittp/ display width = 20 center "ITT";
define eval/ display width = 20 center "Evaluable";

break after subj / skip;
run;

More: http://www.lexjansen.com/pharmasug/2006/technicaltechniques/tt01.pdf

Saturday, February 21, 2009

Labeling with proc transpose

data new;
input subject $ value $ id $ label $;
cards;
111 03 A One
112 03 A One
111 04 B Two
113 03 B Two
114 04 C Three
;
run;

proc sort data=new;
by subject;
run;

proc transpose data=new out=new1;
by subject;
id id;
var value;
idlabel label;
run;

The ID statement identifies a variable whose values will supply the SAS names for variables in the transposed data.

is "FLOW" not working?

Then try this code:

var = compress(temp,compress(temp,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!@#$%^&*()_+?<>{}]["".,/= '));

This code compresses the invisible characters that are present in the variable named temp and it helps in perfect alignment of text with FLOW function.

Saturday, January 31, 2009

Assigning GRADES

data assign;
set draft;
*temperature in celsius;
temp=temperature;

*INCREASE IN TEMPERATURE;
if (temp^= . & temp>= 38) then do;
if 38 <= temp<= 39 then grade= 'GRADE1'; else if 39 < grade=" 'GRADE2';"> 40 then grade= 'GRADE3';
end;
*DECREASE IN TEMPERATURE;
else if (temp^= . & temp<= 35) then do;
if 35 >= temp> 32 then grade= 'GRADE2';
else if 32 >= temp> 28 then grade= 'GRADE3';
else if temp<= 28 then grade= 'GRADE4'; end;

run;

Visit in Window???

data window;
set draft;
by usubjid visitnum;

if visitday ne . & visitday lt 0 then day= 0;
else day= visitday;
*bringing the preceding day using lag function;
preday=lag(day);
*identifying observation through lag of usubjid;
lagusid = lag(usubjid);
*(day-preceding day) gives the day difference;
if usubjid = lagusid then do;
if day ne . & preday ne . then daydiff=(day-preday);
end;

*VISIT WINDOW FOR SCREENING;
if day ne . & visitnum in (1) then do;
if day le 0 then viwind = 'Yes';
else viwind = 'No';
end;
*VISIT WINDOW FOR VISIT 1 2 & 3;
if daydiff ne . & visitnum in (2,3,4) then do;
if 5 le daydiff le 7 then viwind = 'Yes';
else viwind = 'No';
end;
*VISIT WINDOW FOR END OF STUDY;
if daydiff ne . & visitnum in (5) then do;
if 0 le daydiff le 1 then viwind = 'Yes';
else viwind = 'No';
end;
*VISIT WINDOW FOR FOLLOW-UP;
if daydiff ne . & visitnum in (6) then do;
if daydiff le 25 then viwind = 'Yes';
else viwind = 'No';
end;

run;

Saturday, December 20, 2008

Capturing the data into a macro

For the display of (N=xx) in a summary table, we have to capture the data into a macro. There are two different methods for capturing the data into a macro. One is using PROC SQL & the other one is using CALL SYMPUT.

***** GETTING "N" FOR EACH TRT GROUP *****;

1) PROC SQL:

data total;

set main(where=(saf=1));

trtgrp = input(treat,best.);

output;

trtgrp = 3;

output;

run;

proc sql noprint;

select count(distinct usubjid) into : trt1 through : trt3 from total group by trtgrp;

quit;

%put &trt1 &trt2 &trt3;

2) SYMPUT:

proc freq data = total noprint;

tables trtgrp / out = deno(drop = percent) list missing;

run;

data _null_;

set deno;

if trtgrp=1 then call symput('trt1',trim(left(put(count,3.))));

if trtgrp=2 then call symput('trt2',trim(left(put(count,3.))));

if trtgrp=3 then call symput('trt3',trim(left(put(count,3.))));

run;

Thursday, December 18, 2008

DUPOUT=option

-
The following data set contains duplicate observations for Patients (PT:- 01 & 02) :
-
PT NO
01 12
01 3
01 56
02 2
02 12
03 12
-
How does the dupout option work on this data?????
-
proc sort data = test out = dedup
-------------------------dupout = dups nodupkey;
--by pt;
run;

It works in a similar manner to nodupkey. Here the "duplicates are detected by the nodupkey option" & are directed into an output dataset with DUPOUT!

The dedup is an output data set that contains only the original observations.
PT NO
01 12
02 2
03 12
-
Where as the dups is an output data set that contains all duplicate observations.
PT NO
01 3
01 56
02 12
-

Tuesday, December 16, 2008

Use of SUBSTR with INDEX function

-
The syntax for substr function is:
SUBSTR(string, starting position, <length>)
-
It is easy to determine the starting position or the length of the string (if it contains DATE).
-
For example:
day=substr(date,1,2);

but the same substr function is not applicable if the string contains NAME. The reason is that the length of each name varies from one observation to the other. Hence the use of delimiters (like: "." ",") can be used to locate the position within the string.
-
Below is an example for the use of substr with index function:
-
*** INDEX IS USED TO FIND WHERE TO BEGIN & END A SUBSTRING ***;
-
data fullname;
length fname lname $20;
set nameds;
fname= substr(name, 1, index(name, ',') - 1);
lname = trim(left(substr(name, index(name, ',') + 1)));
run;
-
Output:
name----------------------- fname---- lname
SOMA, SUNDARAM---- SOMA---- SUNDARAM
SHIVA, SHEKARAN---- SHIVA---- SHEKARAN
-

Saturday, November 1, 2008

Y FREQ ??? use MEANS...

No dummy dataset... No data doubling...

“Making of Table” has become much simpler using PROC MEANS procedure! PROC MEANS when used in combination with MULTILABEL format and PRELOADFMT option gives all possible values of a formatted variable.

For example: If there are two different treatment groups (1 & 2) in a table and if the third column has to display a total count of both the treatment groups (1+2), then the MULTILABEL format is used.

proc format;

value $tmt (multilabel)

'1'='trt1'

'2'='trt2'

'1','2' = 'Total';

run;

This option allows the user to define:

overlapping ranges across labels and to
assign another label to the same value

This format is very efficient when used in conjunction with the means procedure TO PRODUCE THE TOTAL COUNT. Not only the multilabel format but also the use of PRELOADFMT with COMPLETETYPES enhances the efficiency of a program in CREATING ALL POSSIBLE COMBINATIONS OF A VARIABLE.

Now let’s consider three different categories (‘MILD’, ‘MODERATE’ & 'SEVERE') for a variable AEREL and if the dataset does not contain one of the category (‘MODERATE’) and if this same category has to be displayed in the table, then the PRELOADFMT option is used [as it gives instruction to load the FORMAT for the missing category ‘MODERATE’].

-
proc format;

value $rel

'MILD' = 'Mild'

‘MODERATE’ = 'Moderate'

'SEVERE' = 'Severe'

' ' = 'Missing';

run;

-
*USE OF PROC MEANS WITH MULTILABEL & PRELOADFMT;

-
proc means data=test completetypes nway chartype missing noprint;
by visit;
class treat aerel/preloadfmt missing mlf order=formatted;
format treat $tmt. aerel $rel.;
output out=cnt1 n=ccnt;
run;
-

More: http://www.lexjansen.com/pharmasug/2007/cc/cc10.pdf

Sunday, October 26, 2008

Display the "CLASSIFICATION", Not the "CODE"!

The Anatomical Therapeutic Chemical Classification System (ATCCS) is used for the classification of drugs. It classifies drugs at 5 different levels; based on the organ or system on which they act and/or their therapeutic and chemical characteristics.

For example, the ATCCS for drug SUMATRIPTAN is given below:

ATC1 TERM: NERVOUS SYSTEM, N
ATC1 TERM CODE: N

ATC2 TERM: ANALGESICS, N02
ATC2 TERM CODE: N02

ATC3 TERM: ANTIMIGRAINE PREPARATIONS, N02C
ATC3 TERM CODE:N02C

ATC4 TERM: SELECTIVE SEROTONIN (5HT1) AGONISTS, N02CC
ATC4 TERM CODE: N02CC

ATC5 TERM: SUMATRIPTAN, N02CC01
ATC5 TERM CODE: N02CC01

This "ATC LEVEL 2 TERM" may be captured in CONCOMITANT MEDICATION DATASET, which when captured should display only the CLASSIFICATION: "ANALGESICS" and not the CODE: "N02". Hence the code is trimed off from the chemical classification.

***** USE OF REVERSE FUNCTION TO TRIM OFF CODE *****
-
data cmed;
set med;
x=trim(left(reverse(atc2t)));
y=substr(x,6,150);
z=reverse(y);
atc2term=strip(z);
run;

Tuesday, October 21, 2008

A code to split numbers!

The dummy dataset code has numbers separated by a hash or space. The code that follows splits those numbers and output each number into a unique observation.

***** DUMMY DATASET WITH #, " " *****

data TEST;
length pt $20 txt $40;
pt ='1';
txt= '#12, #3, #56 34-98';
output;
pt ='2';
txt= '#2 #12';
output;
pt ='3';
txt= '#12';
output;
run;

-
***** SPLIT & OUTPUT NUMBERS *****
-
data TEST1;
set TEST;
stat:
do i = 1 to length(txt);
-
if substr(txt,i,1) = '#' then do;
-
if substr(txt,i+2,1) in (',','') then do;
cno = substr(txt,i+1,1); *3: for single digit;
txt = substr(txt,i+2);
end;
else if substr(txt,i+3,1) in (',','') then do;
cno = substr(txt,i+1,2); *12: for two digit;
txt = substr(txt,i+3);
end;
-
output;
if length(trim(left(txt))) > 2 then goto stat;
leave;
end;
-
end;
run;

***** SAME CODE FOR DATA WITHOUT # *****

-
data TEST1;
set TEST;
do i = 1 to length(txt);

-
if substr(txt,i+1,1) in (',','') then do;
cno = trim(left(substr(txt,i,1)));
i=i+2;
end;
else if substr(txt,i+2,1) in (',','') then do;
cno = trim(left(substr(txt,i,2)));
i=i+3;
end;

-
if cno ne '' then output;
end;
run;

Thursday, October 9, 2008

Retain for Repeated Visit

-
***** RETAIN THE VALUE FOR REPEATED VISIT *****
-
data test;
set vis;
by pt visitnum vstdt;
retain x;
if first.visitnum then x = visitnum;
else x=x+.1;*add .1 for repeated visit;
drop visitnum;
run;
-

Sunday, September 7, 2008

Using wildcard to reduce code

Using a wildcard (:) in the variable lists would reduce the code to a great extent.
-
Colon can be used as a wildcard in variable lists. In the example given below, (keep=v:) keeps all variable names that begins with letter v.
-
data x(keep=v:) ;
v1=1 ;
v2=10 ;
v3=100 ;
v4=1000 ;
v5=10000 ;
w1=1;
w2=10;
run;

More: http://support.sas.com/publishing/pubcat/chaps/55513.pdf
-

Thursday, September 4, 2008

Four different ways to find duplicates

We must have probably come across duplicate records in our dataset & tried removing those by using nodup and nodupkey options.

I

Use of nodup will remove exact duplicates as given below:

proc sort data = draft out = draft1 nodup;
by _all_ ;
run;

NAME AGE SEX

Rama _27_ M

-
II

-
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)

proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;

PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT

-
III

-
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.

data check;

set dups;

by id;

if first.id ne last.id; *brings the duplicates;

run;

This was quite confusing to me as my little brain perceived it this way:

"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....

ID
001
001

but SAS THINKS this way

ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1

Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.

-
IV

The last way is the use of freq:

proc freq data = draft noprint ;
table ID / out = draftdup (keep = ID Count where = (Count > 1)) ;
run;

More: http://www2.sas.com/proceedings/sugi31/164-31.pdf

Wednesday, August 27, 2008

Concatenate a set of Numbers

Some time a programmer has to concatenate a set of Numbers. For example an inclusion criteria listing may demand the display of “Number of inclusion criteria’s (1-7) present in the dataset” as column. In the dataset given below, “NO equals the subjects who has not satisfied inclusion criteria". To display this, the following code will concatenate the (inclusion criteria) numbers to bring the inclusion criteria not satisfied per subject as 1, 2, 3, 4, 5, and 6.

SUBJECT AGE IN1 IN2 IN3 -IN4 IN5 -IN6 IN7
00001111 --43- YES YES YES .----YES YES YES
00001222 --20- NO –NO -NO -NO NO -NO -.

(inclusion criteria not satisfied is stored as a Numeric value 2 with format NO)

data incl1;
length no $ 50;
set incl;
retain no;
array a[7] in1 - in7;
no = '';*no is the variable to HOLD THE VALUE;
do i=1 to 7;*loop is RUN TILL i=8;
if a[i]=2 then do;*if loop satisfies the condition "NO";
if no eq '' then no=trim(left(put(i,best.))); *then ASSIGN 1 to no;
else if no ne '' then no=trim(left(no))II', 'IItrim(left(put(i,best.)));
*now CONCATENATE no:1 with 2;
end;
end;
run;

Output for this subject 00001222 will be:

-
1, 2, 3, 4, 5, 6
-

Tuesday, August 26, 2008

Simple way of % Display!

***** DISPLAY OF PERCENTAGE IN PROC REPORT ******;

data new;
input cnt deno;
datalines;
53 54
9 -10
12 12
2 ---9
1 ---5
;
run;

data new1;
set new;

-
if cnt ne . and deno ne . then per = (cnt*100)/deno;

*cnt:COUNT, deno:DENOMINATOR, per:PERCENTAGE;

-
if (per ne 0 and per ne .) and per ne 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.1)))'%)',8.));
else if per eq 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.)))'%)',8.));

-
run;
proc print; run;
-

Together it takes a TOTAL of 11 spaces for displaying: xx (xx.x%)

3. -=> is for COUNT xxx
4.1 => is for PERCENTAGE xx.x / 4. => is for % (if 100) xxx
+3 => is for SPECIAL CHARACTERS (%)
+1 => is for a space in b/w COUNT & PERCENTAGE

Thursday, August 21, 2008

Display 'No Observations'

Code for Displaying 'No Observations':

***** FINAL IS THE DATASET WITH 0 OBSERVATION *****;

-
proc sql noprint;
select count (*) into : n from final;
run;

*if observation is not equal to 0 set final, else if the observation is equal to 0 then display no observation in the first column (usubjid);

-
data disp;
length usubjid $20;
if &n^ = 0 then set final;
else usubjid = 'No Observations';
run;