CODES READYMADE

Saturday, December 20, 2008

Capturing the data into a macro

For the display of (N=xx) in a summary table, we have to capture the data into a macro. There are two different methods for capturing the data into a macro. One is using PROC SQL & the other one is using CALL SYMPUT.

***** GETTING "N" FOR EACH TRT GROUP *****;

1) PROC SQL:

data total;

set main(where=(saf=1));

trtgrp = input(treat,best.);

output;

trtgrp = 3;

output;

run;

proc sql noprint;

select count(distinct usubjid) into : trt1 through : trt3 from total group by trtgrp;

quit;

%put &trt1 &trt2 &trt3;

2) SYMPUT:

proc freq data = total noprint;

tables trtgrp / out = deno(drop = percent) list missing;

run;

data _null_;

set deno;

if trtgrp=1 then call symput('trt1',trim(left(put(count,3.))));

if trtgrp=2 then call symput('trt2',trim(left(put(count,3.))));

if trtgrp=3 then call symput('trt3',trim(left(put(count,3.))));

run;

Thursday, December 18, 2008

DUPOUT=option

-
The following data set contains duplicate observations for Patients (PT:- 01 & 02) :
-
PT NO
01 12
01 3
01 56
02 2
02 12
03 12
-
How does the dupout option work on this data?????
-
proc sort data = test out = dedup
-------------------------dupout = dups nodupkey;
--by pt;
run;

It works in a similar manner to nodupkey. Here the "duplicates are detected by the nodupkey option" & are directed into an output dataset with DUPOUT!

The dedup is an output data set that contains only the original observations.
PT NO
01 12
02 2
03 12
-
Where as the dups is an output data set that contains all duplicate observations.
PT NO
01 3
01 56
02 12
-

Tuesday, December 16, 2008

Use of SUBSTR with INDEX function

-
The syntax for substr function is:
SUBSTR(string, starting position, <length>)
-
It is easy to determine the starting position or the length of the string (if it contains DATE).
-
For example:
day=substr(date,1,2);

but the same substr function is not applicable if the string contains NAME. The reason is that the length of each name varies from one observation to the other. Hence the use of delimiters (like: "." ",") can be used to locate the position within the string.
-
Below is an example for the use of substr with index function:
-
*** INDEX IS USED TO FIND WHERE TO BEGIN & END A SUBSTRING ***;
-
data fullname;
length fname lname $20;
set nameds;
fname= substr(name, 1, index(name, ',') - 1);
lname = trim(left(substr(name, index(name, ',') + 1)));
run;
-
Output:
name----------------------- fname---- lname
SOMA, SUNDARAM---- SOMA---- SUNDARAM
SHIVA, SHEKARAN---- SHIVA---- SHEKARAN
-

Saturday, November 1, 2008

Y FREQ ??? use MEANS...

No dummy dataset... No data doubling...

“Making of Table” has become much simpler using PROC MEANS procedure! PROC MEANS when used in combination with MULTILABEL format and PRELOADFMT option gives all possible values of a formatted variable.

For example: If there are two different treatment groups (1 & 2) in a table and if the third column has to display a total count of both the treatment groups (1+2), then the MULTILABEL format is used.

proc format;

value $tmt (multilabel)

'1'='trt1'

'2'='trt2'

'1','2' = 'Total';

run;

This option allows the user to define:

overlapping ranges across labels and to
assign another label to the same value

This format is very efficient when used in conjunction with the means procedure TO PRODUCE THE TOTAL COUNT. Not only the multilabel format but also the use of PRELOADFMT with COMPLETETYPES enhances the efficiency of a program in CREATING ALL POSSIBLE COMBINATIONS OF A VARIABLE.

Now let’s consider three different categories (‘MILD’, ‘MODERATE’ & 'SEVERE') for a variable AEREL and if the dataset does not contain one of the category (‘MODERATE’) and if this same category has to be displayed in the table, then the PRELOADFMT option is used [as it gives instruction to load the FORMAT for the missing category ‘MODERATE’].

-
proc format;

value $rel

'MILD' = 'Mild'

‘MODERATE’ = 'Moderate'

'SEVERE' = 'Severe'

' ' = 'Missing';

run;

-
*USE OF PROC MEANS WITH MULTILABEL & PRELOADFMT;

-
proc means data=test completetypes nway chartype missing noprint;
by visit;
class treat aerel/preloadfmt missing mlf order=formatted;
format treat $tmt. aerel $rel.;
output out=cnt1 n=ccnt;
run;
-

More: http://www.lexjansen.com/pharmasug/2007/cc/cc10.pdf

Sunday, October 26, 2008

Display the "CLASSIFICATION", Not the "CODE"!

The Anatomical Therapeutic Chemical Classification System (ATCCS) is used for the classification of drugs. It classifies drugs at 5 different levels; based on the organ or system on which they act and/or their therapeutic and chemical characteristics.

For example, the ATCCS for drug SUMATRIPTAN is given below:

ATC1 TERM: NERVOUS SYSTEM, N
ATC1 TERM CODE: N

ATC2 TERM: ANALGESICS, N02
ATC2 TERM CODE: N02

ATC3 TERM: ANTIMIGRAINE PREPARATIONS, N02C
ATC3 TERM CODE:N02C

ATC4 TERM: SELECTIVE SEROTONIN (5HT1) AGONISTS, N02CC
ATC4 TERM CODE: N02CC

ATC5 TERM: SUMATRIPTAN, N02CC01
ATC5 TERM CODE: N02CC01

This "ATC LEVEL 2 TERM" may be captured in CONCOMITANT MEDICATION DATASET, which when captured should display only the CLASSIFICATION: "ANALGESICS" and not the CODE: "N02". Hence the code is trimed off from the chemical classification.

***** USE OF REVERSE FUNCTION TO TRIM OFF CODE *****
-
data cmed;
set med;
x=trim(left(reverse(atc2t)));
y=substr(x,6,150);
z=reverse(y);
atc2term=strip(z);
run;

Tuesday, October 21, 2008

A code to split numbers!

The dummy dataset code has numbers separated by a hash or space. The code that follows splits those numbers and output each number into a unique observation.

***** DUMMY DATASET WITH #, " " *****

data TEST;
length pt $20 txt $40;
pt ='1';
txt= '#12, #3, #56 34-98';
output;
pt ='2';
txt= '#2 #12';
output;
pt ='3';
txt= '#12';
output;
run;

-
***** SPLIT & OUTPUT NUMBERS *****
-
data TEST1;
set TEST;
stat:
do i = 1 to length(txt);
-
if substr(txt,i,1) = '#' then do;
-
if substr(txt,i+2,1) in (',','') then do;
cno = substr(txt,i+1,1); *3: for single digit;
txt = substr(txt,i+2);
end;
else if substr(txt,i+3,1) in (',','') then do;
cno = substr(txt,i+1,2); *12: for two digit;
txt = substr(txt,i+3);
end;
-
output;
if length(trim(left(txt))) > 2 then goto stat;
leave;
end;
-
end;
run;

***** SAME CODE FOR DATA WITHOUT # *****

-
data TEST1;
set TEST;
do i = 1 to length(txt);

-
if substr(txt,i+1,1) in (',','') then do;
cno = trim(left(substr(txt,i,1)));
i=i+2;
end;
else if substr(txt,i+2,1) in (',','') then do;
cno = trim(left(substr(txt,i,2)));
i=i+3;
end;

-
if cno ne '' then output;
end;
run;

Thursday, October 9, 2008

Retain for Repeated Visit

-
***** RETAIN THE VALUE FOR REPEATED VISIT *****
-
data test;
set vis;
by pt visitnum vstdt;
retain x;
if first.visitnum then x = visitnum;
else x=x+.1;*add .1 for repeated visit;
drop visitnum;
run;
-

Sunday, September 7, 2008

Using wildcard to reduce code

Using a wildcard (:) in the variable lists would reduce the code to a great extent.
-
Colon can be used as a wildcard in variable lists. In the example given below, (keep=v:) keeps all variable names that begins with letter v.
-
data x(keep=v:) ;
v1=1 ;
v2=10 ;
v3=100 ;
v4=1000 ;
v5=10000 ;
w1=1;
w2=10;
run;

More: http://support.sas.com/publishing/pubcat/chaps/55513.pdf
-

Thursday, September 4, 2008

Four different ways to find duplicates

We must have probably come across duplicate records in our dataset & tried removing those by using nodup and nodupkey options.

I

Use of nodup will remove exact duplicates as given below:

proc sort data = draft out = draft1 nodup;
by _all_ ;
run;

NAME AGE SEX

Rama _27_ M

-
II

-
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)

proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;

PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT

-
III

-
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.

data check;

set dups;

by id;

if first.id ne last.id; *brings the duplicates;

run;

This was quite confusing to me as my little brain perceived it this way:

"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....

ID
001
001

but SAS THINKS this way

ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1

Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.

-
IV

The last way is the use of freq:

proc freq data = draft noprint ;
table ID / out = draftdup (keep = ID Count where = (Count > 1)) ;
run;

More: http://www2.sas.com/proceedings/sugi31/164-31.pdf

Wednesday, August 27, 2008

Concatenate a set of Numbers

Some time a programmer has to concatenate a set of Numbers. For example an inclusion criteria listing may demand the display of “Number of inclusion criteria’s (1-7) present in the dataset” as column. In the dataset given below, “NO equals the subjects who has not satisfied inclusion criteria". To display this, the following code will concatenate the (inclusion criteria) numbers to bring the inclusion criteria not satisfied per subject as 1, 2, 3, 4, 5, and 6.

SUBJECT AGE IN1 IN2 IN3 -IN4 IN5 -IN6 IN7
00001111 --43- YES YES YES .----YES YES YES
00001222 --20- NO –NO -NO -NO NO -NO -.

(inclusion criteria not satisfied is stored as a Numeric value 2 with format NO)

data incl1;
length no $ 50;
set incl;
retain no;
array a[7] in1 - in7;
no = '';*no is the variable to HOLD THE VALUE;
do i=1 to 7;*loop is RUN TILL i=8;
if a[i]=2 then do;*if loop satisfies the condition "NO";
if no eq '' then no=trim(left(put(i,best.))); *then ASSIGN 1 to no;
else if no ne '' then no=trim(left(no))II', 'IItrim(left(put(i,best.)));
*now CONCATENATE no:1 with 2;
end;
end;
run;

Output for this subject 00001222 will be:

-
1, 2, 3, 4, 5, 6
-

Tuesday, August 26, 2008

Simple way of % Display!

***** DISPLAY OF PERCENTAGE IN PROC REPORT ******;

data new;
input cnt deno;
datalines;
53 54
9 -10
12 12
2 ---9
1 ---5
;
run;

data new1;
set new;

-
if cnt ne . and deno ne . then per = (cnt*100)/deno;

*cnt:COUNT, deno:DENOMINATOR, per:PERCENTAGE;

-
if (per ne 0 and per ne .) and per ne 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.1)))'%)',8.));
else if per eq 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.)))'%)',8.));

-
run;
proc print; run;
-

Together it takes a TOTAL of 11 spaces for displaying: xx (xx.x%)

3. -=> is for COUNT xxx
4.1 => is for PERCENTAGE xx.x / 4. => is for % (if 100) xxx
+3 => is for SPECIAL CHARACTERS (%)
+1 => is for a space in b/w COUNT & PERCENTAGE

Thursday, August 21, 2008

Display 'No Observations'

Code for Displaying 'No Observations':

***** FINAL IS THE DATASET WITH 0 OBSERVATION *****;

-
proc sql noprint;
select count (*) into : n from final;
run;

*if observation is not equal to 0 set final, else if the observation is equal to 0 then display no observation in the first column (usubjid);

-
data disp;
length usubjid $20;
if &n^ = 0 then set final;
else usubjid = 'No Observations';
run;

Thursday, August 14, 2008

Wanna underline a header?

proc report data= ds split='^' nowd headline headskip spacing=1 missing;
by descending treat;

-
column (rest1 rest2 ("_First Value_" (fval1 fval2)) ("_Second Value_" (sval1 sval2 sval3)) rest3 rest4);

-
define rest1 / order-- width = 22 left-- "rest of the values (1)" spacing = 0;
define rest2 / order-- width = 22 left-- "rest of the values (2)";
define fval1 / display width = 22 center "first:-value (1)";
define fval2 / display width = 22 center "first:-value (2)";
define sval1 / display width = 22 center "second value (1)";
define sval2 / display width = 22 center "second value (2)";
define sval3 / display width = 22 center "second value (3)";
define rest3 / display width = 22 center "rest of the values (3)";
define rest4 / display width = 22 center "rest of the values (4)";

-
break after rest1 / skip;
footnote1 "&under";
footnote2 "Required footnote for display.";

-
run;

The above example: shows the use of "_" to underline the header(First Value & Second Value). These are the variables (fval1 fval2, sval1 sval2 sval3) which will get displayed under the header.
-

Saturday, August 2, 2008

Generating missing values in a Table

USUBJID VISITNUMBER
10011001 1
10011001 7
10011002 1
10011003 1

In the above example, the usubjid (10011002 and 10011003) has visitnumber 7 missing. The proc transpose statement helps in generating these missing values.

** PROC TRANSPOSE FOR GENERATING MISSING VALUES **;

proc transpose data = d1 out = d2 prefix = v;
by usubjid treatnc ;
id vis;
*vis is a character variable which contains visitnumber (1,7);
var vala valb valc vald vale valf valg valh vali valj;
run;

When transposed the data becomes:
USUBJID V1 V7
10011001 -0 --0
10011002 -0 --''
10011003 -0 --''

** RE(cursive) TRANSPOSE: TO BRING THE DATA BACK TO ITS SHAPE **;

proc transpose data = d2 out = d3;
by usubjid treatnc ;
id _name_;
*all values vala - valj becomes identifier again;
var v1 v7;
run;

The data would now look like:
USUBJID _name_ vala
10011001 -v1-------0
10011001 -v7-------0
10011002 -v1-------0
10011002 -v7-------''
10011003 -v1-------0
10011003 -v7-------''

proc sort data = d3;
by usubjid treatnc _name_;
run;

** BRING THE VARIABLES ONE AFTER THE OTHER FOR FREQ **;

proc transpose data = d3(rename = (_name_ = name)) out = d4;
by usubjid treatnc name;
var vala valb valc vald vale valf valg valh vali valj;
run;

Friday, August 1, 2008

Why two proc reports?????

When proc report had more columns that do not fit in one page, I tried using two different proc reports. However the disadvantage of using two different proc report is, the report one: gets generated first…

To solve this problem, define a variable with ID. This ID variable appears at the left of every page of a report. Hence the use of ID ensures that you can identify each row of the report when the report contains more columns that will not fit in one page.

In the example given below, the variables (param & grade) act as an id variable. In the first page, these variables (param & grade) appear with other variables (pv1 pv2 pv3 pv4 pv5 pv6). Same way in the next page, it appears with the rest of the variables (pv7 - total).

options nocenter;

proc report data=final headline headskip nowd missing spacing=1 split ='*';

columns (param grade pv1 pv2 pv3 pv4 pv5 pv6 pv7 pv8 pv9 pv10 pv11 total);

define param- / order id left width = 30 "Parameter" spacing = 0 flow;

define grade-- / order id left width = 7- "Grade";

define pv1-----/---------. left width = 10 "Visit 1*(N=&vis1)";

define pv2-----/---------. left width = 10 "Visit 2*(N=&vis2)";

define pv3-----/---------. left width = 10 "Visit 3*(N=&vis3)";

define pv4-----/---------. left width = 10 "Visit 4*(N=&vis4)";

define pv5-----/---------. left width = 10 "Visit 5*(N=&vis5)";

define pv6-----/---------. left width = 10 "Visit 6*(N=&vis6)";

define pv7-----/---------. left width = 10 "Visit 7*(N=&vis7)";

define pv8-----/---------. left width = 10 "Visit 8*(N=&vis8)";

define pv9-----/---------. left width = 10 "Visit 9*(N=&vis9)";

define pv10----/--------.. left width = 10 "Visit 10*(N=&vis10)";

define pv11----/--------.. left width = 10 "Visit 11*(N=&vis11)";

define total-----/-------... left width = 10 "Overall*(N=&total)";

break after param / skip;

footnote1 "&under";

run;

Wednesday, June 18, 2008

SAS Functions

Mathematical and Statistical Functions

More: http://analytics.ncsu.edu/sesug/2005/CC04_05.PDF

Saturday, June 14, 2008

Verbatim Terms and Medical Dictionaries

"These are answers that are recorded word by word (from a subject)" and are often used as a backup of research findings in the final report. These verbatim terms entered in the CRF's are then matched with the "standardized medical dictionaries", which are essential for medical coding. A little more on use of Dictionary is given below:
-
Dictionary name and its Use

WHOART: World Health Organization Adverse Reaction Terminology:

is used for Adverse event (AE) coding

COSTART: (FDA’s) Coding Symbols for a Thesaurus of Adverse Reaction Terms:

is used for Adverse event (AE) coding

MedDRA: Medical Dictionary for Regulatory Activities Terminology:

is used for AE / Med. History / Terms coding

ICD9CM: International Classification of Diseases 9th Revision Clinical Modification:

is used for Medical History coding

WHODD: World Health Organization Drug Dictionary (B1, B2, C):

is used for Medication coding (concomitant)

The WHO Dictionaries include:

WHODD : WHO Drug Dictionary
WHOHD : WHO Herbal Dictionary
WHODDE : WHO Drug Dictionary enhanced.

*ATC: Anatomical Therapeutic Chemical classification is a classification used for coding the therapeutic use of drugs. This classification categorizes substances at five different levels according to the system or organ they act on and according to the chemical, therapeutic & pharmacological properties.

More: http://www.riteshmandal.com/medcoding.htm

Friday, June 13, 2008

"AETERM is Verbatim Name" and "AEDECOD is Preferred Term"

While doing AE Listings, there exists some confusion as these words (AETERM & Preferred TERM) sound similar. Every time I annotate AE Listing, I map “–TERM Variable for Preferred Term” and “–DECOD Variable for Verbatim Name” which is exclusively wrong. I have posted it here just to remind myself "AETERM (i.e., reported term) is Verbatim Name" and "AEDECOD is Preferred Term". Listed below is the CDISC Guidelines for Variables:

Define Variables and CDISC Guidelines:

System Organ Class (SOC) / --BODSYS:
It is the Body System or preferred system organ class that is involved in an event or measurement from the standard hierarchy.

Preferred Term (PT) / --DECOD:
It is the Dictionary derived text description of the topic variable. Equivalent to the Preferred Term (PT in MedDRA). The dictionary name and version should be provided in the comments column in the data definition file.

--TERM:
It is the Topic variable for an event observation, which is the Verbatim Name of the event. Use --MODIFY for modifying the verbatim term and --DECOD for coding --MODIFY or --TERM.

--MODIFY:
It is the Modified Text, if the topic variable is modified as part of a defined procedure.

An Eg. of MedDRA Hierarchy

System Organ Class (SOC) : Gastrointerstinal Disorders

High Level Group Term (HLGT): Dental and Gingival conditions

High Level Term (HLT): Dental pian and sensation disorders

Preferred Term (PT): Toothache <=SSC or SMQ

Lowest Level Term (LLT): Toothache

Must Read For SAS Clinical Programmers!

http://www.cdisc.org/pdf/V3CRTStandardV1_2.pdf

It is an informative pdf on "CDISC Submission Data Domain Models"

Saturday, May 10, 2008

Alt Codes for SAS Programming

Listed below, is the set of Alt Codes used for entering symbols in SAS programming.
Instructions for using Alt Codes:

Make sure that the "Number Lock" is switched on.
Hold down the "Alt" key and type the code on keypad.
Do not release the "Alt" key until the alt code is typed.
Special character are seen when the key is released.

Alt Codes:

Use of space for ‘alignment’ may not work at times. In case if it doesn't work, one can try using Alt 255, which will replace the space with an invisible character and make the alignment look perfect!

More: http://www.usefulshortcuts.com/alt-codes/