Saturday, December 20, 2008
Capturing the data into a macro
Thursday, December 18, 2008
DUPOUT=option
The following data set contains duplicate observations for Patients (PT:- 01 & 02) :
-
PT NO
01 12
01 3
01 56
02 2
02 12
03 12
-
How does the dupout option work on this data?????
-
proc sort data = test out = dedup
-------------------------dupout = dups nodupkey;
--by pt;
run;
It works in a similar manner to nodupkey. Here the "duplicates are detected by the nodupkey option" & are directed into an output dataset with DUPOUT!
The dedup is an output data set that contains only the original observations.
PT NO
01 12
02 2
03 12
-
Where as the dups is an output data set that contains all duplicate observations.
PT NO
01 3
01 56
02 12
-
Tuesday, December 16, 2008
Use of SUBSTR with INDEX function
The syntax for substr function is:
SUBSTR(string, starting position, <length>)
-
It is easy to determine the starting position or the length of the string (
-
For example:
day=substr(date,1,2);
but the same substr function is not applicable if the string contains NAME. The reason is that the length of each name varies from one observation to the other. Hence the use of delimiters (like: "." ",") can be used to locate the position within the string.
-
Below is an example for the use of substr with index function:
-
*** INDEX IS USED TO FIND WHERE TO BEGIN & END A SUBSTRING ***;
data fullname;
length fname lname $20;
set nameds;
fname= substr(name, 1, index(name, ',') - 1);
lname = trim(left(substr(name, index(name, ',') + 1)));
run;
-
Output:
name----------------------- fname---- lname
SOMA, SUNDARAM---- SOMA---- SUNDARAM
SHIVA, SHEKARAN---- SHIVA---- SHEKARAN
-
Saturday, November 1, 2008
Y FREQ ??? use MEANS...
“Making of Table” has become much simpler using PROC MEANS procedure! PROC MEANS when used in combination with MULTILABEL format and PRELOADFMT option gives all possible values of a formatted variable.
For example: If there are two different treatment groups (1 & 2) in a table and if the third column has to display a total count of both the treatment groups (1+2), then the MULTILABEL format is used.
- overlapping ranges across labels and to
- assign another label to the same value
This format is very efficient when used in conjunction with the means procedure TO PRODUCE THE TOTAL COUNT. Not only the multilabel format but also the use of PRELOADFMT with COMPLETETYPES enhances the efficiency of a program in CREATING ALL POSSIBLE COMBINATIONS OF A VARIABLE.
Now let’s consider three different categories (‘MILD’, ‘MODERATE’ & 'SEVERE') for a variable AEREL and if the dataset does not contain one of the category (‘MODERATE’) and if this same category has to be displayed in the table, then the PRELOADFMT option is used [as it gives instruction to load the FORMAT for the missing category ‘MODERATE’].
proc format;
*USE OF PROC MEANS WITH MULTILABEL & PRELOADFMT;
proc means data=test completetypes nway chartype missing noprint;
by visit;
class treat aerel/preloadfmt missing mlf order=formatted;
format treat $tmt. aerel $rel.;
output out=cnt1 n=ccnt;
run;
-
Sunday, October 26, 2008
Display the "CLASSIFICATION", Not the "CODE"!
For example, the ATCCS for drug SUMATRIPTAN is given below:
ATC1 TERM: NERVOUS SYSTEM, N
ATC1 TERM CODE: N
ATC2 TERM: ANALGESICS, N02
ATC2 TERM CODE: N02
ATC3 TERM: ANTIMIGRAINE PREPARATIONS, N02C
ATC3 TERM CODE:N02C
ATC4 TERM: SELECTIVE SEROTONIN (5HT1) AGONISTS, N02CC
ATC4 TERM CODE: N02CC
ATC5 TERM: SUMATRIPTAN, N02CC01
ATC5 TERM CODE: N02CC01
This "ATC LEVEL 2 TERM" may be captured in CONCOMITANT MEDICATION DATASET, which when captured should display only the CLASSIFICATION: "ANALGESICS" and not the CODE: "N02". Hence the code is trimed off from the chemical classification.
-
data cmed;
set med;
x=trim(left(reverse(atc2t)));
y=substr(x,6,150);
z=reverse(y);
atc2term=strip(z);
run;
Tuesday, October 21, 2008
A code to split numbers!
length pt $20 txt $40;
pt ='1';
txt= '#12, #3, #56 34-98';
output;
pt ='2';
txt= '#2 #12';
output;
pt ='3';
txt= '#12';
output;
run;
***** SPLIT & OUTPUT NUMBERS *****
-
data TEST1;
set TEST;
stat:
do i = 1 to length(txt);
-
if substr(txt,i,1) = '#' then do;
-
if substr(txt,i+2,1) in (',','') then do;
cno = substr(txt,i+1,1); *3: for single digit;
txt = substr(txt,i+2);
end;
else if substr(txt,i+3,1) in (',','') then do;
cno = substr(txt,i+1,2); *12: for two digit;
txt = substr(txt,i+3);
end;
-
output;
if length(trim(left(txt))) > 2 then goto stat;
leave;
end;
-
end;
run;
data TEST1;
set TEST;
do i = 1 to length(txt);
if substr(txt,i+1,1) in (',','') then do;
cno = trim(left(substr(txt,i,1)));
i=i+2;
end;
else if substr(txt,i+2,1) in (',','') then do;
cno = trim(left(substr(txt,i,2)));
i=i+3;
end;
if cno ne '' then output;
end;
run;
Thursday, October 9, 2008
Retain for Repeated Visit
***** RETAIN THE VALUE FOR REPEATED VISIT *****
-
data test;
set vis;
by pt visitnum vstdt;
retain x;
if first.visitnum then x = visitnum;
else x=x+.1;*add .1 for repeated visit;
drop visitnum;
run;
-
Sunday, September 7, 2008
Using wildcard to reduce code
-
Colon can be used as a wildcard in variable lists. In the example given below, (keep=v:) keeps all variable names that begins with letter v.
-
data x(keep=v:) ;
v1=1 ;
v2=10 ;
v3=100 ;
v4=1000 ;
v5=10000 ;
w1=1;
w2=10;
run;
More: http://support.sas.com/publishing/pubcat/chaps/55513.pdf
-
Thursday, September 4, 2008
Four different ways to find duplicates
I
proc sort data = draft out = draft1 nodup;
by _all_ ;
run;
II
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)
proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;
PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT
III
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.
"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....
ID
001
001
but SAS THINKS this way
ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1
Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.
IV
proc freq data = draft noprint ;
table ID / out = draftdup (keep = ID Count where = (Count > 1)) ;
run;
Wednesday, August 27, 2008
Concatenate a set of Numbers
SUBJECT AGE IN1 IN2 IN3 -IN4 IN5 -IN6 IN7
00001111 --43- YES YES YES .----YES YES YES
00001222 --20- NO –NO -NO -NO NO -NO -.
data incl1;
length no $ 50;
set incl;
retain no;
array a[7] in1 - in7;
no = '';*no is the variable to HOLD THE VALUE;
do i=1 to 7;*loop is RUN TILL i=8;
if a[i]=2 then do;*if loop satisfies the condition "NO";
if no eq '' then no=trim(left(put(i,best.))); *then ASSIGN 1 to no;
else if no ne '' then no=trim(left(no))II', 'IItrim(left(put(i,best.)));
*now CONCATENATE no:1 with 2;
end;
end;
run;
Output for this subject 00001222 will be:
1, 2, 3, 4, 5, 6
-
Tuesday, August 26, 2008
Simple way of % Display!
data new;
input cnt deno;
datalines;
53 54
9 -10
12 12
2 ---9
1 ---5
;
run;
data new1;
set new;
if cnt ne . and deno ne . then per = (cnt*100)/deno;
if (per ne 0 and per ne .) and per ne 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.1)))'%)',8.));
else if per eq 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.)))'%)',8.));
run;
proc print; run;
-
3. -=> is for COUNT xxx
4.1 => is for PERCENTAGE xx.x / 4. => is for % (if 100) xxx
+3 => is for SPECIAL CHARACTERS (%)
+1 => is for a space in b/w COUNT & PERCENTAGE
Thursday, August 21, 2008
Display 'No Observations'
proc sql noprint;
select count (*) into : n from final;
run;
data disp;
length usubjid $20;
if &n^ = 0 then set final;
else usubjid = 'No Observations';
run;
Thursday, August 14, 2008
Wanna underline a header?
by descending treat;
column (rest1 rest2 ("_First Value_" (fval1 fval2)) ("_Second Value_" (sval1 sval2 sval3)) rest3 rest4);
define rest1 / order-- width = 22 left-- "rest of the values (1)" spacing = 0;
define rest2 / order-- width = 22 left-- "rest of the values (2)";
define fval1 / display width = 22 center "first:-value (1)";
define fval2 / display width = 22 center "first:-value (2)";
define sval1 / display width = 22 center "second value (1)";
define sval2 / display width = 22 center "second value (2)";
define sval3 / display width = 22 center "second value (3)";
define rest3 / display width = 22 center "rest of the values (3)";
define rest4 / display width = 22 center "rest of the values (4)";
break after rest1 / skip;
footnote1 "&under";
footnote2 "Required footnote for display.";
run;
The above example: shows the use of "_" to underline the header(First Value & Second Value). These are the variables (fval1 fval2, sval1 sval2 sval3) which will get displayed under the header.
-
Saturday, August 2, 2008
Generating missing values in a Table
10011001 1
10011001 7
10011002 1
10011003 1
In the above example, the usubjid (10011002 and 10011003) has visitnumber 7 missing. The proc transpose statement helps in generating these missing values.
** PROC TRANSPOSE FOR GENERATING MISSING VALUES **;
proc transpose data = d1 out = d2 prefix = v;
by usubjid treatnc ;
id vis;
*vis is a character variable which contains visitnumber (1,7);
var vala valb valc vald vale valf valg valh vali valj;
run;
When transposed the data becomes:
USUBJID V1 V7
10011001 -0 --0
10011002 -0 --''
10011003 -0 --''
** RE(cursive) TRANSPOSE: TO BRING THE DATA BACK TO ITS SHAPE **;
proc transpose data = d2 out = d3;
by usubjid treatnc ;
id _name_;
*all values vala - valj becomes identifier again;
var v1 v7;
run;
The data would now look like:
USUBJID _name_ vala
10011001 -v1-------0
10011001 -v7-------0
10011002 -v1-------0
10011002 -v7-------''
10011003 -v1-------0
10011003 -v7-------''
proc sort data = d3;
by usubjid treatnc _name_;
run;
** BRING THE VARIABLES ONE AFTER THE OTHER FOR FREQ **;
proc transpose data = d3(rename = (_name_ = name)) out = d4;
by usubjid treatnc name;
var vala valb valc vald vale valf valg valh vali valj;
run;
Friday, August 1, 2008
Why two proc reports?????
Wednesday, June 18, 2008
Saturday, June 14, 2008
Verbatim Terms and Medical Dictionaries
"These are answers that are recorded word by word (from a subject)" and are often used as a backup of research findings in the final report. These verbatim terms entered in the CRF's are then matched with the "standardized medical dictionaries", which are essential for medical coding. A little more on use of Dictionary is given below:
-
Dictionary name and its Use
WHOART: World Health Organization Adverse Reaction Terminology:
is used for Adverse event (AE) coding
COSTART: (FDA’s) Coding Symbols for a Thesaurus of Adverse Reaction Terms:
is used for Adverse event (AE) coding
MedDRA: Medical Dictionary for Regulatory Activities Terminology:
is used for AE / Med. History / Terms coding
ICD9CM: International Classification of Diseases 9th Revision Clinical Modification:
is used for Medical History coding
WHODD: World Health Organization Drug Dictionary (B1, B2, C):
is used for Medication coding (concomitant)
The WHO Dictionaries include:
- WHODD : WHO Drug Dictionary
- WHOHD : WHO Herbal Dictionary
- WHODDE : WHO Drug Dictionary enhanced.
*ATC: Anatomical Therapeutic Chemical classification is a classification used for coding the therapeutic use of drugs. This classification categorizes substances at five different levels according to the system or organ they act on and according to the chemical, therapeutic & pharmacological properties.
More: http://www.riteshmandal.com/medcoding.htm
Friday, June 13, 2008
"AETERM is Verbatim Name" and "AEDECOD is Preferred Term"
While doing AE Listings, there exists some confusion as these words (AETERM & Preferred TERM) sound similar. Every time I annotate AE Listing, I map “–TERM Variable for Preferred Term” and “–DECOD Variable for Verbatim Name” which is exclusively wrong. I have posted it here just to remind myself "AETERM (i.e., reported term) is Verbatim Name" and "AEDECOD is Preferred Term". Listed below is the CDISC Guidelines for Variables:
Define Variables and CDISC Guidelines:
System Organ Class (SOC) / --BODSYS:
It is the Body System or preferred system organ class that is involved in an event or measurement from the standard hierarchy.
Preferred Term (PT) / --DECOD:
It is the Dictionary derived text description of the topic variable. Equivalent to the Preferred Term (PT in MedDRA). The dictionary name and version should be provided in the comments column in the data definition file.
--TERM:
It is the Topic variable for an event observation, which is the Verbatim Name of the event. Use --MODIFY for modifying the verbatim term and --DECOD for coding --MODIFY or --TERM.
--MODIFY:
It is the Modified Text, if the topic variable is modified as part of a defined procedure.
An Eg. of MedDRA Hierarchy
System Organ Class (SOC) : Gastrointerstinal Disorders
High Level Group Term (HLGT): Dental and Gingival conditions
High Level Term (HLT): Dental pian and sensation disorders
Preferred Term (PT): Toothache <=SSC or SMQ
Lowest Level Term (LLT): Toothache
Must Read For SAS Clinical Programmers!
It is an informative pdf on "CDISC Submission Data Domain Models"
Saturday, May 10, 2008
Alt Codes for SAS Programming
Listed below, is the set of Alt Codes used for entering symbols in SAS programming.
Instructions for using Alt Codes:
- Make sure that the "Number Lock" is switched on.
- Hold down the "Alt" key and type the code on keypad.
- Do not release the "Alt" key until the alt code is typed.
- Special character are seen when the key is released.

Use of space for ‘alignment’ may not work at times. In case if it doesn't work, one can try using Alt 255, which will replace the space with an invisible character and make the alignment look perfect!
