Saturday, December 20, 2008
Capturing the data into a macro
Thursday, December 18, 2008
DUPOUT=option
The following data set contains duplicate observations for Patients (PT:- 01 & 02) :
-
PT NO
01 12
01 3
01 56
02 2
02 12
03 12
-
How does the dupout option work on this data?????
-
proc sort data = test out = dedup
-------------------------dupout = dups nodupkey;
--by pt;
run;
It works in a similar manner to nodupkey. Here the "duplicates are detected by the nodupkey option" & are directed into an output dataset with DUPOUT!
The dedup is an output data set that contains only the original observations.
PT NO
01 12
02 2
03 12
-
Where as the dups is an output data set that contains all duplicate observations.
PT NO
01 3
01 56
02 12
-
Tuesday, December 16, 2008
Use of SUBSTR with INDEX function
The syntax for substr function is:
SUBSTR(string, starting position, <length>)
-
It is easy to determine the starting position or the length of the string (
-
For example:
day=substr(date,1,2);
but the same substr function is not applicable if the string contains NAME. The reason is that the length of each name varies from one observation to the other. Hence the use of delimiters (like: "." ",") can be used to locate the position within the string.
-
Below is an example for the use of substr with index function:
-
*** INDEX IS USED TO FIND WHERE TO BEGIN & END A SUBSTRING ***;
data fullname;
length fname lname $20;
set nameds;
fname= substr(name, 1, index(name, ',') - 1);
lname = trim(left(substr(name, index(name, ',') + 1)));
run;
-
Output:
name----------------------- fname---- lname
SOMA, SUNDARAM---- SOMA---- SUNDARAM
SHIVA, SHEKARAN---- SHIVA---- SHEKARAN
-
Saturday, November 1, 2008
Y FREQ ??? use MEANS...
“Making of Table” has become much simpler using PROC MEANS procedure! PROC MEANS when used in combination with MULTILABEL format and PRELOADFMT option gives all possible values of a formatted variable.
For example: If there are two different treatment groups (1 & 2) in a table and if the third column has to display a total count of both the treatment groups (1+2), then the MULTILABEL format is used.
- overlapping ranges across labels and to
- assign another label to the same value
This format is very efficient when used in conjunction with the means procedure TO PRODUCE THE TOTAL COUNT. Not only the multilabel format but also the use of PRELOADFMT with COMPLETETYPES enhances the efficiency of a program in CREATING ALL POSSIBLE COMBINATIONS OF A VARIABLE.
Now let’s consider three different categories (‘MILD’, ‘MODERATE’ & 'SEVERE') for a variable AEREL and if the dataset does not contain one of the category (‘MODERATE’) and if this same category has to be displayed in the table, then the PRELOADFMT option is used [as it gives instruction to load the FORMAT for the missing category ‘MODERATE’].
proc format;
*USE OF PROC MEANS WITH MULTILABEL & PRELOADFMT;
proc means data=test completetypes nway chartype missing noprint;
by visit;
class treat aerel/preloadfmt missing mlf order=formatted;
format treat $tmt. aerel $rel.;
output out=cnt1 n=ccnt;
run;
-
Sunday, October 26, 2008
Display the "CLASSIFICATION", Not the "CODE"!
For example, the ATCCS for drug SUMATRIPTAN is given below:
ATC1 TERM: NERVOUS SYSTEM, N
ATC1 TERM CODE: N
ATC2 TERM: ANALGESICS, N02
ATC2 TERM CODE: N02
ATC3 TERM: ANTIMIGRAINE PREPARATIONS, N02C
ATC3 TERM CODE:N02C
ATC4 TERM: SELECTIVE SEROTONIN (5HT1) AGONISTS, N02CC
ATC4 TERM CODE: N02CC
ATC5 TERM: SUMATRIPTAN, N02CC01
ATC5 TERM CODE: N02CC01
This "ATC LEVEL 2 TERM" may be captured in CONCOMITANT MEDICATION DATASET, which when captured should display only the CLASSIFICATION: "ANALGESICS" and not the CODE: "N02". Hence the code is trimed off from the chemical classification.
-
data cmed;
set med;
x=trim(left(reverse(atc2t)));
y=substr(x,6,150);
z=reverse(y);
atc2term=strip(z);
run;
Tuesday, October 21, 2008
A code to split numbers!
length pt $20 txt $40;
pt ='1';
txt= '#12, #3, #56 34-98';
output;
pt ='2';
txt= '#2 #12';
output;
pt ='3';
txt= '#12';
output;
run;
***** SPLIT & OUTPUT NUMBERS *****
-
data TEST1;
set TEST;
stat:
do i = 1 to length(txt);
-
if substr(txt,i,1) = '#' then do;
-
if substr(txt,i+2,1) in (',','') then do;
cno = substr(txt,i+1,1); *3: for single digit;
txt = substr(txt,i+2);
end;
else if substr(txt,i+3,1) in (',','') then do;
cno = substr(txt,i+1,2); *12: for two digit;
txt = substr(txt,i+3);
end;
-
output;
if length(trim(left(txt))) > 2 then goto stat;
leave;
end;
-
end;
run;
data TEST1;
set TEST;
do i = 1 to length(txt);
if substr(txt,i+1,1) in (',','') then do;
cno = trim(left(substr(txt,i,1)));
i=i+2;
end;
else if substr(txt,i+2,1) in (',','') then do;
cno = trim(left(substr(txt,i,2)));
i=i+3;
end;
if cno ne '' then output;
end;
run;
Thursday, October 9, 2008
Retain for Repeated Visit
***** RETAIN THE VALUE FOR REPEATED VISIT *****
-
data test;
set vis;
by pt visitnum vstdt;
retain x;
if first.visitnum then x = visitnum;
else x=x+.1;*add .1 for repeated visit;
drop visitnum;
run;
-
Sunday, September 7, 2008
Using wildcard to reduce code
-
Colon can be used as a wildcard in variable lists. In the example given below, (keep=v:) keeps all variable names that begins with letter v.
-
data x(keep=v:) ;
v1=1 ;
v2=10 ;
v3=100 ;
v4=1000 ;
v5=10000 ;
w1=1;
w2=10;
run;
More: http://support.sas.com/publishing/pubcat/chaps/55513.pdf
-
Thursday, September 4, 2008
Four different ways to find duplicates
I
proc sort data = draft out = draft1 nodup;
by _all_ ;
run;
II
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)
proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;
PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT
III
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.
"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....
ID
001
001
but SAS THINKS this way
ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1
Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.
IV
proc freq data = draft noprint ;
table ID / out = draftdup (keep = ID Count where = (Count > 1)) ;
run;
Wednesday, August 27, 2008
Concatenate a set of Numbers
SUBJECT AGE IN1 IN2 IN3 -IN4 IN5 -IN6 IN7
00001111 --43- YES YES YES .----YES YES YES
00001222 --20- NO –NO -NO -NO NO -NO -.
data incl1;
length no $ 50;
set incl;
retain no;
array a[7] in1 - in7;
no = '';*no is the variable to HOLD THE VALUE;
do i=1 to 7;*loop is RUN TILL i=8;
if a[i]=2 then do;*if loop satisfies the condition "NO";
if no eq '' then no=trim(left(put(i,best.))); *then ASSIGN 1 to no;
else if no ne '' then no=trim(left(no))II', 'IItrim(left(put(i,best.)));
*now CONCATENATE no:1 with 2;
end;
end;
run;
Output for this subject 00001222 will be:
1, 2, 3, 4, 5, 6
-
Tuesday, August 26, 2008
Simple way of % Display!
data new;
input cnt deno;
datalines;
53 54
9 -10
12 12
2 ---9
1 ---5
;
run;
data new1;
set new;
if cnt ne . and deno ne . then per = (cnt*100)/deno;
if (per ne 0 and per ne .) and per ne 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.1)))'%)',8.));
else if per eq 100 then
cper = right(put(cnt,3.))IIright(put('('trim(left(put(per,4.)))'%)',8.));
run;
proc print; run;
-
3. -=> is for COUNT xxx
4.1 => is for PERCENTAGE xx.x / 4. => is for % (if 100) xxx
+3 => is for SPECIAL CHARACTERS (%)
+1 => is for a space in b/w COUNT & PERCENTAGE
Thursday, August 21, 2008
Display 'No Observations'
proc sql noprint;
select count (*) into : n from final;
run;
data disp;
length usubjid $20;
if &n^ = 0 then set final;
else usubjid = 'No Observations';
run;
Thursday, August 14, 2008
Wanna underline a header?
by descending treat;
column (rest1 rest2 ("_First Value_" (fval1 fval2)) ("_Second Value_" (sval1 sval2 sval3)) rest3 rest4);
define rest1 / order-- width = 22 left-- "rest of the values (1)" spacing = 0;
define rest2 / order-- width = 22 left-- "rest of the values (2)";
define fval1 / display width = 22 center "first:-value (1)";
define fval2 / display width = 22 center "first:-value (2)";
define sval1 / display width = 22 center "second value (1)";
define sval2 / display width = 22 center "second value (2)";
define sval3 / display width = 22 center "second value (3)";
define rest3 / display width = 22 center "rest of the values (3)";
define rest4 / display width = 22 center "rest of the values (4)";
break after rest1 / skip;
footnote1 "&under";
footnote2 "Required footnote for display.";
run;
The above example: shows the use of "_" to underline the header(First Value & Second Value). These are the variables (fval1 fval2, sval1 sval2 sval3) which will get displayed under the header.
-
Saturday, August 2, 2008
Generating missing values in a Table
10011001 1
10011001 7
10011002 1
10011003 1
In the above example, the usubjid (10011002 and 10011003) has visitnumber 7 missing. The proc transpose statement helps in generating these missing values.
** PROC TRANSPOSE FOR GENERATING MISSING VALUES **;
proc transpose data = d1 out = d2 prefix = v;
by usubjid treatnc ;
id vis;
*vis is a character variable which contains visitnumber (1,7);
var vala valb valc vald vale valf valg valh vali valj;
run;
When transposed the data becomes:
USUBJID V1 V7
10011001 -0 --0
10011002 -0 --''
10011003 -0 --''
** RE(cursive) TRANSPOSE: TO BRING THE DATA BACK TO ITS SHAPE **;
proc transpose data = d2 out = d3;
by usubjid treatnc ;
id _name_;
*all values vala - valj becomes identifier again;
var v1 v7;
run;
The data would now look like:
USUBJID _name_ vala
10011001 -v1-------0
10011001 -v7-------0
10011002 -v1-------0
10011002 -v7-------''
10011003 -v1-------0
10011003 -v7-------''
proc sort data = d3;
by usubjid treatnc _name_;
run;
** BRING THE VARIABLES ONE AFTER THE OTHER FOR FREQ **;
proc transpose data = d3(rename = (_name_ = name)) out = d4;
by usubjid treatnc name;
var vala valb valc vald vale valf valg valh vali valj;
run;
Friday, August 1, 2008
Why two proc reports?????
Wednesday, June 18, 2008
Saturday, June 14, 2008
Verbatim Terms and Medical Dictionaries
"These are answers that are recorded word by word (from a subject)" and are often used as a backup of research findings in the final report. These verbatim terms entered in the CRF's are then matched with the "standardized medical dictionaries", which are essential for medical coding. A little more on use of Dictionary is given below:
-
Dictionary name and its Use
WHOART: World Health Organization Adverse Reaction Terminology:
is used for Adverse event (AE) coding
COSTART: (FDA’s) Coding Symbols for a Thesaurus of Adverse Reaction Terms:
is used for Adverse event (AE) coding
MedDRA: Medical Dictionary for Regulatory Activities Terminology:
is used for AE / Med. History / Terms coding
ICD9CM: International Classification of Diseases 9th Revision Clinical Modification:
is used for Medical History coding
WHODD: World Health Organization Drug Dictionary (B1, B2, C):
is used for Medication coding (concomitant)
The WHO Dictionaries include:
- WHODD : WHO Drug Dictionary
- WHOHD : WHO Herbal Dictionary
- WHODDE : WHO Drug Dictionary enhanced.
*ATC: Anatomical Therapeutic Chemical classification is a classification used for coding the therapeutic use of drugs. This classification categorizes substances at five different levels according to the system or organ they act on and according to the chemical, therapeutic & pharmacological properties.
More: http://www.riteshmandal.com/medcoding.htm
Friday, June 13, 2008
"AETERM is Verbatim Name" and "AEDECOD is Preferred Term"
While doing AE Listings, there exists some confusion as these words (AETERM & Preferred TERM) sound similar. Every time I annotate AE Listing, I map “–TERM Variable for Preferred Term” and “–DECOD Variable for Verbatim Name” which is exclusively wrong. I have posted it here just to remind myself "AETERM (i.e., reported term) is Verbatim Name" and "AEDECOD is Preferred Term". Listed below is the CDISC Guidelines for Variables:
Define Variables and CDISC Guidelines:
System Organ Class (SOC) / --BODSYS:
It is the Body System or preferred system organ class that is involved in an event or measurement from the standard hierarchy.
Preferred Term (PT) / --DECOD:
It is the Dictionary derived text description of the topic variable. Equivalent to the Preferred Term (PT in MedDRA). The dictionary name and version should be provided in the comments column in the data definition file.
--TERM:
It is the Topic variable for an event observation, which is the Verbatim Name of the event. Use --MODIFY for modifying the verbatim term and --DECOD for coding --MODIFY or --TERM.
--MODIFY:
It is the Modified Text, if the topic variable is modified as part of a defined procedure.
An Eg. of MedDRA Hierarchy
System Organ Class (SOC) : Gastrointerstinal Disorders
High Level Group Term (HLGT): Dental and Gingival conditions
High Level Term (HLT): Dental pian and sensation disorders
Preferred Term (PT): Toothache <=SSC or SMQ
Lowest Level Term (LLT): Toothache
Must Read For SAS Clinical Programmers!
It is an informative pdf on "CDISC Submission Data Domain Models"
Saturday, May 10, 2008
Alt Codes for SAS Programming
Listed below, is the set of Alt Codes used for entering symbols in SAS programming.
Instructions for using Alt Codes:
- Make sure that the "Number Lock" is switched on.
- Hold down the "Alt" key and type the code on keypad.
- Do not release the "Alt" key until the alt code is typed.
- Special character are seen when the key is released.
Use of space for ‘alignment’ may not work at times. In case if it doesn't work, one can try using Alt 255, which will replace the space with an invisible character and make the alignment look perfect!
Sunday, May 4, 2008
SAS Professionals
2008 SAS Employee Video Contest Winner. A video that I enjoyed watching.......
Wednesday, April 2, 2008
For Partial Dates (UN-NOV-2006)
*pardt:- partial date;
if pardt ne '' then do;
date = substr(pardt,1,2);
month= substr(pardt,4,3);
year = substr(pardt,8,4);
if (month > '' and date >'' and year >'') then do;*;
if compress(date,'UN')='' then ndate='01';
if compress(ndate,'01')='' then do;**ndate;
if month > '' and upcase(month) in
('JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC') then do;*** month;
if compress(year,'0123456789')='' then do;****year;
endtc=input(compress(ndatemonthyear),date9.);
end;****;
format endtc yymmdd10.;
endtc1=put(endtc,yymmdd10.);
endtc1=trim(left(endtc1));
enddate=substr(endtc1,1,7);
end;***;end;**;
end;*;
end;
-
More: http://www.clinplus.com/vboard/showthread.php?t=8
-
The "n" means no separator
*new data;
data date;
input visitdt DATE9.;
datalines;
26MAY2006
;
run;
*conversion of date9.;
data date1;
set date;
format visitdt DATE9.;
*"n" means no separator;
visitdtc=put(visitdt, yymmddn8.);
run;
proc contents data=date1; run;
Output:
20060526
More: http://ftp.sas.com/techsup/download/sample/datastep/dateformat.html
How to insert a picture into a Power Point Slide
How to Add a Picture / Transparent background
- Open a blank PowerPoint presentation, or a completed presentation to which you want to add a background. Note: If your completed presentation contains master slides, you might not want to apply the background to the master slides and risk unwanted changes to your presentation. The alternative is to add the background to one slide at a time.
- To add a background to all the slides in a blank presentation, point to Master on the View menu, click Slide Master, and perform the following steps on the slide master. To add a background to a single slide, simply select a slide and follow the same steps.
- Choose the logo or clip art you want to use as a slide background. On the Insert menu, point to Picture, and then click Clip Art to create a background from a new picture, or click From File to use a picture you have in your files. Keep in mind that not all clip art makes an appropriate background. You might have to experiment a little.
- Select the logo or clip art you want and insert it in the presentation. The logo or clip art appears on the slide master, but is not yet transparent.
- Right-click the logo or clip art on the slide, and then click Format Picture.
- On the Picture tab, click the arrow next to the Color box, click Washout (Watermark in PowerPoint 2000), and then click OK.
- With the logo or clip art still selected, right-click and then click Save as Picture. Save the image where you can find it. Delete the original image from the slide. You'll replace it with the transparent version you just prepared and saved.
- On the Format menu, click Background. In the Background dialog box, click the arrow in the box under Background fill, click Fill Effects, and on the Picture tab, click Select Picture. Find and select the picture you formatted as a transparent background, click Insert, click OK, and then click Apply.
- Your transparent image has been added to the master slide as a background, and the new background will appear on every new slide you add to the presentation. If you selected a single slide, the background has been added to the slide you selected.
More:http://office.microsoft.com/en-us/powerpoint/HA010797741033.aspx?pid=CL100626991033
Saturday, March 22, 2008
Character array for creating a dummy dataset
data one;
input visit$ 1-10;
datalines;
DAY 8
DAY 15
DAY 29
DAY 57
DAY 85
;
run;
-
*creating a dummy dataset with 3 different xxorres for each visit present in dataset one;
data dummy;
set one;
length xxorres $ 20;
array x{*} _character_;
do i = 1 to dim(x);
if i gt 0 then do;
xxorres='COMPLETE CLEARANCE'; output;
xxorres='INCOMPLETE CLEARANCE'; output;
xxorres='UNABLE TO ASSESS'; output;
i=i+1;
end;
end;
run;
proc sort data=dummy; by i; run;
proc print; run;
-
More: http://dist.stat.tamu.edu/flash/SAS/
-
*display of zero for missing values (using:character array);
data display;
set disp;
*new: is array name, l1 to l4: are character variable;
array new l1 - l4 _character_;
do i=1 to dim(new);
if new{i}='' then new{i}='0';
end;
run;
-
Tuesday, March 18, 2008
"When" for data analysis
- "When the ORRES is populated, the STAT (& REASND) remains blank. When the STAT (& REASND) is populated, the ORRES remains blank".
- "When both the raw dataset and the derived dataset names are same, the derived dataset repalce the raw dataset".
- "When two dataset's are merged & one of them have few missing usubjid's:- if a and b; condition is preferred to avoid blanks.
- "When TERM is missing, the derived dataset have blank values". Ex:- To drop observations without value:- if aeterm ne " "; condition is used.
- "When the derived dataset have both TERM and STAT:- if aeterm ne " " or aestat ne " "; condition is used".
- "When the result is not missing in raw dataset then ORRES=result, STRESC=ORRES and STRESN=STRESC(the numeric form of STRESC)". Usually ORRES & STRESC are displayed as "Y" / "N" instead of "Yes" / "No".
- "When the STRESC is "Y" / "N then STRESN is displayed as "blank" (rather than 1 / 2) ".
- "When END DATE is " " then ENRF="ONGOING"; Example: MH dataset".
- "When a PARTIAL DATE is present in a raw dataset, then the same is displayed in the derived dataset".
- "When a sequence number is assigned, it remains unique for all observations within a usubjid".
- When working on display of DAYS, Programmer's check for DAY:- not displayed as 0".
- When the variables in the SAS data set are not in a desired order then one of the following (ATTRIB, ARRAY, FORMAT, INFORMAT, LENGTH, RETAIN) is used before the SET statement to re-order it.
- When a split character is used in the listing/table programming then the same split character must be used in the proc report with a FLOW option. This flow option hold back the direct display of split chatacter "*" in the output.
- When the following warning is received in the log, the PROC PRINT option is removed from the program. "WARNING: Data too long for column "TEST"; truncated to 125 characters to fit."
- When an order is missing in the final dataset of a table and if the same order is being used in the proc report: - the row for which the order is missing will not be displayed.
- When different order numbers are given for the "result HEADER" and the "RESULT"(in listing/table) & if the same order variable is used in the break after statement (break after ord/ skip;) , there appears a blank line. To avoid this same order number is given for both the header and result.
Age (years)-> result header
blank line ------------
N->result
Saturday, March 15, 2008
10 Essential Proc's
PROC name [DATA=dataset [dsoptions] ] [options];
[other PROC-specific statements;]
[BY varlist;]
RUN;
where:
name:------ identifies the procedure you want to use.
dataset:---- identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used.
dsoptions:-- specifies the data set options to be used.
varlist:----- specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables.
options:---- specifies the PROC-specific options to be used.
/***********************************************************/
PROC COMPARE BASE = mydataset
COMPARE = otherds;
RUN;
/***********************************************************/
2) SAS Language allows users to assign their own formats to values in their data. Usually, this involves users asking SAS to replace numbers in their data with some kind of labels. The most common example is the user request that: SAS take '1's and '2's in a variable called SEX and format those values as the words 'female' and 'male' when they are displayed in output.
PROC FORMAT;
VALUE sexfmt
1="Female"
2="Male";
RUN;
FORMAT sex sexfmt.;
O U
-----------------------
1 female
2 male
2 male
1 female
2 male
O: original data, U: output using user-defined format
The original data stay the same -- coded as numbers -- but whenever the format is requested, the output shows the assigned labels instead of the original numbers. These kinds of formats are called 'user-defined formats'. The formats already known to SAS are called SAS formats.
/***********************************************************/
PROC SQL;
SELECT STATE, SALES,
(SALES * .05) AS TAX
FROM USSALES;
QUIT;
It has the functionality of DATA and PROC steps into a single step. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new table or view all in one step.
/***********************************************************/
- Data set level
Name
Engine
Creation date
Number of observations
Number of variables
File size (bytes) - Variable level
Name
Type
Length
Formats
Position
Label
PROC CONTENTS DATA=temp (options);
RUN;
Few options:-
Position: output lists the variables by their position in the data set (default is alphabetical).
Short: output is just the variable names in a row by row format.
Out=filename: creates a data set where each observation is a variable from the original data set.
/***********************************************************/
5) Values of all the variables in the data set will be listed (by PROC PRINT) unless a variables (VAR) statement accompanies the procedure. The VAR statement can also be used to specify the order in which variables are to be printed.
PROC PRINT DATA=temp (options);
RUN;
Few options:-
N: The number of observations is printed after the listing.
UNIFORM: This option specifies that the values for each variable be printed in the same columns on every page of the output.
DOUBLE: This option forces SAS to double space the output.
ROUND: This option causes SAS to round variables being summed. NOOBS: This option causes suppression of the observation number in the printed output.
LABEL: Use information from LABEL statements (where defined) as the column headings rather than the variable names.
/***********************************************************/
6) The FREQ procedure produces one-way frequency tables and n-way cross tabulations. For example, to obtain frequency counts for all variables in the SURVEY data set, enter: PROC FREQ;
PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;
RUN;
Observe that:
- Tables can be produced for numeric and character variables.
- More than one TABLES statement can be used.
- When the TABLES statement is omitted, one-way frequency tables are printed for each variable in the data set.
- There is no limit to the number of variables in a TABLES request.
- For a table of two or more variables an * must be given between each pair of variable names.
The following example illustrates the generation of one-, two-, and three-way frequency tables:
PROC FREQ;
TABLES sex;
TABLES sex * r1;
TABLES sex * r1 * r2;
The first TABLES statement generates a one-way frequency table for the variable SEX. The second TABLES statement generates a two-way frequency table for the row variable SEX and the column variable R1. The third TABLES statement generates a three-way frequency table for the variables SEX, R1, and R2; R1 is the row variable and R2 is the column; a new table is printed for each different value of the variable SEX.
/***********************************************************/
7) The MEANS procedure is used to produce simple descriptive statistics on numeric variables in a SAS data set.
PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
RUN;
/***********************************************************/
8) PROC TRANSPOSE helps to reshape the long data to wide one.
data long1;
input famid year faminc;
cards;
1111 96 40000
1111 97 40500
1111 98 41000
2222 96 45000
2222 97 45400
2222 98 45800
3333 96 75000
3333 97 76000
3333 98 77000 ;
run;
proc sort data=long1; by famid; run;
*incyr: income year, prefix is used for identification;
PROC TRANSPOSE DATA=long1 OUT=wide1 PREFIX=incyr;
*same sort order is used as in proc sort statement;
BY FAMID;
*this is the variable that would become as: identifier/header;
ID YEAR;
*this is the variable that would become as: observation;
VAR FAMINC;
RUN;
Result:
Obs famid _NAME_ incyr96 incyr97 incyr98
1--- 1111 --faminc- 40000 -40500 -41000
2--- 2222 --faminc- 45000 -45400 -45800
3--- 3333 --faminc- 75000 -76000 -77000
/***********************************************************/
9) PROC SORT: sort's the data file. The program below sorts the data file called "DRAFT" on the variable "USUBJID and VISIT" and saves the sorted file as "DRAFT1". The original file remains unchanged since we used out=DRAFT1 to specify that the sorted data should be placed in DRAFT1.
PROC SORT DATA=DRAFT OUT=DRAFT1;
BY USUBJID VISIT;
RUN ;
*new data;
DATA DRAFT2;
SET DRAFT1;
BY USUBJID VISIT;
*if first dot is used to pick out the first obs;
IF FIRST.VISIT;
RUN;
(NODUPKEY option: is used with proc sort to remove duplicates)
/***********************************************************/
10) PROC REPORT: Influence the way in which a data has to be presented and the compute block introduces lines of text into the report.
PROC REPORT .......;
column sex;
.....;
COMPUTE BEFORE sex;
line@1 'Gender';
line@1 sex $200.;
endcomp;
RUN;
Monday, March 10, 2008
TRANWRD, COMPRESS & INDEXW
TRANWRD Function: "Replace all ocurrences of a word".
name=tranwrd(name, "Miss", "Ms.");
put name;
Value: Miss. Joan Smith
Result: Ms. Joan Smith
More: http://www.asu.edu/sas/sasdoc/sashtml/lgref/z0215027.htm
COMPRESS Function: "Removes specific characters from a character string".
a='AB C D ';
b=compress(a);
put b;
Value: 'AB C D ';
Result: ABCD
More: http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000212246.htm
INDEXW Function: "Returns the first position in the character-value that contains the find-string". If the find-string is not found, the function returns a 0.
indexw(STRING1,"the");
Value: STRING1 = "there is a the here"
Result: 12 (the word "the")
More: http://support.sas.com/publishing/pubcat/chaps/59343.pdf
-
Wednesday, March 5, 2008
Formulas for creating Vital Signs Dataset
For temperature:
Conversion of Fahrenheit to Celsius:
Temperature_in_Celsius = (5/9)*(Tf-32); *where Tf is the "temperature in fahrenheit";
Conversion of Celsius to Fahrenheit :
Temperature_in_Fahrenheit = (9/5)*Tc+32; *where Tc is the "temperature in celsius";
height:
Conversion of Inches to Centimeters:
Centimeters = inches x 2.54;
Conversion of Centimeters to Inches:
Inches = centimeters x 0.3937;
weight:
Conversion of Pounds to Kilograms;
Kilograms = lbs / 2.2;
Conversion of Kilograms to Pounds:
Pounds = kg x 2.2;
& bmi:
*Weight in Pounds:
BMI = (Weight in Pounds / (Height in inches) x (Height in inches)) x 703
*Weight in Kilograms:
BMI = (Weight in Kilograms / (Height in Meters) x (Height in Meters))
-
Saturday, February 23, 2008
Industry Reports
The Thomson CenterWatch 2007 Survey of Investigative Sites in the U.S. finds that: sites have rated Kendle, Covance, Omnicare as Top CROs to work with.