-
We must have probably come across duplicate records in our dataset & tried removing those by using nodup and nodupkey options.
I
I
-
Use of nodup will remove exact duplicates as given below:
proc sort data = draft out = draft1 nodup;
by _all_ ;
run;
proc sort data = draft out = draft1 nodup;
by _all_ ;
run;
-
NAME AGE SEX
Rama _27_ M
Rama _27_ M
-
II
II
-
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)
proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;
PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT
Where as the use of nodupkey looks at the by variables in the sort procedure and keeps the first records of those by variables (it removes the rest of the records when encountered again!)
proc sort data = draft out = draft1 nodupkey;
by petestcd;
run;
PETESTCD
Neck
Abdominal
Abdominal
Abdominal *these are the two records that will be removed by nodupkey;
HEENT
-
III
III
-
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.
The duplicate records in the dataset may result in the display of the Note => “MERGE statement has more than one data set with repeats of BY values” in the log. To determine this one can use "if first. ne last." method.
-
data check;
set dups;
by id;
if first.id ne last.id; *brings the duplicates;
run;
-
This was quite confusing to me as my little brain perceived it this way:
"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....
ID
001
001
but SAS THINKS this way
ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1
Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.
"Y Y first. ne last.?????
If an ID is repeated twice in the dataset then first dot is equal to last dot....
ID
001
001
but SAS THINKS this way
ID First. Last.
001 1---- 0
001 0---- 1
002 1---- 1
003 1---- 1
Records that are not duplicates will have FIRST.ID = 1 and LAST.ID = 1, so we select records where first dot is not equal to last dot.
-
IV
IV
-
The last way is the use of freq:
proc freq data = draft noprint ;
table ID / out = draftdup (keep = ID Count where = (Count > 1)) ;
run;
-
-
No comments:
Post a Comment