sas - Standardising dataset attributes across projects -
background:
i have multiple old projects need standardise (prj01
-prj10
). each stored under own libname , each has around 30 datasets (note: not studies have same 30 datasets).
the variable names have remained consistent across projects. however, on years, labels , formats have been assigned these variable names have changed in places - example below:
attribute inconsistencies between studies:
data prj01.users(label='user identifiers') ; attrib userid label='username' format=$20. ; run ; data prj02.users(label='user identifiers') ; attrib userid label='name of user' format=$15. ; run;
attribute inconsistencies within studies:
data prj02.users(label='user identifiers') ; attrib userid label='name of user' format=$15. ; run; data prj02.orders(label='orders') ; attrib userid label='name of user' format=$15.) orderno label='order number' format=8. ; run ;
i have written program report inconsistencies. however, need generate 'tidy' copies of projects giving them standardised structure. current thinking should create dataset of standard variables below can add , adjust until have defined in there:
data standards ; attrib userid label='username ' format=$20. orderno label='order number ' format=8. ;run ;
question:
from standards
dataset, best way apply attributes ever these variables exist?
i write output datasets new libnames eg: prj01.users
--> prjstd01.users
, put errors log if there variables changed variable length getting truncated.
create dictionary table containing standards:
name label format userid username $20. orderno order number 8.
join dictionary table containing column names in library:
proc sql; create table standards2 select d.memname, s.name, s.label, s.format sashelp.vcolumn d inner join standards s on d.name = s.name libname eq 'prj01' order d.memname, s.name ; quit;
to this:
memname name label format users userid username $20. orders userid username $20. orders orderno order number 8.
then read data set using put
statements create proc datasets
performs modifications.
filename gencode temp; data _null_; set standards2 end=eof; memname; file gencode; if _n_ = 1 put "proc datasets lib=prj01 nolist;"; if first.memname put " modify " memname ";"; put " label " name "='" label "';"; put " format " name format ";"; if eof put "quit;"; run; %include gencode / source2; filename gencode clear;
(stolen this paper)
you should able modify match rest of requirements (copying new libraries, iterating on projects).
Comments
Post a Comment