Monday, 7 November 2011

Hash Code

Merging data sets in SAS is extremely important, however sometimes (depending on the size of the data sets you're merging) this can take a long time to run. Once again SuperBass SAS to the rescue. Using the hash code below can speed this process up massively!


data merged_account_information;
      set account1; /*first table you are merging*/

      length gender $6.;

      if _n_=1 then do;
      declare hash h1(dataset:"account_gender"); /*data memorising information from (the other table you are merging with)*/
       h1.defineKey("account_number"); /*define key to merge on*/
       h1.defineData("gender"); /*define variable to bring across*/
       h1.defineDone(); /*closes hash set*/
      end;

      rc = h1.find() ;
      if rc = 0; /*match returns 0*/

      drop rc;
run;

/*to keep files that don’t match type rc ne 0*/

The code above merges the data set account1 with the data set account_gender by account number, and brings across the gender of the owner of each account!

Generally it's a good idea to declare hash on the smaller of the two tables your merging, otherwise you could get system support giving you a call sometime after you've run it! Also don't forget to specify the length of the variable you're bringing across!

Enjoy! P.S....I own a dog!!

Wednesday, 19 October 2011

Great Transposing Tip

"You blocked me on facebook and now you are going to die"

Transposing in SAS is a great feature, but I always forget how to do it. My very talented colleague made this little illustration to remind him how it works and now you can use it too!



/*          |  id
        ----+-----
        by  |  var       */


proc transpose data= bernard out= mark (drop=_:) prefix= title_;
      var data;
      id column_headings;
      by row_groups;
run;



Monday, 10 October 2011

Interleaving merge - so cool!

"I heard Cameron is actually C3P0 covered in wafer thin ham"

Here's a Great tip for merging. Ever got annoyed when you had to merge sorted datasets together and then had to re-sort because the order got messed up? Yeah, well not anymore buddy, because Marky boy's comin to the rescue!

data interleaved_set;
       merge dataset_1
                 dataset_2;
       by variable; *key bit!;
run;

The data step above will read in rows from both input datasets in order of the By variable. You can think of it as weaving the output together, or interleaving as it's called.

This saves a lot of time & memory because sorting is the memory intensive thing you can do in SAS.

Stay safe Sas fans, and yes, i'm accepting groupies. (Sas professionals only)

Tuesday, 4 October 2011

Normalising / Standardising Data in SAS

"I am pork boy, the breakfast monkey."


So here's a good procedure I came across today. If you ever need to quickly standardize a dataset then you can use the amazing Proc Standard!



PROC STANDARD DATA=cars MEAN=0 STD=1 OUT=zcars;
  VAR weight price ;
RUN;



This will standardize a dataset based on the normal distribution.  Boy I wish I knew about this a few months ago, it might of saved my job!

Wednesday, 21 September 2011

SAS Efficiency: Prefiltering Input Datasets

"Last night I lay in bed looking up at the stars in the sky and I thought to myself, where the heck is the ceiling."


Here is a great SAS tip that requires a bit of background explanation, but it's worth it!


The tip is about how the data step works. When you read in a data set with the set statement like below, SAS processes every column and row of the input file. If the data set is long and wide it will take SAS a long time to finish!


Data Output;
      Set Input;
Run;


If you were only interested in a few columns,  you might you be tempted to do something like this:


Data Output (keep = Branch Line Date);
       Set Input;
Run;


Which will create an output data set with just the 3 columns, Branch Line and Date. However! using this method means that SAS still has to read in every column and row from the Input data set. Instead it's far better to do this:


Data Output;
       Set Input (keep = Branch Line Date);
Run;


Here we are telling SAS to only read in the three columns Branch Line and Date from the input data set. Every other column will be ignored so the process of reading in the Input data set will be far more efficient and quick!


But what if we only want a subset of Branches, Lines or Dates?


Say we just want to see the Lines in Branch 308, we could to the following:


data Output;
      set Input (keep = Branch Line Date);
      where Branch = 308;
run;


Once the Input data set has been processed,  every record of the Branch column would then be re-checked to see if the branch was equal to 308, as demanded by the Where statement.


But we know that the Set statement reads in every column and row of the input data set, so why not check if the Branch is equal to 308 at the same time to prevent double work?


data Output;
     set Input (keep = Branch Line Date where = (Branch = 308) );
run;


By using Where in the Set statement, you can drastically improve the efficiency of the data step.

Additionally, if you have multiple conditions on the data you want to bring in, make sure the first condition is most unlikely, the set statement will stop processing a record if a condition is not met  and move on to the next one. The earlier you can stop processing a record you don't want, the less work SAS will have to do!


data Output;
     set Input (keep = Branch Line Date where = ( Date = '24Dec2010'd and Branch = 308 ) );
run;



There are plenty of other things you can do in the Set statement. I'd recommend you do as much here as possible. Enjoy! 


data Output;
        set Input (Keep = Branch Line Date Rename=(Line = Product) where = (Branch = 308) );
run;







Evaluating calculations in open code

"I love salad"

If you want to calculate a value outside of a datastep then use the following code.

%let Mark = %SysEvalF(6 + 6);

%put &Mark;

Log should output 12!

Friday, 16 September 2011

Exponential Smoothing in SAS

"It's Not funny!"

A very nice Chinese lady once asked me "Can you save the Kennedy Space Center?" I replied with the only response possible for a true SAS Professional.. An explanation of how to forecast using exponential smoothing models within Base SAS! 


I guess this is why I still live with my parents.

/* How to do Smoothing in SAS*/

There are 5 steps:
      
   1. Sort your data so they are in line-date order
   2. Specify the type of smoothing model you want with proc hpfesmspec
   3. Select the smoothing model(s) you want to use with proc hpfselect
   4. Create a forecast with your selected smoothing model(s) using proc hpfengine
   5. Pray.


   1. Make sure your data is Sorted first or you’ll get an error.

proc sort data=frozen_sample out=frozenSampleSort;
      by customer linenum dow date;
run;

      2. Define the exact type of Smoothing model(s) you want to use.


There are many ways to Smooth. Some popular techniques are Simple, Double and Winters. The basic smoothing technique is called ‘Simple’ in SAS. proc hpfesmspec is where we define the type of smoothing we want to use.

This is also the place where we can tell SAS to automatically choose the best smoothing weights for us, or used a fixed weight.


proc hpfesmspec repository = work.model   *where the model will be saved;
        specname = Simple;          *the name of your model;
        esm method = Simple;        *the smoothing model;
run;

Add this code after the 'esm method = Simple' to use a fixed smoothing weight:

levelrest = (0,1) levelparm = "insert smoothing weight here" noest



  1. Select the model(s) you want to use.

Now that you’ve physically saved your model(s) the next step is to load them into memory (kind of like what happens with macro variables). You do this by creating a virtual folder, and putting all the models you want to use into it. In the next step you will point to this virtual folder as the location that contains all the models you want to use.

proc hpfselect repository = work.model  *where the model(s) were saved;
name = myselect;     *Name of the virtual folder;
      spec simple;         *The name of the model(s) you want to use;
run;



  1. Create a forecast with your selected model, Finally!

hpfengine is the step that actually generates the forecast. There are many options, a few important ones have been included.

proc hpfengine data = frozenSampleSort  *sorted dataset;
   outfor=WeekForecast     *save just the forecast;
   outest=WeekEst           *save just the smoothing weights;
   outstat=WeekStat         *save just the statistics;
   out=WeekData             *save everything;
               repository = work.model     *location of models;
               globalselection = myselect  *location of virtual folder;
               back = 0               *hides last x actual from forecast;
               lead = 4 ;             *forecasts y periods ahead;
               by customer linenum dow;   *by variable;
               id date interval=day;      *time between observations;
               forecast weekdemand;       *forecast variable;
run;

5. Hu Yu Hai Ding? Tai Ni Po Ni. 


Do let me know if you have any questions, until next time SAS fans!

Monday, 15 August 2011

Permissions

"There are no personal problems which cannot be solved through suitable application of high explosives."


Sometimes other people want to use the data sets you create. Sure they can read em' but what if they need to edit them too? The following code will allow you to change the permissions of your data sets within SAS:


%let PERMIS = 777;
%let extpath = /sasusers/mark/sasuser.v91;
%let dataset = edit_me;

filename chmod pipe "chmod &PERMIS %trim(&extpath)/&dataset..sas7bdat";

data _null_;
   file chmod;
run; 



note: the data set name specified must be lowercase or else it wont work. You could use the %lower statement to do this automatically.

Chmod changes the permissions of the data set in the library specified. How does it work?

777 is an octal. It refers to a permission configuration that allows absolutely anyone to read or write to the files specified in 'dataset'. The first number refers to the user, the second refers to your group and the third number refers to everyone else.

The values and their meanings can be seen here:
http://en.wikipedia.org/wiki/Chmod under 'numerical permissions'

This is great to use when SAS annoyingly reverts permission to read only whenever you update a file.  If only it was this easy with women!

Saturday, 6 August 2011

Week Beginning and End

So you've been asked to compare the week on week changes in data, no problem!!......Except the data set you have only has the data per individual day.

If only there was a way to convert the individual dates to their corresponding weekdate! As you've probably guessed by now with SAS their is always a solution, in this case it's the INTNX function.

(INTNX('week.2', date_variable, 0, 'beginning'))

Creating a new column and setting it equal to the above function converts a date variable to a weekdate. Each weekdate begins on a Monday however this can easily be changed by changing the integer following 'week.2' E.g. Changing this to 'week.1' will bring back weekdates starting from Sunday.

The second argument in the function is where your date variable goes, however what is even more interesting is what the third and fourth argument allow you to do. If the third argument is changed from 0 to 1 instead, SAS will offset the weekdate by one week, e.g. if the start of the week is 25JUL2011, SAS will shift this to 01AUG2011 instead. This can be extremely useful at times.

The fourth argument specifies whether you want your weekdate to correspond to the beginning of the week or the end of the week. This can be achieved by changing 'beginning' to 'end'.

Problem Solved!

Wednesday, 20 July 2011

Fixing Base SAS

"Before I speak, I have something important to say."


Base SAS is dumb. Every time it opens you have to maximize it. What's the point? Like I have time to click stupid buttons when there is important analysis to be done! Fortunately, there is a way to customize how Base SAS starts up. The way to do this is to create a file called AutoExec.sas and save it in the root directory of your local hard drive:


C:\AutoExec.SAS


In the file use the following code to start SAS maximized:


dm 'awsmaximize';


It's also good if you want to automatically run code or assign libraries. I use the include statement to run the scripts that logs me on to our remote server: 


%include "C:\Hi Doggy\SAS Code\Remote Sign On.sas";


What used to be a pointless task is now a dream, I love you again Base SAS.

Date Evaluation

Filtering the date in your data using the today() function in SAS is great, as it will provide you with all the data with today's date.

However filtering for a certain number of days from today's date (e.g. 50 days) using the today() function (e.g. today()-50) can take a while to run, which is not great if your manager needs the data on their desk by the hour and you want to keep your job!!

Using the %eval function can speed up the process:


%let date= %eval(%sysfunc(today())- 50);


The code above creates the macro 'date' using the %eval function to calculate the value of the system function today()-50.

Now all you need to do is filter the date in your data set greater than &date and you'll have all the data you need in no time!

If you don't believe me try using the time it function to test the difference! http://superbasssas.blogspot.com/2011/07/time-it.html

Tuesday, 19 July 2011

IT Help to the Rescue

DATA DOWN!! Probably the worst nightmare for any analyst, but not when you have the amazing IT Help Desk at hand, after initially not picking up the phone the following conversation took place:


ITHelp                           you rang ...
Mark                              hey
Mark                              yeah just to let you know we have no data
ITHelp                           that's nice
Mark                              lol.....not really
Mark                              might be useful to have
Mark                              well apparently it doesnt exist now
ITHelp                           some people want it all
Mark                              yeah, these analysts huh......who would think that they actually need ................data
ITHelp                           that's just what is wrong with society today
Mark                              haha
Mark                              do you know what happened to the table?
ITHelp                           black hole?
ITHelp                           whirl pool?
ITHelp                           gap in the space time continuum
ITHelp                           ok sorry, I will have a look
Mark                              LOL
ITHelp                           just getting my dry suit on now
Mark                              haha, cool, be careful.......those gaps in the space time continuum ................can be tricky
ITHelp                           it' ok I've got my light saber
Mark                              may the force be with you
ITHelp                           I think that the job that updates the data has gotten stuck
ITHelp                           I should have noticed it earlier but was too busy eating scones - my ................apologies
Mark                              what did you eat the scones with?........thanks for letting me know, do ................you know when it will be ready?
ITHelp                           strawberry jam
ITHelp                           home made too
ITHelp                           I am going to have to stop and restart the server in order to sort this ................out - this will kill everybodys eg projects that are running
ITHelp                           how does that sound?
Mark                              errmmmm great to me, dunno about them...but who cares

After kicking everyone of the server normality was restored, thanks once again to the IT Help Desk!

Time it!

"Time flies like an arrow; fruit flies like a banana"

Often when writing code it's important to know how long it will take to run. This can be very useful when creating loops that need to be repeated hundreds of times or when working out which bits of code are slowing everything else down.

The following code will work out how long something takes to run and output the time into the log:

%let tic = %sysfunc(time(),8.); 

*Insert code here*

%let toc = %sysfunc(time(),8.);

%put Total Time %sysfunc(putn(%eval(&toc-&tic),time8.));


The code works by storing the system times into the macro variables 'tic' and 'toc'. The difference between the two will tell you how long the code took to run.  Here the 'Total Time' taken is printed into the log, but it can just as easily be saved into a data set for later viewing.


Mark Meat.

Monday, 18 July 2011

Creating A Macro Using Data From An Existing Table!

This is a simple piece of code that I use in most of the projects that I create.


PROC SQL noprint;
select distinct age into :agelist separated by ',' from agedataset;
quit;

The code above takes all of the ages from the data set conveniently called agedataset, and places them into a macro (also conveniently) called agelist. This list of ages within the macro is then separated via commas (',').

Now if I want to filter another table for only the ages within the agedataset, all I need to do is filter for ages IN &agelist and SAS will do the rest for me!

Welcome to SuperBass SAS!!

SuperBass SAS has been created to share some of the useful code that we as SAS analysts use daily. This can be anything from a simple macro to using hash tables, anything that makes our lives as analysts simpler that we sometimes take for granted.

Please feel free to add any code that you yourself are using, also if you see anything on this blog that you could make more efficient, please share this with us, so that we can all learn and become better analysts!!

Enjoy;