Monday, 7 November 2011

Hash Code

Merging data sets in SAS is extremely important, however sometimes (depending on the size of the data sets you're merging) this can take a long time to run. Once again SuperBass SAS to the rescue. Using the hash code below can speed this process up massively!


data merged_account_information;
      set account1; /*first table you are merging*/

      length gender $6.;

      if _n_=1 then do;
      declare hash h1(dataset:"account_gender"); /*data memorising information from (the other table you are merging with)*/
       h1.defineKey("account_number"); /*define key to merge on*/
       h1.defineData("gender"); /*define variable to bring across*/
       h1.defineDone(); /*closes hash set*/
      end;

      rc = h1.find() ;
      if rc = 0; /*match returns 0*/

      drop rc;
run;

/*to keep files that don’t match type rc ne 0*/

The code above merges the data set account1 with the data set account_gender by account number, and brings across the gender of the owner of each account!

Generally it's a good idea to declare hash on the smaller of the two tables your merging, otherwise you could get system support giving you a call sometime after you've run it! Also don't forget to specify the length of the variable you're bringing across!

Enjoy! P.S....I own a dog!!

Wednesday, 19 October 2011

Great Transposing Tip

"You blocked me on facebook and now you are going to die"

Transposing in SAS is a great feature, but I always forget how to do it. My very talented colleague made this little illustration to remind him how it works and now you can use it too!



/*          |  id
        ----+-----
        by  |  var       */


proc transpose data= bernard out= mark (drop=_:) prefix= title_;
      var data;
      id column_headings;
      by row_groups;
run;



Monday, 10 October 2011

Interleaving merge - so cool!

"I heard Cameron is actually C3P0 covered in wafer thin ham"

Here's a Great tip for merging. Ever got annoyed when you had to merge sorted datasets together and then had to re-sort because the order got messed up? Yeah, well not anymore buddy, because Marky boy's comin to the rescue!

data interleaved_set;
       merge dataset_1
                 dataset_2;
       by variable; *key bit!;
run;

The data step above will read in rows from both input datasets in order of the By variable. You can think of it as weaving the output together, or interleaving as it's called.

This saves a lot of time & memory because sorting is the memory intensive thing you can do in SAS.

Stay safe Sas fans, and yes, i'm accepting groupies. (Sas professionals only)

Tuesday, 4 October 2011

Normalising / Standardising Data in SAS

"I am pork boy, the breakfast monkey."


So here's a good procedure I came across today. If you ever need to quickly standardize a dataset then you can use the amazing Proc Standard!



PROC STANDARD DATA=cars MEAN=0 STD=1 OUT=zcars;
  VAR weight price ;
RUN;



This will standardize a dataset based on the normal distribution.  Boy I wish I knew about this a few months ago, it might of saved my job!

Wednesday, 21 September 2011

SAS Efficiency: Prefiltering Input Datasets

"Last night I lay in bed looking up at the stars in the sky and I thought to myself, where the heck is the ceiling."


Here is a great SAS tip that requires a bit of background explanation, but it's worth it!


The tip is about how the data step works. When you read in a data set with the set statement like below, SAS processes every column and row of the input file. If the data set is long and wide it will take SAS a long time to finish!


Data Output;
      Set Input;
Run;


If you were only interested in a few columns,  you might you be tempted to do something like this:


Data Output (keep = Branch Line Date);
       Set Input;
Run;


Which will create an output data set with just the 3 columns, Branch Line and Date. However! using this method means that SAS still has to read in every column and row from the Input data set. Instead it's far better to do this:


Data Output;
       Set Input (keep = Branch Line Date);
Run;


Here we are telling SAS to only read in the three columns Branch Line and Date from the input data set. Every other column will be ignored so the process of reading in the Input data set will be far more efficient and quick!


But what if we only want a subset of Branches, Lines or Dates?


Say we just want to see the Lines in Branch 308, we could to the following:


data Output;
      set Input (keep = Branch Line Date);
      where Branch = 308;
run;


Once the Input data set has been processed,  every record of the Branch column would then be re-checked to see if the branch was equal to 308, as demanded by the Where statement.


But we know that the Set statement reads in every column and row of the input data set, so why not check if the Branch is equal to 308 at the same time to prevent double work?


data Output;
     set Input (keep = Branch Line Date where = (Branch = 308) );
run;


By using Where in the Set statement, you can drastically improve the efficiency of the data step.

Additionally, if you have multiple conditions on the data you want to bring in, make sure the first condition is most unlikely, the set statement will stop processing a record if a condition is not met  and move on to the next one. The earlier you can stop processing a record you don't want, the less work SAS will have to do!


data Output;
     set Input (keep = Branch Line Date where = ( Date = '24Dec2010'd and Branch = 308 ) );
run;



There are plenty of other things you can do in the Set statement. I'd recommend you do as much here as possible. Enjoy! 


data Output;
        set Input (Keep = Branch Line Date Rename=(Line = Product) where = (Branch = 308) );
run;







Evaluating calculations in open code

"I love salad"

If you want to calculate a value outside of a datastep then use the following code.

%let Mark = %SysEvalF(6 + 6);

%put &Mark;

Log should output 12!

Friday, 16 September 2011

Exponential Smoothing in SAS

"It's Not funny!"

A very nice Chinese lady once asked me "Can you save the Kennedy Space Center?" I replied with the only response possible for a true SAS Professional.. An explanation of how to forecast using exponential smoothing models within Base SAS! 


I guess this is why I still live with my parents.

/* How to do Smoothing in SAS*/

There are 5 steps:
      
   1. Sort your data so they are in line-date order
   2. Specify the type of smoothing model you want with proc hpfesmspec
   3. Select the smoothing model(s) you want to use with proc hpfselect
   4. Create a forecast with your selected smoothing model(s) using proc hpfengine
   5. Pray.


   1. Make sure your data is Sorted first or you’ll get an error.

proc sort data=frozen_sample out=frozenSampleSort;
      by customer linenum dow date;
run;

      2. Define the exact type of Smoothing model(s) you want to use.


There are many ways to Smooth. Some popular techniques are Simple, Double and Winters. The basic smoothing technique is called ‘Simple’ in SAS. proc hpfesmspec is where we define the type of smoothing we want to use.

This is also the place where we can tell SAS to automatically choose the best smoothing weights for us, or used a fixed weight.


proc hpfesmspec repository = work.model   *where the model will be saved;
        specname = Simple;          *the name of your model;
        esm method = Simple;        *the smoothing model;
run;

Add this code after the 'esm method = Simple' to use a fixed smoothing weight:

levelrest = (0,1) levelparm = "insert smoothing weight here" noest



  1. Select the model(s) you want to use.

Now that you’ve physically saved your model(s) the next step is to load them into memory (kind of like what happens with macro variables). You do this by creating a virtual folder, and putting all the models you want to use into it. In the next step you will point to this virtual folder as the location that contains all the models you want to use.

proc hpfselect repository = work.model  *where the model(s) were saved;
name = myselect;     *Name of the virtual folder;
      spec simple;         *The name of the model(s) you want to use;
run;



  1. Create a forecast with your selected model, Finally!

hpfengine is the step that actually generates the forecast. There are many options, a few important ones have been included.

proc hpfengine data = frozenSampleSort  *sorted dataset;
   outfor=WeekForecast     *save just the forecast;
   outest=WeekEst           *save just the smoothing weights;
   outstat=WeekStat         *save just the statistics;
   out=WeekData             *save everything;
               repository = work.model     *location of models;
               globalselection = myselect  *location of virtual folder;
               back = 0               *hides last x actual from forecast;
               lead = 4 ;             *forecasts y periods ahead;
               by customer linenum dow;   *by variable;
               id date interval=day;      *time between observations;
               forecast weekdemand;       *forecast variable;
run;

5. Hu Yu Hai Ding? Tai Ni Po Ni. 


Do let me know if you have any questions, until next time SAS fans!