SuperBass SAS

Monday, 10 October 2011

Interleaving merge - so cool!

"I heard Cameron is actually C3P0 covered in wafer thin ham"

Here's a Great tip for merging. Ever got annoyed when you had to merge sorted datasets together and then had to re-sort because the order got messed up? Yeah, well not anymore buddy, because Marky boy's comin to the rescue!

data interleaved_set;
merge dataset_1
dataset_2;
by variable; *key bit!;
run;

The data step above will read in rows from both input datasets in order of the By variable. You can think of it as weaving the output together, or interleaving as it's called.

This saves a lot of time & memory because sorting is the memory intensive thing you can do in SAS.

Stay safe Sas fans, and yes, i'm accepting groupies. (Sas professionals only)

Tuesday, 4 October 2011

Normalising / Standardising Data in SAS

"I am pork boy, the breakfast monkey."

So here's a good procedure I came across today. If you ever need to quickly standardize a dataset then you can use the amazing Proc Standard!

PROC STANDARD DATA=cars MEAN=0 STD=1 OUT=zcars;
  VAR weight price ;
RUN;

This will standardize a dataset based on the normal distribution. Boy I wish I knew about this a few months ago, it might of saved my job!

Wednesday, 21 September 2011

SAS Efficiency: Prefiltering Input Datasets

"Last night I lay in bed looking up at the stars in the sky and I thought to myself, where the heck is the ceiling."

Here is a great SAS tip that requires a bit of background explanation, but it's worth it!

The tip is about how the data step works. When you read in a data set with the set statement like below, SAS processes every column and row of the input file. If the data set is long and wide it will take SAS a long time to finish!

Data Output;
Set Input;
Run;

If you were only interested in a few columns, you might you be tempted to do something like this:

Data Output (keep = Branch Line Date);
Set Input;
Run;

Which will create an output data set with just the 3 columns, Branch Line and Date. However! using this method means that SAS still has to read in every column and row from the Input data set. Instead it's far better to do this:

Data Output;
Set Input (keep = Branch Line Date);
Run;

Here we are telling SAS to only read in the three columns Branch Line and Date from the input data set. Every other column will be ignored so the process of reading in the Input data set will be far more efficient and quick!

But what if we only want a subset of Branches, Lines or Dates?

Say we just want to see the Lines in Branch 308, we could to the following:

data Output;
set Input (keep = Branch Line Date);
where Branch = 308;
run;

Once the Input data set has been processed, every record of the Branch column would then be re-checked to see if the branch was equal to 308, as demanded by the Where statement.

But we know that the Set statement reads in every column and row of the input data set, so why not check if the Branch is equal to 308 at the same time to prevent double work?

data Output;
set Input (keep = Branch Line Date where = (Branch = 308) );
run;

By using Where in the Set statement, you can drastically improve the efficiency of the data step.

Additionally, if you have multiple conditions on the data you want to bring in, make sure the first condition is most unlikely, the set statement will stop processing a record if a condition is not met and move on to the next one. The earlier you can stop processing a record you don't want, the less work SAS will have to do!

data Output;
set Input (keep = Branch Line Date where = ( Date = '24Dec2010'd and Branch = 308 ) );
run;

There are plenty of other things you can do in the Set statement. I'd recommend you do as much here as possible. Enjoy!

data Output;
set Input (Keep = Branch Line Date Rename=(Line = Product) where = (Branch = 308) );
run;

Evaluating calculations in open code

"I love salad"

If you want to calculate a value outside of a datastep then use the following code.

%let Mark = %SysEvalF(6 + 6);

%put &Mark;

Log should output 12!

Friday, 16 September 2011

Exponential Smoothing in SAS

"It's Not funny!"

A very nice Chinese lady once asked me "Can you save the Kennedy Space Center?" I replied with the only response possible for a true SAS Professional.. An explanation of how to forecast using exponential smoothing models within Base SAS!

I guess this is why I still live with my parents.

/* How to do Smoothing in SAS*/

There are 5 steps:

1. Sort your data so they are in line-date order

2. Specify the type of smoothing model you want with proc hpfesmspec

3. Select the smoothing model(s) you want to use with proc hpfselect

4. Create a forecast with your selected smoothing model(s) using proc hpfengine

5. Pray.

1. Make sure your data is Sorted first or you’ll get an error.

proc sort data=frozen_sample out=frozenSampleSort;

by customer linenum dow date;

run;

2. Define the exact type of Smoothing model(s) you want to use.

There are many ways to Smooth. Some popular techniques are Simple, Double and Winters. The basic smoothing technique is called ‘Simple’ in SAS. proc hpfesmspec is where we define the type of smoothing we want to use.

This is also the place where we can tell SAS to automatically choose the best smoothing weights for us, or used a fixed weight.

proc hpfesmspec repository = work.model *where the model will be saved;

specname = Simple; *the name of your model;

esm method = Simple; *the smoothing model;

run;

Add this code after the 'esm method = Simple' to use a fixed smoothing weight:

levelrest = (0,1) levelparm = "insert smoothing weight here" noest

Select the model(s) you want to use.

Now that you’ve physically saved your model(s) the next step is to load them into memory (kind of like what happens with macro variables). You do this by creating a virtual folder, and putting all the models you want to use into it. In the next step you will point to this virtual folder as the location that contains all the models you want to use.

proc hpfselect repository = work.model *where the model(s) were saved;

name = myselect; *Name of the virtual folder;

spec simple; *The name of the model(s) you want to use;

run;

Create a forecast with your selected model, Finally!

hpfengine is the step that actually generates the forecast. There are many options, a few important ones have been included.

proc hpfengine data = frozenSampleSort *sorted dataset;

outfor=WeekForecast *save just the forecast;

outest=WeekEst *save just the smoothing weights;

outstat=WeekStat *save just the statistics;

out=WeekData *save everything;

repository = work.model *location of models;

globalselection = myselect *location of virtual folder;

back = 0 *hides last x actual from forecast;

lead = 4 ; *forecasts y periods ahead;

by customer linenum dow; *by variable;

id date interval=day; *time between observations;

forecast weekdemand; *forecast variable;

run;

5. Hu Yu Hai Ding? Tai Ni Po Ni.

Do let me know if you have any questions, until next time SAS fans!

Monday, 15 August 2011

Permissions

"There are no personal problems which cannot be solved through suitable application of high explosives."

Sometimes other people want to use the data sets you create. Sure they can read em' but what if they need to edit them too? The following code will allow you to change the permissions of your data sets within SAS:

%let PERMIS = 777;
%let extpath = /sasusers/mark/sasuser.v91;
%let dataset = edit_me;

filename chmod pipe "chmod &PERMIS %trim(&extpath)/&dataset..sas7bdat";

data _null_;
file chmod;
run;

note: the data set name specified must be lowercase or else it wont work. You could use the %lower statement to do this automatically.

Chmod changes the permissions of the data set in the library specified. How does it work?

777 is an octal. It refers to a permission configuration that allows absolutely anyone to read or write to the files specified in 'dataset'. The first number refers to the user, the second refers to your group and the third number refers to everyone else.

The values and their meanings can be seen here:
http://en.wikipedia.org/wiki/Chmod under 'numerical permissions'

This is great to use when SAS annoyingly reverts permission to read only whenever you update a file. If only it was this easy with women!

Saturday, 6 August 2011

Week Beginning and End

So you've been asked to compare the week on week changes in data, no problem!!......Except the data set you have only has the data per individual day.

If only there was a way to convert the individual dates to their corresponding weekdate! As you've probably guessed by now with SAS their is always a solution, in this case it's the INTNX function.

(INTNX('week.2', date_variable, 0, 'beginning'))

Creating a new column and setting it equal to the above function converts a date variable to a weekdate. Each weekdate begins on a Monday however this can easily be changed by changing the integer following 'week.2' E.g. Changing this to 'week.1' will bring back weekdates starting from Sunday.

The second argument in the function is where your date variable goes, however what is even more interesting is what the third and fourth argument allow you to do. If the third argument is changed from 0 to 1 instead, SAS will offset the weekdate by one week, e.g. if the start of the week is 25JUL2011, SAS will shift this to 01AUG2011 instead. This can be extremely useful at times.

The fourth argument specifies whether you want your weekdate to correspond to the beginning of the week or the end of the week. This can be achieved by changing 'beginning' to 'end'.

Problem Solved!