 How to Simulate Multiple Samples From a Linear Regression Model - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

# How to Simulate Multiple Samples From a Linear Regression Model

In this blog post, we will learn how to simulate multiple samples efficiently. In order to keep the discussion, easy we have simulated a single sample with ‘n’ number of observations, and ‘p’ amount of variables. But in order to use the Monte Carlo method to approximate the distribution sampling of statistics, one needs to simulate many specimens with the same regression model.

The data steps in SAS in  most blogs have 4 steps mentioned for so. However, to simulate multiple samples, put DO loop around these steps that will generate, the error term and the response variable for very observation made in the model.

With this following program we can create a single data set that will include the following: NumSamples (=100) samples. Here every sample will be identified with an ordinal variable named SampleID.

Seeking SAS courses in Pune? DexLab Analytics is now in Pune, after Delhi NCR!

```/* Simulate many samples from a  linear regression model */
%let N = 50;            /* N = sample size               */
%let nCont = 10;        /* p = number of continuous variables */
%let NumSamples = 100;  /* number of samples                  */
data SimReg(keep= SampleID i Y x:);
call streaminit(54321);
array x[&nCont];        /* explanatory variables are named x1-x&nCont */

/* 1. Specify model coefficients. You can hard-code values such as
array beta[0:&nCont] _temporary_ (-4 2 -1.33 1 -0.8 0.67 -0.57 0.5 -0.44 0.4 -0.36);
or you can use a formula such as the following */
array beta[0:&nCont] _temporary_;
do j = 0 to &nCont;
beta[j] = 4 * (-1)**(j+1) / (j+1);       /* formula for beta[j] */
end;

do i = 1 to &N;              /* for each observation in the sample */
do j = 1 to &nCont;
x[j] = rand("Normal"); /* 2. Simulate explanatory variables  */
end;

eta = beta;                       /* model = intercept term  */
do j = 1 to &nCont;
eta = eta + beta[j] * x[j];       /*     + sum(beta[j]*x[j]) */
end;

/* 5. simulate response for each sample */
do SampleID = 1 to &NumSamples;      /* <== LOOP OVER SAMPLES   */
epsilon = rand("Normal", 0, 1.5); /* 3. Specify error distrib*/
Y = eta + epsilon;                /* 4. Y = model + error    */
output;
end;
end;
run;```

The best way to do the analysis of the simulated samples with the use of SAS is by making use of the BY-group processing. With the aid of this BY-group processing, we can assess all samples with a single call procedure. The below mentioned process will sort the data with the SampleID variable and call PROC REG and conduct analysis of all the samples. With NOPRINT option, one can make sure the process does not spew out several thousand of graphs and tables. With the option OUTEST= option you can save the estimates of the parameters for every samples to a SAS data set.

```proc sort data=SimReg;
by SampleID i;
run;

proc reg data=SimReg outest=PE NOPRINT;
by SampleID;
model y = x:;
quit;```

With the PE data set that includes the NumSamples, row. With each row having the p parameter will help to estimate for the assessment of one simulated sample. With the distribution of the estimate being an approximation which is closest to the theoretical sample distribution of statistics.

The below mentioned image visualization offers an exhibit of the joint distribution of the estimates for 4 coefficients of regression. As one can see that, the distribution of the estimates will appear to be centred normal and multivariate as the population parameter values.

With a proper SAS predictive modeling training one can learn to simulate the data, analyze it and produce insightful graphs. This is the most efficient and also highly in demand program, so hurry to take up a SAS training courses in Noida from industry leaders.

This post originally appeared onblogs.sas.com/content/iml/2017/02/01/simulate-samples-linear-regression.html