Ahh, the wonder of statistics; a great tool they can be
while remaining a complete mystery to many. Now take basic statistical methods
and couple them with the power of Excel and what do you get? An interesting
simulation! Okay, some might really question my motives for completing this;
however they are actually quite simple, I am seeking to improve my modeling and
simulation skills within Excel and what a better way than to practice.
I began the process by determine what I wanted to simulate,
in this case it was the expected return of five assets over a thirty year
holding period. Why five assets over thirty years? In all honesty, they were arbitrary;
essentially they served as a simple basis for the creation of my model. In
addition, the inputs that found in the download file have also been arbitrarily
chosen for illustrative purposes only.
After getting my basic idea down, I outlined how the
simulation would work. I wanted to use expected return and risk as the metrics
for each individual investment. Expected return would be measured as a percent
change on an annual basis. Risk would be measured as the standard deviation of
the expected return. In addition to these two important asset specific
measures, each asset would be assigned a weight based on their allocation in
the portfolio as a whole. Lastly, to help better represent the expected return
on a monetary basis, a starting portfolio balance would be necessary. While a
monetary measure was not essential, I felt it would allow a better “feel” for
the expected returns as opposed to solely a percentage change being displayed.
The next step would be to create a random expected return
for a single year, then repeat this over thirty years and compound the growth.
In Excel I used the “=NORM.INV” function with requires the following syntax (probability,mean,standard_dev).
The probability portion corresponds to the location on a normal distribution.
The mean in this case would be the represented by the expected annual return
for each respective asset. Standard_dev would be the assigned risk measure of
standard deviation of return for each respective investment. By using the “=RAND()”
functions, I was able to generate random numbers between 0 and 1 which
represent probabilities. Now would be a great time for a quick Statistics-101
review.
A normal distribution is a commonly occurring continuous probability
distribution in which data is equally distributed above and below the mean in a
bell curve pattern. The central limit theorem states that under mild conditions
the mean of randomly selected independent variables will be approximately
normally distributed. For this model we assume the returns of each asset will
be normally distributed. This essentially means that 99% of all returns will
fall within plus or minus three standard deviations from the mean or expected
return. This theory is widely used in practice with what is referred to as a
normal distribution table. This table shows the relationship between the number
of standard deviations above or below the mean and the cumulative probability
of this occurring. For example, there is a 1% chance of experiencing a return
that is three standard deviations below the mean. Since cumulative probability
is used, a 99% probability would represent three standard deviations above the mean;
this reflects that 99% of all occurrences fall below this level.
To help better explain this, let us walk through an example.
We will use “asset X” which has an expected return of 10% and a standard deviation
of returns of 3%. Following the central limits theorem, 99% of all returns for
this asset will fall between plus or minus three standard deviation or in this
case 9%. Thus, it can be said with 99% confidence that the expected return will
be between 1% and 19% (10-9=1, 10+9=19). Now if our random number generator in
Excel returns .24 we will look this number up on a cumulative normal
distribution table and find the corresponding standard deviations to be negative
0.0948. Next, we will take this number and multiply it by the standard deviation
of 3% to get negative 0.284%. Lastly, we will add negative 0.284% to the
expected return of 10% to get a random return of 9.7156%.
Clearly, this would be a dunging task if done with manual
calculations, luckily we have Excel. This simulation completed the aforementioned
process of determining a random return for each asset each year over thirty
years. For each year, the respective return on each individual asset is
weighted by its allocation within the portfolio to determine the total expected
annual return based on the random returns. Finally, the ending value of the
portfolio is determined based on the starting balance multiplied by one plus
the annual return. This is completed over thirty years to determine the ending
portfolio balance after thirty years of random returns calculated from the
expected return of each asset and the respective standard deviations of return.
This process of 150 random returns based on expected return
and risk for each respective asset yielded a possible outcome for the
portfolio. While this would give us a single possible estimate, I wanted to
allow for a deeper analysis of total possible outcomes. To do this, I used Excel’s
Data Table function to create 1,000 replications of this process. This causes
Excel to create the 150 random variables needed to calculate a possible outcome
for a single set of thirty years and record the ending value as one data point
then complete the process again to get a second data point, so on and so forth.
Clearly, this would not be feasible to complete this process 1,000 times
without the use of Excel; a total of 150,000 calculations were made.
After completing the simulation of 1,000 possible ending
balances for the portfolio, I then calculated the mean value of the possible ending
balanced of the portfolio, this served as the expected ending balance; the
standard deviation was also measured off this sample. The compound annual
growth rate (CAGR) was calculated by taking the expected ending portfolio
balance and dividing by the beginning balance to determine the delta then
taking the 30th root (for the number of years of returns being
calculated).
Lastly, I wanted to also include
a visual illustration of the return distributions using a histogram. I chose to
use 201 bins with the expected return being the 101st bin. By using
201 bins, I was able to have 100 bins above and below the expected return. To
determine width of each bin, I took the calculated standard deviation of the
returns and multiplied it by two then divided it by 100. This allows the
histogram to display return that fall within plus or minus two standard
deviations of the mean; according to the empirical rule of normal distribution,
two standard deviations will represent approximately 95% of all data points.
-----------------
While this serves as a simple explanation of the process
used to complete this simulation, I intend on creating a second post analyzing
the results. I have included a link to download the simulation below. Enjoy.