Instructor Micky Midha

Updated On - Apply the bootstrap historical simulation approach to estimate coherent risk measures.
- Describe historical simulation using non-parametric density estimation.
- Compare and contrast the age-weighted, the volatility-weighted, the correlation-weighted, and the filtered historical simulation approaches.
- Identify advantages and disadvantages of non-parametric estimation methods.

- Video Lecture
- |
- PDFs
- |
- List of chapters

- Introduction
- Compiling Historical Simulation Data
- Basic Historical Simulation
- Compiling Historical Simulation Data
- Hs Using Non-Parametric Density Estimation
- Equal weighted Historical Simulation – Issues
- Age Weighted Historical Simulation
- Volatility Weighted Historical Simulation
- Correlation Weighted Historical Simulation
- Filtered Historical Simulation(FHS)
- Advantages of Non-Parametric Methods
- Disadvantages of Non-Parametric Methods

- The essence of non-parametric approaches is that we try to let the P/L data speak for themselves as much as possible, and use the recent empirical (or in some cases simulated) distribution of P/L – not some assumed theoretical distribution – to estimate our risk measures. All non-parametric approaches are based on the underlying assumption that the near future will be sufficiently like the recent past that we can use the data from the recent past to forecast risks over the near future – and this assumption may or may not be valid in any given context. In deciding whether to use any non-parametric approach, we must make a judgment about the extent to which data from the recent past are likely to give us a good guide about the risks we face over the horizon period we are concerned with.
- The most popular non-parametric approach-historical simulation (HS). Loosely speaking, HS is a histogram-based approach: it is conceptually simple, easy to implement, very widely used, and has a fairly good historical record. Refinements can be made to basic HS using bootstrap and kernel methods.

- The first task is to assemble a suitable P/L series for our portfolio, and this requires a set of historical P/L or return observations on the positions in our current portfolio. These P/Ls or returns will be measured over a particular frequency (e.g., a day), and we want a reasonably large set of historical P/L or return observations over the recent past.
- The fact that multiple positions collapse into one single HS P/L as given by this equation implies that it is very easy for nonparametric methods to accommodate high dimensions—unlike the case for some parametric methods. With non-parametric methods, there are no problems dealing with variance-covariance matrices, curses of dimensionality, and the like. This means that non-parametric methods will often be the most natural choice for high-dimension problems.

- Having obtained our historical simulation P/L data, we can estimate VaR by plotting the P/L (or L/P) on a simple histogram and then reading off the VaR from the histogram. To illustrate, suppose we have 1000 observations in our HS P/L series and we plot the L/P histogram shown in this figure. If these were daily data, this sample size would be equivalent to four years’ daily data at 250 trading days to a year. If we take our confidence level to be 95%, our VaR is given by the x-value that cuts off the upper 5% of very high losses from the rest of the distribution. Given 1000 observations, we can take this value (i.e., our VaR) to be the 51st highest loss value. The ES is then the average of the 50 highest losses.
- One simple but powerful improvement over basic HS is to estimate VaR and ES from bootstrapped data. A bootstrap procedure involves resampling from our existing data set with replacement. The bootstrap is very intuitive and easy to apply. A bootstrapped estimate will often be more accurate than a ‘raw’ sample estimate, and bootstraps are also useful for gauging the precision of our estimates. To apply the bootstrap, we create a large number of new samples, each observation of which is obtained by drawing at random from our original sample and replacing the observation after it has been drawn. Each new ‘resampled’ sample gives us a new VaR estimate, and we can take our ‘best’ estimate to be the mean of these resample-based estimates. The same approach can also be used to produce resample-based ES estimates—each one of which would be the average of the losses in each resample exceeding the resample VaR—and our ‘best’ ES estimate would be the mean of these estimates.
- It has been observed that these bootstrapped estimates are much closer to the known true values than our earlier basic HS estimates suggests that bootstraps estimates might be more accurate.
- HS does not make the best use of the information we have. It also has the practical drawback that it only allows us to estimate VaRs at discrete confidence intervals determined by the size of our data set. For example, if we have 100 HS P/L observations, basic HS allows us to estimate VaR at the 95% confidence level, but not the VaR at the 95.1% confidence level. The VaR at the 95% confidence level is given by the sixth largest loss, but the VaR at the 95.1% confidence level is a problem because there is no corresponding loss observation to go with it. We know that it should be greater than the sixth largest loss (or the 95% VaR), and smaller than the fifth largest loss (or the 96% VaR), but with only 100 observations there is no observation that corresponds to any VaR whose confidence level involves a fraction of 1%. With n observations, basic HS only allows us to estimate the VaRs associated with, at best, n different confidence levels.
- Non-parametric density estimation offers a potential solution to both these problems. The idea is to treat our data as if they were drawings from some unspecified or unknown empirical distribution function.
- HS does not make the best use of the information we have. It also has the practical drawback that it only allows us to estimate VaRs at discrete confidence intervals determined by the size of our data set. For example, if we have 100 HS P/L observations, basic HS allows us to estimate VaR at the 95% confidence level, but not the VaR at the 95.1% confidence level. The VaR at the 95% confidence level is given by the sixth largest loss, but the VaR at the 95.1% confidence level is a problem because there is no corresponding loss observation to go with it. We know that it should be greater than the sixth largest loss (or the 95% VaR), and smaller than the fifth largest loss (or the 96% VaR), but with only 100 observations there is no observation that corresponds to any VaR whose confidence level involves a fraction of 1%. With n observations, basic HS only allows us to estimate the VaRs associated with, at best, n different confidence levels.
- Non-parametric density estimation offers a potential solution to both these problems. The idea is to treat our data as if they were drawings from some unspecified or unknown empirical distribution function. This approach also encourages us to confront potentially important decisions about the width of bins and where bins should be centered, and these decisions can sometimes make a difference to our results. Besides using a histogram, we can also represent our data using naive estimators or kernels, which are generally superior.
- Besides using a histogram, we can also represent our data using naive estimators or kernels, which are generally superior.
- Non-parametric density estimation also allows us to estimate VaRs and ESs for any confidence levels we like and so avoid constraints imposed by the size of our data set. In effect, it enables us to draw lines through points on or near the edges of the ‘bars’ of a histogram. We can then treat the areas under these lines as a surrogate pdf, and so proceed to estimate VaRs for arbitrary confidence levels.
- A simple way to do this is to draw in straight lines connecting the mid-points at the top of each histogram bar, as illustrated in the figure. Once we draw these lines, we can forget about the histogram bars and treat the area under the lines as if it were a pdf. Treating the area under the lines as a pdf then enables us to estimate VaRs at any confidence level, regardless of the size of our data set. Each possible confidence level would correspond to its own tail and we can then use a suitable calculation method to estimate the VaR. Of course, drawing straight lines through the mid-points of the tops of histogram bars is not the best we can do: we could draw smooth curves that meet up nicely, and so on.
- Notice that by connecting the midpoints, the upper portion gains some area and the lower portion loses an equal amount of area. So overall, no area is lost, only displaced.
- Some empirical evidence by Butler and Schachter (1998) using real trading portfolios suggests that kernel-type methods produce VaR estimates that are a little different to those we would obtain under basic HS. However, their work also suggests that the different types of kernel methods produce quite similar VaR estimates. However, their work also suggests that the different types of kernel methods produce quite similar VaR estimates, although to the extent that there are differences among them, they also found that the ‘best’ kernels were the adaptive Epanechinikov and adaptive Gaussian ones.
- Although kernel methods are better in theory, they do not necessarily produce much better estimates in practice. There are also practical reasons why we might prefer simpler non-parametric density estimation methods over kernel ones. Although the kernel methods are theoretically better, crude methods like drawing straight-line ‘curves’ through the tops of histograms are more transparent and easier to check.
- One of the most important features of traditional HS is the way it weights past observations. In the basic HS approach, observations up to a specified cutoff period are used and all have equal weights. So if the total number of observations is n, our HS P/L series is constructed in a way that gives any observation the same weight (=1/n) on P/L which is less than n periods old, and no weight (i.e., a zero weight) if it is older than that.
- This weighting structure has a number of problems. One problem is that it is hard to justify giving each observation in our sample period the same weight, regardless of age, market volatility, or anything else. A good example of the difficulties this can create is given by Shimko et. al. (1998). It is well known that natural gas prices are usually more volatile in the winter than in the summer, so a raw HS approach that incorporates both summer and winter observations will tend to average the summer and winter observations together. As a result, treating all observations as having equal weight will tend to underestimate true risks in the winter, and overestimate them in the summer.
- The equal-weight approach can also make risk estimates unresponsive to major events. For instance, a stock market crash might have no effect on VaRs except at a very high confidence level, so we could have a situation where everyone might agree that risk had suddenly increased, and yet that increase in risk would be missed by most HS VaR estimates. The increase in risk would only show up later in VaR estimates if the stock market continued to fall in subsequent days. The increase in risk would show up in ES estimates just after the first shock occurred—which is, incidentally, a good example of how ES can be a more informative risk measure than the VaR.
- The equal-weight structure also presumes that each observation in the sample period is equally likely and independent of the others over time. However, this ‘iid’ assumption is unrealistic because it is well known that volatilities vary over time, and that periods of high and low volatility tend to be clustered together.
- It is also hard to justify why an observation should have a weight that suddenly goes to zero when it reaches age n. Why is it that an observation of age n – 1 is regarded as having a lot of value (and, indeed, the same value as any more recent observation), but an observation of age n is regarded as having no value at all? Even old observations usually have some information content, and giving them zero value tends to violate the old statistical adage that we should never throw information away.
- This weighting structure also creates the potential for ghost effects – we can have a VaR that is unduly high (or low) because of a small cluster of high loss observations, or even just a single high loss, and the measured VaR will continue to be high (or low) until n days or so have passed and the observation has fallen out of the sample period. At that point, the VaR will fall again, but the fall in VaR is only a ghost effect created by the weighting structure and the length of sample period used.
- To address these issues, we can use ‘weighted historical simulation’ which can be regarded as semi-parametric methods because they combine features of both parametric and non-parametric methods.
- This approach is to weight the relative importance of our observations by their age, as suggested by Boudoukh, Richardson and Whitelaw (BRW: 1998). Instead of treating each observation for asset i as having the same implied probability as any other
, we could weight their probabilities to discount the older observations in favor of newer ones. Thus, if w(1) is the probability weight given to an observation 1 day old, then w(2), the probability given to an observation 2 days old, could be λw(1); w(3) could be λw(2)=λ^2 w(1); and so on. The λ term is between 0 and 1, and reflects the exponential rate of decay in the weight or value given to an observation as it ages: a λ close to 1 indicates a slow rate of decay, and a λ far away from 1 indicates a high rate of decay.**(i.e., weight = 1/n)** - This age-weighted approach has four major attractions.
- First, it provides a nice generalization of traditional HS, because we can regard traditional HS as a special case with zero decay, or λ -> 1. If HS is like driving along a road looking only at the rear-view mirror, then traditional equal-weighted HS is only safe if the road is straight, and the age-weighted approach is safe if the road bends gently.
- Second, a suitable choice of λ can make the VaR (or ES) estimates more responsive to large loss observations: a large loss event will receive a higher weight than under traditional HS, and the resulting next-day VaR would be higher than it would otherwise have been. This not only means that age-weighted VaR estimates are more responsive to large losses, but also makes them better at handling clusters of large losses.
- Third, age-weighting helps to reduce distortions caused by events that are unlikely to recur, and helps to reduce ghost effects. As an observation ages, its probability weight gradually falls and its influence diminishes gradually over time. Furthermore, when it finally falls out of the sample period, its weight will fall from to zero, instead of from 1/n to zero. Since λ^n w(1) is less than 1/n for any reasonable values of X and n, then the shock – the ghost effect – will be less than it would be under equal-weighted HS.
- Finally, we can also modify age-weighting in a way that makes our risk estimates more efficient and effectively eliminates any remaining ghost effects. Since age-weighting allows the impact of past extreme events to decline as past events recede in time, it gives us the option of letting our sample size grow over time. (Why can’t we do this under equal-weighted HS? Because we would be stuck with ancient observations whose information content was assumed never to date.) Age-weighting allows us to let our sample period grow with each new observation, so we never throw potentially valuable information away. This would improve efficiency and eliminate ghost effects, because there would no longer be any ‘jumps’ in our sample resulting from old observations being thrown away.
- We can also weight our data by volatility. The basic idea suggested by Hull and White (HW; 1998b) – is to update return information to take account of recent changes in volatility. For example, if the current volatility in a market is 1.5% a day, and it was only 1% a day a month ago, then data a month old understate the changes we can expect to see tomorrow, and this suggests that historical returns would underestimate tomorrow’s risks; on the other hand, if last month’s volatility was 2 % a day, month-old data will overstate the changes we can expect tomorrow, and historical returns would overestimate tomorrow’s risks. We therefore adjust the historical returns to reflect how volatility tomorrow is believed to have changed from its past values.
- Actual returns in any period t are therefore increased (or decreased), depending on whether the current forecast of volatility is greater (or less than) the estimated volatility for period t. The HS P/L is now calculated using the adjusted returns and then HS VaRs or ESs are estimated in the traditional way (i.e., with equal weights, etc.)
- This HW approach has a number of advantages relative to the traditional equal-weighted and/or the BRW age-weighted approaches:
- It takes account of volatility changes in a natural and direct way, whereas equal-weighted HS ignores volatility changes and the age-weighted approach treats volatility changes in a rather arbitrary and restrictive way.
- It produces risk estimates that are appropriately sensitive to current volatility estimates, and so enables us to incorporate information from GARCH forecasts into HS VaR and ES estimation.
- It allows us to obtain VaR and ES estimates that can exceed the maximum loss in our historical data set: in periods of high volatility, historical returns are scaled upwards, and the HS P/L series used in the HW procedure will have values that exceed actual historical losses. This is a major advantage over traditional HS, which prevents the VaR or ES from being any bigger than the losses in our historical data set.
- Empirical evidence presented by HW indicates that their approach produces superior VaR estimates to the BRW one.
- We can also adjust our historical returns to reflect changes between historical and current correlations. The historic returns are multiplied by the revised correlation matrix to yield updated correlation-adjusted returns. The returns adjusted in this way will then have the currently prevailing correlation matrix C and, more generally, the currently prevailing covariance matrix. This approach is a major generalization of the HW approach, because it gives us a weighting system that takes account of correlations as well as volatilities.
- Another promising approach is filtered historical simulation (FHS). This is a form of semi-parametric bootstrap which aims to combine the benefits of HS with the power and flexibility of conditional volatility models such as GARCH. It does so by bootstrapping returns within a conditional volatility (e.g., GARCH) framework, where the bootstrap preserves the non-parametric nature of HS, and the volatility model gives us a sophisticated treatment of volatility.
- In order to estimate the VaR of a single-asset portfolio over a 1-day holding period
- The first step in FHS is to fit, say, a GARCH model to our portfolio- return data. We want a model that is rich enough to accommodate the key features of our data, and Barone-Adesi and colleagues recommend an asymmetric GARCH, or AGARCH, model. This not only accommodates conditionally changing volatility, volatility clustering, and so on, but also allows positive and negative returns to have differential impacts on volatility, a phenomenon known as the leverage effect.
- The second step is to use the model to forecast volatility for each of the days in a sample period. These volatility forecasts are then divided into the realized returns to produce a set of standardized returns. These standardized returns should be independently and identically distributed (i.i.d), and therefore be suitable for HS.
- Assuming a 1-day VaR holding period, the third stage involves bootstrapping from our data set of standardized returns: we take a large number of drawings from this data set, which we now treat as a sample, replacing each one after it has been drawn, and multiply each random drawing by the AGARCH forecast of tomorrow’s volatility. If we take M drawings, we therefore get M simulated returns, each of which reflects current market conditions because it is scaled by today’s forecast of tomorrow’s volatility.
- Finally, each of these simulated returns gives us a possible end-of-tomorrow portfolio value, and a corresponding possible loss, and we take the VaR to be the loss corresponding to our chosen confidence level.
- If we have a multi-asset portfolio, a multivariate AGARCH would be used. The other extension is to a longer holding period.
- FHS has a number of attractions:
- It enables us to combine the non-parametric attractions of HS with a sophisticated (e.g., GARCH) treatment of volatility, and so take account of changing market volatility conditions
- It is fast, even for large portfolios
- As with the earlier HW approach, FHS allows us to get VaR and ES estimates that can exceed the maximum historical loss in our data set.
- It maintains the correlation structure in our return data without relying on knowledge of the variance-covariance matrix or the conditional distribution of asset returns
- It can be modified to take account of autocorrelation or past cross-correlations in asset returns
- It can be modified to produce estimates of VaR or ES confidence intervals by combining it with an OS or bootstrap approach to confidence interval estimation
- There is evidence that FHS works well
- Non-parametric approaches are intuitive and conceptually simple.
- Since they do not depend on parametric assumptions about P/L, they can accommodate fat tails, skewness, and any other non-normal features that can cause problems for parametric approaches.
- They can in theory accommodate any type of position, including derivatives positions.
- There is a widespread perception among risk practitioners that HS works quite well empirically, although formal empirical evidence on this issue is inevitably mixed.
- They are (in varying degrees, fairly) easy to implement on a spreadsheet.
- Non-parametric methods are free of the operational problems to which parametric methods are subject when applied to high-dimensional problems: no need for covariance matrices, no curses of dimensionality, etc.
- They use data that are (often) readily available, either from public sources (e.g., Bloomberg) or from in-house data sets (e.g., collected as a by-product of marking positions to market).
- They provide results that are easy to report and communicate to senior managers and interested outsiders (e.g., bank supervisors or rating agencies).
- It is easy to produce confidence intervals for nonparametric VaR and ES.
- Non-parametric approaches are capable of considerable refinement and potential improvement if we combine them with parametric ‘add-ons’ to make them semi-parametric: such refinements include age-weighting (as in BRW), volatility-weighting (as in HW and FHS), and correlation-weighting.
- If our data period was unusually quiet, non-parametric methods will often produce VaR or ES estimates that are too low for the risks we are actually facing; and if our data period was unusually volatile, they will often produce VaR or ES estimates that are too high.
- Non-parametric approaches can have difficulty handling shifts that take place during our sample period. For example, if there is a permanent change in exchange rate risk, it will usually take time for the FIS VaR or ES estimates to reflect the new exchange rate risk. Similarly, such approaches are sometimes slow to reflect major events, such as the increases in risk associated with sudden market turbulence.
- If our data set incorporates extreme losses that are unlikely to recur, these losses can dominate nonparametric risk estimates even though we don’t expect them to recur.
- Most (if not all) non-parametric methods are subject (to a greater or lesser extent) to the phenomenon of ghost or shadow effects.
- In general, non-parametric estimates of VaR or ES make no allowance for plausible events that might occur, but did not actually occur, in our sample period.
- Non-parametric estimates of VaR and ES are to a greater or lesser extent constrained by the largest loss in our historical data set. In the simpler versions of FIS, we cannot extrapolate from the largest historical loss to anything larger that might conceivably occur in the future. More sophisticated versions of FIS can relax this constraint, but even so, the fact remains that nonparametric estimates of VaR or ES are still constrained by the largest loss in a way that parametric estimates are not. This means that such methods are not well suited to handling extremes, particularly with small- or medium-sized samples.
- However, we can often deal with these problems by suitable refinements. For example, we can deal with volatility, market turbulence, correlation and other problems by semi-parametric adjustments, and we can deal with ghost effects by age-weighting our data and allowing our sample size to rise over time.