Real vs. Hypothetical

It is elsewhere stated on this site that the generally excellent performance that is exhibited by the Traded Portfolio’s main program is “hypothetical.” That is the traditional label for results that were never realized with real money, and it’s not an utterly bad choice of words. The meaning may seem to be very clear. Certainly it means that the charts aren’t a history of any actual financial performance of the program that is currently offered here, as the program was only recently created whereas the charts go back much farther in time.

    But anyone who does analyses of the Traded Portfolio kind and distributes the findings and has to label them hypothetical has to worry that some may suppose that the meaning of that simple characterization is that the shown good outcomes are the result of a good measure of wishful thinking, or even that there may have been some fudging of the program’s specifications so as to maximize the returns over just the shown history, that consequently it would be merely fortuitous if in the real-world future any such happy outcomes were to actually be brought about by the programs. Otherwise, why would there be the need to provide something that certainly sounds like a disclaimer? Or so the reasoning may well go.

    Nothing in this note is to suggest that there is anything wrong with the old saw “past performance is no guarantee of future results”, especially if that refers to financial matters. But beyond that there are two things about past performance that are worth getting straight. One is whether or not past performance, in the particular way that it is wielded by the analyst, can at least be considered to have provided an unbiased estimate of future performance. The other is the matter of how often would the program have failed in the past. That is, we might find that the program would have finished with a splendid cumulative return over, say, 50 prior years… if only we had the program back then (we didn’t). But if in spite of that splendid finish there were a couple of really bad decades, that would be of interest to us. Even though we can’t assume that the program would perform badly in two out of the next five decades, we’re nonetheless thereby informed about the risk. That is, in a nutshell, how and why testing a program with historical data can be highly beneficial if done properly, even though through it all we realize that future performance may deviate substantially from any unbiased estimate of performance that is based on historical data.

    Imposition of the “hypothetical” label may connote that “real-money” results are necessarily superior, that investors would be better off throwing their lot in with a good real-money performer. But is it as simple as that? If you’re an investor or advisor, exactly how then do you go about making your choices based on real-money results?

Stop!

Analyzing real-money performers is all that is ever done on this site: Choices are made based on real-money results, albeit with original computer code and established principles of statistical inference. The only other difference is that whereas an investor may try to choose a security or an investment advisor based on past real money performance (if the advisor is prepared to part with such information), the market-risk-avoiding allocation scheme that is demonstrated here in these pages chooses instead among ETFs or other funds based on their past performance, which is of course also real-money performance, which can in some cases be pursuant to the active management of folks that you could certainly call investment advisors.

The Shown Results Aren’t Maximized

Some of the tests to which the Traded Portfolio subjects the rules by which equity is to be allocated are discussed here. The asset allocation rule specification that is used on any given day is not determined by any use of data of that day or of subsequent days; it’s always a specification that has been determined using prior data, trailing data, with respect to which the price changes of the given day are “out-of-sample.”

    Out-of-sample testing is the main Traded Portfolio method for supplementing academic research and further qualifying a prospective rule specification. David R. Aronson states some of the advantages of out-of-sample testing in his book Evidence-Based Technical Analysis.

Out-of-sample testing is based on the valid notion that the performance of a data-mined rule, in out-of-sample data, provides an unbiased estimate of the rule’s future performance. Of course, due to sampling variation, the rule may perform differently in the future than it does in its out-of-sample test, but there is no reason to presume it will do worse.

    To reiterate, the out-of-sample data are called that because they play no role in specifying, in hypothesis-testing parlance, the “data-mined” rule— the specification of the rule is in no way fudged so as to maximize the shown returns, all of which are out-of-sample returns.

It’s More Involved Than You May Have Thought

Let’s agree that the subject is active managers, not passive. True, the charted examples on this site presently just happen to be about funds that are essentially passively managed, with holdings rotated in and out only due to factors such as value and size. But the Traded Portfolio project could just as well extend its scope to cover actively managed holdings, to anything Berkshire-Hathaway-like or even CTA-like in its management and the methods of the project would be the same.

    So what is your real-money method of finding the best active manager? Let’s say, just for the sake of argument, that the thought might be that there is somewhere out there a seasoned pro possessed of sound judgment who keeps a steady oar in the water at all times, even when the seas are rough, who can be relied upon to triumph in the long run, or… maybe not. Maybe it would be best to believe that hot hands come and go like basketball stars on scoring streaks that suddenly start and suddenly end, and so we should make sure that we are able to find out who has the hot hands.

    The first thought might be to select the one that performed best in recent years. But that could be, say, the last three, five or ten years. So right away, we have three different rules for selecting a performer. Or we could say that it’s one rule with three values of the lookback period, the number of trailing years that will be considered. And we focus on this choice of lookback period because the longest lookback period might be the one that finds us the seasoned pro with the oar in the water, if any such persons are actually reliably good performers, whereas the shortest might allow us identify the best hot hands prospect and then drop him quickly in favor of another when he conks out.

    And of course it matters which lookback period we pick because we will rank past-performers radically differently with each period and the different rankings can’t serve equally well to determine the best performer in future years, not if past performance matters. So how do we know which lookback period is best? Yes, it sounds insane, but it’s inescapable reality— we just wanted to find the best performer but now we have to first find the best lookback period.

    We could form trailing data windows and use them to annually select the best real-money performer and walk the windows forward through, say, the last 20 years. For example, with a trailing data window that is 10 years long we could 10 years ago find the best performer during the data window when the data window was positioned over years 11–20 ago, and record that best performer’s returns for the subsequent year 10 years ago. And then we would move the 10-year window forward by a year and find the best performer again and the returns of that best performer (which might well be a different best performer) in the year 9 years ago… all the way up to the present at which time we could then total up the best-performer returns that we got using the 10-year data window. And so we would also do that with the 5-year and 3-year data windows, recording the best-performer returns for those starting 10 years ago just as with the 10-year data window.

    And so you might think, well… that was quite a bit of work but it’s finished. We have what we need to know. It’s best to use a lookback period of 5 years (if the 5-year lookback period selected performers that produced the highest returns). Sadly, that’s not enough! What has just been outlined isn’t really true out-of-sample testing even though the year for which the best performer’s returns were recorded was always the year after the end of the trailing data window that was used to select the best performer. You could indeed say that it involved out-of-sample testing of the best performer, but what it isn’t is out-of-sample testing of the thus-determined best lookback period. You have “cherry-picked” what was the best lookback period, but what are the odds that it will perform as well in the future? There are reasonable fixes for that, but they are a bit complicated for this note.

Complications

    Hardly anyone can keep a job in finance for decades, and it’s not as though financial institutions of any kind can stay the same for lengthy periods of time either. We are likely, particularly if we are investigating an actively managed fund, to run into the problem of there not being enough relevant data— not with a chosen range of lookback periods that extends to ten years which is longer than many such funds have been in existence under current management. With such a data insufficiency we would find it difficult to assuredly refute a null hypothesis such as the claim that the past records of the real-money performers don’t matter when it comes to predicting their future performance.

    The data insufficiency problem is greatly eased if far shorter lookbacks periods are considered. The Traded Portfolio’s current program is much more involved than the find-the-best-lookback-period example here. But it too involves lookback periods, looking back up to about a year. That shorter time scale renders management of asset allocations quite active, allowing timely responses— tests show that the dot-com collapse, the Lehman Brothers-subprime crisis, the 2011 16% drop in the S&P500 and the Covid-19 dip would have been mainly avoided without reductions in returns over the long haul.

Academic Efforts

A number of academicians have taken up this subject of whether or not it pays to cast your lot with the best past-performers. Currently much of their work can be found by searching Google Scholar using the string “past performance repeat persist fund.”

    You’ll find variations among the studies. One says that there is persistence but it’s mainly due to managers who correctly select industry groups. Another cautions that what persistence there is is just due to some funds charging high fees (yielding persistently worse performance). Others refer to persistence only being present in the negative sense of bad-performing funds alone continuing to perform as in the past. And there is a paper that says that hedge funds had persistent performance only after bear market periods. So in all, the picture is not simple. You have your work cut out for you if you are to make effective use of real-money performance histories of any kind.