|
OR/MS Today - June 2008 Forecasting Software Survey Forecasting at Steady State? Perhaps, but ubiquitous software continues to play a crucial role in many aspects of life. By Jack Yurkiewicz Many professional and casual users do explanatory and time series forecasting in medicine, business and academia. The forecast "hits" and "misses," particularly the latter, sometimes make headlines. The large increase in college applications for the class of 2012, especially to prestigious universities, caught some provosts by surprise. Box office returns for "Juno" (on the high side) and "Leatherheads" and "Stop-Loss" (who under-performed) had repercussions in the movie industry. Doctors simultaneously promulgate and de-emphasize the charting of homocysteine levels to help predict the risk of coronary heart disease. Watching the approval numbers as a function of time for Clinton vs. Obama vs. McCain has become a mainstay of cable news networks. The trend curves for the price of oil, and finding explanatory models for said prices, are subjects of discussion in academia, on Wall Street and at many dinner tables. In preparing this, the seventh biennial edition of OR/MS Today's forecasting product survey, an analogous example arose. Has the study of and the software for forecasting reached steady state? In trying to answer the first part of that question, a look at Amazon.com shows only two forecasting texts with copyright dates after 2006, when OR/MS Today published its last forecasting product survey. BarnesandNoble.com shows four. As for the latter part of the question, more statistical software programs have indeed released updates since then, but not many claim that the enhancements involved forecasting capabilities or features, and few new dedicated forecasting programs have been identified. This survey tries to answer the latter part of the posed question. That is, has the development of forecasting software reached the plateau stage of a Gompertz curve? While the arguable answer seems affirmative, what is almost surely not debatable is that forecasting plays a crucial role in many aspects of life. Practitioners use forecasting software emanating from two camps. The first source is the stand-alone, dedicated forecast product, such as Forecast Pro, Autobox and others, which just does a variety of forecasting procedures. Typically, these include regression, Box-Jenkins models, exponential smoothing models by Brown, Holt, Winters, etc. The second is the general statistical software product, such as SPSS, Minitab, SAS, Systat, Statgraphics, NCSS and others, which include forecasting as part of the many statistical techniques available. There are two main reasons why a practitioner may want to buy and use a dedicated forecast program over a general statistics product. First, some dedicated forecast programs may have specific techniques that the general statistics program may not. These include state space smoothing algorithms, econometric models, transfer function models and others. The second is that some dedicated forecasting products offer a higher level of "automation," which translates into ease-of-use, than the general statistics program group. This degree of automation has benefits and caveats for users, and we will mention a few of these later. As in our previous surveys, we can delineate forecasting software into three categories. We call the first automatic forecasting software. Automatic software analyzes the data, makes a recommendation (accompanied by a statistical reason for why the product made the recommendation) of a forecasting procedure or model, optimizes the parameters of the model, and gives forecasts, plots and various statistical summary measures. The user can accept the recommendation or reject it. If the latter, then the user chooses an alternative model or technique, and the software optimizes the parameters, gives the resulting forecasts, plots, statistical summary measures, etc. For example, Forecast Pro, a dedicated forecasting product, typically starts in its default "Expert Selection" mode. Figure 1 shows the program's analysis of an airline's enplanement data.
With this high ease-of-use level come potential pitfalls. The casual or untrained user may come to rely on the software as a forecasting black box. Thus, from Figure 1, if the inexperienced user does not know what the Box-Jenkins ARIMA(0,1,1)*(0,1,1) model is, what are its assumptions, etc., but naively takes the forecasts because the software recommended this procedure as appropriate, then he or she may be inviting criticism. Automatic forecasting is more likely found in the dedicated products. However, a few general statistical products (e.g., SPSS with its Trends add-on, Statgraphics) do forecasting in the automatic mode. The second software category is called semiautomatic. Here the user must specify the model or technique, and the software will find the optimal parameters for that model and display the resulting forecasts and ancillary output. Some general statistics programs (e.g., NCSS) and most dedicated forecasting products are semiautomatic. Thus, a user must have knowledge of the various forecasting procedures if he or she is to use semiautomatic software. The calculations to find the parameters of the designated model (e.g., the three smoothing constants for Winters' method) which minimizes some statistical measure, such as Schwarz' Bayesian Information Criterion (BIC), should be left to the software. The last group is manual software. Here the user must specify both the model and the parameters for that model. Many general statistics programs fall into this category. Clearly, the major drawback in using manual software is determining the optimal parameters for the model chosen. For example, my students were working with a time series of monthly (Oct. 1997 through Sept. 2007) airline enplanement data from a particular carrier. Figure 2 shows the Excel time plot of this data.
The data exhibited monthly seasonality and a small upward trend, and my students recommended Winters' method to make the forecasts. With manual software, zeroing in on the appropriate smoothing constants while minimizing the Akaike Information Criterion (AIC) became a tedious trial and error process. Figure 3 shows Systat's Dialog Box for Winters' method, indicative of what is seen in manual software.
As we mentioned in previous surveys, the current versions of many products differ in their flexibility. Some products allow the user to withhold a portion of the data for the model fit and do a validation for the remainder. Experienced forecasting users may want the software to perform statistical tests on the within-sample errors and show various statistics for the out-of-sample errors. Certain products allow the user to specify which statistical parameter (mean square error, AIC, BIC, etc.) should be minimized to find the optimal parameters of the model, some don't give the user the choice, and a few do not even explicitly tell the user (without resorting to a search in the documentation) which statistic is minimized. Software differs when comparing the output. Ideally, the experienced user would like to know the various summary statistics of the model (MSE, RMSE, MPE, MAPE, AIC, BIC, Ljung-Box statistic, etc.). Some products give these and others, while some give many fewer. All products give time plots of the data, with or without the forecasts. However, if the data set is large, the resulting graph may have a confusing "squished" look to the data and the software may not give the user the flexibility of adjusting the view. Figure 4 shows Minitab's default output of the enplanement data, which can, with some effort, be modified to make the display easier to interpret.
In informal testing, I have found that different programs routinely give different forecasts for the same data, even when using the same model and the same parameters. Using a sample of a few software programs I own, I specified Winters' method as the forecasting technique on the enplanement data. If a product was semiautomatic (Forecast Pro, NCSS, Statgraphics), I let it find the optimal smoothing constants, and if a program was manual (Minitab, Systat), I specified the smoothing constants. Where possible, I indicated that RMSE should be minimized. I also used an Excel template I wrote that does Winters' method and uses Solver to find the smoothing parameters to minimize RMSE. Table 1 summarizes the monthly forecasts for the subsequent year from the various products. (Click here to open Table 1 in a separate window.) The Auto column shows the forecasts found when the software automatically found the three optimal smoothing parameters, while Manual gives the forecasts using the three parameters I specifically indicated (which came from my template and Solver's answer). The table also gives the RMSE for the different models. While the forecast differences were, for most products, not substantial, they were indeed different. In addition, this was well-behaved data, following almost a "textbook" pattern. On messier data, the comparative results differed more dramatically. The reasons can vary, from which statistic the software is trying to minimize, to how the software gets the initial conditions for the recursive procedure (i.e., the initial estimates for the intercept and slope of the trend line and the twelve seasonal indices for Winters' method). Unfortunately, very few forecasting products address these initial conditions in their documentation, and the user has no idea what the assumptions are. I show this comparison not to denigrate or laud any product, but to point out that forecasts may be a function of the software, and the user should be aware of this. If you are interested in getting a forecasting program, or want to try something different, I recommend that you first look at the techniques that the software can do. Next, determine the level of automation of the product. There are other issues, such as its flexibility, the quality and quantity of the output, the ease-of-learning and the ease-of-use of the software. These are much harder to judge. The best way is to try the software, but this may be difficult. Check if the vendor has a trial version to download; unfortunately, most do not. Sometimes vendors make a "student" version available, at a greatly reduced price, for academics to try. Finally, contact the vendor with your specific questions. Users tell us that vendors are helpful and want to satisfy them with their selection. The Forecasting Software Survey Jack Yurkiewicz (yurk@optonline.net) is a professor of management science in the MBA program at the Lubin School of Business, Pace University, New York. Besides management science, he teaches business statistics, operations management and forecasting. His current interests include developing distance-learning courses for these topics and assessing their effectiveness. OR/MS Today copyright © 2008 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2008 by Lionheart Publishing, Inc. All rights reserved. |