OR/MS Today - October 2001



2001 Statistical Analysis Software Survey


Looking for Meaning in an Uncertain World

2001 survey of statistical analysis software products

By James J. Swain


If money is the lifeblood of the economy, then raw data and information must be the nerve impulses and statistical decision-making a part of the central nervous system. The often-remarked combination of computers and communications not only makes the metaphorical comparison more apt, but also has contributed to the incredible growth in the magnitude and availability of data. While billions of dollars flow around the world daily, surely trillions of bits and bytes of data are being generated, transmitted and stored, available for interpretation and action. For instance, a large retailer such as Wal-Mart may have 20 million transactions per day, and AT&T will process several hundred million calls per day. Data now comes in such torrents that analysis is more of a bottleneck than data collection. The maintenance of data is also an important issue, giving rise to data warehousing products to manage and control data throughout the corporation.

The Internet is increasingly a source of data and not merely a conduit of data. Merchandisers not only sell to us through the Internet, but they can observe our shopping and purchasing habits. Corporations can receive sales data instantaneously from all outlets and use it to purchase new items as well as to schedule deliveries, and the data is available for analysis for everything from forecasting to observing the response to promotions. In the last decade the very magnitude of data collections have made it possible to hunt for relationships among combinations of variables that would not have been anticipated, and this data-mining is of interest in the medical, marketing and risk management fields, among others.

Statistical data collection and analysis is not limited to the corporate world. The federal government is an important source and user of statistics for everything from economic and labor statistics, environmental and health to agriculture, education and population matters. For all of their importance, these activities generally receive little media attention outside the scheduled announcements of sensitive economic indicators. The controversial, unadjusted census results are already figuring in the allocation of federal grants and are currently the basis for political reapportionment battles in many states. Here, too, there is an incredible amount of data that is available via computer download and CD-ROM.

Statistical data analysis is at the heart of a number of problems of particular interest to the ORMS professional because, these problems often cut across the length and breadth of the organization, affecting both its operations and management. In light of the terror attacks of last month, strategic risk assessment of data networks, the safety and redundancy of data, and methods of improvement will surely occupy practitioners from our field.

Who else but the ORMS professional is so uniquely positioned, not only to appreciate the stochastic and uncertain nature of the data, but also to take a systems approach that transcends departmental boundaries or to lead in team-based solutions? Statistical analysis of the raw data provides the empirical grounding of our analytical models, such as the stochastic and the simulation models, as well as the grist for the scheduling and optimization procedures. Data analysis is also needed in the validation of these models as well as in the continuing feedback monitoring of plans and products once implemented. Finally, the data will be collected using designed experiments to test relations that have been empirically observed in order to provide confirmation of the relationships and to estimate magnitudes.

As noted in the previous statistical software survey [1] in OR, the use of statistics by industry has been rapidly growing in the last two decades, influenced by Japanese industrial practices and by initiatives such as the Motorola "Six Sigma" program. Another survey article for statisticians [2] noted the penetration of statistical methods at many corporations is both broad and sustained, resulting in a more data-driven, quantitative approach to problem solving. For instance, manufacturing and service operations have increasingly distributed the effects of product and process improvements to the operating teams themselves, whether for quality improvement, cost reduction, or enhancements such as tighter specifications, improved reliability or durability. Statistical design of experiments is now increasingly prevalent at all levels of the organization and rarely relegated to statistical or other service groups, except where complexity indicates that special design expertise is required.

Echoing these observations, MacDonald [3] highlights the role that statistical analysis plays within the corporation, particularly with the increased emphasis on shortening the design cycle. That is, once consumer research identifies that a need for a process or a product exists, the technical cycle (design to manufacturing) must be made as short as possible in order to forestall competitors and to secure market share. He identifies the importance of information technology in linking these stages together, and looks at the varied roles that statistics will play. He identifies an increasing importance on visual analysis and interpretation, increased emphasis on computing skills "linked to common business systems (e.g., spreadsheets) and database issues (e.g., organization and management)."

Changes in Statistical Software


Statistical software is widely used in the ORMS profession and this survey of products is an update of the survey published in 1999. The biannual statistical software products surveyed in this issue provides capsule information about more than 60 products selected from almost 100 product submissions (the complete list is available via the OR/MS Today Web site www.orms-today.com). Standard statistical tools are available for the classical problem of small-sample inference, but tools are increasingly geared toward both data management across the entire organization. As in the previous surveys, product information was solicited from product vendors and is summarized in the following tables to highlight general features and capabilities and to provide contact information. Many of the vendors have extensive Web sites for further, detailed information, and many provide demo programs that can be downloaded from these sites. Because of space limitations, no attempt is made to evaluate or rank the products, and the information provided comes from the vendors themselves. Vendors that were unable to make the publishing deadline will be added to the online survey.

In the last several years the number of statistical add-ins available for use with spreadsheets has continued to grow. For routine procedures and tests of hypotheses, basic graphics and even regression modeling, as well as for most introductory statistics courses, a statistical add-in product for a spreadsheet is likely adequate. The functionality of products for use with spreadsheets is growing, particularly for risk analysis and Monte Carlo sampling, with products such as Palisade's @Risk and Decisioneering's Crystal Ball.

Spreadsheets lack the level of detail and flexibility of the best statistical programs and may have limited diagnostic options or adequate Help information about what the procedures are or what they signify. For many specialized techniques such as forecasting, design of experiments and so forth, a statistical package would be preferred. Moreover, new procedures are likely to become available first in the statistical software and only later be added to the add-in software. In general, statistical software plays a distinct role on the analyst's desktop and, provided that data can be freely exchanged among applications, each part of an analysis can be made with the most appropriate (or convenient) software tool.

One of the strongest impressions from the latest releases is the growth of visualization tools for examining multivariate data and data visualization in the temporal or spatial senses. In addition, there appears to be a general trend from the classical estimation and hypothesis testing to the tools of exploratory data analysis to aid in the search for relations, the investigation of anomalous cases and outliers, and the examination of these cases within the factor space. Almost all of the Web sites feature dramatic graphics that are available with the software. In 1999, I particularly enjoyed the dynamic linking feature that DataDesk demonstrated (www.datadesk.com), and this functionality is now available as ActivStats XL as an add-in to Excel. Several programs have methods of linking data between graphics, and I expect to see this feature to be even more pervasive in the future.

An important feature of statistical programs was the importation of data from as many sources as possible to eliminate the need for data entry when data is already available from another source. Most programs have the ability to read from spreadsheets and selected data storage formats. The program DBMS/COPY (www.conceptual.com) is still available for shifting data between a large number of file formats, and it includes improved data preview, editing and filtering capabilities in addition to import and export capabilities. Also highly visible in this survey is the growth of data warehousing and "data mining" capabilities, programs and training. Data mining tools attempt to integrate and analyze data from a variety of sources (and purposes) to look for relations that would not be possible from the individual data sets. Specialized methodologies for these problems are already appearing in the statistical literature.

A large number of the vendors now provide families of products or modules rather than a single, omnibus statistical package, with many of the modules for specialized business functions or fields. The SAS and STATISTICA (StatSoft) programs are clearly aimed at support for the entire corporation. These include specialized needs for sample surveys, quality control or process capability, medical data and toxicity, marketing, time series, data warehousing and data mining tools. Another approach is evident from the Statpoint offerings, which provide statistics via the Internet and Java Statbeans for corporate developers. Within the survey we observe several specialized products which are more narrowly focused on distribution fitting than general statistics, but of particular use to developers of stochastic models and simulations.

References

  1. Swain, J.J., "Desktop Statistics Software: Serious Tools for Decision-Making," OR/MS Today, 1999, Vol. 26, No. 5, pp.50-61.
  2. Hahn, G.J., W.J. Hill, R.W. Hoerl, and S.A. Zinkgraf, 1999, "The Impact of Six Sigma Improvement: A Glimpse into the Future of Statistics," The American Statistician, Vol. 53, No. 3, pp. 208-215.
  3. Macdonald, G.C., 1999, "Shaping Statistics for Success in the 21st Century: The Needs of Industry," The American Statistician, Vol. 53, No. 3, pp. 203-207.
Be sure to read the survey online.



James J. Swain is professor and chair, Department of Industrial and Systems and Engineering Management, University of Alabama in Huntsville. He is a member of INFORMS, IIE and ASA.





  • Table of Contents

  • OR/MS Today Home Page


    OR/MS Today copyright © 2001 by the Institute for Operations Research and the Management Sciences. All rights reserved.


    Lionheart Publishing, Inc.
    506 Roswell Street, Suite 220, Marietta, GA 30060, USA
    Phone: 770-431-0867 | Fax: 770-432-6969
    E-mail: lpi@lionhrtpub.com
    URL: http://www.lionhrtpub.com


    Web Site © Copyright 2001 by Lionheart Publishing, Inc. All rights reserved.