APICS - The Performance Advantage
April 1998 • Volume 8 • Number 4


Forecasting Transportation Expense:
A Shipper's Perspective


Regression analysis provides an understanding into the relationship between actions and outcomes, and how it all affects transportation costs.

By Jeff Zoroya

A major retailer is scheduling delivery to hundreds of stores. Should the deliveries be made in the morning, the afternoon or the evening? What is the cost associated with each course of action? A major consumer products manufacturer is locating a new distribution center to support its growing business in the Southeast. One alternative location is Jacksonville, another is Orlando. Although the costs appear equal, are they? What are the transportation costs associated with the two locations?

A common obstacle shippers face to controlling transportation expense is not fully understanding its underlying cost drivers. By setting service requirements and locating facilities, shippers have enormous influence over their transportation expense. However, they are often unable to quantify that influence. What follows is a way shippers can measure the cost drivers of their transportation expense — using their own data, an Excel spreadsheet and a basic understanding of regression analysis.


Why use regression?
Regression analysis helps explain relationships between actions and outcomes. Since shippers typically buy transportation in units of lanes, they can lose sight of the underlying cost drivers that influence the price of each lane. Regression puts a price tag on those cost drivers and helps develop a framework for measuring and managing transportation and distribution network activities. In our example we will consider a set of European truckload rates and define the geographic cost drivers that can help us manage these activities.


What should we examine?
Before we begin, we should think about what influences the price of a lane. Much of a lane rate is the price a carrier charges for the time he anticipates dedicating his assets to the shipper. Generally, this time can be broken into three components: 1) the time required to get a driver, tractor and trailer to the origin, 2) the time the shipper actually uses the driver, tractor and trailer, and 3) the time the carrier needs to reposition the driver, tractor and trailer to the next customer. So the factors that we will examine in our regressions are the setup cost, the cost of getting from the origin to the destination, and the cost — or likelihood — of the carrier finding follow-on business near our destination. Although we could test the influence of these factors in one analysis, for the purposes of our example, we will introduce one factor at a time so we more readily see the impact of each on our truckload rates.


Necessary Data and Origin-to-Destination Cost
The first basic rule of regression is that our data set should contain at least 30 representative sample observations. It's usually a good idea to start with more than 30 observations in case we find that there are some outliers we want to exclude from our analysis. The second basic rule is that each observation must represent a unique event. Duplicating observations in an effort to create

30 observations will not increase the accuracy of our analysis, but simply turn it into a fancy weighted average. Fortunately for us, our data set meets both of these criteria, we have 43 distinct lanes that represent truckload movement throughout Europe.

In our first pass, we want to determine what influence the distance from the origin to the destination has on our truckload rates. Following the steps outlined in the help section of Excel, and as shown in Figure 1, we: 1) set the Y range equal to the price of our lanes, 2) set the X range equal to our lane distances, 3) set our confidence level at 95 percent, 4) indicate that our model includes column labels (Cost and Km), 5) force the intercept (constant) to zero, and 6) ask that our residuals be displayed in the output. Remember, we are forcing the intercept to zero, which means the regression will estimate the cost of a lane based on its distance only — in effect, a cost per kilometer.

Before continuing with this example, it's important to consider what to look for in the output. First, the "Adjusted R Squared" measures the cost variability, by lane, that our regression can predict. The more variability we can predict, the better. Second, the "t statistic" measures how important each of our inputs (known as independent variables) are in predicting our lane costs. As long as an independent variable's "t statistic" falls outside a certain range of values, we know it contributes to the accuracy of our forecast. This range changes slightly with the size of the problem, but generally with 30 or more degrees of freedom, a "t statistic" of greater than 1.7 or less then -1.7 means that particular independent variable contributes to the accuracy of our model (degrees of freedom are the number of observations less the number of independent variables in the regression, and are indicated in the output as "Residual"). If an independent variable does not contribute to our regression's accuracy, we may want to exclude it.

Looking at our output (Figure 2), we notice several things. First, our model accounts for 59 percent of the variability in our lane rates. Second, our independent variable — lane distance — contributes to the accuracy of our model with a "t statistic" of 18.8. Finally, our model estimates our cost per kilometer to be $1.40.

Regression Statistics
Multiple R 79%
R Square 62%
Adjusted R Square 59%
Standard Error $ 559
Observations 43

ANOVA
  df SS MS F Significance F
Regression 1 21280755.78 21280755.78 68.13217227 2.97406E-10
Residual 42 13118497.66 312345.1824    
Total 43 34399253.44      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
  0 # N/A # N/A # N/A # N/A # N/A
Cost per Km $ 1.40 0.074774064 18.75501618 5.00773E-22 1.251488567 1.553289005

Figure 2

Setup Cost and Effect of Origin-to-Destination Cost
Now that we have established that distance influences our lane rates, we want to measure the (setup) cost to the carrier of getting the driver, tractor and trailer to our origin location. We can begin to measure this by letting the regression determine the intercept point.

After running this second regression (Figure 3), we see that the accuracy of our model has improved. We can now account for 64 percent of the variability in our lane rates. Both of our independent variables are significant in influencing our model. The "t statistic" of the intercept (setup cost) is 1.8, and the "t statistic" of the cost per kilometer is 8.7. The cost of getting the carrier's equipment to our origin is $287 per move. Our cost per kilometer, from origin to destination, is $1.19. It's not unusual to see the values of the independent variables change as you increase the level of detail in your model. As you can see, the cost per kilometer has already decreased from $1.40 to $1.19 because part of the original cost includes the setup cost of $287.

Regression Statistics
Multiple R 80%
R Square 65%
Adjusted R Square 64%
Standard Error $ 544
Observations 43

ANOVA
  df SS MS F Significance F
Regression 1 22282287.95 22282287.95 75.39625379 7.8015E-11
Residual 41 12116965.5 295535.7438    
Total 42 34399253.44      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
  $ 287 156.0573248 1.840888327 0.072882253 -27.88022684 602.448442
Cost per Km $ 1.19 0.136915295 8.683101623 7.8015E-11 0.912343221 1.465355621

Figure 3

In our next pass, we want to measure the cost of traversing water along our lanes. We do this by creating a dummy variable, in the form of a column (labeled water) in which we will enter a value of either "0" or "1" for each lane. If a lane contains any movement via water, we would indicate this by placing a "1" in the appropriate cell; if not, we would place a "0" in the cell. This water column will now be included in the "X" range of our regression analysis. Additionally, we will make our first exclusion by removing the Ferrania-to-Birmingham lane. This observation appears to be an outlier with some property that we are unable to identify.

After running this regression (Figure 4) we notice that the "t statistic" of our dummy variable "water" indicates that it is not significant. However, our regression can now account for 67 percent of the variability in our lane rates. Although we may end up discarding "water" later, we will retain it for now because it did improve our ability to forecast our lane rates. Now that we have isolated the additional cost of traversing water to $218, our cost per kilometer has decreased again to $1.14. Additionally, we begin to see a pattern in the lanes that are the more difficult to predict (Figure 5). Three of these lanes all have destinations in the former East Bloc. We appear to be under-forecasting the cost of shipping to these destinations.

Regression Statistics
Multiple R 83%
R Square 68%
Adjusted R Square 67%
Standard Error $ 526
Observations 42

ANOVA
  df SS MS F Significance F
Regression 2 23410784.65 11705392.32 42.29121181 1.70846E-10
Residual 39 10794448.33 276780.7264    
Total 41 34205232.98      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Setup cost $ 289 151.1408548 1.90947735 0.063576709 -17.11089526 594.3109729
Water $ 218 198.2621265 1.100359543 0.277921142 -182.8629859 619.1822317
Cost per Km $ 1.14 0.150613244 7.576201485 3.51265E-09 0.836432542 1.445720025

Figure 4

Lane Predicted Actual Error
Breda Budapest $ 1,223 $ 3,143 39%
Breda Warsaw $ 1,683 $ 3,429 49%
Breda Bedford $ 879 $ 1,394 63%
Breda Basle $ 1,006 $ 1,286 78%
Breda Birmingham $ 1,209 $ 1,493 81%
Breda Moscow $ 3,068 $ 3,771 81%
Breda Malmo $ 1,573 $ 1,783 88%
Aycliff Ferrania $ 2,334 $ 2,640 88%
Breda Lyon $ 1,185 $ 1,314 90%
Breda Coppenhagen $ 1,441 $ 1,571 92%
Farrania Barcelona $ 1,160 $ 1,257 92%
Aycliff Barcelona $ 2,539 $ 2,714 94%
Aycliff Breda $ 1,412 $ 1,486 95%
Aycliff Madrid $ 2,830 $ 2,914 97%
Breda Gothenburg $ 1,733 $ 1,783 97%
Breda Linz $ 1,329 $ 1,314 101%
Breda Stockholm $ 2,104 $ 2,057 102%
Farrania Basle $ 840 $ 811 104%
Aycliff Vienna $ 2,561 $ 2,457 104%
Breda Athens $ 2,922 $ 2,800 104%
Breda Vienna $ 1,519 $ 1,429 106%
Breda Brussels $ 405 $ 377 107%
Breda Paris $ 735 $ 663 111%
Aycliff Paris $ 1,432 $ 1,257 114%
Farrania Madrid $ 1,788 $ 1,503 119%
Farrania Vienna $ 1,408 $ 1,171 120%
Breda Ferrania $ 1,610 $ 1,320 122%
Farrania Lyon $ 750 $ 600 125%
Farrania Paris $ 1,225 $ 943 130%
Farrania Marsailles $ 645 $ 491 131%
Breda Dusseldorf $ 496 $ 377 132%
Lyon Geneva $ 445 $ 320 139%
Madrid Paris $ 1,696 $ 1,200 141%
Farrania Athens $ 2,922 $ 2,057 142%
Lyon Madrid $ 1,595 $ 1,114 143%
Breda Barcelona $ 1,872 $ 1,257 149%
Breda Rotterdam $ 346 $ 229 151%
Lyon Brussels $ 1,081 $ 714 151%
Breda Madrid $ 2,133 $ 1,400 152%
Lyon Paris $ 796 $ 491 162%
Farrania Genoa $ 357 $ 183 195%
Farrania Turin $ 432 $ 206 210%

Figure 5

Next we will measure the impact of shipping to eastern Europe by again creating a dummy variable column. If the destination of a lane lies within eastern Europe, we will indicate this by placing a "1" in the appropriate cell. If not, we will place a "0" in the cell.

As you can see in the regression output (Figure 6), the accuracy of our model has increased. We can now account for 92 percent of the variability of the price of a lane. Additionally, the "t statistics" of our independent variables (setup cost, water, eastern Europe, and cost per kilometer) indicate that they all contribute to the accuracy of our model. Finally, we can now see that there is a cost penalty of $1,899 for shipping to eastern Europe. The reason for this is probably due to the number of days the carrier spends clearing the border and the inability of the carrier to find a back haul out of eastern Europe.

Regression Statistics
Multiple R 96%
R Square 93%
Adjusted R Square 92%
Standard Error $ 256
Observations 42

ANOVA
  df SS MS F Significance F
Regression 3 31707809.09 10569269.7 160.8186139 1.22812E-21
Residual 38 2497423.89 65721.68132    
Total 41 34205232.98      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Setup costs $ 352 73.86469374 4.764697657 2.76118E-05 202.4116731 501.4741934
Water $ 622 103.0804391 6.034240662 5.10919E-07 413.3367307 830.6876241
Eastern Europe $ 1,899 169.0336231 11.23587379 1.21524E-13 1557.049763 2241.431147
Cost per Km $ 0.80 0.079362738 10.10239047 2.57031E-12 0.6410919 0.962414839

Figure 6

In our last regression we will look at the impact of an improved back haul opportunity on our lane cost. We can create an additional dummy variable ("Low Cost Zone") and include the areas where we appear to be over-forecasting the cost of a lane. An examination of our last output shows that we are over-forecasting cost when the destination is Paris, northern Italy, the Low Countries/Ruhr industrial region. In all three regions, carriers probably have an improved chance of finding follow-on business at the conclusion of our move. So we should reduce our lane rates to reflect those opportunities.

Once we run this regression (Figure 7), we can see that our accuracy has increased slightly to 93 percent, and that all the independent variables, including our newly created low cost zone, are still significant. Additionally, we can see that there is an incentive of $262 to ship into these three "Low Cost Zone" destinations. Now, using the coefficients from our model, we can make lane-to-lane cost comparisons by multiplying the same distance by our forecasted cost drivers.

Regression Statistics
Multiple R 97%
R Square 94%
Adjusted R Square 93%
Standard Error $ 240
Observations 42

ANOVA
  df SS MS F Significance F
Regression 4 32081513.3 8020378.326 139.7331305 8.59517E-22
Residual 37 2123719.673 57397.82899    
Total 41 34205232.98      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Setup Costs $ 499 89.95916731 5.548470984 2.56728E-06 316.8614206 681.4102
Low Cost Zone $ (262) 102.6457502 -2.551621581 0.014987325 -469.8929541 53.93326885
Water $ 569 98.57739959 5.768357448 1.29314E-06 368.8930879 768.3662663
Eastern Europe $ 1,854 158.9709248 11.66079012 5.93296E-14 1531.621213 2175.831964
Cost per Km $ 0.73 0.078827291 9.306663964 3.13318E-11 0.573900001 0.893338215

Figure 7

Additional opportunities
Although our example is a simple one, the same technique can be used to gain insight into the cost drivers of your transportation activity. In addition to quantifying geographic and market cost drivers, we have used the same process to measure other cost drivers, such as the cost of a live unload versus a drop trailer, and the cost of a morning delivery versus an afternoon delivery. Finally, we have used this process to identify distribution cost drivers in support of a larger distribution network analysis. In each case we were able to quantify our underlying cost drivers and increase the accuracy and power of our decision support tools.


Jeff Zoroya is a product manager with InterTrans, an i2 company.

For more information about this article, input the number 3
in the appropriate place on the April Reader Service Form

Copyright © 2020 by APICS — The Educational Society for Resource Management. All rights reserved.

Web Site © Copyright 2020 by Lionheart Publishing, Inc.
All rights reserved.


Lionheart Publishing, Inc.
2555 Cumberland Parkway, Suite 299, Atlanta, GA 30339 USA
Phone: +44 23 8110 3411 |
E-mail:
Web: www.lionheartpub.com


Web Design by Premier Web Designs
E-mail: [email protected]