2cee, A Twenty First Century Effort Estimation Methodology
Hihn, J1; Lum, K1; Baker, D2; Menzies, T2
1JPL/Caltech; 2West Virginia University
There exists an extensive academic literature on software cost estimation that explores techniques such as boot strapping, assorted analogy methods such as nearest neighbor, and even highly non-linear 'models' such as decision trees. However, industry "best practice" virtually ignores the academic literature and continues to rely upon standard regression-based algorithms and most often local calibration. Local calibration only calibrates or tunes the main intercept and slope in a log-linear regression. Over the past three years our research has been investigating the behavior and performance of these various models and calibration/tuning techniques using machine learning methods. A summary of our preliminary findings was presented in 2006 at the 28th Annual Conference of the International Society of Parametric Analysts. While all of the analysis has been performed on software project data the results should easily extend to systems and size estimation models.
Our work, referenced below, cautions that current approaches to model specification and calibration can often produce sub-optimal models which is a significant contributor to the cost growth exhibited by most software projects. Our research indicates that building optimal models requires (a) exploring a wider range of models while (b) pruning irrelevant variables and (c) selecting models via rejection based on non-parametric statistical tests.
This paper will provide an overview of the 2cee methodology, the systemic cost estimation issues that have been identified, and a description of the best performing tuning techniques. While we have found that COCOMO is a very robust model, our results also indicate that local calibration using boot strapping over standard regression, combined with variable reduction (column pruning) and stratification (row pruning using nearest neighbor) is in the vast majority of experiments the most efficient and effective tuning method.
2cee has been encoded in a Windows based tool that can be used to both generate an estimate and allow the model developer to calibrate and develop models using these techniques. It is available for free by registering with JPL.
Jalali, O., Menzies, T., Baker, D. and Hihn, J. Column Pruning Beats Stratification in Effort Estimation. Proceedings of the International Workshop on Predictor Models in Software Engineering (PROMISE 2007), Minneapolis, MN, 20 May 2007. Menzies, T. Chen Z, Hihn, J. Lum, K., Best Practices in Software Effort Estimation, IEEE Transaction in Software Engineering, November, 2007. Menzies, T. and Hihn, J, Evidence-Based Cost Estimation for Better Quality Software, IEEE Software, July 2006 Lum, K. and Hihn, J. Studies in Software Cost Model Behavior: Do We Really Understand Cost Model Performance? , Proceedings of the 28th Annual Conference of the International Society of Parametric Analysts (ISPA), 24-26 May, 2006, Seattle, WA. Menzies, T, Port D, Chen Z, Hihn Specialization and Extrapolation of Software Cost Models, Automation in Software Engineering, San Diego, October 2005 Menzies, T, Port D, Chen Z, Hihn, J, Stukes, S., Validation Methods for Calibrating Software Effort Models, Proceedings of the Twenty Seventh International Conference on Software Engineering (ICSE’05), St. Louis, MO, June, 2005 Menzies, T, Port D, Hihn, J., Chen Z, Simple Software Cost Analysis: Safe or Unsafe?, Proceedings of the International Workshop on Predictor Models in Software Engineering (PROMISE 2005), St Louis, MS, 14 May 2005. Menzies, T, Port D, Hihn, J., Chen Z, Simple Software Cost Analysis: Safe or Unsafe?, Proceedings of the International Workshop on Predictor Models in Software Engineering (PROMISE 2005), St Louis, MS, 14 June 2005. Finding the Right Data for Software Cost Modeling; Zhihao Chen and Tim Menzies and Dan Port and Barry Boehm; IEEE Software; Nov, 2005;