Motivation

(2-3 minutes read)

The COCO framework has been motivated by (a) the highly repetitive nature of benchmarking and (b) realizing that a proper and robust benchmarking methodology is more intricate than we had hoped for.

When designing test suites, our motivation and objective was to

While our test suites are motivated by real world difficulties, we do not claim that the distribution of functions in these suites truly reflect any distribution in the real world.1 We do not even claim to know any such real world distribution of any broader class of functions.

Generally, a decent light-weight (but somewhat fragile) benchmarking can be done2 without dealing with the most intricate aspects of COCO, which are

Compared to a one-off benchmarking set up, the COCO framework provides methodological and exploitative robustness.

Additionally, COCO allows for the direct comparison with a variety of previous results, which has become one of the most appealing features to us.

Footnotes

  1. The ultimately relevant question is not how well a benchmark function (or a suite) approximates real world problems, but how much the performance on a benchmark function (or a suite) is positively correlated with the performance on the real world problem(s) of interest. (Or in other words, the distance measure is similarity in algorithm performance.) The latter question generally depends on the considered algorithms as well. Algorithm invariance properties and other measures against overfitting are likely to increase such a correlation. Possibly due to the systematic instance creation, we have seen surprisingly little overfitting to the COCO test suites over more than a decade.↩︎

  2. Given a suite of test functions, we can simply display convergence plots of single runs (and their median). However, avoiding any and all possible pitfalls from a fresh start remains always a challenge. There are many small but potentially significant decisions to be made in the process.↩︎