Is the No Free Lunch Theorem (NFL) relevant in practice?

The BBOBies, November 2025 (3 minutes read with footnotes)

The short answer is “no”!

The No Free Lunch Theorem (NFL) states that aggregated over all functions, all search algorithms perform identical, given they do not revisit solutions (Wolpert and Macready 2002). This is true if (and only if) the set of functions is closed under permutation (Schumacher et al. 2001).

Trying to differentiate algorithms by means of benchmarking can only be meaningful if NFL does not hold on the set (or weighted set)1 of functions of interest.

Let us consider the set of all (possible) functions R^n\to R discretized with two bytes per real number2 and call this set F_n. The No Free Lunch Theorem holds for this set of functions because the set is closed under permutation. The number of functions in F_n is3

|F_n| = \left(2^{16}\right)^{\left(2^{16}\right)^n} = 2^{16\times 2^{16n}} > 10^{4.8 \times 10^{4.8n}} > 10^{5\times 50000^{n}}

Let us put the size of F_n into perspective. The number of electrons in the universe multiplied by the number of Planck times passed since the beginning of time is about 10^{141}.4 This number is minuscule compared to |F_n|>10^{5\times 50000^{n}}. Hence, the number of real world functions we could possibly ever encounter (and hence could possibly be relevant to us) is a minuscule subclass of all problems and hence NFL (in general) does not hold.5

Could NFL hold for any practically relevant set of continuous (discretized) functions? We don’t know for certain,6 but it looks exceedingly unlikely to us.

References

Igel, C., and Toussaint, M. (2005), “A no-free-lunch theorem for non-uniform distributions of target functions,” Journal of Mathematical Modelling and Algorithms, Springer, 3, 313–322.
Schumacher, C., Vose, M. D., and Whitley, L. D. (2001), “The No Free Lunch and problem description length,” in Proceedings of the genetic and evolutionary computation conference (GECCO), pp. 565–570.
Wolpert, D. H., and Macready, W. G. (2002), “No free lunch theorems for optimization,” IEEE transactions on evolutionary computation, IEEE, 1, 67–82.

Footnotes

  1. For the time being, we assume, somewhat unrealistically, equal frequency and/or equal importance weight for all functions.↩︎

  2. As an example, this gives roughly 5 digits of precision for representing numbers from the interval [0,1]. In some practical situations we may require more precision or a wider range. Hence, the following size derivation is on the conservative side.↩︎

  3. The number of functions equals the number of different function values to the power of the number of different solutions. The smallest nontrivial subsets of F_n in which NFL holds are (2^{16})^n \approx 10^{4.8n} needles-in-the-flat functions. NFL never holds on any single function which has at least two different f-values.↩︎

  4. The number of electrons (or atoms) in the universe is about 10^{80}. The age of the universe is about 10^{18} seconds. The Planck time is about 10^{-43} seconds. The number of all electrons in the universe multiplied by the number of Planck times since the beginning of the universe is hence about 10^{80} \times 10^{61} = 10^{141}.↩︎

  5. In general, any function subset is highly unlikely to obey NFL and any nontrivial neighbourhood is not invariant under permutation (Igel and Toussaint 2005). Furthermore, we would expect different functions to have different importance weights (probabilities).↩︎

  6. Given any observable set of nontrivial “continuous” real world problems, it seems hard to conceive how we could determine that NFL holds for this set. The existence of algorithms that do perform differently seems however easy to establish if we could sample (uniformly) from the problem class.↩︎