# Big-O Notation

## Remarks

**Definition**

The Big-O notation is at its heart a mathematical notation, used to compare the rate of convergence of functions. Let `n -> f(n)`

and `n -> g(n)`

be functions defined over the natural numbers. Then we say that `f = O(g)`

if and only if `f(n)/g(n)`

is bounded when n approaches infinity. In other words, `f = O(g)`

if and only if there exists a constant A, such that for all n, `f(n)/g(n) <= A`

.

Actually the scope of the Big-O notation is a bit wider in mathematics but for simplicity I have narrowed it to what is used in algorithm complexity analysis : functions defined on the naturals, that have non-zero values, and the case of n growing to infinity.

**What does it mean ?**

Let’s take the case of `f(n) = 100n^2 + 10n + 1`

and `g(n) = n^2`

. It is quite clear that both of these functions tend to infinity as n tends to infinity. But sometimes knowing the limit is not enough, and we also want to know the *speed* at which the functions approach their limit. Notions like Big-O help compare and classify functions by their speed of convergence.

Let’s find out if `f = O(g)`

by applying the definition. We have `f(n)/g(n) = 100 + 10/n + 1/n^2`

. Since `10/n`

is 10 when n is 1 and is decreasing, and since `1/n^2`

is 1 when n is 1 and is also decreasing, we have ̀`f(n)/g(n) <= 100 + 10 + 1 = 111`

. The definition is satisfied because we have found a bound of `f(n)/g(n)`

(111) and so `f = O(g)`

(we say that f is a Big-O of `n^2`

).

This means that f tends to infinity at approximately the same speed as g. Now this may seem like a strange thing to say, because what we have found is that f is at most 111 times bigger than g, or in other words when g grows by 1, f grows by at most 111. It may seem that growing 111 times faster is not “approximately the same speed”. And indeed the Big-O notation is not a very precise way to classify function convergence speed, which is why in mathematics we use the equivalence relationship when we want a precise estimation of speed. But for the purposes of separating algorithms in large speed classes, Big-O is enough. We don’t need to separate functions that grow a fixed number of times faster than each other, but only functions that grow *infinitely* faster than each other. For instance if we take `h(n) = n^2*log(n)`

, we see that `h(n)/g(n) = log(n)`

which tends to infinity with n so h is *not* O(n^2), because h grows *infinitely* faster than n^2.

Now I need to make a side note : you might have noticed that if `f = O(g)`

and `g = O(h)`

, then `f = O(h)`

. For instance in our case, we have `f = O(n^3)`

, and `f = O(n^4)`

… In algorithm complexity analysis, we frequently say `f = O(g)`

to mean that `f = O(g)`

*and* `g = O(f)`

, which can be understood as “g is the smallest Big-O for f”. In mathematics we say that such functions are Big-Thetas of each other.

**How is it used ?**

When comparing algorithm performance, we are interested in the number of operations that an algorithm performs. This is called *time complexity*. In this model, we consider that each basic operation (addition, multiplication, comparison, assignment, etc.) takes a fixed amount of time, and we count the number of such operations. We can usually express this number as a function of the size of the input, which we call n. And sadly, this number usually grows to infinity with n (if it doesn’t, we say that the algorithm is O(1)). We separate our algorithms in big speed classes defined by Big-O : when we speak about a “O(n^2) algorithm”, we mean that the number of operations it performs, expressed as a function of n, is a O(n^2). This says that our algorithm is approximately as fast as an algorithm that would do a number of operations equal to the square of the size of its input, *or faster*. The “or faster” part is there because I used Big-O instead of Big-Theta, but usually people will say Big-O to mean Big-Theta.

When counting operations, we usually consider the worst case: for instance if we have a loop that can run at most n times and that contains 5 operations, the number of operations we count is 5n. It is also possible to consider the average case complexity.

Quick note : a fast algorithm is one that performs few operations, so if the number of operations grows to infinity *faster*, then the algorithm is *slower*: O(n) is better than O(n^2).

We are also sometimes interested in the *space complexity* of our algorithm. For this we consider the number of bytes in memory occupied by the algorithm as a function of the size of the input, and use Big-O the same way.