Probability Definition (Kolmogorov)

The Trouble with the Definition of Probability

Until the 20th century, all definitions of probability had some flaws.

classical (Laplace’s) – logical error inside, works only for finite numbers of possible outcomes
geometric – essentially the same logical flaw, limits things to finiteness in a certain sense, various strange paradoxes
frequentist (already 1931, time flies…) – enormous problems in practical application

The correct and commonly used definition today was given in 1933 by the Soviet mathematician Andrey Kolmogorov, and this Lecture will be devoted to it.

Kolmogorov’s definition requires – unfortunately for students – a fair amount of concentration and cannot be explained “in one sentence.” Very few people here choose the path of UNDERSTANDING; most choose the easier route of MEMORIZING, or even worse – IGNORING (to put it mildly).

Which is a pity, because first of all, knowing WHAT you are actually doing is a genuinely great feeling, and secondly, this definition is a treasure and brings incredible benefits. It turns out that the world of probability connects with set theory, and even with mathematical analysis.

That means once you accept this definition, when solving probability problems (and as you can guess, there are plenty of them in the “real” world), you can use the full heavy artillery of other branches of mathematics – set operations, and even derivatives and integrals!

Kolmogorov’s Definition of Probability

You will find this definition in every probability textbook, on Wikipedia, and basically everywhere.

Probability is a function satisfying certain conditions (so-called axioms – what conditions exactly, see below), assigning numbers to certain sets (these sets must also satisfy certain conditions – again, see below) of elementary events.

In short: probability is a function assigning numbers to sets of elementary events.

Now I will slowly explain all the elements used in this definition, starting completely from scratch. If you look closely, I actually have three elements to discuss:

what “elementary events” (Ω) are
what conditions the sets of these events ( $F$ ) must satisfy
what conditions the function (P) must satisfy

Elementary events form the set Ω, the function (which I denote by P) is defined on sets belonging to a family of subsets of Ω (I denote this family by $F$ ). Both the family of subsets and the function must satisfy certain conditions. The whole structure forms the so-called “probability triple” (Ω, $F$ ,P).

I’ll go step by step and please, don’t stress about it – it’s really simple 🙂

1. Elementary events Ω

Elementary events are the simplest (i.e., no longer decomposable) possible outcomes of a random experiment.

For example, if the experiment consists of tossing a coin, the elementary events may be: HEADS and TAILS (assuming it cannot land on its edge). The event HEADS cannot be further “decomposed” into simpler events – and that’s the point.

If the experiment consists of rolling a die, then an elementary event could be: TWO PIPS APPEAR. However, EVEN NUMBER OF PIPS APPEARS is not an elementary event – because it can be decomposed into several other events (TWO PIPS, FOUR PIPS, or SIX PIPS).

And that’s it.

Notice that this concept is somewhat “fluid” (someone might argue: “why couldn’t it land on its edge?”) and it is NOT a strict mathematical definition.

And that’s exactly how it should be, because the notions “elementary event” and “random event” are in probability theory primitive notions. Primitive notions in mathematics are objects or elements that are not defined – because they are assumed to be so obvious.

For example, in geometry a primitive notion is a point. A point has no definition. Don’t get me wrong – I’m not saying you cannot describe what a point is. But whatever we say about a point will only be a verbal description, not a strict mathematical definition.

Of course, treating something as a primitive notion is somewhat conventional and can lead to paradoxes. In that case it often becomes necessary to “refine” the notion and introduce a strict definition that avoids those paradoxes. However, that definition will itself rely on other primitive notions – because that’s simply how definitions work.

Why must it be that way? Well, that’s a topic for a completely different story. Let’s return to probability.

So I have elementary events – the possible, simplest and indivisible outcomes of a random experiment. All of them together form some set (for example {1 PIP, 2 PIPS, 3 PIPS, 4 PIPS, 5 PIPS, 6 PIPS}).

I will denote this set by the Greek letter Ω.

This set is the first element of Kolmogorov’s definition of probability.

The second element of the definition probably causes students the most trouble. But it is not difficult either.

2. A σ-algebra of subsets on Ω

A “family of subsets on a set” consists of some of its subsets.

Example

Set A = $\left\{ 1,2,3,4,5,6 \right\}$

An example family of subsets on this set is: $F=\left\{ \left\{ 1 \right\},\left\{ 2 \right\},\left\{ 1,2 \right\},\left\{ 3,4 \right\},\left\{ 1,2,3,4,5,6 \right\} \right\}$ (the subset consisting of element 1, the subset consisting of element 2, the subset consisting of elements 1 and 2, the subset consisting of elements 3 and 4, the subset equal to the whole set A).

We can see that a set may have quite a few different “families of subsets.”

In the definition of probability, the set whose family of subsets we choose is the set of all elementary events Ω (see the first part of the definition). However, this family of subsets cannot be completely arbitrary.

It must satisfy certain conditions that are fulfilled by special families called σ-algebras.

For a set Ω, a family of its subsets $F$ is called a σ-algebra if:

$\varnothing \in F$ – the empty set belongs to it

$A\in F\Rightarrow {A}'\in F$ – the complement of every set in the family also belongs to the family (the complement of A consists of all elements of Ω that do not belong to A)

${{A}_{1}},{{A}_{2}},\ldots \in F\Rightarrow \bigcup\limits_{n=1}^{\infty }{{{A}_{n}}}\in F$ – countable unions of sets belonging to the family also belong to the family

Example

Let us take the set $\Omega =\left\{ a,b,c \right\}$

Consider the following family of its subsets: $F=\left\{ \varnothing ,\left\{ a \right\},\left\{ b \right\},\left\{ c \right\},\left\{ a,b \right\},\left\{ a,b,c \right\} \right\}$ . The empty set belongs to it (condition 1 satisfied). However, the complement of the set $\left\{ a \right\}$ , that is the set of all elements that do NOT belong to $\left\{ a \right\}$ and belong to Ω, is the set $\left\{ b,c \right\}$ . This set does not belong to the family of subsets $F$ .
Therefore, this family of subsets is not a σ-algebra (moreover, the union of the sets $\left\{ b \right\}$ and $\left\{ c \right\}$ does not belong to the subfamily either – so the third condition is also not satisfied).

Now consider the family of its subsets $F=\left\{ \varnothing ,\left\{ a \right\},\left\{ b \right\},\left\{ c \right\},\left\{ a,b \right\},\left\{ b,c \right\},\left\{ a,c \right\},\left\{ a,b,c \right\} \right\}$ . The empty set belongs to it, as do the complements of all sets and the unions of all sets with one another. This family of subsets is a σ-algebra.

The definition of a σ-algebra of subsets of the set Ω looks very heavy and mathematical.

However, if you think about it, it is easy to translate it into “everyday language.”

We are already very close to defining probability as a certain function P. This function will assign numbers — representing their “probabilities” — to subsets from this σ-algebra.

It would therefore be somewhat strange if we:

could not determine the probability of an impossible event (that is, the probability of the empty set – condition 1)
could determine the probability that something happens, but could not determine the probability that it does not happen (condition 2)
could determine the probabilities of several events, but could not determine the probability that at least one of them happens (condition 3)

All the conditions that the family of subsets of Ω must satisfy in the definition of probability are therefore truly necessary and well justified.

I will denote our family of subsets by $F$ .

Remark 1

To define the family $F$ , that is, the second element of the definition, I need the first element — the set of all elementary events Ω.

$F$ consists of various subsets of Ω.

Remark 2

$F$ consists of SETS, not elements. That means one of its elements is, for example, $\left\{ a \right\}$ , not $a$ .

This is actually a huge advantage, because from this point on it allows us to operate on objects that are very strictly defined mathematically — namely sets. Sets can be added, subtracted, intersected, and at this point mathematics “enters the game,” which would be impossible if we limited ourselves only to events, that is, elements of Ω (have you ever tried adding heads to tails? What result did you get? 🙂).

With the third element of the “probability triple,” there are usually no major difficulties.

3. The Function $P:F\to R$

So here we simply have a function that assigns numbers to subsets from $F$ (the second element of the “probability triple”). These numbers (that is, the values of the function P) are commonly called “probabilities.”

Of course, this function must also satisfy certain conditions (have you ever seen the probability of something equal to −7?). These are also called axioms (because they are accepted without proof):

The function $P:F\to R$ is a probability function if:

$P\left( A \right)\ge 0$ for every $A\in F$

P(Ω)=1

${{A}_{1}}\cup {{A}_{2}}\cup {{A}_{3}}\cup \ldots =P\left( {{A}_{1}} \right)+P\left( {{A}_{2}} \right)+P\left( {{A}_{3}} \right)+\ldots$ — provided that the sets ${{A}_{1}},{{A}_{2}},{{A}_{3}},\ldots$ are pairwise disjoint (that is, they have no common elements, ${{A}_{i}}\cap {{A}_{j}}=\varnothing$ for $i\ne j$ )

So the numbers assigned by the function must be non-negative (there are no negative probabilities), the probability that some event from all possible ones occurs equals 1 (a certain event), and probabilities can be added if the events/sets are disjoint.

That’s All About Probability

The values of the function P can be called probabilities. The function P assigns them to sets from the family $F$ . Sets from the family $F$ are subsets of elementary events from the set Ω.

Altogether, this forms the probability triple: (Ω, $F$ ,P). Defining probability while omitting any element of the triple is unfortunately impossible (although apparently people have tried).

You cannot describe what P is without saying what $F$ is, and that is impossible without defining Ω.

Some high school definitions that present P as a function $P:\Omega \to R$ are not correct, or refer to older definitions of probability (those with logical flaws).

Let us now see our definition in action, using a concrete example.

Example

Suppose our random experiment consists of rolling a six-sided die. By “1” I mean rolling a one, and so on.

Our sample space of elementary events is: $\Omega =\left\{ 1,2,3,4,5,6 \right\}$

Our σ-algebra $F$ is the family of all possible subsets of Ω

The function P defined on $F$ takes the value 0 for $\varnothing$ , the value $\frac{1}{6}$ for every one-element set, $\frac{2}{6}$ for every two-element set, … and finally $\frac{6}{6}=1$ for the six-element set (that is, the entire set Ω)

Altogether, this indeed forms a probability triple, and all conditions and axioms are satisfied.

I hope that after carefully reading this Lecture you now understand Kolmogorov’s definition of probability — it’s not that difficult, right? 🙂

If you would like to ask about something or if anything is still unclear, I strongly encourage you to leave a comment under this Lecture — it will surely help others as well.

Click here to go to the next Lecture about probability formulas

Click here to return to the main page with probability materials

What Is Probability? Definition Step by Step

The Trouble with the Definition of Probability

Kolmogorov’s Definition of Probability

1. Elementary events Ω

2. A σ-algebra of subsets on Ω

Remark 1

Remark 2

3. The Function P:F\to R

That’s All About Probability

Leave a Reply Cancel reply

Sign in

Sign up

3. The Function $P:F\to R$