What Is Probability? Definition Step by Step

 

The problem with defining probability

Until the 20th century, all definitions of probability had some shortcomings.

  • classical (Laplace’s) – a logical error at its core, works only for a finite number of possible outcomes
  • geometric – essentially the same logical error, limits infinity in a certain sense, various strange paradoxes
  • frequentist (already 1931, time really flies…) – enormous problems in practical applications

The correct and commonly used definition was given in 1933 by the Soviet mathematician Andrey Kolmogorov, and this lecture is devoted precisely to it.

Kolmogorov’s definition requires – unfortunately for students – a great deal of concentration and cannot be explained “in one sentence”. Very few people choose the path of UNDERSTANDING here; most choose the easier path of MEMORIZING, or even worse – IGNORING it altogether (to put it mildly).

Which is a shame, because first of all, knowing WHAT you are actually doing is a really great feeling, and secondly, this definition is a huge treasure and brings incredible benefits. It turns out that the world of probability theory connects with set theory and even mathematical analysis.

That means that once you accept this definition, to solve problems in probability theory (and as you can guess, there are plenty of them in the “real” world), you can use the full heavy artillery from other areas of mathematics – set operations, and even derivatives and integrals!

Kolmogorov’s definition of probability

You will find this definition in every probability textbook, on Wikipedia, and basically everywhere.

Probability is a function that satisfies certain conditions (the so-called axioms – what those conditions are is explained below), which assigns numbers to certain sets (these sets must also satisfy certain conditions, explained below) of elementary events.

In short: probability is a function that assigns numbers to sets of elementary events.

Now I will slowly and carefully explain all the elements used in this definition, starting completely from the basics. If you look closely, there are three elements to discuss:

  • what “elementary events” are (Ω)
  • what conditions the sets of these events must satisfy ( F)
  • what conditions the function itself must satisfy (P)

Elementary events form the set Ω, the function (denoted by P) is defined on sets belonging to a family of subsets of Ω (this family is denoted by F). Both the family of subsets and the function must satisfy certain conditions. Together, this forms the so-called “probability triple” (Ω, F,P).

I’ll go step by step, and please don’t tense up – it really is simple 🙂

1. Elementary events Ω

Elementary events are the simplest (i.e. no longer decomposable into other events) possible outcomes of a random experiment.

For example, if the experiment is tossing a coin, then the elementary events may be: HEADS and TAILS (assuming it cannot land on its edge). The event HEADS cannot be decomposed any further – and that’s the point.

If the experiment is, for example, rolling a die, then an elementary event could be ROLLING A TWO. An event such as ROLLING AN EVEN NUMBER is not an elementary event, because it can be decomposed into several other events (ROLLING A TWO, ROLLING A FOUR, or ROLLING A SIX).

And that’s it.

Notice that this concept is somewhat “fluid” (someone might insist: “but why couldn’t it land on its edge?”), and it is NOT a strict mathematical definition.

And that is exactly how it should be, because concepts like “elementary event” and “random event” are primitive notions in probability theory. Primitive notions in mathematics are objects or concepts that are not defined – because they are assumed to be self-evident.

For example, in geometry, a point is a primitive notion. A point has no definition. Don’t misunderstand me – I don’t mean that we cannot say what a point is. But whatever we say about a point, however we describe it, these will only be verbal descriptions, not strict mathematical definitions.

Of course, accepting something as a primitive notion is somewhat conventional and often leads to paradoxes. In such cases, it becomes necessary to “refine” the concept and introduce a strict definition that avoids these paradoxes. However, that definition will itself be based on other primitive notions, because that is simply how definitions work.

Why must it be that way? That’s a topic for a completely different story. Let’s get back to probability.

So we have elementary events – the possible, simplest, and indivisible outcomes of a random experiment. All of them together form a set (for example, the set {ROLLING 1, ROLLING 2, ROLLING 3, ROLLING 4, ROLLING 5, ROLLING 6}).

I will denote this set by the Greek letter Ω.

This set is the first element of Kolmogorov’s definition of probability.

The second element of the definition usually causes students the most trouble. But it is not difficult either.

2. σ-algebra of subsets of Ω

A “family of subsets of a set” consists of some of its subsets.

Example

Let the set A = \left\{ 1,2,3,4,5,6 \right\}

An example of a family of subsets of this set is: F=\left\{ \left\{ 1 \right\},\left\{ 2 \right\},\left\{ 1,2 \right\},\left\{ 3,4 \right\},\left\{ 1,2,3,4,5,6 \right\} \right\} (a subset consisting of element 1, a subset consisting of element 2, a subset consisting of elements 1 and 2, a subset consisting of elements 3 and 4, and the subset equal to the entire set A).

As you can see, a set can have quite a lot of its own “families of subsets”.

In the definition of probability, the set whose family of subsets we choose is the set of all elementary events Ω (see the first part of the definition). However, this family of subsets cannot be completely arbitrary.

It must satisfy certain conditions, which are met by families called σ-algebras.

For a set Ω, a family of its subsets F is called a σ-algebra if:

  1. \varnothing \in F – the empty set belongs to it
  2. A\in F\Rightarrow {A}'\in F – the complement of every set in the family also belongs to the family (the complement of A consists of all elements of Ω that do not belong to A)
{{A}_{1}},{{A}_{2}},\ldots \in F\Rightarrow \bigcup\limits_{n=1}^{\infty }{{{A}_{n}}}\in F

– unions of any sets belonging to the family also belong to the family

Example

Let the set be \Omega =\left\{ a,b,c \right\}
  • Consider the family of its subsets F=\left\{ \varnothing ,\left\{ a \right\},\left\{ b \right\},\left\{ c \right\},\left\{ a,b \right\},\left\{ a,b,c \right\} \right\}. It contains the empty set (condition 1 satisfied). However, the complement of the set \left\{ a \right\}, that is the set of all elements that do NOT belong to \left\{ a \right\} and belong to Ω, is the set \left\{ b,c \right\}. This set does not belong to the family of subsets F. Therefore, this family of subsets is not a σ-algebra (moreover, the union of the sets \left\{ b \right\} and \left\{ c \right\} does not belong to the family either – so the third condition is also not satisfied).
  • Now consider the family of its subsets F=\left\{ \varnothing ,\left\{ a \right\},\left\{ b \right\},\left\{ c \right\},\left\{ a,b \right\},\left\{ b,c \right\},\left\{ a,c \right\}\left\{ a,b,c \right\} \right\}. It contains the empty set, complements of all sets, and unions of all sets with one another. This family of subsets is a σ-algebra.
The definition of a σ-algebra of subsets of Ω looks very heavy and very mathematical. However, if you think about it, it is easy to translate it into “everyday language”. We are already very close to defining probability as a certain function P. This function will assign numbers to our subsets from this σ-algebra – numbers representing their “probabilities”. It would therefore be somewhat strange if:
  • we could not define the probability of an impossible event (that is, define the probability of the empty set – condition 1)
  • we could define the probability that something happens, but could not define the probability that it does not happen (condition 2)
  • we could define probabilities of several events, but could not define the probability that at least one of them occurs (condition 3)
All the conditions that the family of subsets of Ω must satisfy in the definition of probability are therefore truly necessary and well justified. I will denote our family of subsets by F.

Note 1

To define the family F, which is the second element of the definition, I need the first element – the set of all elementary events Ω. F consists of various subsets of Ω.

Note 2

F consists of SETS, not elements. That means that one of its elements is, for example, \left\{ a \right\}, and not a. This is really a huge bonus, because from this point on it allows us to operate on objects that are very precisely defined mathematically – namely sets. Sets can be added, subtracted, intersected, and at that moment mathematics really “enters the game”, which would be impossible if we limited ourselves only to events, that is, elements of Ω (have you ever tried to add heads to tails? What result did you get? 🙂 ) The third element of the “probability triple” usually causes no difficulties.

3. The function P:F\to R

So here we simply have a function that assigns numbers to subsets from F (the second element of the “probability triple”). These numbers (that is, the values of the function P) are commonly called “probabilities”. Of course, this function must also satisfy certain conditions (have you ever seen a probability equal to –7?). These are also called axioms (because they are accepted without proof):
The function P:F\to R is a probability function if:
  1. P\left( A \right)\ge 0 for any A\in F
  2. P(Ω)=1
  3. {{A}_{1}}\cup {{A}_{2}}\cup {{A}_{3}}\cup \ldots =P\left( {{A}_{1}} \right)+P\left( {{A}_{2}} \right)+P\left( {{A}_{3}} \right)+\ldots – if the sets {{A}_{1}},{{A}_{2}},{{A}_{3}},\ldots are pairwise disjoint (that is, they have no elements in common, {{A}_{i}}\cap {{A}_{j}}=\varnothing for i\ne j)
That is, the numbers assigned by the function must be non-negative (there are no negative probabilities), the probability that some event from all possible events occurs is equal to 1 (a certain event), and probabilities can be added if the events/sets are disjoint.

That’s all when it comes to probability

The values of the function P can be called probabilities. The function P assigns them to sets from the family F. Sets from the family F are subsets of elementary events from the set Ω. Together, this forms the probability triple: (Ω, F,P). Defining probability while omitting any element of the triple is unfortunately impossible (and apparently, people have tried). You cannot explain what P is without explaining what F is, and that is impossible without defining Ω. Some high-school definitions presenting P as a function P:\Omega \to R are incorrect or refer to old definitions of probability (those with logical errors). So let us see our definition in action with a concrete example.
Example Let our experiment be rolling a six-sided die. By “1” I mean rolling a one, and so on.
  1. Our sample space of elementary events is: \Omega =\left\{ 1,2,3,4,5,6 \right\}
  2. Our σ-algebra F is the family of all possible subsets of Ω
  3. Let the function P defined on F take the value 0 for \varnothing , the value \frac{1}{6} for each single-element set, \frac{2}{6} for each two-element set, … and finally \frac{6}{6}=1 for the six-element set (that is, the entire set Ω)
Together, this indeed forms a probability triple; all conditions and axioms are satisfied.
I hope that after carefully reading my Lecture you have understood Kolmogorov’s definition of probability – it’s not that difficult, right? 🙂 If you would like to ask something, or if something is still not completely clear to you, I strongly encourage you to write a comment under this Lecture – it will surely help others as well.

Click here to go to the next lecture on probability formulas

Click here to return to the main page with probability materials

Leave a Reply

Your email address will not be published. Required fields are marked *

Your comment will be publicly visible on our website along with the above signature. You can change or delete your comment at any time. The administrator of personal data provided in this form is eTrapez Usługi Edukacyjne E-Learning Krystian Karczyński. The principles of data processing and your related rights are described in our Privace Policy (polish).