## Overview

Probability theory is concerned with *chance*. Whenever there is an event or an activity where the outcome is uncertain, then probabilities are involved. We can think of any activity or event with an uncertain outcome as an *experiment* whose outcome we observe. Probabilities are then measures of the likelihood of any of the possible outcomes of an event.

An *experiment* represents an activity whose output is subject to chance (or variation). The output of the experiment is referred to as the *outcome* of the experiment. The set of all possible outcomes is called the *sample space*.

## Some prerequisite set theory

To understand probability theory, we first want an understanding of the three main operation in set theory.

Definition — Set

A **set** is a collection of elements, such as numbers, points in space, shapes, variables, or other sets. A set does not contain duplicate elements.

The set with no elements is a special set called the **empty set**, denoted [latex]\emptyset[/latex].

Definition — Set union, intersection, and subtraction

The **set union** [latex]A \cup B[/latex] of two sets [latex]A[/latex] and [latex]B[/latex] is the set of all elements in that are in either set [latex]A[/latex] or in set [latex]B[/latex]. Formally: [latex]A \cup B = \{a \mid a \in A \lor a \in B\}[/latex].

The **set intersection** [latex]A \cap B[/latex] of two sets [latex]A[/latex] and [latex]B[/latex] is the set of all elements in that are in both set [latex]A[/latex] and set [latex]B[/latex]. Formally: [latex]A \cup B = \{a \mid a \in A \land a \in B\}[/latex].

The **set subtraction** [latex]A - B[/latex] of two sets [latex]A[/latex] and [latex]B[/latex] is the set of all elements in that are in set [latex]A[/latex] but not in set [latex]B[/latex]. Formally: [latex]A \cup B = \{a \mid a \in A \land a \notin B\}[/latex]

Like any set, [latex]A \cup B[/latex], [latex]A\cap B[/latex], and [latex]A - B[/latex] do not contain duplicate elements. If an element [latex]a[/latex] is in [latex]A[/latex] and in [latex]B[/latex], it is represented just once in [latex]A \cup B[/latex], in [latex]A \cap B[/latex], and in [latex]A - B[/latex].

## Sample spaces and outcomes

Definition – Sample space and outcomes

The *sample space* [latex]\Omega[/latex] is the set of all possible outcomes that might be observed for an experiment.

A set of outcomes [latex]A \subseteq \Omega[/latex] is called an *event*. An event is said to have *occurred* if any one of its elements is the outcome observed in an experiment.

Examples

- An experiment involving the flipping of a coin has two possible outcomes: heads or tails. The sample space is thus [latex]S = \{heads, tails\}[/latex].
- An experiment involving testing a software system and counting the number of failures experienced after T = 1 hour has many possible outcomes: we may experience no failures, 1 failure, 2 failures, 3 failures, …. The sample space is [latex]S = \{0, ~1, ~2, ~3, \ldots\}[/latex] (the set of positive integers).
- An experiment involving connecting to a web server may have several outcomes, such as successfully connecting, receiving a 404 error, etc. The sample space is the set of possible return values from the web sever.

Definition – Independence

Two events are said to be **independent** if the occurrence of one event does not depend on the other and vice versa. For example, if we have two dice, the probability of one falling on the outcome [latex]6[/latex] is independent of the outcome of the other dice. However, if we throw both dice, but one falls on the floor out of sight, while the other shows a [latex]3[/latex], then the probability of the sum of the two dice equaling [latex]7[/latex] is dependent on the probability of the second dice.

## Probability

Probabilities are usually assigned to events.

Definition – Probability

The probability [latex]P(A)[/latex] of an event [latex]A \subseteq \Omega[/latex] is a non-negative real number that relates to the number of times we observe an outcome in [latex]A[/latex]. This can be defined as the fraction of times that we observe an outcome in [latex]A[/latex] over the total number of possible outcomes:

[latex]\begin{split}P\{A\} = \lim\begin{array}[c]{c} m \\\hline n \end{array}\end{split}[/latex]

where [latex]m[/latex] is the number of outcomes in [latex]A[/latex] and [latex]n[/latex] is the total number of possible outcomes, that is, the number of elements in [latex]S[/latex].

We say that [latex]P(A)[/latex] is the *probability measure*, and this measures how likely it is that the actual outcome of the experiment will be a member of the set [latex]A[/latex].

The three axioms of probability theory

Probabilities must satisfy certain *axioms* to be meaningful measures of likelihood. The following probability laws hold for any event [latex]A[/latex] and any state space [latex]\Omega[/latex]:

- [latex]0\leq P(A) \leq 1[/latex];
- [latex]P(\Omega) = 1[/latex]; and
- [latex]P(A \cup B) = P(A) + P(B)[/latex] for disjoint events [latex]A[/latex] and [latex]B[/latex] (that is, [latex]A \cap B = \emptyset[/latex]).

That is: (1) the probability of an event occurring is between [latex]0[/latex] and [latex]1[/latex] inclusive; (2) the probability of the event being part of the sample space is 1, so no events outside the sample space are possible; and (3) the probability of the event [latex]A \cup B[/latex] is the sum of the probabilities of [latex]A[/latex] and of [latex]B[/latex], assuming that [latex]A[/latex] and [latex]B[/latex] are disjount.

There are some important consequences of these three axioms:

- [latex]P(A) = 1 - P(\Omega - A)[/latex].
- [latex]P(\emptyset) = 0[/latex], where [latex]\emptyset[/latex] is the empty set.
- If [latex]A \subseteq B[/latex] then [latex]P(A) \leq P(B)[/latex].
- [latex]P(A \cup B) = P(A) + P(B) - P(A \cap B)[/latex], even if [latex]A[/latex] and [latex]B[/latex] are not disjoint.

Definition – Conditional probability

Given two events [latex]A[/latex] and [latex]B[/latex], if [latex]P(B)[/latex] then the **conditional probability** of [latex]A[/latex] *given* [latex]B[/latex] is defined as:

[latex]P(A \mid B) = \frac{P(A \cap B)}{P(B)}[/latex]

This says that the probability of [latex]A[/latex] occuring, given that we have observed [latex]B[/latex], is the probability of observing both [latex]A[/latex] and [latex]B[/latex], divided by the probability of observing event [latex]B[/latex].

Conditional probability provides us with the tools to reason about partial information, so that we can estimate the probability of an event [latex]A[/latex] that depends on the outcome of event [latex]B[/latex],, event if we do not know the outcome of event [latex]B[/latex] yet.

We say that [latex]P(A)[/latex] is the *prior probability* of [latex]A[/latex] and that [latex]P(A \mid B)[/latex] is the *posterior probability[latex]of[/latex]A[latex]given[/latex]B$.

Definition – The Product rule

Given two events [latex]A[/latex] and [latex]B[/latex], if [latex]P(B)[/latex] then the probability of both events [latex]A[/latex] and [latex]B[/latex] occuring is defined by the **product rule**:

[latex]P(A \cap B) = P(A)P(B \mid A)[/latex]

The product rule can be useful, however, it is defined it terms of conditional probability, which itself uses the product rule. We can resolve this using Bayes’ theorem.

Definition – Bayes’ theorem

Given two events [latex]A[/latex] and [latex]B[/latex], the conditional probability of [latex]A[/latex] *given* [latex]B[/latex] can be calculated using:

[latex]P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}[/latex]

Bayes’ theorem allows us to translate causal knowledge about events into diagnostic knowledge. For example, if event [latex]A[/latex] represents a particular disease in a plant, and event [latex]B[/latex] is the event describing a positive outcome of a test that can diagnose that disease, then [latex]P(B \mid A)[/latex] define the casual relationship: if the plant has disease [latex]A[/latex], then the probability of observing a positive diagnosis from test [latex]B[/latex] is [latex]P(B \mid A)[/latex]. Once we have the test result, we can determine the probability of [latex]A[/latex] (the plant having the disease) provided we can estimate the prior probabilities [latex]P(A)[/latex] and [latex]P(B)[/latex].

Bayes’ theorem is named after Thomas Bayes, the English statistician and philosopher who first defined it.