Probability for Statistics and Data Science

Probability and Statistics are the basic skills required for data science and machine learning. We have already covered statistics for data science in our previous articles. In this article, we will learn about probability for data science, how is probability used in machine learning, and what are the important topics you need to know in probability.

What is Probability?

Probability helps you find how likely an event is about to occur or the chance of the occurrence of an event. It is the ratio of expected outcomes to the total no of possible outcomes.

The value of probability always lies between 0 and 1.

How is Probability Used in Machine Learning?

In machine learning or data analysis, we mostly have incomplete data and we want to measure the uncertainty. Probability is the measure of uncertainty/likelihood in the data. Hence probability is used in many concepts of machine learning.

Probability is the foundation for most statistics concepts and statistics is the foundation for machine learning. Hence clearing concepts of probability is important if you want to learn machine learning algorithms.

You can learn about different probability distributions from here.

An event is described as the outcome which is able to occur.

Types of Probability

Probability problems can be categorized into two types: Additive Probability and Multiplicative Probability.

1. Additive Probability

If A and B are two events in a probability experiment, then the probability that either one of the events will occur can be calculated using the addition rule in probability. The additive rule can be applied to two types of events: mutually exclusive events and non-mutually exclusive events.

Mutually Exclusive Events

Two events are called mutually exclusive if they cannot occur together. They are disjoint events.

P(A or B) = P(A) + P(B)
P(A U  B) = P(A) + P(B)

Real-life Examples of Mutually Exclusive Events

  • When tossing a coin, the event of getting head and tail are mutually exclusive because head and tail cannot occur together.
  • In throwing a die, the events that 3 and 5 will occur are mutually exclusive as we cannot get 3 and 5 at the same time.

Non-Mutually Exclusive Events

Two events are called non-mutually exclusive if they can occur together.

P(A or B) = P(A) + P(B) – P (A and B)
P(A U  B) = P(A) + P(B) – P (A ∩ B)

Real-life Examples of Non-Mutually Exclusive Events

  • Driving a car and listening to music.
  • Occurrence of a prime number and even number in throwing dice.

2. Multiplicative Probability

If A and B are two events in a probability experiment, then the probability that both events will occur at the same time can be calculated using the multiplicative rule in probability. The multiplicative rule can be applied to two types of events: independent events and dependent events.

Independent Events

Two events are considered independent if the occurrence of one event doesn’t depend on the occurrence of another.

P(A ∩ B) = P(A) . P(B)

Real-life Examples of Independent Events

  • In flipping a coin, the head appears in the first flip and the tail appears in the second flip. The occurrence of a tail in the second flip does not depend on the occurrence of a head in the first flip.
  • In throwing a die, let us consider that 3 occurs in the first throw and 2 occurs in the second throw. The occurrence of 2 doesn’t depend on the occurrence of 3.

Dependent Events

Two events are considered dependent if the occurrence of one event is dependent on the occurrence of another.

P(A ∩ B) = P(B) . P(A|B)

Conditional Probability

The probability of occurrence of any event A when another event B in relation to A has already occurred is known as conditional probability. The second part P(A|B) in dependent events is conditional probability.

Bayes’ Theorem

From the equation of dependent events and conditional probability,

End Notes

In this article, we have covered the basics of probability required to learn statistics and machine learning. You can also read about Common Probability Distributions you need to know to learn machine learning.

Feel free to put your feedback in the comment box below.

Happy Learning!