Probability - Distributions
Fundamentals of Probability Distributions
Distribution: a collection of all the possible values a variable can take and how frequently they occur in the sample space.
Notations:
Y
→ The actual outcome of an event
y
→ One of the possible outcomes
P(Y = y)
or P(Y)
Examples:
Y
→ The number of red marbles we draw out of a bag
y
→ 5 red marbles
P(Y=5)
or P(5)
Probability Frequency Distribution: measures the likelihood of an outcome.
Definitions:
Two characteristics: MEAN → μ
and VARIANCE → σ
2
Mean: average value
Variance: how spread out the data is
Population vs Sample
Population data: the whole data → σ
Sample data: a part of the whole data → s
Sample mean: x̄
Sample variance: s
2
Variance measured in squared units
Standard deviation → square root of variance sqrt(σ
2
)
= σ
Mean and Variation relationship:
σ
2
= E((Y - μ)
2
) = E(Y
2
) - μ
2
Types of Probability Distributions
Notation:
X ~ N (μ, σ
2
)
Discrete Distributions:
Uniform Distribution: pick a card or flip a coin → All outcomes are equally likely → Equiprobable
Bernoulli Distribution: events with only two possible outcomes → True or False
Binomial Distribution: two outcomes per iteration but many iterations (carrying out a similar experiment several times in a row). For example, we flip the coin 3 times and calculate the probability of P(HEAD*2)
Poisson Distribution: test out how unusual an event frequency is for a given interval.
Continuous Distributions:
Normal Distribution: often observe in nature
Chi-Squared: Asymmetric; Only consists of non-negative values. Often used in Hypothesis Testing
Exponential distribution: events that are rapidly changing early on
Logistic Distribution: useful in forecast analysis, or for determining a cut-off point for a successful outcome