8.9. Chapter Summary#
8.9.1. Terminology Review#
Use the flashcards below to help you review the terminology introduced in this chapter. \(~~~~ ~~~~ ~~~~\)
8.9.2. Key Take-Aways#
Definition of Random Variables
A random variables \(X\) is defined on a probability space \((S, P, \mathcal{F})\) as a function from \(S\) to the real line. Formally, we write the random variable as \(X(s)\).
Random variables are written as uppercase letters (like \(X\) or \(Y\)), and values of random variables are written as the corresponding lowercase letters (like \(x\) and \(y\)).
Random variables defined on the same sample space can have different ranges and different distributions of probability.
For a random variable, every Borel set must correspond to an event in the event class.
Discrete Random Variables
A discrete random variable takes values on a finite or countable range.
The probability mass function (PMF) is a way to write the probabilities of a discrete random variable as a function that is defined for all real values.
We use stem plots to illustrate PMFs.
A discrete random variable may have an infinite number of values with nonzero probabilities.
Cumulative Distribution Functions
The cumulative distribution function (CDF) of a random variable \(X\) has the form \begin{equation*} F_X(x) = \operatorname{Pr} \bigl( \left{s \vert X(s) \le x\right} \bigr). \end{equation*}
We abbreviate the CDF as \(F_X(x) = P(X \le x)\).
CDFs are often piecewise functions. In Python, we can use
np.piecewise()
to define a piecewise function.The CDF for any discrete random variable is a staircase function, meaning that it is a piecewise-constant, nondecreasing function.
The height of the jump at a point in the CDF of a discrete random variable is the probability at that point.
CDFs are helpful because they can be used to calculate probabilities over intervals.
If \(F_X(x)\) is a CDF, then \(0 \le F_X(x) \le 1\) for all \(x\) because the CDF evaluates back to the probability of some event in the probability space over which the random variable is defined.
\(F(-\infty) = 0\), \(F(+ \infty) =1\), and \(F_X(x)\) is monotonically nondecreasing in between.
The survival function of a random variable is \(S_X(x) = 1-F_X(x)\).
Important Discrete Random Variables
A Discrete Uniform random variable has equal probability at each of a finite set of values, which are usually consecutive integers.
A Bernoulli (\(p\)) random variable \(B\) takes on values of 0 or 1, where \(P(B=1) = p\).
A Binomial (\(N,p\)) random variable is the number of successes (i.e., 1 values) on \(N\) Bernoulli (\(p\)) trials.
A Geometric (\(p\)) random variable is the number of Bernoulli (\(p\)) trials until the first success. Unlike the RVs above, it has a countably infinite range.
A Poisson (\(\alpha\)) random variable models the number of occurrences in some observation interval with average number of occurrences \(\alpha\).
SciPy.stats has functions to create objects for all of these types of random variables and many more. The objects have methods to return the range, PMF values, CDF values, and many more properties and functions related to the corresponding random variable.
Continuous Random Variables
A continuous random variable has a CDF that is a continuous function.
All of the values of a continuous RV have zero probability; instead, they have nonzero probability density. Zero probability does not mean that the values do not occur. Probabilities of the random variable taking a value in intervals are usually nonzero.
The probability density function (pdf) is \begin{equation*} f_X(x) = \frac{d}{dx} F_X(x). \end{equation*}
The CDF can be computed from the pdf by integrating: \begin{equation*} F_X(x) = \int_{-\infty}^{x} f_X(u) ~du. \end{equation*}
Be sure to use a variable of integration, or you will not be able to evaluate the integral correctly.
SymPy is a Python library for performing symbol mathematics. In particular, it has
diff()
andintegrate()
functions for performing differentiation and integration, respectively.Some important properties of pdfs are:
They are nonnegative, but unlike CDFs, they can be arbitrarily large. They are not probabilities; they are only densities of probabilities.
The area under the pdf for some interval (i.e., the integral over that interval) is the probability that the random variable takes a value in that interval. This can be extended to Borel sets.
The area under the entire pdf is 1. I.e., a pdf integrates to 1.
Almost any nonnegative function that integrates to 1 is a valid pdf for some random variable.
Important Continuous Random Variables
A Uniform random variable models complete lack of knowledge on a fixed interval; i.e., the density is uniformly distributed across the interval, so every point in the interval is equally likely.
Exponential random variables model the time between Poisson arrivals and the lifetime of devices. They are the only continuous RVs with the memoryless property.
Normal random variables model aggregate phenomena and show up often in statistics and data science. The pdf depends on the mean \(\mu\) and standard deviation \(\sigma\) and is \begin{equation*} f_X(x) = \frac{1}{\sigma \sqrt{2 \pi}} \exp \left[ - \frac 1 2 \left( \frac{x- \mu}{\sigma} \right)^2 \right]. \end{equation*}
A standard Normal random variable us \(\mu=0\) and \(\sigma=1\).
The CDF of a Normal random variable can only be written as an integral. For a standard Normal random variable, the CDF is called \(\Phi(x)\), and the survival function is called \(Q(x)\).
The probability of a tail of a Normal random variable can always be expressed as \(Q(d/\sigma)\), where \(d\) is the distance from the mean to the tail. The \(Q()\) function can be evaluated using SciPy.stat’s
norm.sf()
function.The Central Limit Theorem says that the CDF of the average of almost any types of random variables converges to a Normal distribution.
A chi-squared random variable can be created as the sum of squares of independent zero-mean Normal random variables. It is characterized by the degrees of freedom, which corresponds to the number of squared Normal RVs added together. A special case is when dof=2, the random variable is Exponential.
Student’s \(t\) random variable is similar to the standard Normal but its density is spread further away from 0.
Histograms of Continuous Densities and Kernel Density Estimation
Normalized histograms of continuous densities approximate the pdf when the size of the bins is small, and the number of samples of the random variable is large.
Kernel Density Estimation (KDE) uses a smooth kernel function to better approximate the pdf for random variables with continuous pdfs.
The quality of the KDE estimate depends on the shape of the kernel. The usual kernel is the Gaussian (Normal) kernel but the value of \(\sigma\) (typically called the bandwidth) must still be chosen.
SciPy.stats has a
gaussian_kde()
function that performs KDE and automatically chooses the kernel bandwidth.
Conditioning with Random Variables
We can use the usual definition of conditional probability to create conditional CDFs when conditioning on events with nonzero probability.
Conditional pdfs are defined as the derivative of the corresponding conditional CDFs.
The Law of Total Probability and Bayes’ rule can be used with conditional CDFs.
If we want to condition on an observed value of a continuous random variable, then all our previous approaches to conditional probabilities break down.