Notes on Math and Stats

4 minute read

Linear Algebra

Vector Space

Basis

Linear Independent

Rank

Column & Row Space

Linear Transformation

Probability

A, B, and C are distinct events;

X is a random variable that can take different values (Ex. X=1 if an individual is a female and 0 if a male)

P(X=1): probability that an individual randomly drawn from a population is female, which corresponds to the proportion of the population that are female

Marginal probability

0 ≤ P(A) ≤ 1

P(A) + P(Ā) = 1, where Ā denotes all other events that are not A

Joint Probability

$P(A, B) = P(A) * P(B \mid A)$

P(A, B) = 0 if A and B are mutually exclusive; B = Ā

Chain rule: $P(A, B, C) = P(A) * P(B \mid A) * P(C \mid B,A)$

Conditional probability

\[P(B|A) = \frac{P(A,B)}{P(A)}\]

Statistical independence

If A and B are statistically independent events, then $P(B \mid A) = P(B)$ and $P(A,B) = P(A) * P(B)$

Bayes’ Theorem

\[P(B|A)=\frac{P(A|B) * P(B)}{P(A)}\]

Probability density for continuous random variables

For a discrete random variable Y, P(Y = y) is the probability that Y takes value y, and $\sum_{k}^{} 𝑃(Y=y)=1$

For a continuous random variable Y, 𝑓(𝑦) is the probability density of y, and $\int_{y}^{}𝑓(𝑦)dy=1$

Bayes’ Theorem for continuous random variables X and Y

\[f(y|x) = \frac{f(x,y)}{f(x)} = \frac{f(x|y)f(y)}{f(x)}\]

If x and y are independent, then f(x,y) = f(x)f(y)

Expectation

a and b are constants; X, Y, and Z are random variables

If X is a discrete random variable, $E(X) = \sum_{x}^{} xP(X=x)$.

When X is a binary indicator (0 and 1), E(X) = P(X = 1)

E(a) = a
E(a+X) = a + E(X)
E(aX) = aE(X)
E(X+Y) = E(X) + E(Y)
E(aX+bY) = aE(X) + bE(Y)

\[E(\sum_{i=1}^{k} a_{i} X_{i}) = \sum_{i=1}^{k} a_{i} E(X_{i})\]

If X and Y are statistically independent, then E(XY) = E(X)E(Y)

Conditional expectation

If Y is a discrete random variable, then $E(Y \ mid X=x) = \sum_{y}^{} yP(Y=y) \mid X=x$

If Y is a continuous random variable, then $E(Y \mid X=x) = \int_{y}^{} yf(y \mid x)dy$

Marginal expectation

If Y is a discrete random variable, then $E(Y \mid X=x) = \sum_{x}^{} E(Y \mid X=x)P(X=x) = \sum_{x}^{}\sum_{y}^{} yP(Y=y \mid X=x)P(X=x)$

If Y is a discrete random variable, then $E(Y) = \int_{}^{} \int_{}^{} yf(y \mid x)f(x)dydx$

Variance and Covariance

\[Var(X) = E{[X-E(X)]^{2}} = E(X^{2})-[E(X)]^{2}\]
\[Var(a) = 0\]
\[Var(a+X) = Var(X)\]
\[Var(aX) = a^{2}Var(X)\]
If X and Y are statistically independent, then $Var(X+Y) = Var(X)+Var(Y)$;

Otherwise, $Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)$;

\[Cov(X,Y) = E{[X-E(X)][Y-E(Y)]} = E(XY)-E(X)E(Y)\]
\[Cov(a,X) = 0\]
\[Cov(aX,Y) = aCov(X,Y)\]
\[Cov(aX, bY) = abCov(X,Y)\]
\[Cov(a+bX,Y) = Cov(a,Y)+Cov(bX,Y)=bCov(X,Y)\]
\[Cov(X+Y,Z) = Cov(X,Z)+Cov(Y,Z)\]

Distributions

Binomial
Poisson
Normal

Discrete time Markov chain

Poisson process

Continuous time Markov process

Renewal process

Brownian motion

Conditional probability and independence

Statistical Inference

Standard error; Standard deviation

Standard error measures how accurate is a sample represents a population.

Standard Error is used to measure the statistical accuracy of an estimate.

Difference

p value

When testing an hypothesis, the p-value is the likelihood that we would observe results at least as extreme as our result due purely to random chance if the null hypothesis were true.

A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so we say that the result is “statistically significant” and reject the null hypothesis.

(relatively rare for the our results to be purely from random variations in observations.)

A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so we fail to reject the null hypothesis.
p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so our readers can draw their own conclusions. Link

Statistical Tests

Statistical Tests — When to use Which

Interactive Visualization of Statistical Power and Significance Testing

Confidence Interval

95% confidence interval: a range of values that you can be 95% certain contains the population mean

Using Confidence Intervals to Compare Means

Effect size

Cohen’s d is an effect size used to indicate the standardised difference between two means.

Cohen’s d can be calculated as the difference between the means divided by the pooled SD: $\frac{Mean Difference}{Standard Deviation}$

Cohen’s d

[Computation of Effect Sizes] (https://www.psychometrica.de/effect_size.html)

Ex. The effect size was d=0.1, n1 is the number of units in treatment group, n2 is the number of units in control group

library(psych)
cohen.d.ci(d = .1, n1 = 100, n2 = 100)

Mediation Analysis

Various package options for conducting mediation analysis

Posterior distributions

Confidence and credible intervals

Likelihood ratio tests

Multinomial distributions

Chi-square tests

Diagnostic plots

Bootstrapping

Comparison of Bayesian and frequentist inference

Conditioning

Law of Large Number

Wiki def: the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer to the expected value as more trials are performed.

Central Limit Theorem

Def: probability distribution of the average of independent random variables will converge to a normal distribution, as the number of observations increases

Brilliant

Medium

Jensen’s Inequality

Note on Brillant

Chebyshev’s inequality

Note on Brilliant’

The exponential family

Generalized linear modeling

Study design/power analyses

Basic statistical tests (z-score, t-test, Chi-square test, ANOVA…)

Nonparametric statistical testing

Expected values and variance of a random variable

Likelihood ratios

Time series and longitudinal regression methods

Stochastic processes

Measurement models (SEMs, factor analysis…)

Statistics

Linear Regression

4 Assumptions

Remove multicolinearity

Beyond OLS

Logistic Regression

Maxmimum Likelihood Estimation

Generalized Linear Models

UChicago_STAT34700_Winter 2020

Discrete Choice Modeling

AB Test and Experiments

Power

Sample Size

Metrics

Time

Ex. Evaluate and prototype Bayesian alternatives to measuring statistical significance of A/B tests

Online resources

A/B Testing, A Data Science Perspective

Simulation

Bayesian Statistics

Monte-Carlo Simulation

Brilliant

MCMC

Bayes’s Formula

Hierarchical Bayesian models

Econometrics & Casual Inference

Potential Outcome Framework

Simpson’s paradox: Quora answer

Instrument Variables

Difference in Differnce

Regression Discontinuity

Propensity Score Matching

Mediation

Moderation

Spillover Effects

Readings:

Causal Inference in A Nutshell

Introducing CausalNex

Quasi-experimental tools in the age of A/B tests

Identifying Cause & Effect With Causal Reasoning

Causal Inference using Difference in Differences, Causal Impact, and Synthetic Control

Non-parametric Methods

Kernel density estimators

Linear interpolation

Cubic splines

Histograms

Confidence sets

Orthogonal functions

Random processes

Numerical Methods

High performance computing/parallel programming (MPI and OpenMP)

Matrix decompositions (SVN, QN-LQ, Choleski, eigenvector-eigenvalue)

Orthogonal polynomials

Numerical derivatives

Nnumerical integration

Equidistributed sequences for quasi-Monte Carlo simulation

Measure theory

Optimization algorithms

Linear programming

Simplex, Newton and quasi-Newton methods

Conjugate gradient methods

Duality theory

Optimality conditions

Intractability results

Unconstrained and constrained optimization

Psychometrics

Item response theory (IRT)

Factor analysis

Time Series

Multilevel Modeling

Multilevel regression with poststratification

Survival Analysis

Contingency table analysis
Kaplan-Meier survival analysis
Cox proportional-hazards survival analysis

Spatial Data Analysis

Survey Research

Longitudinal Data Analysis

Multiple Testing

Share on

Twitter Facebook Google+ LinkedIn

Li Liu