Notes on Math and Stats
Linear Algebra
Vector Space
Basis
Linear Independent
Rank
Column & Row Space
Linear Transformation
Probability
A, B, and C are distinct events;
X is a random variable that can take different values (Ex. X=1 if an individual is a female and 0 if a male)
P(X=1): probability that an individual randomly drawn from a population is female, which corresponds to the proportion of the population that are female
Marginal probability
0 ≤ P(A) ≤ 1
P(A) + P(Ā) = 1, where Ā denotes all other events that are not A
Joint Probability
P(A, B) = 0 if A and B are mutually exclusive; B = Ā
Chain rule: \(P(A, B, C) = P(A) * P(B \mid A) * P(C \mid B,A)\)
Conditional probability
\[P(B|A) = \frac{P(A,B)}{P(A)}\]Statistical independence
If A and B are statistically independent events, then \(P(B \mid A) = P(B)\) and \(P(A,B) = P(A) * P(B)\)
Bayes’ Theorem
\[P(B|A)=\frac{P(A|B) * P(B)}{P(A)}\]Probability density for continuous random variables
For a discrete random variable Y, P(Y = y) is the probability that Y takes value y, and \(\sum_{k}^{} 𝑃(Y=y)=1\)
For a continuous random variable Y, 𝑓(𝑦) is the probability density of y, and \(\int_{y}^{}𝑓(𝑦)dy=1\)
Bayes’ Theorem for continuous random variables X and Y
\[f(y|x) = \frac{f(x,y)}{f(x)} = \frac{f(x|y)f(y)}{f(x)}\]If x and y are independent, then f(x,y) = f(x)f(y)
Expectation
a and b are constants; X, Y, and Z are random variables
If X is a discrete random variable, \(E(X) = \sum_{x}^{} xP(X=x)\).
When X is a binary indicator (0 and 1), E(X) = P(X = 1)
- E(a) = a
- E(a+X) = a + E(X)
- E(aX) = aE(X)
- E(X+Y) = E(X) + E(Y)
- E(aX+bY) = aE(X) + bE(Y)
If X and Y are statistically independent, then E(XY) = E(X)E(Y)
Conditional expectation
If Y is a discrete random variable, then \(E(Y \ mid X=x) = \sum_{y}^{} yP(Y=y) \mid X=x\)
If Y is a continuous random variable, then \(E(Y \mid X=x) = \int_{y}^{} yf(y \mid x)dy\)
Marginal expectation
If Y is a discrete random variable, then \(E(Y \mid X=x) = \sum_{x}^{} E(Y \mid X=x)P(X=x) = \sum_{x}^{}\sum_{y}^{} yP(Y=y \mid X=x)P(X=x)\)
If Y is a discrete random variable, then \(E(Y) = \int_{}^{} \int_{}^{} yf(y \mid x)f(x)dydx\)
Variance and Covariance
- \[Var(X) = E{[X-E(X)]^{2}} = E(X^{2})-[E(X)]^{2}\]
- \[Var(a) = 0\]
- \[Var(a+X) = Var(X)\]
- \[Var(aX) = a^{2}Var(X)\]
- If X and Y are statistically independent, then \(Var(X+Y) = Var(X)+Var(Y)\);
Otherwise, \(Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)\);
- \[Cov(X,Y) = E{[X-E(X)][Y-E(Y)]} = E(XY)-E(X)E(Y)\]
- \[Cov(a,X) = 0\]
- \[Cov(aX,Y) = aCov(X,Y)\]
- \[Cov(aX, bY) = abCov(X,Y)\]
- \[Cov(a+bX,Y) = Cov(a,Y)+Cov(bX,Y)=bCov(X,Y)\]
- \[Cov(X+Y,Z) = Cov(X,Z)+Cov(Y,Z)\]
Distributions
-
Binomial
-
Poisson
-
Normal
Discrete time Markov chain
Poisson process
Continuous time Markov process
Renewal process
Brownian motion
Conditional probability and independence
Statistical Inference
Standard error; Standard deviation
Standard error measures how accurate is a sample represents a population.
Standard Error is used to measure the statistical accuracy of an estimate.
p value
When testing an hypothesis, the p-value is the likelihood that we would observe results at least as extreme as our result due purely to random chance if the null hypothesis were true.
- A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so we say that the result is “statistically significant” and reject the null hypothesis.
(relatively rare for the our results to be purely from random variations in observations.)
-
A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so we fail to reject the null hypothesis.
-
p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so our readers can draw their own conclusions. Link
Statistical Tests
Statistical Tests — When to use Which
Interactive Visualization of Statistical Power and Significance Testing
Confidence Interval
95% confidence interval: a range of values that you can be 95% certain contains the population mean
Using Confidence Intervals to Compare Means
Effect size
Cohen’s d is an effect size used to indicate the standardised difference between two means.
Cohen’s d can be calculated as the difference between the means divided by the pooled SD: \(\frac{Mean Difference}{Standard Deviation}\)
[Computation of Effect Sizes] (https://www.psychometrica.de/effect_size.html)
Ex. The effect size was d=0.1, n1 is the number of units in treatment group, n2 is the number of units in control group
library(psych)
cohen.d.ci(d = .1, n1 = 100, n2 = 100)
Mediation Analysis
Various package options for conducting mediation analysis
Posterior distributions
Confidence and credible intervals
Likelihood ratio tests
Multinomial distributions
Chi-square tests
Diagnostic plots
Bootstrapping
Comparison of Bayesian and frequentist inference
Conditioning
Law of Large Number
Wiki def: the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer to the expected value as more trials are performed.
Central Limit Theorem
Def: probability distribution of the average of independent random variables will converge to a normal distribution, as the number of observations increases
Jensen’s Inequality
Chebyshev’s inequality
The exponential family
Generalized linear modeling
Study design/power analyses
Basic statistical tests (z-score, t-test, Chi-square test, ANOVA…)
Nonparametric statistical testing
Expected values and variance of a random variable
Likelihood ratios
Time series and longitudinal regression methods
Stochastic processes
Measurement models (SEMs, factor analysis…)
Statistics
Linear Regression
4 Assumptions
Remove multicolinearity
Logistic Regression
Maxmimum Likelihood Estimation
Generalized Linear Models
UChicago_STAT34700_Winter 2020
Discrete Choice Modeling
AB Test and Experiments
Power
Sample Size
Metrics
Time
Ex. Evaluate and prototype Bayesian alternatives to measuring statistical significance of A/B tests
Online resources
A/B Testing, A Data Science Perspective
Simulation
Bayesian Statistics
Monte-Carlo Simulation
MCMC
Bayes’s Formula
Hierarchical Bayesian models
Econometrics & Casual Inference
Potential Outcome Framework
Simpson’s paradox: Quora answer
Instrument Variables
Difference in Differnce
Regression Discontinuity
Propensity Score Matching
Mediation
Moderation
Spillover Effects
Readings:
Causal Inference in A Nutshell
Quasi-experimental tools in the age of A/B tests
Identifying Cause & Effect With Causal Reasoning
Causal Inference using Difference in Differences, Causal Impact, and Synthetic Control
Non-parametric Methods
Kernel density estimators
Linear interpolation
Cubic splines
Histograms
Confidence sets
Orthogonal functions
Random processes
Numerical Methods
High performance computing/parallel programming (MPI and OpenMP)
Matrix decompositions (SVN, QN-LQ, Choleski, eigenvector-eigenvalue)
Orthogonal polynomials
Numerical derivatives
Nnumerical integration
Equidistributed sequences for quasi-Monte Carlo simulation
Measure theory
Optimization algorithms
Linear programming
Simplex, Newton and quasi-Newton methods
Conjugate gradient methods
Duality theory
Optimality conditions
Intractability results
Unconstrained and constrained optimization
Psychometrics
Item response theory (IRT)
Factor analysis
Time Series
Multilevel Modeling
Multilevel regression with poststratification
Survival Analysis
-
Contingency table analysis
-
Kaplan-Meier survival analysis
-
Cox proportional-hazards survival analysis
Leave a Comment