Markov’s inequality

For any non-negative random variable $X$ and $t > 0$ , we have

P {X \geq t} \leq \frac{E X}{t} .

Proof

Fix any $t > 0$ . We can represent any real number as
$x = x \cdot 1 {x \geq t} + x \cdot 1 {x < t} .$
Then we have
$E X = E (X \cdot 1 {X \geq t}) + E (X \cdot 1 {X < t}) \geq E (t \cdot 1 {X \geq t}) = t \cdot E 1 {X \geq t} = t \cdot P {X \geq t} □$

Chebyshev’s inequality

Let $X$ any random variable. Then, for every $t > 0$ we have

P {∣ X - E X ∣ \geq t} \leq \frac{Var ( X )}{t ^{2}} .

Proof

By using Markov’s inequality on the non-negative random variable $(X - E X)^{2}$ we have
$P {∣ X - E X ∣ \geq t} = P {(X - E X)^{2} \geq t^{2}} \leq \frac{E ( X - E X ) ^{2}}{t ^{2}} = \frac{Var ( X )}{t ^{2}} □$

Chernoff’s inequality

Let $X_{i}$ be $N$ independent Bernoulli random variables with parameters $p_{i}$ . Let $S_{N} = X_{1} + \dots + X_{N}$ , with mean $μ = E S_{N}$ .

Then, for every value $t > μ$ , we have

P {S_{N} \geq t} \leq e^{- μ} (\frac{e μ}{t})^{t}

Proof

We use the same approach as for the Chernoff’s inequality, passing through the Markov’s inequality.
$P {S_{N} \geq t} = P {e^{λ S_{N}} \geq e^{λ t}} \leq e^{- λ t} E e^{λ S_{n}} = e^{- λ t} E i = 1 \prod N e^{λ X_{i}} = e^{- λ t} i = 1 \prod N E e^{λ X_{i}} = e^{- λ t} i = 1 \prod N (e^{λ} p_{i} + (1 - p_{i})) = e^{- λ t} i = 1 \prod N (1 + (e^{λ} - 1) p_{i}) \leq e^{- λ t} i = 1 \prod N exp ((e^{λ} - 1) p_{i})$
where last inequality holds since $1 + x \leq e^{x}$ for every $x \in R$ .

Therefore
$P {S_{N} \geq t} \leq e^{- λ t} i = 1 \prod N exp ((e^{λ} - 1) p_{i}) = e^{- λ t} exp (i = 1 \sum N (e^{λ} - 1) p_{i}) = e^{- λ t} exp ((e^{λ} - 1) i = 1 \sum N p_{i}) = e^{- λ t} exp ((e^{λ} - 1) μ) = exp ((e^{λ} - 1) μ - λ t)$
Since this function is convex, we can minimize the bound finding the value of $λ$ s.t. the first derivative becomes $0$ . By monotonicity, it holds when the first derivative of the $(e^{λ} - 1) μ - λ t$ is $0$ , i.e., when
$e^{λ} μ - t = 0 ⟺ λ = ln \frac{t}{μ}$
Finally
$P {S_{N} \geq t} \leq exp ((\frac{t}{μ} - 1) μ - t ln \frac{t}{μ}) = exp (t - μ - t ln \frac{t}{μ}) = e^{t} e^{- μ} (\frac{t}{μ})^{t} = e^{- μ} (\frac{e t}{μ})^{t}$

In a similar way, we can prove the lower tail bound:

P {S_{N} \leq t} \leq e^{- μ} (\frac{e μ}{t})^{t}

Chernoff’s inequality: small deviations

Let $X_{i}$ be $N$ independent Bernoulli random variables with parameters $p_{i}$ . Let $S_{N} = X_{1} + \dots + X_{N}$ , with mean $μ = E S_{N}$ .

Then, for every $δ \in (0, 1]$ we have

P {∣ S_{N} - μ ∣ \geq δ μ} \leq 2 e^{- c μ δ^{2}}

where $c > 0$ is an absolute constant.

Proof

We can re-write the probability
$P {∣ S_{N} - μ ∣ \geq δ μ} = P {S_{N} - μ \geq δ μ} + P {S_{N} - μ \leq - δ μ} = P {S_{N} \geq (1 + δ) μ} + P {S_{N} \leq (1 - δ) μ} \leq e^{- μ} (\frac{e μ}{μ ( 1 + δ )})^{μ (1 + δ)} + e^{- μ} (\frac{e μ}{μ ( 1 - δ )})^{μ (1 - δ)} = e^{- μ} [(\frac{e}{1 + δ})^{1 + δ}]^{μ} + e^{- μ} [(\frac{e}{1 - δ})^{1 - δ}]^{μ} = [e^{- 1} \cdot (\frac{e}{1 + δ})^{1 + δ}]^{μ} + [e^{- 1} \cdot (\frac{e}{1 - δ})^{1 - δ}]^{μ} = (\frac{e ^{δ}}{( 1 + δ ) ^{1 + δ}})^{μ} + (\frac{e ^{- δ}}{( 1 - δ ) ^{1 - δ}})^{μ} = p_{1} + p_{2}$
We want now to maximize $p_{1}$ and $p_{2}$ , and this is equivalent to maximize $lo g p_{1} = μ (δ - (1 + δ) lo g (1 + δ))$ and $lo g p_{2} = μ (- δ - (1 - δ) lo g (1 - δ))$ .

here.

$\frac{x}{1 + \frac{x}{2}} \leq lo g (1 + x) \leq \frac{x + \frac{x ^{2}}{2}}{1 + x}$
And
$lo g (1 - x) \geq \frac{\frac{x ^{2}}{2} - x}{1 - x}$

Since $lo g (1 + x) \geq \frac{x}{1 + x /2}$ , then
$lo g p_{1} (δ \leq 1) \leq μ (δ - \frac{( 1 + δ ) \cdot δ}{1 + δ /2}) = δ μ (1 - \frac{( 1 + δ )}{1 + δ /2}) = δ μ (- \frac{δ /2}{1 + δ /2}) \leq δ μ (- \frac{δ /2}{3/2}) = - \frac{μ δ ^{2}}{3}$
Since $lo g (1 - x) \geq \frac{x ^{2} /2 - x}{1 - x}$ , then
$lo g p_{2} \leq μ (- δ - \frac{( 1 - δ ) \cdot ( δ ^{2} /2 - δ )}{1 - δ}) \leq μ (- δ - δ^{2} /2 + δ) = - \frac{μ δ ^{2}}{2}$
Finally
$P {∣ S_{N} - μ ∣ \geq δ μ} \leq p_{1} + p_{2} \leq e^{- μ δ^{2} /3} + e^{- μ δ^{2} /2} \leq 2 e^{- μ δ^{2} /3} □$

Chernoff’s inequality: Additive form

Let $X_{i}$ be $N$ independent Bernoulli random variables with parameters $p_{i}$ . Let $S_{N} = X_{1} + \dots + X_{N} \in [a, b]$ , with mean $μ = E S_{N}$ .

Then, for every $ε > 0$ we have

P {S_{N} \geq μ + ε} = P {S_{N} \leq μ - ε} \leq e^{- 2 ε^{2} / (N (b - a)^{2})}

In the special case where $X_{i} \in {0, 1}$ , i.e. $a = 0$ and $b = 1$ , we have

P {S_{N} \geq μ + ε} = P {S_{N} \leq μ - ε} \leq e^{- 2 ε^{2} / N}

Hoeffding’s inequality

Let $X_{1}, ..., X_{N}$ be independent symmetric Bernoulli random variables, i.e., random binary random variables that assumes value $1$ or $- 1$ with probability $1/2$ . Let $a = (a_{1}, \dots, a_{N}) \in R^{N}$ . Then, for every $t \geq 0$ , we have

P {i = 1 \sum N a_{i} X_{i} \geq t} \leq exp (- \frac{t ^{2}}{2∥ a ∥ _{2}^{2}}) .

Proof

We start multiplying both sides of the inequality by a value $λ > 0$ and and exponentiate them (as for the proof of Chebyshev’s inequality), and then we apply Markov’s inequality.
$P {i = 1 \sum N a_{i} X_{i} \geq t} = P {exp (λ i = 1 \sum N a_{i} X_{i}) \geq exp (λ t)} \leq \frac{E exp ( λ \sum _{i = 1}^{N} a _{i} X _{i} )}{e ^{λ t}} = e^{- λ t} E i = 1 \prod N e^{λ a_{i} X_{i}} = e^{- λ t} i = 1 \prod N E e^{λ a_{i} X_{i}}$
where the last inequality holds by the indipendence of the random variables.

For every $i$ , since $X_{i} \in {- 1, 1}$ , we have that $E e^{λ a_{i} X_{i}} = \frac{e ^{λ a_{i}} + e ^{- λ a_{i}}}{2} = cosh (λ a_{i})$ (see hyperbolic functions).

In can be prooved (see this) that $cosh (x) \leq e^{x^{2} /2}$ , for every $x \in R$ . Therefore,
$i = 1 \prod N E e^{λ a_{i} X_{i}} \leq i = 1 \prod N e^{λ^{2} a_{i}^{2} /2} = exp (\frac{λ ^{2}}{2} i = 1 \sum N a_{i}^{2}) = exp (\frac{λ ^{2}}{2} ∥ a ∥_{2}^{2})$
By combining the previous inequality we have
$P {i = 1 \sum N a_{i} X_{i} \geq t} \leq exp (- λ t + \frac{λ ^{2}}{2} ∥ a ∥_{2}^{2})$
for every $λ \geq 0$ .

Since the $exp$ function is convex, we can minimize it simply by finding the value of $λ$ such that the derivative is $0$ (thus providing a better upper bound). The derivative is $- t + λ ∥ a ∥_{2}^{2}$ , and it’s equal to $0$ when $λ = t /∥ a ∥_{2}^{2}$ . Therefore,
$P {i = 1 \sum N a_{i} X_{i} \geq t} \leq exp (- \frac{t ^{2}}{∥ a ∥ _{2}^{2}} + \frac{1}{2} \frac{t ^{2}}{∥ a ∥ _{2}^{2}}) = exp (- \frac{t ^{2}}{2∥ a ∥ _{2}^{2}}) . □$

Hoeffding’s inequality, two-sided

P {i = 1 \sum N a_{i} X_{i} \geq t} \leq 2 exp (- \frac{t ^{2}}{2∥ a ∥ _{2}^{2}}) .

Proof

Let $S_{N} = \sum_{i = 1}^{N} a_{i} X_{i}$ . We can simply apply the Hoeffding’s inequality for the variables $- X_{i}$ instead of $X_{i}$ , and obtain the same bound for $P {- S_{N} \geq t}$ . Then
$P {∣ S_{N} ∣ \geq t} = P {S_{N} \geq t} + P {- S_{N} \geq t} . □$

Alessandro Straziota 🌱

Explorer

Concentration Inequalities

Markov’s inequality

Chebyshev’s inequality

Chernoff’s inequality

Chernoff’s inequality: small deviations

Chernoff’s inequality: Additive form

Hoeffding’s inequality

Hoeffding’s inequality, two-sided

Graph View

Table of Contents

Backlinks