Markov’s inequality

For any non-negative random variable and , we have

Proof

Fix any . We can represent any real number as

Then we have


Chebyshev’s inequality

Let any random variable. Then, for every we have

Proof

By using Markov’s inequality on the non-negative random variable we have


Chernoff’s inequality

Let be independent Bernoulli random variables with parameters . Let , with mean .

Then, for every value , we have

Proof

We use the same approach as for the Chernoff’s inequality, passing through the Markov’s inequality.

where last inequality holds since for every .

Therefore

Since this function is convex, we can minimize the bound finding the value of s.t. the first derivative becomes . By monotonicity, it holds when the first derivative of the is , i.e., when

Finally

In a similar way, we can prove the lower tail bound:

Chernoff’s inequality: small deviations

Let be independent Bernoulli random variables with parameters . Let , with mean .

Then, for every we have

where is an absolute constant.

Proof

We can re-write the probability

We want now to maximize and , and this is equivalent to maximize and .

And

Since , then

Since , then

Finally

Chernoff’s inequality: Additive form

Let be independent Bernoulli random variables with parameters . Let , with mean .

Then, for every we have

In the special case where , i.e. and , we have


Hoeffding’s inequality

Let be independent symmetric Bernoulli random variables, i.e., random binary random variables that assumes value or with probability . Let . Then, for every , we have

Proof

We start multiplying both sides of the inequality by a value and and exponentiate them (as for the proof of Chebyshev’s inequality), and then we apply Markov’s inequality.

where the last inequality holds by the indipendence of the random variables.

For every , since , we have that (see hyperbolic functions).

In can be prooved (see this) that , for every . Therefore,

By combining the previous inequality we have

for every .

Since the function is convex, we can minimize it simply by finding the value of such that the derivative is (thus providing a better upper bound). The derivative is , and it’s equal to when . Therefore,

Hoeffding’s inequality, two-sided

Let be independent symmetric Bernoulli random variables, i.e., random binary random variables that assumes value or with probability . Let . Then, for every , we have

Proof

Let . We can simply apply the Hoeffding’s inequality for the variables instead of , and obtain the same bound for . Then