# Why can we assume equal a priori probabilities?

Given a state with many particles, if we want to guess about how we think the particles will be distributed, this is another question of how can we be optimally ignorant? From the second law of thermodynamics, from a statistical perspective, the most common macrostate is that with the highest number of equal energy microstates, the state with the highest entropy. But is there any way to “derive” the second law of thermodynamics? Let’s see…

Let the probability of a state $r$ at time $t$ be $p_r(t)$ such that $\sum_r p_r(t) = 1$ for all $t$. A transition to state $s$ then is given by the probability $W_{r\to s}$. Interestingly, the reverse transition must have the same probability $W_{r\to s} = W_{s\to r}$ because in quantum mechanics, the probability can be computed, and the Hamiltonian of the dynamics is Hermitian. The change in the probability over time is then the additional probability of transitions to $r$ and the subtraction of transitions from $r$, i.e. the relationship

$p_r(t+\Delta t) = p_r(t) + \left(\sum_s p_s(t) W_{s \to r} - \sum_s p_r(t) W_{r \to s}\right)\Delta t$

or subtracting through and taking the limit of $\Delta t\to 0$

$\frac{\partial p_r}{\partial t} = \sum_s W_{s \to r}( p_s-p_r )$

using the symmetry of $r$ and $s$ developed above. This equation is referred to as the master equation for the probability of being in state $r$. Now we define a quantity, $\mathcal{H}=\langle \ln p \rangle$, the mean of the natural logarithm of the probability, or the information. Computing the time derivative of the information, we have

\begin{aligned} \frac{\partial \mathcal{H}}{\partial t} &= \frac{\partial}{\partial t} \sum_r p_r \ln p_r \\ &= \sum_r \frac{\partial p_r}{\partial t} \ln p_r + \frac{\partial p_r}{\partial t} \\ &= \sum_r \frac{\partial p_r}{\partial t} (1+\ln p_r) \end{aligned}

now substituting in the master equation, we calculate the change in the entropy over time for the transition from $s\to r$ to be

$\frac{\partial \mathcal{H}}{\partial t} = \sum_r \sum_s W_{sr} \,(p_s-p_r)(\ln p_r +1).$

We examine the addition of the change in entropy for the transition instead from $r\to s$, which is

$\frac{\partial \mathcal{H}}{\partial t} + \frac{\partial \mathcal{H}}{\partial t} = \sum_r \sum_s W_{sr} \,(p_s-p_r)(\ln p_r +1) + \sum_r \sum_s W_{rs} \,(p_r-p_s)(\ln p_s +1)$

because the entropy does not depend on the transitions. Thus we can factor

$2\frac{\partial \mathcal{H}}{\partial t} = \sum_r \sum_s W_{sr} \, \left(p_s\ln p_r - p_r \ln p_r + p_s -p_r + p_r \ln p_s - p_s \ln p_s + p_r -p_s\right)$

where all the probabilities cancel and we are left with

$2\frac{\partial \mathcal{H}}{\partial t} = \sum_r \sum_s W_{sr} \, \left(p_s\ln p_r - p_r \ln p_r + p_r \ln p_s - p_s \ln p_s\right)$

so examining the term in braces we have

$p_s(\ln p_r - \ln p_s) - p_r (-\ln p_s + \ln p_r) = (p_s-p_r)(\ln p_r - \ln p_s)$

or finally the whole expression for the change in the entropy is,

$\frac{\partial \mathcal{H}}{\partial t} = -\frac{1}{2}\sum_r \sum_s W_{sr} \,(p_r-p_s)(\ln p_r - \ln p_s).$

Because the logarithm is monotonic, if $p_r>p_s$, then $\ln p_r > \ln p_s$. That means when there is a higher chance of being in state $r$, than $s$, the entropy decreases, and the maximum of the entropy only occurs when $p_r=p_s$. This theorem is referred to as the H theorem where H denotes the entropy for Boltzmann.

So have we actually derived the idea that the maximum entropy state is that of equal probability of two substates? Well, no. The time-reversal symmetry is an approximation in that it assumes that the probabilities are always uncorrelated, but as particles interact, they become hopelessly correlated. That is, there is an underlying structure to a microstate that is beyond the white-noise approximation. That the hopelessly correlated state begins to look like a randomized, uncorrelated state, is the idea of the chaos approximation, but precisely, our assumption of time reversal symmetry is inexact, and thus this is only an approximate theorem that appears to work most of the time in practice.