Thursday, November 25, 2010

Sixty years of Hamming codes

Via Slashdot, this year marks the 60th anniversary of the venerable Hamming code.  Arriving just two years after Shannon's seminal paper, the Hamming codes were the first nontrivial error-correcting code, and Hamming's contribution to the theory of error correcting codes (using metrics which are now known as Hamming distance and Hamming weight) are still taught today.

Here (pdf) is Hamming's original paper (from BSTJ), and here is my Hamming code lecture from last spring's information theory course:

Friday, November 12, 2010

Wednesday, November 10, 2010

Capacity in the presence of intractability

Shannon's channel capacity, C = max_p(x) I(X;Y), gives the highest rate at which information can be reliably transmitted over a noisy channel.  But the simplicity of the equation is deceptive: there are very few channels for which capacity is known.  Normally, the difficulty is in optimizing over all input distributions p(x), but what if I(X;Y) is itself intractable?

I(X;Y) can be hard to calculate if f(y) or f(y|x) are hard.  The former can happen in intersymbol interference channels with long memories, especially under a binary (or discrete) input constraint.  The latter can happen in timing channels, when x is a vector of inputs, and the output is of the form y = sort(x+n), where n is the vector of timing noise. In this case, calculating f(y|x) requires the calculation of a matrix permanent, a classic intractable problem for vectors of practical length. (See also: the Bapat-Beg theorem.)

However, there is an elegant trick you can use to find a lower bound on I(X;Y), as long as you can (tractably) generate random variables x and y according to their true distributions f(x) and f(y|x).  Let g(y|x) be any valid conditional distribution of y given x.  Further, let



Then





where expectation is always taken with respect to the true distribution.  That is, if you can pick a tractable approximate distribution g(y|x), and you can simulate both x and y, then you can perform a Monte Carlo expectation on log g(y|x)/g(y), and get a lower bound on I(X;Y).  You can then optimize with respect to the bound to find a lower bound on capacity.

The proof is straightforward.  First let g(x|y) = g(y|x)f(x)/g(y). Then:









So the accuracy of the bound is equal to the relative entropy between the true f(x|y) and the approximate g(x|y).

This result has been around for a while, but a nice recent application is here:

D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavcic, and W. Zeng, “Simulation-based computation of information rates for channels with memory,” IEEE Trans. Inform. Theory, vol. 52, pp. 3498–3508, Aug. 2006.

Tuesday, November 2, 2010

List of blogs in machine learning

Here is a thread with a huge list of good machine learning blogs.

The most interesting discovery from the list: Radford Neal, the co-re-discoverer of LDPC codes, has a blog. I took his graphical inference course when I was a Ph.D. student at U of T.