Via Slashdot, this year marks the 60th anniversary of the venerable Hamming code. Arriving just two years after Shannon's seminal paper, the Hamming codes were the first nontrivial error-correcting code, and Hamming's contribution to the theory of error correcting codes (using metrics which are now known as Hamming distance and Hamming weight) are still taught today.

Here (pdf) is Hamming's original paper (from BSTJ), and here is my Hamming code lecture from last spring's information theory course:

## Thursday, November 25, 2010

## Friday, November 12, 2010

### Ph.D. position available (Update: No longer available)

*Update August 16, 2011:*This particular position is no longer available.

## Wednesday, November 10, 2010

### Capacity in the presence of intractability

Shannon's channel capacity, C = max_p(x) I(X;Y), gives the highest rate at which information can be reliably transmitted over a noisy channel. But the simplicity of the equation is deceptive: there are very few channels for which capacity is known. Normally, the difficulty is in optimizing over all input distributions p(x), but what if I(X;Y) is itself intractable?

I(X;Y) can be hard to calculate if f(y) or f(y|x) are hard. The former can happen in intersymbol interference channels with long memories, especially under a binary (or discrete) input constraint. The latter can happen in timing channels, when x is a vector of inputs, and the output is of the form y = sort(x+n), where n is the vector of timing noise. In this case, calculating f(y|x) requires the calculation of a matrix permanent, a classic intractable problem for vectors of practical length. (See also: the Bapat-Beg theorem.)

However, there is an elegant trick you can use to find a lower bound on I(X;Y), as long as you can (tractably) generate random variables x and y according to their true distributions f(x) and f(y|x). Let g(y|x) be any valid conditional distribution of y given x. Further, let

Then

where expectation is always taken with respect to the

The proof is straightforward. First let g(x|y) = g(y|x)f(x)/g(y). Then:

So the accuracy of the bound is equal to the relative entropy between the true f(x|y) and the approximate g(x|y).

This result has been around for a while, but a nice recent application is here:

D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavcic, and W. Zeng, “Simulation-based computation of information rates for channels with memory,”

I(X;Y) can be hard to calculate if f(y) or f(y|x) are hard. The former can happen in intersymbol interference channels with long memories, especially under a binary (or discrete) input constraint. The latter can happen in timing channels, when x is a vector of inputs, and the output is of the form y = sort(x+n), where n is the vector of timing noise. In this case, calculating f(y|x) requires the calculation of a matrix permanent, a classic intractable problem for vectors of practical length. (See also: the Bapat-Beg theorem.)

However, there is an elegant trick you can use to find a lower bound on I(X;Y), as long as you can (tractably) generate random variables x and y according to their true distributions f(x) and f(y|x). Let g(y|x) be any valid conditional distribution of y given x. Further, let

Then

where expectation is always taken with respect to the

*true*distribution. That is, if you can pick a tractable approximate distribution g(y|x), and you can simulate both x and y, then you can perform a*Monte Carlo*expectation on log g(y|x)/g(y), and get a lower bound on I(X;Y). You can then optimize with respect to the bound to find a lower bound on capacity.The proof is straightforward. First let g(x|y) = g(y|x)f(x)/g(y). Then:

So the accuracy of the bound is equal to the relative entropy between the true f(x|y) and the approximate g(x|y).

This result has been around for a while, but a nice recent application is here:

D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavcic, and W. Zeng, “Simulation-based computation of information rates for channels with memory,”

*IEEE Trans. Inform. Theory*, vol. 52, pp. 3498–3508, Aug. 2006.## Tuesday, November 2, 2010

### List of blogs in machine learning

Here is a thread with a huge list of good machine learning blogs.

The most interesting discovery from the list: Radford Neal, the co-re-discoverer of LDPC codes, has a blog. I took his graphical inference course when I was a Ph.D. student at U of T.

The most interesting discovery from the list: Radford Neal, the co-re-discoverer of LDPC codes, has a blog. I took his graphical inference course when I was a Ph.D. student at U of T.

Subscribe to:
Posts (Atom)