Jack Keil Wolf, one of the giants of information theory, passed away last month. His obituary was carried in the New York Times.
Wolf's signature contribution was in distributed data compression. Say you're trying to retrieve two files, X and Y, from separate servers on a network. One strategy would be to compress both files individually to their entropy, H(X) and H(Y), and download them; this requires H(X)+H(Y) bits. This is as good as you can do if X and Y are statistically independent. However, if they are dependent, compressing X and Y together can be done at the joint entropy H(X,Y), which is less than H(X)+H(Y). But what can be done by the two servers, since neither has access to both X and Y?
Along with David Slepian, Wolf proved the Slepian-Wolf theorem[1]: as long as certain conditions are met, files X and Y can be compressed to H(X,Y), even if the X server has no knowledge of file Y, and vice versa. One way to do this is to have the X server just compress and transmit X at a rate of H(X), and have the Y server form the "difference signal" and transmit it at a rate of H(Y|X), for a total rate of H(X)+H(Y|X)=H(X,Y), called a "corner point" solution.
How does the Y server form a difference signal if it doesn't know X? The server groups all possible Y values into "bins", as follows: if X is known, then only one member of the bin can possibly be equal to Y. The server then sends some string or number which identifies the bin. Assuming the strategy is agreed upon in advance, the receiver sees H(X) and decodes X, then it sees the bin identifier from the Y server, and picks Y as the unique member of the bin corresponding to X.
For instance, suppose X and Y are temperature readings, and suppose we know from past experience that either Y=X, or Y=X+1 (because the sensors are not too far apart, say). Now, suppose X is even. Then if Y is even, it must be true that Y=X, and if Y is odd, it must be true that Y=X+1; and vice versa if X is odd. So if we know X, and we know whether Y is even or odd, then we know Y. Our strategy is then the following: server X encodes the temperature independently, and server Y sends one bit: 0 if Y is even, and 1 if Y is odd. Since a 2-digit temperature would normally require 7 or 8 bits to encode, this is a huge savings of data compared to encoding X and Y independently.
Years ago, I did a bit of work on Slepian-Wolf, when the big idea was to use the syndromes of a powerful error-correcting code (e.g., an LDPC code) to form the difference signal [2,3], or to form more general balances away from the "corner point" (what I did in [4]). I haven't worked in the field in years, but as far as I know, nonbinary Slepian-Wolf compression is still a major open problem, since LDPC codes have a hard time handling it.
[1] D. Slepian and J. K. Wolf, "Noiseless coding of correlated information sources," IEEE Trans. Inform. Theory, vol. 19, pp. 471-480, 1973.
[2] S. S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes (DISCUS): Design and construction,” IEEE Trans. Inform. Theory, vol. 49, no. 3, pp. 626-643, Mar. 2003.
[3] D. Schonberg, K. Ramchandran, and S. S. Pradhan, “LDPC codes can approach the Slepian Wolf bound for general binary sources,” Proc. 40th Allerton Conf. on Commun., Control, and Computing, Monticello, IL, 2002.
[4] A. W. Eckford and W. Yu, "Density evolution for the simultaneous decoding of LDPC-based Slepian-Wolf source codes," in Proc. IEEE International Symposium on Information Theory (ISIT), Adelaide, Australia, pp. 1401-1405, 2005. [PDF]
Wolf's signature contribution was in distributed data compression. Say you're trying to retrieve two files, X and Y, from separate servers on a network. One strategy would be to compress both files individually to their entropy, H(X) and H(Y), and download them; this requires H(X)+H(Y) bits. This is as good as you can do if X and Y are statistically independent. However, if they are dependent, compressing X and Y together can be done at the joint entropy H(X,Y), which is less than H(X)+H(Y). But what can be done by the two servers, since neither has access to both X and Y?
Along with David Slepian, Wolf proved the Slepian-Wolf theorem[1]: as long as certain conditions are met, files X and Y can be compressed to H(X,Y), even if the X server has no knowledge of file Y, and vice versa. One way to do this is to have the X server just compress and transmit X at a rate of H(X), and have the Y server form the "difference signal" and transmit it at a rate of H(Y|X), for a total rate of H(X)+H(Y|X)=H(X,Y), called a "corner point" solution.
How does the Y server form a difference signal if it doesn't know X? The server groups all possible Y values into "bins", as follows: if X is known, then only one member of the bin can possibly be equal to Y. The server then sends some string or number which identifies the bin. Assuming the strategy is agreed upon in advance, the receiver sees H(X) and decodes X, then it sees the bin identifier from the Y server, and picks Y as the unique member of the bin corresponding to X.
For instance, suppose X and Y are temperature readings, and suppose we know from past experience that either Y=X, or Y=X+1 (because the sensors are not too far apart, say). Now, suppose X is even. Then if Y is even, it must be true that Y=X, and if Y is odd, it must be true that Y=X+1; and vice versa if X is odd. So if we know X, and we know whether Y is even or odd, then we know Y. Our strategy is then the following: server X encodes the temperature independently, and server Y sends one bit: 0 if Y is even, and 1 if Y is odd. Since a 2-digit temperature would normally require 7 or 8 bits to encode, this is a huge savings of data compared to encoding X and Y independently.
Years ago, I did a bit of work on Slepian-Wolf, when the big idea was to use the syndromes of a powerful error-correcting code (e.g., an LDPC code) to form the difference signal [2,3], or to form more general balances away from the "corner point" (what I did in [4]). I haven't worked in the field in years, but as far as I know, nonbinary Slepian-Wolf compression is still a major open problem, since LDPC codes have a hard time handling it.
[1] D. Slepian and J. K. Wolf, "Noiseless coding of correlated information sources," IEEE Trans. Inform. Theory, vol. 19, pp. 471-480, 1973.
[2] S. S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes (DISCUS): Design and construction,” IEEE Trans. Inform. Theory, vol. 49, no. 3, pp. 626-643, Mar. 2003.
[3] D. Schonberg, K. Ramchandran, and S. S. Pradhan, “LDPC codes can approach the Slepian Wolf bound for general binary sources,” Proc. 40th Allerton Conf. on Commun., Control, and Computing, Monticello, IL, 2002.
[4] A. W. Eckford and W. Yu, "Density evolution for the simultaneous decoding of LDPC-based Slepian-Wolf source codes," in Proc. IEEE International Symposium on Information Theory (ISIT), Adelaide, Australia, pp. 1401-1405, 2005. [PDF]
1 comment:
Thanks. The temperature example is a good way to think about binning.
Post a Comment