I(X;Y) can be hard to calculate if f(y) or f(y|x) are hard. The former can happen in intersymbol interference channels with long memories, especially under a binary (or discrete) input constraint. The latter can happen in timing channels, when x is a vector of inputs, and the output is of the form y = sort(x+n), where n is the vector of timing noise. In this case, calculating f(y|x) requires the calculation of a matrix permanent, a classic intractable problem for vectors of practical length. (See also: the Bapat-Beg theorem.)
However, there is an elegant trick you can use to find a lower bound on I(X;Y), as long as you can (tractably) generate random variables x and y according to their true distributions f(x) and f(y|x). Let g(y|x) be any valid conditional distribution of y given x. Further, let
where expectation is always taken with respect to the true distribution. That is, if you can pick a tractable approximate distribution g(y|x), and you can simulate both x and y, then you can perform a Monte Carlo expectation on log g(y|x)/g(y), and get a lower bound on I(X;Y). You can then optimize with respect to the bound to find a lower bound on capacity.
The proof is straightforward. First let g(x|y) = g(y|x)f(x)/g(y). Then:
So the accuracy of the bound is equal to the relative entropy between the true f(x|y) and the approximate g(x|y).
This result has been around for a while, but a nice recent application is here:
D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavcic, and W. Zeng, “Simulation-based computation of information rates for channels with memory,” IEEE Trans. Inform. Theory, vol. 52, pp. 3498–3508, Aug. 2006.