The trace of a square matrix is the sum of the elements on the main diagonal. That is, for an n by n square matrix A, the trace of A is

This might not seem too exciting at first. However, the trace operator has a neat quasi-commutative property: for matrices U and V, so long as the internal dimensions work out, it is true that

The proof isn't too hard so I'll skip it. If we had a third matrix W (again assuming the internal dimensions work out), since matrix multiplication is associative, it is also true that

It's not truly commutative, since you can only do cyclic shifts of the arguments. So, e.g., tr(UVW) is not equal to tr(WVU) in general.

What can you do with this? For one thing, note that the trace of a scalar a is itself: tr(a) = a. So if you have a matrix multiplication that results in a scalar, you can use trace to rearrange the arguments.

For instance, let U be a 1 by n row vector, and let V be an n by n matrix. If U' is the transpose of U, then UVU' is a scalar. This kind of expression comes up pretty often in jointly Gaussian distributions.

Now say U is a zero-mean vector with covariance matrix E[U'U], and I want to know E[UVU']. Using the trace trick, I can express this expectation in terms of E[U'U]: first, we can write

and since expectation distributes over the trace sum, we have

As a result, if you know the covariance E[U'U], there's no need to recalculate any expectations.

## 5 comments:

thx, it helpful

thanks, i've been looking for these properties for a long time

thank you, very well explained, you just saved me a lot of pain.

Wonderful post. I have just encountered exactly this type of manipulation in the derivation of the Akaike Information Criterion. Thanks.

Thanks, I was looking for the expectation - trace thing...

Post a Comment