Monday, July 21, 2008

Golub and Van Loan wear the clock so you can know the time

Just thought I'd share the magic.


U =
[
0.3594 -0.5577 0.5229 -0.4077 0.3128 0.1494 ;
0.4158 -0.4082 0.0545 0.4089 -0.5689 -0.4082 ;
0.4449 -0.1494 -0.4741 0.4072 0.2794 0.5577 ;
0.4449 0.1494 -0.4729 -0.4093 0.2786 -0.5577 ;
0.4158 0.4083 0.0558 -0.4076 -0.5697 0.4082 ;
0.3594 0.5577 0.5216 0.4088 0.3135 -0.1494 ;
];

B =
[
-30.5509 -0.0006 -0.0000 -0.0000 -0.0000 0.0000 ;
-0.0000 -7.4641 0.0001 -0.0000 -0.0000 0.0000 ;
-0.0000 -0.0000 1.7912 0.0035 -0.0000 0.0000 ;
0.0000 -0.0000 0.0000 1.0000 0.0008 0.0000 ;
-0.0000 0.0000 0.0000 -0.0000 0.6579 0.0000 ;
-0.0000 0.0000 0.0000 -0.0000 0.0000 -0.5359 ;
];

V' =
[
-0.3594 -0.4158 -0.4449 -0.4449 -0.4158 -0.3594 ;
0.5577 0.4082 0.1494 -0.1494 -0.4083 -0.5577 ;
0.5234 0.0540 -0.4746 -0.4724 0.0563 0.5211 ;
-0.4072 0.4092 0.4065 -0.4100 -0.4073 0.4093 ;
0.3126 -0.5687 0.2796 0.2784 -0.5699 0.3137 ;
-0.1494 0.4082 -0.5577 0.5577 -0.4082 0.1494 ;
];


U B V' =
[
7.0000 6.0000 5.0000 4.0000 3.0000 2.0000 ;
6.0000 7.0000 6.0000 5.0000 4.0000 3.0000 ;
5.0000 6.0000 7.0000 6.0000 5.0000 4.0000 ;
4.0000 5.0000 6.0000 7.0000 6.0000 5.0000 ;
3.0000 4.0000 5.0000 6.0000 7.0000 6.0000 ;
2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 ;
];
U and V' are unitary.


Working on making SVD converge as fast as possible and to minimize storage requirements. Right now, it holds a whole copy of the tridiagonal matrix B. Storing a lot of elements known to be zero is a bit wasteful, if you believe that information theory voodoo.

Yes, I realize the output B is not *quite* diagonal, but that's with four iterations of the diagonalization process (two Givens rotations per row) and no explicit zeroing of elements.

Problems to solve:
* minimize storage requirements for B.
* figure out whether it is possible to store U and V in factored Householder forms and accumulate the Givens rotations some other way (as opposed to forming U and V explicitly then rotating them to diagonalize as I do now).
* complex Householder reflections

2 comments:

sstc said...

heh.

sparse matrices are a crock.

You still have to store a list of indexes of where they are. I suppose it takes less space to store an int vs a float.

You also have to take the time to know if it is zero or not.

Then again, I know nothing about the inner workings of the sparse magic. But in my use of it, any gains that you got in using sparse matrices for calculations was lost when you also factored in the cost of setting up the sparse matrix and the cost of making it unsparse on the other side.

In your case, you know its diagonal, but don't you need to store the off diagonal non-zero elements while you are trying to converge?

Question. Are you talking about storage during calculation, or post calculation? GPUs have teh 2gb memories now. but then again, that 2gb of memory is teh slow compared to in register file.

Spatchcock said...

The sequence of the SVD algorithm is:

* QR factorization for convenience
* bidiagonalization
* diagonalization process

The diagonalization process receives the bidiagonal matrix B and attempts to use orthogonal transformations to diagonalize it. U and V may or may not be required; if they are, the diagonalization must transform those also.

During this process, B is rotated by Givens transformations. It is never more than a banded matrix (tridiagonal, though only one element in the subdiagonal may be nonzero at any point in the process).

This is a very special case of sparse matrices, and it arises internally. The input matrix is assumed to be dense, and VSIPL includes no real support for sparse matrices. When matrix structure is assumed, dense matrices or vectors are still used.

On my machine:
sizeof(int) = 4
sizeof(float) = 4

What's to be gained by storing the banded representation explicitly as two vectors rather than a large sparse banded matrix?

Surely memory is cheap, and any system performing something as complex as SVD on a large matrix is sure to have plenty of RAM. The savings, I would imagine, is the ability to fit in the cache. Adjacent elements in the diagonal of a large matrix may or may not appear in the same cache line. I think this is what you were referring to.

Since I want to publish this to the VSIPL Forum, I figure I need to make this savings at the very least.

The biggest thing I was interested in, though, is representation of U and V. VSIPL doesn't expose those directly, but given an SVD workspace and the input matrix, you may multiply by U or V. If I store them explicitly, the matrix product U*C is an O(N^3) operation (okay, perhaps you can do *slightly* better than that but I'm not really interested).

If U and V are in a factored form (of Householder vectors), you can compute U*C with a cheaper O(N^2) procedure.

Things are well and good for bidiagonalizing B, but to diagonalize it requires a potentially unbounded sequence of Givens rotations. I don't see a robust way of storing those instead of the explicitly transformed U and V... nothing that is efficient and preferable to a straight-up matrix product.

That's just how things go.