Saturday, November 22, 2008

News from the GPU VSIPL Front

We're planning to undergo talks for licensing GPU VSIPL. Presently, we only distribute the binary and insist it not be redistributed. This lets early adopters test it out but minimizes the scope of what we're committed to supporting. When the implementation is stable and complete, we're hoping to make it available in some additional capacity. Ideally, we would partner with someone and include it in a larger VSIPL distribution.

Additionally, I'm responsible for implementing a sample application to publish at CUDA Zone. I need a team of undergraduates to finish the Test Suite for me.

Friday, November 21, 2008

Wii on PS3

Someone should implement a Wii virtual machine for PS3.

This should include a hardware receiver (and drivers) for the Wii controllers. GameCube compatibility could be accomplished by translating the inputs from the PS3 controllers into what appear to be GameCube controller signals.

The Cell B.E. processor includes a 64-bit "Power Processing Unit." The Wii's CPU is also a POWER implementation. They both use GPUs developed from the same technology that goes into PC GPUs, so I would expect the PS3 to implement whatever primitive shader model is offered in the Wii. Moreover, I would expect the performance advantage of the PS3 to be sufficient to emulate any hardware-provided functionality in the Wii (use the idle SPUs for dynamic translation!).

Nintendo could continue to sell Wii games, and they'd penetrate formerly PS3-only households. I'd wager the technical obstacles to this idea aren't insurmountable.

Thursday, November 20, 2008

Cells and PS3

Guess the PlayStation3 was simply ahead of its time.

We're in the process of building a system that, if talks go well, will have several high-end (9800GX2) GPUs, one Cell processor, and one quad-core x86. I have no plans for tossing in a Blu-Ray player though that would be amusing.

Wednesday, November 19, 2008

GPU VSIPL

GPU VSIPL is getting some press:

http://www.marketwatch.com/news/story/Mercury-Computer-Systems-Unveils-Multi/story.aspx?guid={A3EA2E33-A633-4021-A6C5-CFB83FBAC70B}

Here is a link to the main GPU VSIPL page, where we claim speedup of two orders of magnitude for applications well-suited to GPUs without ever writing any GPU code.

http://gpu-vsipl.gtri.gatech.edu

We just released a new version on Sunday with a number of enhancements from VSIPL Core, though we do not yet claim VSIPL Core compliance so I will not enumerate them yet. That will come by Christmas.

Monday, November 17, 2008

Sunday, November 9, 2008

SSE3 and Manhattans

Now that the semester's coding for the PTX to SPU translator is complete, I spent the weekend researching some areas that I've been thinking about but haven't had much time to investigate. So, it was a weekend spent mostly* hacking.

1.)
Streaming SIMD Extensions are a set of instructions added to the x86 instruction set beginning with the Pentium II. These instructions apply the same operator, typically floating-point {*,+, -, /}, to corresponding elements of 128-bit 4-element vector registers. Since they are parallel, you can typically perform more operations in a given number of clock cycles than with scalar floating-point code.

SSE2 and SSE3 are revisions that have added additional instructions as programmers demanded them. If you have a Pentium 4 Prescott or better, you have SSE3. If you have a 2.2 GHz P4 Northwood, you only support SSE2, and you miss out on the faster-better-cheaper possibilities concomitant with SSE3.

I spent a few hours updating my hand-rolled matrix class with compiler instrinsics (statements with the semantics of C functions but direct correspondence to CPU instructions; portable too), and I achieved 2.5x speedup for matrix multiply. SSE3 provides support for horizontal operations - operators apply to elements within the same 128-bit register. This permits the implementation of dot products and complex arithmetic without shuffle instructions and makes the code a lot faster. If your CPU doesn't support SSE3, you should probably build a new system (and use the existing system as a dedicated build machine).

2.)
CUDA is interoperable with OpenGL and Direct3D9. I spent a few hours tonight writing a quick DirectX application that renders a textured quad then performs post processing (separable 2D convolution) with a CUDA kernel. The immediate application of this would be to produce efficient visualizations for GPU-based simulations. Other ideas are to perform 3D rendering with DirectX and post-processing image-space operations with CUDA though Cg/HLSL is still probably the right way to implement that.

Also, the fragmented nature of OpenGL distributions across versions and driver providers made it more of a debugging hassle to get working than CUDA-DirectX interoperability.

3.)
Identified the need for a new power supply. Apparently, a GeForce GTX 280 has been purchased for me. I'd like to use it along with the GeForce 9800 GX2 giving me a grand total of 3 GPUs and 2 GB of GDDR3 memory. I'm working on ways to leverage all three at once, so this isn't a fool's errand. Unfortunately, my power supply cannot source enough current on enough lines to power both cards. During Christmas break, I'll make the transition.

*
During a trip to Harry and Sons, I decided to modify my usual order of Chicken Larb (Thai salad). I still ordered it, but I augmented it with a Manhattan. For those of you who don't know, a Manhattan is a cocktail of whiskey and sweet vermouth. Typically, I avoid cocktails because (1) I'd only really had bad examples and (2) cola + {rum, whiskey} is difficult to beat. The stigma of cocktails being girly drinks may have originated during my freshman year's first experiences with alcohol; vodka, grenadine, and orange juice are simply not something I'll ever combine again.

A well-made Manhattan, on the other hand, is quite strong yet simple enough to order from a busy bartender. In terms of flavor, it is quite divine. The vermouth dulls all of the whiskey's edge leaving only the wonderful caramel flavoring. Typically, it's made with 3-4 shots of the principle, so one drink takes you a long way toward inebriation while looking classy the entire time. It's my new official drink.


CUDA, DirectX9, SSE3, and the Manhattan cocktail: all for the win!

Saturday, November 8, 2008

PTX to SPU Translator

Our semester project for Dynamic Compilation and Managed Runtimes is a code translator from NVIDIA's PTX virtual assembly language to IBM Cell SPU. Last night, we just demonstrated it by translating a kernel that computes the complex Givens rotation of a pair of values. The resulting Cell SPU assembly source was linked with a runtime platform we developed, and I executed it on the Cell processor of a PlayStation3 in PaSTEC, my favorite cluster.

We'll be writing a paper due in December that I'll post. Additionally, we'll continue it through next spring. Exciting.

Monday, November 3, 2008

Congratulations, Joe and Sara!

Joe and Sara are now married in an exceedingly elegant wedding ceremony and reception.

First post.