Tuesday, January 18, 2011

Taste of AVX

One can use Intel's Software Development Emulator to execute programs utilizing unsupported ISA extensions such as Advanced Vector Extensions. This tool, based on Pin, instruments binaries and emulates instructions not supported in hardware. It also provides detailed instruction opcode histograms so you can, for instance, figure out whether some video game or library was compiled with SSE support.

Here's an example x86-64 function implementing element-wise multiply-accumulate of two single-precision floating-point vectors of arbitrary length.


; void vectorFMA(float *R, float *A, float *B, int N)
;
; R[i] += A[i] * B[i] for i = 0 .. N-1
;
; uses AVX to multiply two vectors, elementwise, and add
; results to a third vector
;
; rdi - R
; rsi - A
; rdx - B
; rcx - N
;
vectorFMA:

.L1:
; load words from R, A, and B
vmovups ymm1, [rdi]
vmovups ymm2, [rsi]
vmovups ymm3, [rdx]

; *R += *A * *B
vmulps ymm4, ymm2, ymm3
vaddps ymm5, ymm1, ymm4
vmovups [rdi], ymm5

; R+=8, A+=8, B+=8
add rdi, 32
add rsi, 32
add rdx, 32

; if (--rcx == 0) break;
dec rcx
jnz .L1
.L5:
ret


When our SandyBridge machine arrives next week, I should have a handful of interesting microbenchmarks to run on it.

No comments: