
Happy Thanksgiving!
N = 1000
Average nodes visited: 4.68
log_3(N) = 6.28771 (worst)
log_6(N) = 3.85529 (best)
tree depth = 5
N = 25000
Average nodes visited: 7.68968
log_3(N) = 9.21766 (worst)
log_6(N) = 5.65178 (best)
depth = 8
.
.
{
p1 = p1 * a;
p2 = p2 * b;
p3 = p3 * c;
p4 = p4 * d;
}
.
.
.
.
%r1500 = fmul float %r1496, %r24 ; compute %1500
%r1501 = fmul float %r1497, %r23
%r1502 = fmul float %r1498, %r22
%r1503 = fmul float %r1499, %r21
%r1504 = fmul float %r1500, %r24 ; first use of %1500
%r1505 = fmul float %r1501, %r23
%r1506 = fmul float %r1502, %r22
%r1507 = fmul float %r1503, %r21
%r1508 = fmul float %r1504, %r24 ; first use of %1504
.
.
p1 = p1 * a
p1 = p1 * a
.
.
p2 = p2 * b
p2 = p2 * b
.
.
p3 = p3 * c
p3 = p3 * c
.
.
mulss %xmm8, %xmm10
mulss %xmm8, %xmm10
.
. repeated 512 times
.
mulss %xmm7, %xmm9
mulss %xmm7, %xmm9
.
. repeated 512 times
.
mulss %xmm6, %xmm3
mulss %xmm6, %xmm3
.
. repeated 512 times
.
1 threads 0.648891 GFLOP/s
2 threads 1.489049 GFLOP/s
3 threads 2.209838 GFLOP/s
4 threads 2.940443 GFLOP/s
.
.
mulss %xmm8, %xmm10
mulss %xmm7, %xmm9
mulss %xmm6, %xmm3
mulss %xmm5, %xmm11
mulss %xmm8, %xmm10
mulss %xmm7, %xmm9
mulss %xmm6, %xmm3
mulss %xmm5, %xmm11
mulss %xmm8, %xmm10
mulss %xmm7, %xmm9
.
.
1 threads 2.067118 GFLOP/s
2 threads 5.569419 GFLOP/s
3 threads 8.285519 GFLOP/s
4 threads 10.81742 GFLOP/s
Vectorized - No instruction interleaving - back-to-back dependencies
1 threads 1.540621 GFLOP/s
2 threads 5.900833 GFLOP/s
3 threads 8.755953 GFLOP/s
4 threads 11.257122 GFLOP/s
Vectorized - Interleaved - stride-4 reuse distance
1 threads 3.157255 GFLOP/s
2 threads 22.104369 GFLOP/s
3 threads 32.300111 GFLOP/s
4 threads 39.112162 GFLOP/s