A real life single thread benchmark for Sparc versus Intel
At first special thanks to all Oracle team (Oracle Turkey,
Martin and Michele) we worked on this benchmark together.
Comments about Benchmarks
Benchmarks are good to give some basic idea about the performance or
capabilities of your system or application etc. For example, if an application
lives a network bottleneck you can test it first with IPERF network benchmark
and if performance is good there, you can then diagnose your application
perhaps a parameter in your app cant use network well. So benchmarks are good
but reality is not benchmark, your reality is always your application.
Dont use GCC for
compiling code on SPARC CPU
we
observed a real-life sample issue. We used GCC to compile the benchmark code.
Then Oracle proved us that GCC did not recognized SPARC CPU modulo(%) command and used its own __umoddi3 software solution, software
solution instead of hardware solution caused really worse performance result.
So, when compiling code on SPARC, use Developer Studio instead of GCC.
Applied Single thread benchmark - Quick Sort
Source
codes are also attached. We used Developer Studio for compiling.
You can examine in detail from attachments. (please examine single-thread-benchmark-QUICK-SORT.zip)
Only results are Shown below.
|
|
|
RUN
1 |
|
|
RUN
2 |
|
|
Lessons Learned
- Dont use GCC for compiling
code on SPARC, use Developer Studio instead.
- Results differ a lot between
2 RUN. So, changing small parameters and making some tuning can differ a lot.
- One of the major differences
between 2 RUN are, code in RUN-1 includes FLOAT definitions but code in RUN-2
includes INTEGER definitions instead of FLOAT. SPARC CPU performs better with
Integer operations. That was what Oracle told us before. Please examine the doc
è When and How
to use SPARC CPU
- Oracle also told us before
again that SPARC CPU can be maximum %30 worse (due to pipeline design) than
Intel CPU on single thread performance, our results confirmed this. We made
this benchmark because we lived a single-thread application issue after
platform change, afterwards we wanted to eliminate CPU effect and application
effect among performance results.
- It is already known but
again confirmed that CPU clock speed is not the essential parameter for
performance, pipelines are more important. Oracle explained that Intel was 2
times faster at RUN-1 because RUN-1 included Floating Point and Intel’s
Floating Point pipeline is twice of Oracle FP pipeline.
Please feel
free to communicate by bulent.yucesoy@gmail.com