VAST-F Parallel
- Full Loop Nest Analysis. Loops are analyzed in simple and complicated
loop nests; loops containing the largest amount of work are parallelized.
Loops do not have to be tightly nested.
- Extended Parallel Regions. VAST/Parallel extends parallel regions
to include multiple parallel loops and intervening scalar code. This
cuts down on parallel overhead.
- Threshold testing. All parallel systems have some overhead. When
VAST/Parallel finds a parallel region, if the amount of work in the
region is not clear at compile time, then VAST/Parallel creates a
run-time test. Through this run-time test, the parallel region will
only be executed if there is enough work; otherwise, the original
serial version is executed.
- Dependence Analysis. VAST/Parallel has very sophisticated data dependency
analysis capabilities that allow it to optimize complicated situations.
All loop nests are examined to see if they can be executed in parallel
safely. VAST/Parallel can resolve ambiguous subscripting by examining
variable assignments outside of loops, and restructure the use of
variables to avoid certain other dependencies.
VAST-F Vector (AltiVec)
- Optimization of entire loop nests, not just inner loops. Critical
optimizations include loop fusion (squeezing multiple loops into one
loop), outer loop unrolling (unrolling an outer loop inside an inner
loop), loop collapse (making one long loop from a multiple dimension
loop), and loop interchange (changing the order of the loops in a
loop nest to get more efficient memory access).
- Unrolled vector loops. Unrolling vectorized loops is very important
in making sure that the vector instructions are overlapped the the
maximum extent possible.
- Vectorization of reduction loops. Includes array summations, dot
products, minimum and maximum element of an array, product of array
elements, etc. These operations take a large fraction of the CPU time
for many programs.
- Vectorization of conditional loops. "if" statements and
conditional operators are vectorized.
- Non-aligned vectors can be vectorized efficiently. VAST introduces
"permute" operations to align vectors "on the fly"
prior to computation.
- 32-bit float and 8, 16 and 32-bit integer vectorization. Integers
can be signed and unsigned. Also, VAST can vectorize loops that contain
mixed data sizes.
Performance
Performance Gains on a Single CPU system:
VAST/Parallel's superscalar optimization technology can enhance the performance of certain types of code on standard, single CPU systems. If your programs spend large amounts of time in nested loops or operating on large arrays, a performance improvement of over 35% may be possible. On other types of code, VAST/Parallel may have little impact.
Performance Gains on Dual CPU System:
VAST/Parallel can automatically parallelize your code and also provides full OpenMP support to enable user-directed parallelization. VAST/Parallel contains sophisticated data dependency analysis technology to detect when optimized execution will be safe, has very advanced in-lining capabilities, and uses interprocedural analysis to optimize across procedure boundaries. |