Forum topic

How to speed up this code?

    Hello together,

many thanks for all contributors to my past question.

Authored by Alexander L. Last updated on 01/19/2017 - 02:12

Software Occlusion Culling

This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth buffer. The sample code uses frustum culling and is optimized with Streaming SIMD Extensions (SSE)...
Authored by Kiefer Kuah (Intel) Last updated on 01/17/2017 - 11:59

Getting Started with Intel® Software Optimization for Theano* and Intel® Distribution for Python*

Theano* is a Python* library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays. Theano can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install Intel-optimized Theano with Intel®...
Authored by Sunny G. (Intel) Last updated on 01/17/2017 - 09:57

Fast Computation of Huffman Codes

The generation of Huffman codes is used in many applications, among them the DEFLATE compression algorithm. The classical way to compute these codes uses a heap data structure. This approach is fairly efficient, but traditional software implementations contain lots of branches that are data-dependent and thus hard for general-purpose CPU hardware to predict. On modern processors with deep...
Authored by James Guilford (Intel) Last updated on 01/17/2017 - 09:53
Forum topic

mitigating permute costs in AVX 256?

Hello, I'm investigating conversion of a number of compute kernels from AVX 128 to AVX 256 and would appreciate any guidance which might be available on getting a small number of operations on port

Authored by Todd West Last updated on 01/15/2017 - 09:21
Forum topic

_mm_prefetch usage



Authored by Ioan H. Last updated on 01/15/2017 - 06:01
Forum topic

Is xend treated as a full memory barrier?

I've started attempting to learn RTM extensions. The most common examples I can find online are using them to implement a mutex or concurrent lock. Often they are similar to:

Authored by william laeder Last updated on 01/13/2017 - 08:05
Forum topic

Code scales poorly with AVX

This code scales poorly with AVX on my Sandy Bridge, how can I make it more vectorizer friendly:

Authored by CommanderLake Last updated on 01/11/2017 - 18:32

Exploring MPI for Python* on Intel® Xeon Phi™ Processor

Learn how to write an MPI program in Python*, and take advantage of Intel® multicore architectures using OpenMP threads and Intel® AVX512 instructions.
Authored by Nguyen, Loc Q Last updated on 01/09/2017 - 11:21
Forum topic

Parallelization + Vectorization using OpenMP in Sandy Bridge


I would like to ask question about parallelization+vectorization:

Authored by Claudia W. Last updated on 01/09/2017 - 00:05
For more complete information about compiler optimizations, see our Optimization Notice.