CPI rate blows up


Criado por Alexander L. Última atualização em 20/01/2017 - 16:09
How to speed up this code?

Criado por Alexander L. Última atualização em 19/01/2017 - 02:12

Software Occlusion Culling

This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth buffer. The sample code uses frustum culling and is optimized with Streaming SIMD Extensions (SSE)...
Criado por Kiefer Kuah (Intel) Última atualização em 17/01/2017 - 11:59

Getting Started with Intel® Software Optimization for Theano* and Intel® Distribution for Python*

Theano* is a Python* library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays. Theano can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install Intel-optimized Theano with Intel®...
Criado por Sunny G. (Intel) Última atualização em 17/01/2017 - 09:57

Fast Computation of Huffman Codes

The generation of Huffman codes is used in many applications, among them the DEFLATE compression algorithm. The classical way to compute these codes uses a heap data structure. This approach is fairly efficient, but traditional software implementations contain lots of branches that are data-dependent and thus hard for general-purpose CPU hardware to predict. On modern processors with deep...
Criado por James Guilford (Intel) Última atualização em 17/01/2017 - 09:53
mitigating permute costs in AVX 256?

Hello, I'm investigating conversion of a number of compute kernels from AVX 128 to AVX 256 and would appreciate any guidance which might be available on getting a small number of operations on port

Criado por Todd West Última atualização em 15/01/2017 - 09:21
_mm_prefetch usage



Criado por Ioan H. Última atualização em 15/01/2017 - 06:01
Is xend treated as a full memory barrier?

I've started attempting to learn RTM extensions. The most common examples I can find online are using them to implement a mutex or concurrent lock. Often they are similar to:

Criado por william laeder Última atualização em 13/01/2017 - 08:05
Code scales poorly with AVX

This code scales poorly with AVX on my Sandy Bridge, how can I make it more vectorizer friendly:

Criado por CommanderLake Última atualização em 11/01/2017 - 18:32

Exploring MPI for Python* on Intel® Xeon Phi™ Processor

Learn how to write an MPI program in Python*, and take advantage of Intel® multicore architectures using OpenMP threads and Intel® AVX512 instructions.
Criado por Nguyen, Loc Q Última atualização em 09/01/2017 - 11:21
