Código abierto

Using L1/L2 cache as a scratchpad memory

Dear all,

Explicitly cache control is a one of important feature in Xeonphi (MIC). How could I use the L1 or L2 as scratchpad memory and also sharing them data between the cores?

In addition,  is there any way to hack the MESI state of the cache line in the distributed tag directory (DTD)? 

Thanks in advance.


DPDK Community Meetup

Silicon Valley DPDK Meetup

This is a group for engineers who enjoy developing applications for high network performance, it is all about plumbing... but for fat pipes!

This is a casual setting to collaborate, discuss, and learn more about DPDK.
Let's meetup and have fun with the Silicon Valley DPDK community, every 2nd Thursday of the month at 6:00 pm.

See you there!

Performance comparison between Intel TBB task_list, openMP task and parallel for

I am planning on parallelizing a hotspot in a project. And I would like to know your opinion between the performance evaluation between parallel for, omp single followed by task and intel TBB task_list, under ideal conditions where number of threads are equal to computation items and when computation are much greater than available threads to see scheduling overhead(in order to evaluate the most efficient scheduler). I will also, be writing some sample test programs to evaluate myself but I also wanted to know if anybody had previously made these evaluations.

Thanks in advance.

Computing Delacorte Numbers with Julia

I came in 2nd in the Al Zimmermann Programming Contest "Delacorte Numbers", using a quad-core machine and the Julia programming language.  The attached PDF file is a personal report on using Julia for the contest and a detailed discussion of the program.  If you have not used Julia before, you may find it to be a useful introduction to the language.

  • Profesores
  • Estudiantes
  • Julia
  • Al Zimmermann Programming Contest
  • Delacorte Numbers
  • Académico
  • Código abierto
  • Further information about different barrier algorithms


    I'm researching on barrier algorithms using SIMD instructions and I'm trying to deeply understand the different versions included in the RTL.

    I've noticed that there is a new barrier algorithm (hierarchical) since the last time I had a look.

    Where could I find a further description of them? Could someone from Intel provides me with further information?


    Thank you in advance.

    Kind regards.

    Analyzing Intel® SDE's TSX-related log data for capacity aborts

    Starting with version 7.12.0, Intel® SDE has Intel® TSX-related instruction and memory access logging features which can be useful for debugging Intel® TSX's capacity aborts. With the log data from the Intel SDE you can diagnose cache set population to determine if there is non-uniform cache set usage causing capacity overflows. A refined log data may be used to further diagnose the source of the aborts.

  • Desarrolladores
  • Socios
  • Estudiantes
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • Unix*
  • Servidor
  • Python*
  • Avanzado
  • Emulador de Desarrollo de software Intel®
  • Intel® Transactional Synchronization Extensions
  • Intel Transactional Synchronization Extensions (Intel TSX)
  • Intel SDE
  • Restricted Transactional Memory (RTM)
  • Depuración
  • Herramientas de desarrollo
  • Procesadores Intel® Core™
  • Código abierto
  • Optimización
  • Computación en paralelo
  • Subprocesos
  • Contrato de licencia: 

    Suscribirse a Código abierto