Thanks to Wen-Mei for a delightful chat. I found your site, http://courses.ece.illinois.edu/ece498/al/, with the curricula for your Programming Massively Parallel Processors course. This curricula fosters students acquiring practical experience, typically learned toiling hours to days in trenches with little sleep, and less coffee. I plan to look over his mathematically prodigious mini-case studies to seek ones more accessible to undergraduate students, and not surprisingly I seek your help with this effort.
I told myself that all my posts here would be crosstaggable to both academic and multi-core, but Henry Neeman has got me all riled up.
Half this blog will NOT be about multi-core, but half will; it is thus a semi-multi-core post.
Henry is Director of the OU Supercomputing Center for Education & Research (OSCER) at the University of Oklahoma; Henry has a highly regarded effective multi-format lecture series called Supercomputing in Plain English; and Henry is usually always right.
by Rajiv Kapoor
Several instructions are available on the Intel® Pentium® 4 Processor for moving integer data between SIMD registers. However, it may be more beneficial to use other instructions as a replacement for the straightforward register-to-register moves to reduce the number of cycles it takes to execute. Together, the organization of the code and the execution units required by the instructions, will determine the benefit of these replacement instructions.
This article briefly covers the background and use of the "flush-to-zero" (FTZ) or abrupt-underflow settings for Streaming SIMD Extensions (SSE/SSE2) instructions on IA-32 and floating-point instructions on Itanium®-based architecture (64-bit). An example is presented that shows how the mode may be set at run time and the effects on floating-point results.
This document details the difference between how assists are handled with x87 and Single Instruction Multiple Data (SIMD) instructions, and gives information on how to change their behavior when using (Streaming SIMD Extensions) SSE and SSE2.
Last week I promised to talk more about what our team in PRC is doing. So in this post I have a little bit of a challenge for you XML pioneers out there.
If you could define new CPU instructions to improve XML validation, what would they be?
Well Yongnian Le from the XET team has ideas to share. Does “Parallel TRIE with Intel® SSE4.2/STTNI” sound more interesting than a morning coffee? For me yes, but please don’t tell my wife, she’s already has enough Ken’s so nerdy ammo.