Procesadores Intel® Pentium®

How to Vectorize Assembly Code by Hand on 32-Bit Intel® Architecture


Challenge

Vectorize code by hand-coding in assembly. Programming directly in assembly language for a target platform may produce the required performance gain, but assembly code is not portable between processor architectures and is expensive to write and maintain.

Consider the following simple loop:

void add(float *a, float *b, float *c)

{

int i;

for (i = 0; i < 4; i++) {

c[i] = a[i] + b[i];

}

}

 

  • Procesadores Intel® Pentium®
  • How to Eliminate Unpredictable Conditional Branches on 32-Bit Intel® Architecture


    Challenge

    Eliminate unpredictable conditional branches in code. Eliminating these branches improves performance because it does the following:

    • reduces the possibility of mispredictions
    • reduces the number of required branch target buffer (BTB) entries; conditional branches that are never taken do not consume BTB resources

     

    Consider a line of C code that has a condition dependent upon one of the constants:

  • Procesadores Intel® Pentium®
  • How to Determine the IA-32 Hardware Backward Compatibility of an Application


    Challenge

    Determine the level of IA-32 processor-architecture compatibility an application provides. Many applications today must support hardware for at least five years. This is forever in terms of hardware technology, when you consider that five years ago the most common business computer was based on the Intel® Pentium® processor. The first release of the Pentium processor preceded MMX™ technology. Today's processors are on their third generation of specialized multimedia instructions. 

  • Procesadores Intel® Pentium®
  • Fast SIMD Integer Move for the Intel® Pentium® 4 Processor

    Introduction

    Several instructions are available on the Intel® Pentium® 4 Processor for moving integer data between SIMD registers. However, it may be more beneficial to use other instructions as a replacement for the straightforward register-to-register moves to reduce the number of cycles it takes to execute. Together, the organization of the code and the execution units required by the instructions, will determine the benefit of these replacement instructions.

  • Intel® Streaming SIMD Extensions
  • Procesadores Intel® Pentium®
  • x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)

    Introduction

    This document details the difference between how assists are handled with x87 and Single Instruction Multiple Data (SIMD) instructions, and gives information on how to change their behavior when using (Streaming SIMD Extensions) SSE and SSE2.

  • Desarrolladores
  • Intel® Streaming SIMD Extensions
  • SSE2
  • SSE
  • Procesadores Intel® Pentium®
  • Desktop Performance and Optimization for Intel® Pentium® 4 Processor

    Introduction

    This paper describes the performance philosophy of the Intel® Pentium® 4 processor. It also describes software optimization techniques and tools to achieve leading-edge performance on current and future generations of the IA-32 high-performance processors. The information on performance results, tools and techniques for software optimization will enable managers, architects and engineers to deliver industry-leading software performance.

    Read entire article (PDF)

  • Procesadores Intel® Pentium®
  • Suscribirse a Procesadores Intel® Pentium®