Procesadores Intel® Pentium®

Optimized Matrix Library for use with the Intel® Pentium® 4 Processor's SSE2 Instructions


On January 2000, Intel published an optimized matrix library (4D single-precision matrix and vector classes) for use with Pentium® III Streaming SIMD (Single Instruction Multiple Data) Extensions, or SSE, in an article in

  • Artificial Neural Nets and Hyper-Threading Technology

    by Chuck Desylva


    Different methods for optimizing AI algorithms to take advantage of an Intel® Pentium® 4 Processor with Hyper-Threading Technology

    The purpose of this paper is to highlight several key artificial intelligence (AI) software technologies and some simple changes that can be made to them to gain performance improvements on the Pentium® 4 and Intel® Xeon® processors.

  • How to Vectorize Code Using Intrinsics on 32-Bit Intel® Architecture


    Vectorize code by means of intrinsics. Intrinsics provide the access to the ISA functionality using C/C++ style coding instead of assembly language. Consider the following simple loop:

    void add(float *a, float *b, float *c)
    int i;
    for (i = 0; i < 4; i++) {
    c[i] = a[i] + b[i];


  • Reducing the Impact of Misaligned Memory Accesses


    Misalignment of memory access is a problem commonly encountered when optimizing code with Streaming SIMD Extensions 2 (SSE2). An SSE2 algorithm often requires loading and storing data 16 bytes at a time to match the size of the XMM registers. If alignment cannot be guaranteed, some part of the performance gain achieved by processing multiple data elements in parallel will be lost because either the compiler or assembly programmer must use unaligned move instructions.

