Microsoft Windows* 8.x

Introducing Batch GEMM Operations

The general matrix-matrix multiplication (GEMM) is a fundamental operation in most scientific, engineering, and data applications. There is an everlasting desire to make this operation run faster. Optimized numerical libraries like Intel® Math Kernel Library (Intel® MKL) typically offer parallel high-performing GEMM implementations to leverage the concurrent threads supported by modern multi-core architectures. This strategy works well when multiplying large matrices because all cores are used efficiently.

  • Developers
  • Partners
  • Professors
  • Students
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Unix*
  • Windows*
  • C/C++
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Math Kernel Library
  • Intel Math Kernal Library (Intel MKL)
  • Development Tools
  • Optimization
  • Parallel Computing
  • max and min reduction and other omp 4 changes in 15.0 updates

    I've been looking for documentation of what has changed with this vector reduction.  The one note I find in ifort release notes is about the somewhat curious addition of these reduction clauses to the legacy !dir$ simd directive in 15.0.1.

    In the basic case, the reductions, using f77 code and directive, are equivalent to f90 maxval and minval, so the latter seem preferable, and I almost wonder why so much fuss has been made about directives which 15.0.3 seems to show aren't needed for the single thread case.

    针对“普通级(以上)厨师”的高级计算机概念: 简介

    之前在与一位非常聪明但不是专业工程师的同事交谈时,我发现有必要对线程化和英特尔® 至强融核™ ⅹ100 和 ⅹ200 架构的其他组件稍作解释。 首先关于超线程,(说的更具体一点)以及协处理器的超线程版本。 经过冥思苦想,我终于想到可以用公共厨房来进行恰当的比喻。

    Image of cook, oven & appliances

     

     

    针对“普通级(以上)厨师”的高级计算机概念: 术语(第 1 部分)

    开始之前,我想通过下面两篇博客解释一些术语。 如果对这些概念已经有所了解,可以直接跳至下一部分。  我建议所有软件读者参阅其他关于介绍线程的博客。 这个领域存在许多混淆,即使我们软件专业人员也无法避免。

    我们首先来了解一下什么是处理器、CPU、内核以及封装。 电视等大众媒体在使用这些术语时通常比较随便。 然后我们介绍线程,尤其是硬件和软件线程之间的区别。 人们通常容易混淆这些不同线程之间的区别,即使计算机编程人员也不例外。

    内核? CPU? 封装? 芯片? HUH?

    请大家注意下图 CPU 的左侧。 在奔腾® 处理器时代,人们通常将计算机中执行程序指令的组件(即计算机的大脑)称为“CPU” 或‘处理器’。 这两者之间几乎没有区别。 ‘计算机芯片’指上面刻有集成电路的芯片,比如 CPU。 ‘封装’指由塑料和金属制成的外壳,用于包裹和保护芯片不被大量针脚/接口刺穿,同时也是出于美观的考虑。

    使用英特尔® 软件开发仿真器的优势

    简介

    全新的英特尔处理器引入了增强型扩展指令集,以此提升应用的性能或增强其安全性。  英特尔 AVX1 和 AVX21 等扩展指令集主要用于提升性能,而英特尔 SHA2 指令则用于 SHA 加速,从而增强应用的安全性。

    如果开发人员希望用这些新指令创建应用,但目前的硬件不支持这些指令该怎么办?  公司如何证明购买新系统来支持新指令的价值,同时确保其应用能够充分利用这些新指令来提升性能?

    英特尔® 软件开发仿真器可用于在不支持新指令的系统上执行包含这些指令的应用。

    本文将探讨使用 SDE 测试使用新指令的代码所带来的优势。

  • Developers
  • Professors
  • Students
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Server
  • Windows*
  • C/C++
  • Advanced
  • Beginner
  • Intermediate
  • software developer emulator
  • sde
  • server
  • windows
  • Linux
  • Intel Processor
  • Programming
  • Academic
  • Debugging
  • Development Tools
  • Intel® Core™ Processors
  • Open Source
  • Threading
  • MAXVAL Stack overflow problem

    I have a large 3-dimensional array and I'm trying to do an element-by-element maximum on the first 2 dimensions using the MAXVAL function.  When I do, I get a stack overflow error.  Is there a size limit to the MAXVAL intrinsic function?  The code is abbreviated below with constants in the array declarations and allocations instead of variables just to show the size:

    program main

    real, allocatable :: arr2(:,:), arr3(:,:,:)

    allocate( arr3( 0:1000, 1:440, 1:6 ), source = 0.0 )

    allocate( arr2( 0:1000, 1:440 ), source = 0.0 )

    ...! assign values to arr3

    Parameterized derived types with PASS

    I'm trying to use parameterized derived types and have run into a problem which I have distilled into the following code:

    module t_mod
    
        implicit none
        
        type T(k)
            integer, kind   :: k = 4
            integer(kind=k) :: d
        contains
            procedure, public, pass(x) :: check_v
        end type T
    
        contains
      
        logical function check_v(k, x)
            integer  :: k
            class(T) :: x
            check_v = (k == x%d)
        end function check_v
    
    end module t_mod
    

    I get the following compilation error:

    Game Companies Speed Up Development with Intel® Sample Code

    Whether you are an indie game developer or a seasoned professional, you are likely to find an interesting code sample on Intel Developer Zone's game dev section. Read here to learn how Intel engineers worked with Blizzard and Codemasters to optimize our Adaptive Volumetric Shadow Maps (AVSM), Conservative Morphological Anti-Aliasing (CMAA), and Software Occlusion Culling code samples.
  • Developers
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Game Development
  • Windows*
  • C/C++
  • Intel® C++ Compiler
  • Microsoft DirectX*
  • samples we’ve created have been adapted for use in games published by Blizzard and Codemasters – specifically Adaptive Volumetric Shadow Maps (AVSM)
  • Conservative Morphological Anti-Aliasing (CMAA)
  • and Software Occlusion Culling.
  • Game Development
  • Graphics
  • Intel® Core™ Processors
  • Microsoft Windows* 8 Desktop
  • Optimization
  • Subscribe to Microsoft Windows* 8.x