Error mesage when running Intel® Optimized LINPACK Benchmark for Linux* OS on Intel Phi cards.

Error mesage when running Intel® Optimized LINPACK Benchmark for Linux* OS on Intel Phi cards.


I am trying Intel® Optimized LINPACK Benchmark for Linux* OS on Multi-Intel Phi cards configuration.



My test environment :

  1. AIC Sandy Bridge EP-4S server system with Sandy Bridge EP-4S *4 + 98GB memory
  2. Intel Xeon Phi : 3 pcs of 3110 and 4 pcs of 3115
  3. OS: Redhat Enterprise Linux 6.2 x64
  4. Xeon Phi MPSS: KNC_gold_update_2-2.1.5889-16-rhel-6.2.tar
  5. Intel Composer XE : l_ccompxe_2013.3.163.tgz
  6. Intel MPI : l_mpi_p_4.1.0.024.tgz or l_mpi_p_4.1.0.030.tgz

After ran the runme_xeon64_ao script to enables acceleration by offloading computations to Intel Xeon Phi coprocessors available on the system, I found that when I increase the HPL problem size(Ns) to a arrange, Linpack process(xlinpack_xeon64) will run endlessly and can’t be finished and found some relevant error message in host system log . For example, at 7 pcs Phi configuration, I got this problem when I set HPL problem size(Ns) to 46000. It related to Phi card quantity. At 1 pcs Phi configuration, I can increase HPL problem size(Ns) to 100000 without problem.


The below is error message:


__scif_fence_wait 3041 err -16

dma_mark_wait 1080 TO chan 0x0

drain_dma_intr 1151 err -16

micscif_rma_destroy_temp_windows 2082 DMA channel 0 hung ep->state 2 window->dma_mark 0x1c0 channel_mark 0x1c2

------------[ cut here ]------------

WARNING: at /home/build/sandbox/mpss/MPSS_4982/k1om/rhel-6.2/mpss/.rpmbuild_4982/BUILD/intel-mic-kmod-2.1.4982/micscif_rma.c:2084 micscif_rma_destroy_temp_windows+0x314/0x540 [mic]() (Not tainted)

Hardware name: SB301-TO

Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 mic(U) microcode sg ixgbe dca mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp e1000e i2c_i801 i2c_core ext4 mbcache jbd2 sr_mod cdrom usb_storage sd_mod crc_t10dif ahci isci libsas scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 2812, comm: SCIF_MISC Not tainted 2.6.32-220.el6.x86_64 #1

Call Trace:

 [<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0

 [<ffffffff81069bca>] ? warn_slowpath_null+0x1a/0x20

 [<ffffffffa0235664>] ? micscif_rma_destroy_temp_windows+0x314/0x540 [mic]

 [<ffffffffa02321b5>] ? micscif_rma_handle_remote_fences+0x155/0x380 [mic]

 [<ffffffff814eca40>] ? thread_return+0x4e/0x77e

 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20

 [<ffffffffa022a0f0>] ? micscif_misc_handler+0x0/0xc0 [mic]

 [<ffffffffa022a10a>] ? micscif_misc_handler+0x1a/0xc0 [mic]

 [<ffffffffa022a0f0>] ? micscif_misc_handler+0x0/0xc0 [mic]

 [<ffffffff8108b2b0>] ? worker_thread+0x170/0x2a0

 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40

 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0

 [<ffffffff81090886>] ? kthread+0x96/0xa0

 [<ffffffff8100c14a>] ? child_rip+0xa/0x20

 [<ffffffff810907f0>] ? kthread+0x0/0xa0

 [<ffffffff8100c140>] ? child_rip+0x0/0x20

---[ end trace e0d2c31584645743 ]---


2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

   Could you please indicate which binary you are running?  There are multiple binaries for different sorts of configurations, all of which are LINPACK in one form or another.

Leave a Comment

Please sign in to add a comment. Not a member? Join today