The attached code occasionally produces incorrect results, due to what I suspect is a data race in the OpenMP implementation. I believe the code is standards compliant and should produce deterministic output. The expected output is:
user@host $ ./a.out 8. 1. 8. 1. 8. 1. 8. 1. 8. 1.
Occasionally, the program will produce erroneous output such as a blank line or numbers other than 8 and 1. I have observed the following:
user@host $ ./a.out user@host $ ./a.out Array "whole" contains zeros! 8. 0. 0. 0. 0. 0. 0. 0. 0. 0.
It appears there is a race condition during the copy-in or copy-out of the non-contiguous array section pointed to by "slice" that is passed to the subroutine "sub". The C pointer "ptr" is private through the OpenMP parallel directive and is used to allocate memory that is private to each thread via a call to malloc. This memory is associated with the pointer "whole", which is threadprivate via the declaration in "data_mod", and subsequently becomes associated with the shared pointer variable "slice", which is accessed only by one thread due to the OpenMP single directive. Memory allocation via "malloc" on Mac OS X is documented to be thread safe, so that shouldn't be the source of the race.
Compiler and host (Mac OS X 10.8) information:
user@host $ ifort --version ifort (IFORT) 12.1.6 20120928 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. user@host $ uname -a Darwin host.local 12.3.0 Darwin Kernel Version 12.3.0: Sun Jan 6 22:37:10 PST 2013; root:xnu-2050.22.13~1/RELEASE_X86_64 x86_64