Composable Threading Coming to Julia*

By Henry A. Gabb, Published: 08/14/2019, Last Updated: 08/14/2019

Henry A Gabb, PhD (Editor, The Parallel Universe)

The July 2017 issue of The Parallel Universe ran an article on Julia: A High-Level Language for Supercomputing. My key takeaways from the article were that Julia has built-in primitives for multithreading and distributed computing and was capable of extreme parallelism (scaling to thousands of cores). Despite this, I still considered Julia something of a curiosity. As I noted in my editorial, Python* was my productivity language of choice:

I recoded some of my time-consuming data wrangling applications from Python to Julia, maintaining a line-for-line translation as much as possible. The performance gains were startling, especially because these were not numerically-intensive applications, where Julia is known to shine. They were string manipulation applications to prepare data sets for text mining. I’m not ready to forsake Python and its vast ecosystem just yet, but Julia definitely has my attention.

That was over two years ago, and Julia still has my attention because its ecosystem is growing rapidly and it has the potential to solve the “two-language problem” in data science, in which a high-level language like Python is used for data manipulation but low-level languages like Fortran or C/C++ are used when performance is required.

As much as I like Python, I’m not impressed with its ability to exploit parallelism, which brings me to the point of this brief commentary. Julia Computing announced last month that composable, general task parallelism is coming to Julia. (It is available now for testing in the v1.3.0-alpha release.) This is in addition to its already impressive parallel capabilities.

Julia’s task parallelism is similar in spirit to that of Threading Building Blocks. The programmer spawns tasks freely and lets the scheduler sort out when and where they run. The programmer doesn’t have to worry about available processors or threads because the scheduler makes sure that the system isn’t oversubscribed. As the authors note, “The model is nestable and composable: you can start parallel tasks that call library functions that themselves start parallel tasks, and everything works.” They illustrate this principle with a straightforward merge sort implemented with task parallelism:

import Base.Threads.@spawn

# sort the elements of ‘v’ in place, from indices ‘lo’ to ‘hi’ inclusive
function psort!(v, lo::Int=1, hi::Int=length(v))
   # some condition checking code (e.g., v is empty) deleted for brevity
   mid = (lo+hi)>>>1                  # find the midpoint
   half = @spawn psort!(v, lo, mid)   # task to sort the lower half; will run
   psort!(v, mid+1, hi)               # in parallel with the current call sorting
                                      # the upper half
   wait(half)                         # wait for the lower half to finish
   temp = v[lo:mid]                   # workspace for merging
   i, k, j = 1, lo, mid+1             # merge the two sorted sub-arrays
   @inbounds while k < j <= hi
      if v[j] < temp[i]
         v[k] = v[j]
         j += 1
         v[k] = temp[i]
         i += 1
      k += 1
   @inbounds while k < j
      v[k] = temp[i]
      k += 1
      i += 1
   return v

As you can see, Julia syntax is similar to other productivity languages, so the learning curve is low. The psort function above is a standard merge sort with recursive spawning of threads (via the @spawn construct). Depending on the length of v, there’s the potential to create many threads, but that’s the Julia scheduler’s problem. The programmer just specifies where to spawn threads and where to wait for them to finish.

I’ve only scratched the surface here, so have a look at their original blog post for more information about the new threading constructs and “under the hood” implementation details.


Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804