Parallel Implementation of preconditioned conjugate gradient and cholesky decomposition using Cilk Plus

Parallel Implementation of preconditioned conjugate gradient and cholesky decomposition using Cilk Plus

 Hi All,

I want the source code of parallel preconditioned conjugate gradient and cholesky decomposition using Cilk. So if anybody has these implementations source code kindly share them. I need them urgently.


Abdul Jabbar. 

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

anybody can suggest me any source from where I can find their parallel implemetation source code in any of the shared-memory programming model?

As far as Cholesky decomposition goes, are you looking for a sparse matrix implementation, or a dense matrix implementation?

A few years back, there was a paper entitled "Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems", by Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc, which had an implementation of an asynchronous parallel Cholesky decomposition, in CnC.   You might consider trying to contact one or more of the authors to see if they have the original code around.

In a previous life, I had also collaborated on a project on Nabbit, task-graph evaluation for Cilk-style languages, which had used some code derived from this CnC code to implement the asynchronous Cholesky algorithm in Cilk++.   I am not sure whatever happened to that code base, and whether it is still around anywhere.   But I can inquire whether that code base still exists and whether they would be willing to share it.   Unfortunately, as both of these codes were research projects, they may or may not be as industrial-strength an implementation as you might want, and you might not be able to get access as quickly as you might like.

I don't know as much about preconditioned conjugate gradient implementations.   I know that the NAS parallel benchmarks have a benchmark named "conjugate gradient" but I have no idea how close or far away that is to what you are looking for.



W.r.t. dense Cholesky: that puzzles me.  Why did you use that complication rather than simple SIMD directives?  Was it because they work together with blocking rather better?

I use Cholesky as an example/exercise in many of my courses, and have got stuck with CilkPlus due to reducers not working together with variable length arrays; while I could get it going by doing the 2-D indexing manually, that isn't my objective, so my source isn't publishable.  However, my OpenMP source is on the Web, in the form of a worked example (look at the practical exercise notes to see where) with test code, and shouldn't be hard to convert:

The code was derived from LAPACK (I was lazy) and just uses the obvious algorithm.  It is pretty solid, but does NOT use blocking methods, so is a lot slower than similar code in MKL etc.  Its real merit is that it is SIMPLE, and so people don't need to be experts to understand and modify it.

The book Structured Parallel Programming [I'm one of the authors] has a chapter on implementing a block-recursive dense Cholesky factorization in Cilk Plus.  The code example can be downloaded from .  

Arch D. Robison,

Sir can you tell me that how can I find the solution of Ax=b from that parallel cholesky implementation. It is using potrf() routines. It is just giving me changed A matrix after applying cholesky method. I don't how can I find solution x from it, would you kindly help me regarding this issue? Are there any operations or steps that need to be done after its execution to find solution?


Abdul Jabbar

you may search  web for example how to  use potrf for your purpose.

Leave a Comment

Please sign in to add a comment. Not a member? Join today