Concurrent Solution Design by Examples Functional Constructs Scenario

Concurrent Solution Design by Examples Functional Constructs Scenario

This article is one of a series illustrating some basic practices of concurrent or parallel solution design, which are expected to be within the common mastery of software developers in the 21st century.

As a common practice, programmers are suggested to write their codes in a modular way which adds ease to later maintenance and more importantly, reusability. On the other hand, this useful construct is actually a good facility for data parallel applications as well. Lets start the story from a well known searching algorithm, Binary Search.

Binary Search (http://en.wikipedia.org/wiki/Binary_search) begins with a set of data that is sorted and examines the middle element of the sorted set. If the element is greater than the one it is looking for, the lower half of the set becomes the new set to search. Otherwise, let the upper half becomes the new set. This process is repeated on each smaller set until either the element we are looking for is located or it cannot divide the set any further.

With the algorithm briefed, one may ask if we are going to parallelize it since the topic of this series is concurrent solution design. We can have a try with basic steps we've practiced. That is, the work loads partitioning and dependency analysis. A first look at the algorithm shows that it is iterative. So there seems to be an opportunity of loops partitioning or dividing for parallelism. However, if we draw a control flow chart of the algorithm, we should find that the major portion is composed of a series of conditional branches...branches? uh..., that known "trouble maker" for compiler optimization component?

Yes, there is in deed a control dependency between iterations. The operations in the next iteration cant start until the previous one (the comparison) finishes, which means, generally that individual iteration can't be executed in parallel. So does that mean parallelism fails in this case?

Let's take an out-of-box thinking. By the intent it was created, functions (or libraries) are invoked because they can process different data sets with same algorithm. So they can help developers implement data decomposition strategy. The search algorithm is normally not used in a single query. In reality, it could be used by applications with multiple or even massive search requests. One known example of such application is the spell checker which is broadly used in the online or offline text editing. The working principle of a spell checker is simple. It repetitively grabs a word from a block of text, searches it in a formed dictionary to see if it can be found. That means, even though the searching operations may not be parallelized, the data (the word to be searched) can be processed in parallel since there is literally no dependency between them. The attached is a concurrent solution to the spell checker in Java. It divides the text for checking into blocks according to its paragraphs and assigns the blocks to different threads to check simultaneously. Here, Blocking Queue, a concurrent collection of Java is used to implement the Boss/Worker model. An
d to make the application run, a dictionary (a file with hard-coded name Dictionary.txt) needs to be created before the input file (another name specified file ToBeChecked_m.txt) can be checked.

Is that story we try to tell? Wait a minute; the story is not finished yet. There is still one important note we need to make. Special care needs to be taken when designing the function or library that can be invoked in parallel and provide correct result. That is, using a term of concurrent programming, called Thread safety. We can soon see it by revisiting the linear Least-Squares Estimation example used in a prior article.

Different from the previous case, this time l-LSqE is encapsulated as a function and is called many times to produce estimates for massive instances of data sets.

double sumx, sumy, sumx2, sumxy;

void LeastSquareEstimation(double *y, int n, double yVar, double *b1, double *b0) {

int i;

sumx = 0.0;

sumy = 0.0;

sumx2 = 0.0;

sumxy = 0.0;

for (i = 0; i < n; i++) {

sumx = sumx + (double)i;

sumy = sumy + (y[i]+yVar);

sumx2 = sumx2 + pow((double)i, 2.0);

sumxy = sumxy + ((double)i * (y[i]+yVar));

}

*b1 = (sumxy - ((sumx * sumy)/(double)n)) / (sumx2-(pow(sumx,2.0)/(double)n));

*b0 = (sumy - ((*b1) * sumx)) / (double)n;

return;

}

#pragma omp parallel for

for (i = 0; i < it er; i++) {

LeastSquareEstimation(y, arrLen(y), (double)i/iter, &b1[i], &b0[i]);

}

Intuitively, we can just add a single OpenMP pragma (as shown in the code segment) to implement the parallelism. However, do we have a correct result?

Most likely not. A review at the code will tell us that shared/private resources issue rises again. The l-LSqE function uses global variables as its states and they got messed up when being invoked concurrently. With root cause identified, the solution is pretty straight forward - make the variables of function states private or local to the function.

void LeastSquareEstimation(double *y, int n, double yVar, double *b1, double *b0) {

int i;

double sumx, sumy, sumx2, sumxy;

sumx = 0.0;

sumy = 0.0;

sumx2 = 0.0;

sumxy = 0.0;

for (i = 0; i < n; i++) {

sumx = sumx + (double)i;

sumy = sumy + (y[i]+yVar);

sumx2 = sumx2 + pow((double)i, 2.0);

sumxy = sumxy + ((double)i * (y[i]+yVar));

}

*b1 = (sumxy - ((sumx * sumy)/(double)n)) / (sumx2-(pow(sumx,2.0)/(double)n));

*b0 = (sumy - ((*b1) * sumx)) / (double)n;

return;

}

To conclude, one important implementation requirement for thread safety is to avoid using global or shared resources for function/library states.

Now we've examined the use of function (or library) for concurrent solution design and an important notion, thread safety. For illustration purpose, some examples are in
tentionally made stupid. But we are expecting to catch some whales by the minnows thrown out.

1 post / novo 0
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.