## Overview

To help jumpstart your application development, we provide code samples that illustrate the use of Intel’s Ct Technology in various workloads including those often used for financial services, graphics, image processing, medical imaging and more. The sample applications provide the most direct way to determine:

- Whether the software is working on your system
- How you can use the different language constructs (for example, operators, functions, facilities and so on)
- How to write code and create an application

## Installation

The code samples are contained in the installation package and by default are installed to

- Windows* directory
**C:\Program Files\Intel\Ct\<version>\samples** - Linux* directory
**/opt/intel/Ct/<version>/samples**

where <version> is the version of the product being installed.

## Building and Running the Samples

On Windows*, open one of the Microsoft* Visual Studio* solution (.sln) files located in the \samples folder. For examples, if you use Visual Studio 2005, double-click the file **C:\Program Files\Intel\Ct\<version>\samples\samples-vs05.sln**. Likewise, double-click the file **C:\Program Files\Intel\Ct\<version>\samples\samples-vs08.sln**, if you use Visual Studio 2008.

- Open
**samples-vs05.sln**or**samples-vs08.sln** - Select a configuration (the default setting is
**Debug - Win32**configuration) - To build all of the sample applications at once, build the entire solution. You can also select a specific sample to run by <following these steps>. tell them about the batch file in tools to run everything.
- Run the sample (
**Debug >> Run Without Debugging**or)

On Linux*, use one of the following shell scripts, located in the folder **/opt/intel/Ct/<version>/tools**, to build and run the samples:

**build_run-icc.sh**- automatically build and run the sample applications using Intel® C++ Compiler**build_run-gcc.sh**- automatically build and run the sample applications using GCC*

## Known Issues

- Sample Applications Run out of Memory
- Debug – Win32 Samples fail to compile on Windows Vista* with Microsoft Visual C++* 2005
- Release Configuration Samples "failed to start" on Supported Windows* Operating Systems

## Detailed Description

Vertical(Folder) |
Workload(Subfolder) |
Description |
Algorithm |
Implementations |

finance | binomial-tree | Numerical lattice for pricing European options. | Option stream with arithmetic intensity (exp, sqrt). | Uses rmap to parallelize over options. A series of _for loops for each time step using replace() on elements of both temporary containers and output containers of option prices. |

finance | black-scholes | Analytical method for pricing European options. Optionally evaluates or approximates polynomials. | Data-parallel random number generation. Option stream with arithmetic intensity (ln, exp, sqrt). | (1) Uses the rcall operator to invoke a Ct function whose outer loop parallelizes over options. Illustrates the use of select to chose between two terms during polynomial evaluation.(2) Equivalent implementation using a map over options to perform element-wise arithmetic. |

finance | monte-carlo_Ct | Stochastic method for computing financial options using the Blackscholes formula given randomly varying prices. Optionally generates MCG or MCGF sequences of random numbers. | Data-parallel random number generation. Option stream with arithmetic intensity (exp). 1D and 2D accumulation (reductions). |
(1) Uses the rcall operator to invoke a Ct function whose outer loop parallelizes over options. Generates a normally distributed random sequence. Nests an _for loop over prices, performs 1D vector arithmetic, and uses addReduce and replace to accumulate a result.(2) Uses reshape2D and repeatCol to perform an equivalent 2D implementation. Illustrates the use of addReduce for accumulation. |

finance | poisson-solver | Monte-Carlo method to solve Poisson functions (MCP solver). Uses an LCGF sequence of random numbers. | Data-parallel random number generation. Kernel with nested loops and arithmetic intensity (sin, cos). Minimum distance computation using a series of thresholds in the inner loop. Unbalanced load where the number of iterations depends on random input. |
(1) rcall to generate a large vector of scalar random numbers. Followed by a map over points illustrating nested _for and _while loops for a random walk. The inner loop is a series of _if statements to compute a minimum distance.(2) Equivalent implementation using map to perform element-wise arithmetic. |

finance | random_test | Tests that demonstrate usage of Ct random number generation and assess the quality of generated sequences. Compares Ct implementations with C and/or Intel MKL: (1) distributionTest (2) frequencyTest (3) goodOfFitTest (4) KStest (5) collisionTest (6) birthdayTest |
Data-parallel random number generation. | Uses the rcall operator to generate a large vector of random numbers with all supported sequences. Followed by an _for loop to generate multiple sequences and gather statistics on the output.Demonstrates the use of reshape2D and replaceRow. |

finance | randomlib | Code that can be inlined to generate a normally distributed random sequence using the following algorithms: (1) Linear Congruential Generator (LCG) (2) Multiplicative Congruential Generator (MCG) (3) Combined multiple recursive generator with two components of order 3 (MRG) (4) Generalized feedback shift register generator (R250) (5) Mersenne twister (MT) |
Data-parallel random number generation. Scan collectives, bitwise operations. | (MCG) Uses the ncall operator to invoke native code stubs from within a Ct function. The actual native implementation can be switched at link time.(General) Use of mulScan to generate indices.Illustrates the use of rotate and select on seeds.Illustrates the use of a bitwise & operation (a mask) to simulate a vector mod operation. |

graphics | raytracing1 | A kernel used to create a realistic visualization of a scene when tracing rays from a camera through an image plane to a light source. For each pixel in a 2D array, the closest ray-triangle intersection is determined and the pixel shade is evaluated with a lighting calculation. |
For each triangle in the scene, compute the intersection and distance to triangle. Compute the minimum distance and shade the triangle closest to the camera. The simplified lighting calculation is given by a proportional sum of diffuse, specular and ambient light. The ray tracing algorithm is parameterized over 1-component or 2-component inputs and outputs. |
(1) rcall over a 2D pixel array. Within the rcall body, an _for loop over the height of the pixel array is performed. For each row, a map over 1D lines of pixels is performed. Illustrates the use of index to generate an arithmetic sequence and replaceRow to populate rows of the 2D outputs(2) An equivalent map over a 2D pixel array is performed. Illustrates the use of index2D to generate a 2D sequence. Note: The bulk of the implementation differs only in the use of scalars versus 2-tuples for positions and directions. Therefore, ray tracing is parameterized over different data types.(3) A variation on (2) using 3-component tuples and large vectors of 3-component tuples instead of separate variables (i.e. for RGB and XYZ). Illustrates the use of get and set for components of a tuple.(4) A variation on (3) that uses ::global vectors for the triangles and their normals. |

graphics | raytracing2 | A variation on raytracing1 where ray-triangle intersection is limited to triangles in grid cells that intersect with rays. | A variation on raytracing1 where the the triangles that intersect and grid cells are pre-computed. For each cell in a 3D grid, an initial test is performed to determined if the ray intersects the cell. Ray-triangle intersection is performed when rays intersect grid cells. | (1-3) See (1), (2) and (3) for raytracing1. Note: Uses _break to perform an early exit from a _for loop or a _while loop. |

img-processing | convolve | Convolution of a 2D image with a discrete Gaussian function. | A gather over a fixed neighbourhood around each pixel of a 2D image. 1D convolution along X and Y axis of a pre-computed 2D stencil of co-efficients. Clamps to 255 to prevent saturation of the 8-bit unsigned image data. Optionally runs with convolution stencils of 5x5 or 9x9 pixels. |
(1) Separable convolution using large vector math. Calls shift to perform vector arithmetic on neighbours. Optionally performs an averaging filter.(2) A variation on (1) with manual unrolling rather than _for loops for separable convolution. Only works with a 5x5 pixel convolution stencil.(3) An element-wise map operation using nested _for loops to perform convolution. Calls numRows on the 2D stencil to operate on square kernels of any size.(4) This tuned version casts the unsigned image data to single-precision float. Next, an _for loop is performed over 64-pixel wide strips of the image. For each block, nested for loops are used to unroll the convolution. Uses section and replace to operate on a 64-pixel wide strip of the image. Followed by a similar loop to process the portion of the image that does not fit into strips of 64 pixels. |

img-processing | gauss-convolve | Convolution of a 2D image with a discrete Gaussian function. | Similar to convolve. Uses different stencil sizes and does not assume odd stencil sizes. | (1) Two _for loops to perform separable convolution using large vector math. Uses shiftRow and shiftCol for the 1D convolution along X and Y axis.(2) Equivalent implementation using map to perform element-wise arithmetic. Illustrates in-lining of multiple C++ routines (one for each axis) into a single Ct function. |

img-processing | harmonic-filter | Image filter to emphasize high frequencies (high-pass) by subtracting (filtering) a proportion of low pass filter output. | A mix of large vector arithmetic and element-wise arithmetic. First, a box filter (stencil for averaging) is used to generate a low frequency image. The harmonic filter subtracts a fixed proportion of the low pass from the original image to isolate high-frequency content. |
Uses rcall to invoke a Ct function containing two nested _for loops. Uses shift to perform vector arithmetic on neighbours. Calls numRows and numCols on the 2D stencil to operate on rectangular kernels of any size.Followed by an element-wise map operation to extract the high frequency content and to prevent saturation. |

img-processing | sobel | An edge detection filter for a 2D image that uses the gradient (rate of change) of image intensities. | A gather over a fixed neighbourhood around each pixel of a 2D image. Separately computes the gradient along the X and Y axes. This variation on a Sobel filter outputs the largest of the two gradients (with clamping to avoid saturation of 8-bpp image data). |
(1) Uses rcall to invoke a Ct function that in turn inlines separate functions to compute the gradient in X and Y using large vector arithmetic. Calls shift to perform vector arithmetic on neighbours.(2) Equivalent implementation using map to perform element-wise arithmetic. |

medical | 3D-dilate | A morphological operator for dilation applied to 3D grayscale images. | Loops over a neighbourhood defined by a 3D binary mask (structuring element). For each neighbor corresponding to a non-zero mask entry, the image is updated with the largest difference between a neighbour and a height field. | Three nested _for loops are used to iterate through the mask. A call to create makes a local buffer to store maximums. Calls shift to perform vector arithmetic on neighbours. Calls numCols, numRows and numPages to operate on a mask of arbitrary size. |

medical | 3D-erode | A morphological operator for erosion applied to 3D grayscale images. |
Similar to 3D-dilate, except that the minimum difference is output (min reduce). |
Similar to 3D-dilate. |

medical | 3D-gauss-convolve | Convolution of a 3D image with a discrete Gaussian function. |
Similar to guass-convolve, except that a 3D convolution stencil is applied to 3D image data. |
Similar to guass-convolve. Uses shiftPage in addition to shiftRow and shiftCol to handle the Z axis. |

medical | back_projection | A technique for image reconstruction used with inputs from computed axial tomography (CAT) scans. | A spatially-coherent gather along projections (rays) through each pixel of a 2D image. Applies the inverse Radon transform to reconstruct a 2D image given a set of projections through that image. Uses 1D interpolation to update the output image with the contribution from the nearest projections. Notes: A simple scan geometry is assumed (radially symmetric 1D orthographic projections rather than a helical scan). In addition, it is assumed that sharpening of sets of input projections (sinograms) has already been performed. |
(1) Uses the rcall operator to parallelize over pixels in the 2D output image. Uses an _for loop in the rcall body to iterate through projection angles. Uses a table lookup to compute the sin and cos of each projection angle. Calls floor and ceiling on large vectors prior to interpolation. Uses the += operator to integrate contributions.(2) A variation on (1) that uses reshape2D to create a 2D product of angles and projections rather than a packed 1D vector. Within the rcall body, the indexing is modified to perform a 2D gather using a two-component index. |

misc | gemm | Calculates the new value of matrix C by adding the matrix-product of matrices A and B. | With A, B and C as RowMajor matrices, multiply each row from A with a column from B, and add to one row of C. Note: Not a fully generalized matrix multiply, because the co-efficients for AB and C are not parameters in this sample. |
(1) Uses an _for loop in an rcall operator to parallelize over rows of matrix A. Specifically, repeatCol is applied to each row of matrix A to express an MxN intermediate matrix. This intermediate is multiplied with matrix B and addReduce is called to reduce the 2D product to a single M-element array. The output of addReduce is the input for replaceRow, which populates each row of the output matrix C.(2) This variation on (1) uses a nested _for loop to traverse rows and columns of matrix A. The scalar product of each element of matrix A and each row of matrix B is accumulated in a temporary 1D array of N elements. This 1D temporary is the input for replaceRow, which populates each row of the output matrix C.(3) Uses the rcall operator to parallelize over columns of matrix A. For each of K columns, repeatCol is called on a column of matrix A to express a KxN intermediate matrix. Similarily, repeatRow is called on the correponding row of matrix B to create a KxN matrix. C accumulates the product of these two intermediate matrices. |

misc | mandelbrot | Generate a fractal data set. | Iteratively applies a quadratic polynomial on complex numbers to compute a fractal set. | (1) Uses an _for loop in a map operator to iteratively refine the output. Uses the complex data type C32 to perform complex multiplcation and addition. Calls abs to compute the complex normal, and uses _break to exit early when the hard-coded bounds are exceeded.(2) An equivalent implementation using an _for loop in an rcall operator. Creates a large vector that is local to the Ct function, complex and 2D. Performs a fixed number of iterations, but stops updating the output when the fractal bounds have been exceeded. |

misc | spec-samples (collective) |
Calling code detailing the behavior of many operations on dense and nested containers. These are divided into three categories: (1) Collectives used for reductions and scans. (2) Facilities for building and querying the structure of nested data containers. (3) Operations on dense and nested containers, like sort and pack, and that permute elements or nested segments of containers. |
(1) Full and partial collective operations are performed. The partial collectives reduce the dimensionality of the input set rather than returning a single value. Collective operation is illustrated using dense and nested containers. (2) Illustrates the reshaping of dense containers as nested containers, flattening of nested containers, and split/unsplit/cat operations. Also shows how to extract sizes of dense continers and nested segments. (...2) Illustrates the creation and initialization of large vectors and index sets. Also shows how to section large vectors and update sections of large vectors. (3) Permutes data using swizzle, pack, shift, rotate, sort and shuffle operations. Many operations have opposites, such as pack/unpack. |
(1) The calling code, inputs and outputs are detailed for full/partial reductions (addReduce and addIReduce), as well as full/partial scans (addScan and addIScan).(2) Uses reshapeNestedLengths to generate nested vectors from dense vectors based on segment descriptors. Couples this operation with a type cast using reshapeAs. Calls split, unsplit and cat with inputs and/or outputs that are nested containers.(...2) Calls value, lengths, flags and offsets to extract information about nested containers. (...2) Calls create for large vectors and illustrates the construction of index sets. Uses section and replace to operate on pieces of large vectors.(3) Performs swizzle, mask, pack/unpack and scatter operations on large vectors using large vectors to specify the output indices. (...3) Calls shift, shiftSticky and rotate with options to permute dense and nested containers both left and right. Note that full segments of nested containers can be permuted.(...3) Calls sort to perform direct and indirect sorts on dense containers.(...3) Calls shuffle/unshuffle to perform strided interleave/deinterleave of dense containers.(...3) Shows how to use repeat and repeatRow variants to replicate data in dense containers. |

misc | svm | Computes the maximum-margin split point that can be used to classify new data given a stack of observations (support vector machine). | Computes the distance between a 1D input and each row of a 2D stack of data. Uses an exponential function on the distance to accumulate a split point (a hyperplane in the general case). Computes a decision value using heuristics. | Uses an rcall operator to call addReduce on the difference between a 1D input and a 2D data set. An exponential function is computed on this 1D reduction output. A second call to addReduce completes the reduction and generates a scalar that can be used to output a split point. |

seismic | 3dstencil | Convolution used in reverse time migration (RTM). | Convolution using a 7x7x7 cross-shaped kernel. |
(1) Uses the rmap operator to invoke an element-wise operation from outside of an rcall body. Uses relative indices to gather values of neighbours. |

seismic | convolution | 1D and 2D convolution for a seismic image. | Separable 2D convolution using a cross-shaped kernel. | (X) Uses the rcall operator to implement 1D convolution on the x-axis between a seismic trace and a large array of weights. Calls shift to access neighbours within an _for loop to perform convolution with an arbitrarily sized array of weights. Uses create to generate a large vector output of any specified size.(Y) An equivalent operation on the Y axis performed on half of the input data set. (2D) Uses the rcall operator to perform a 2D convolution with a cross-shaped stencil of fixed size. Uses shiftSticky to perform vector arithmetic with neighbours using a zero-flux assumption for out-of-bounds accesses (clamped to the nearest boundary value). Uses a stride of 2 on the x-axis when gathering neighbours. |

seismic | kirchhoff | Generic Kirchhoff migration assuming constant velocity of seismic waves through a sub-surface. | Accumulates the contributions of each seismic trace to a sub-surface reconstruction. Uses a constant velocity model where the time from source to receiver is proportional to the distance between the source and receiver. Uses the equation of a circle to determine the possible reflection points. Uses correlation between multiple source-receiver pairs to identify the location of the reflecting sub-surface. |
(1) Uses the rcall operator to implement migration with large vector arithmetic. Uses create to allocate a large vector output. Constructs index<> sets with the user-specified resolution. Uses a _for loop to parallelize over circle centers. Uses a select statement to perform a boundary check.(2) A 2D variation on (1) where the output and index sets are 2D X-Z datasets. Uses repeatCol and repeatRow to generate the 2D index sets. Uses create to allocate and initialize a 2D large vector containing two-component tuples. This is used to index the trace data to determine the appropriate contribution for the output reconstruction. |