Accelerating Polygonization of Skeletal Implicit Surfaces on Many-Core architectures

Published: 04/12/2010, Last Updated: 04/12/2010


Author: Pourya Shirazianpourya.jpg


Pourya Shirazian, University of Victoria, BC, Canada.  At UVIC, Pourya is a PHD student in Computer Science under the supervision of Dr.Brian Wyvill. His main research interests are implicit modeling and visualization of deformable models.
His current project is real-time rendering of skeletal implicit models which will benefit sketch-based modeling systems, enabling more complex models to be designed in real time.



Interactive design systems that use polygonization methods for rendering implicit surfaces have difficulty in responding interactively to user changes. To address this, we propose a novel approach for tessellating Blobtree skeletal implicit models using a divide and conquer strategy controlled by bounding volumes. Our algorithm leverages the parallel processing power available on Many-Core architectures.



Proposed Algorithm

The BlobTree creates an intuitive mechanism for organizing skeletal implicit primitives with CSG, blending, warping and affine transformations [1]. Models are visualized as an iso-surface in a scalar field; using a space partitioning algorithm, all that is needed is to find a field value at the vertices of a regular voxel grid. This is done by traversing the BlobTree for each point. Points with field values greater than the iso-value are considered to be “inside” the model. When polygonizing [3] a model using a cellular grid, this process has to be repeated on the order of times, where n is derived from the cell size parameter as follows:


Using conventional marching-cube style methods, the cost of field evaluations will be the bottleneck of the system. Our proposed algorithm is as follows:




1. Each skeletal primitive s_i contains a list of grid internal points φ_i. (φ_i has at least one element, which must be either on, or very close to, the primitive’s skeleton.)

2. We compute the bounding box of the entire model. (The bounding volume of each skeletal primitive is computed using the skeleton and radius of influence; tree nodes will modify the bounding boxes accordingly.)

3. Divide the bounding box into “thread volumes”, one thread volume for each physical thread available on the processor.

4. Each thread volume is then associated with a list of voxels. For each list of voxels, we must generate an associated list of triangles. We proceed as follows:

The first problem is to guarantee a surface seed point for each primitive. (Steps a to d won’t be a bottleneck in the system).

4a. For each threading volume tv, we consider S={s_1,s_2,…,s_q}, the set of all primitive nodes in the BlobTree whose bounding box intersects the thread volume.

4b. For each element s_i in S, all internal points in φ_i are checked against tv. Once a point is found which is inside tv, rays are fired starting from that point to each corner of tv. If no internal point inside tv is found, the rays are shot from the center of tv to all of its corners.

4c.  We use root finding methods to find the seed point on the surface inside tv.

4d.  At this point, some primitive nodes may not have seed points. For each s_i with no seed point, another ray is shot from its internal point φ_i to each skeleton point of s_i. We then repeat step (c) until each list of potential internal points is resolved to one seed point.

The second problem is to find all cubic voxels intersecting the iso-surface inside each thread volume tv.

4e.The seed point on the surface inside the current tv is used as the origin of a local coordinate system. A cubic voxel of the defined size is created at the origin and added to the current list of voxels V.

4f. Field values at all corners of each intersecting voxel in V are computed and, for each edge that intersects with the surface (i.e. each edge that has one inside and one outside endpoint), the three neighbouring cubes to that edge are added to V, provided that they are inside tv.

4g. For each voxel, the hash function in [3] is used to generate appropriate triangles from that configuration. All generated triangles are added to T.

5. Finally, the list of triangles produced for each thread volume is merged into a global list and then passed to the renderer.

Figure 1. Flow chart describing the algorithm

We avoid redundant space subdivisions that will slow down the surface search by using a continuation method per voxel. The only problem that might arise in some cases is stitching the mesh along boundaries of the tv; since we are going to render the mesh directly, this will not produce visual defects.

We implemented this algorithm using two different methods. Our initial implementation used Intel® Threading Building Blocks (Intel® TBB) to apply the Bounding Box subdivision on a multi-core CPU. Using TBB, we divided the algorithm into 2 major steps. First we applied a Parallel_For template function on a three-dimensional bounding box. Inside each voxel, we created a list of all cubes traversing the manifold at that voxel. This cube list is further triangulated using a Parallel_Reduce operation.

The second implementation used a Many-Core programming paradigm. A host application marshals tree structure into a data package and sends it as a message to a device application. On device side, the package is opened and converted to a light weight tree. We used separate threads to create the same two steps for division to voxels and for the triangulation of each voxel. The resulting mesh is finally directly rendered to the device context.

In terms of performance our CPU implementation shows a maximum speed-up of 10x when compared to serial marching cubes method. In addition, number of field value evaluations per triangle has been reduced at least 4 times due to the underlying continuation method used and caching of field values when processing each thread volume.

Figure 2. Comparison of performance for rendering each frame at different cell-sizes. blue, red and green bars represent MC, Continuation and Parsip respectively. Vertical axis shows logarithmic rendering time in hundreds of seconds.

Figure 3. Comparison of field value evaluations per triangle at different cell-sizes. blue, red and green bars represent MC, Continuation and Parsip respectively.

Figure 4. Number of threads vs. Polygonization time in milliseconds.



  1. Wyvill, B., Guy, A., & Galin, E. (1999). Extending the CSG Tree - Warping, Blending and Boolean Operations in an Implicit Surface Modeling System. In Computer Graphics Forum (Vol. 18, p. 149–158).
  2. Bloomenthal, J. (1994). An implicit surface polygonizer. Graphics gems IV, 1, 324–349. Citeseer.
  3. Wyvill, G., McPheeters, C., and Wyvill, B., (1986) Data Structure for Soft Objects, In The Visual Computer, (Vol. 2, Num. 4, p. 227-234).
  4. Complete paper on this work: /content/dam/develop/external/us/en/documents/parsip-extendedabstract-167043.pdf
  5. Poster Presentation from Grand 2010 (Ottowa,Canada): /content/dam/develop/external/us/en/documents/parsip-poster-167043.pdf

Advisor: Brian Wyvill

brian.jpgBrian Wyvill obtained his PhD in computer graphics at the University of Bradford and subsequently worked as a Post-Doc at the Royal College of Art in London. In the late 1970’s he worked on sequences for the movie, Alien. Brian spent 25 years at the University of Calgary, and now holds a Canada Research Chair at the University of Victoria, where he is doing research into implicit modeling and animation, and non-photo realistic rendering.





Further Resources:






Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804