Big Datasets from Small Experiments

By Andrey Vladimirov, Published: 07/06/2017, Last Updated: 07/06/2017

Prof. Dunham's experiment and Poincaré plot

Modern experiments produce lots of data. Abundant data is not exclusive to the "big gun" institutions, such as observatories and particle colliders. It is also the norm in modest-size labs working on anything from genomics to microscopy. Even outside of science you don't need to go far to find lots of data. With the Internet-of-Things (IoT) at play, a modern smart home is a continuous source of big datasets! With data collection as easy as it is, how does one analyze the data efficiently?

The work of Prof. Jeffrey Dunham connects real-world phenomena to data collection to computing in a very pure experiment. He has built a tabletop-scale chaotic pendulum equipped with a high-precision rotary encoder. The pendulum produces hundreds of gigabytes of data per day. This data reveals the strange attractor of the pendulum, which is a fractal. This manifestation of "order in chaos" is not only a thing of beauty. It has roots in chaos theory, which also applies to climate studies, biology, cryptography, and technology. However, the amazing fractal structure of the data emerges only with proper post-processing. “Proper” means that the experimenter must scan a parameter space of the Savitzky-Golay filter. For each point, the computationally expensive filter must be applied to the entire dataset. For good science in this experiment, computational performance is paramount.

In his upcoming presentation in Modern Code Contributed talks ("MC² Series"), Prof. Dunham shares his experience with this computational challenge. He talks about the modern code practices that allowed him to shrink the data processing time from hours to fractions of a second. That was made possible through two factors. The first one is the usage of an Intel® Xeon Phi™ processor (formerly Knights Landing). The second one is a thoughtful approach to parallel programming. Prof. Dunham also talks about probing the peak performance of these processors, the roofline model, and the importance of vector arithmetics.

Tune into the webinar on July 11, 2017, or watch a recording after this date at


Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804