Storage developers can get an overview on generating cryptographic hashes, which dramatically improve software performance.
Hi. I'm Praveen from Intel. In this video, we're going to talk about Intel® Intelligent Storage Acceleration Library cryptographic hashing code sample. You can find this code sample on Intel® Developer Zone in the links provided.
The Intel® ISA-L provides storage developers building deduplication software the ability to generate cryptographic hashes extremely fast, which can radically improve deduplication performance due to the high throughput of the algorithms on the order of multiple gigabytes per second per code. Keeping all the lanes full may pose a challenge from a threading perspective.
This example here describes a sample application, including downloadable source code, to demonstrate the utilization of the Intel storage cryptographic hash feature. The threading model here demonstrates a design pattern similar to the producer-consumer patterns which can be used to keep the lanes full.
In computing, the producer-consumer design is an example of multiprocess synchronization. The problem describes two processes, the producer and the consumer, who share a common fixed-size buffer used as a queue.
The producer's job is to generate data, put it into the buffer, sleep for some amount of time, and start again. At the same time, the consumer is consuming the data, which is removing it from the buffer, one piece at a time.
The sample application produces output to help characterize the level of parallelism necessary to saturate the single core computing the hashes. A variable number of producer threads from one to sixteen will fill a single buffer with data chunks, while a single consumer thread will take data chunks from the buffer and calculate cryptographic hashes using Intel ISA-L's [sic] implementations.
You can choose the number of producer threads submitting data, which can be two, four, eight, or sixteen, and the type of hash: MD5, SHA-1, SHA-256, or SHA-512. The example will produce output that shows the utilization of the consumer thread and the overall [? wall ?] clock time.
Each producer is assigned one chunk in which it will submit data. On each iteration, the producer waits until our chunk is ready to write, then fill it with data, and sleep for the appropriate amount of time to simulate the data.
The consumer will repeatedly wait for some chunks of data to be ready for read, summit each of them to be hashed, mark those chunks ready for write, wait for the jobs to be done, and unlock the mutex and notify all the waiting threads so the producers can start filling the chunks again.
The program accepts multiple command-line arguments. One of them is the number of producers. That varies from two to sixteen. Another important one is the speed. The speed argument is used to choose how fast each producer is generating data.
If speed is 100 MB, each producer thread would take one second to generate a 100-MB chunk. The faster the speed, the less time the consumer thread will have to hash the data before the new chunks are available. This means the consumer thread usage will be higher.
The other arguments are the chunk size, which is the size of the data chunks, is being defined by each producer for each iteration. And then the consumer submits the data chunk to the hash function.
The total size, which is the total amount of data to be generated and hashed— knowing this and the other parameters, the program knows how many times chunks will need to be generated in total and how many hash jobs will be submitted in total. When all the data has been hashed, we display the results, including the thread usage. This is computed by comparing the amount of time we waited for chunks to be ready and the amount of time we actually spent hashing the data.
Hash speed is effective speed at which the ISA-L [sic] crypto functions hash the data. The clock for this starts running as soon as at least one data chunk is available and stops when all these chunks have been hashed.
We finally compare how long we spent waiting for chunks of data to be available to how long the consumer thread has been running in total. Any value lower than 100 percent shows that the consumer thread was able to keep up with the producers and had to wait for new chunks of data. A value very close to 100 percent shows that the consumer threads were consistently busy and were not able to outrun the producers.
Running this example with a task set command to codes number 3 and 4 shows that when the program runs as a single thread on code number 3 and 4, 55 percent of its time is waiting for the producer to submit the data.
Similarly, in this run, we set the number of producers as 16, with the same chunk size and the total size, and running the program with the task set command for code 3 to 20 for the sixteen producer threads. Now, the program runs as 16 threads on code number 3 to 20. And only two percent of its time is waiting for the producer to submit the data.
As demonstrated in this example, you can change the parameters. And as per your CPU configurations, you can use the Intel ISA-L cryptographic hashing feature to get the best performance, as per your needs, of your storage applications.
Thanks for watching. To learn more about ISA-L [sic], check out the links below. And make sure to watch the rest of this video playlist on Intel ISA-L. Don't forget to like this video and subscribe.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804