I've been debugging our application to remove all memory leaks as we would like to be able to run certain things continuously on a loop (booth mode so to speak). When using valgrind under linux a single run of our program does not give anymore any warnings about our code but it gives quite a lot warnings from code inside Intel's SDK.
This is an example what happens when we run one iteration of our loop:
==9600== HEAP SUMMARY:
==9600== in use at exit: 48,908,822 bytes in 350,902 blocks
==9600== total heap usage: 982,292 allocs, 631,390 frees, 278,624,761 bytes allocated
==9600== LEAK SUMMARY:
==9600== definitely lost: 23,849 bytes in 595 blocks
==9600== indirectly lost: 49,871 bytes in 43 blocks
==9600== possibly lost: 7,426,119 bytes in 43,171 blocks
==9600== still reachable: 41,408,983 bytes in 307,093 blocks
==9600== suppressed: 0 bytes in 0 blocks
==9600== Reachable blocks (those to which a pointer was found) are not shown.
==9600== To see them, rerun with: --leak-check=full --show-reachable=yes
==9600== ERROR SUMMARY: 6789900 errors from 1213 contexts (suppressed: 32 from 6)
We've debugged our CL api calls and all memory objects, kernels, programs whatever are released (for each function which increase reference count there is one call which decrements it) so I have good faith that we are not leaving anything dangling around, and when we use our own OpenCL mockup library (does nothing except pretend to be an OpenCL implementation) there are no memory leaks. In addition we use a custom memory manager which allocates a single large block so our allocations (except that large block) are not visible on the leak summary.
As an example of what we see at valgrind:
==9600== 48 bytes in 1 blocks are definitely lost in loss record 742 of 1,781
==9600== at 0x4C28B42: operator new(unsigned long) (vg_replace_malloc.c:261)
==9600== by 0x1538F9E4: Intel::OpenCL::ClangFE::CompileTask::Execute() (in /usr/lib/OpenCL/vendors/intel/libclang_compiler.so)
==9600== by 0x1489C1B2: Intel::OpenCL::TaskExecutor::execute_command(Intel::OpenCL::TaskExecutor::ITaskBase*) (in /usr/lib/OpenCL/vendors/intel/libtask_executor.so)
==9600== by 0x1489F7B3: Intel::OpenCL::TaskExecutor::in_order_executor_task::execute() (in /usr/lib/OpenCL/vendors/intel/libtask_executor.so)
==9600== by 0x14CCF4E3: tbb::internal::custom_scheduler::local_wait_for_all(tbb::task&, tbb::task*) (in /usr/lib/OpenCL/vendors/intel/libtbb.so.2)
==9600== by 0x14CCD1C7: tbb::internal::arena::process(tbb::internal::generic_scheduler&) (in /usr/lib/OpenCL/vendors/intel/libtbb.so.2)
==9600== by 0x14CCC11A: tbb::internal::market::process(rml::job&) (in /usr/lib/OpenCL/vendors/intel/libtbb.so.2)
==9600== by 0x14CCA44B: tbb::internal::rml::private_worker::run() (in /usr/lib/OpenCL/vendors/intel/libtbb.so.2)
==9600== by 0x14CCA3C5: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/lib/OpenCL/vendors/intel/libtbb.so.2)
==9600== by 0x54F9D8B: start_thread (pthread_create.c:304)
==9600== by 0x7D1404C: clone (clone.S:112)
This is also seen when we keep on running the program the memory usage (as reported by the OS) increases steadily, slowly but steadily. Sometimes faster, sometimes slower. This makes it somewhat difficult to run anything continuously, not to speak of applications which would actually require continuous operation.
Do we have any other options than periodically restart our program to free the leaked memory? Do you have plans to clean up the code?
When running at windows and using OpenGL interoperability the memory leaks go trough the roof. Within 10 minutes hundreds of megabytes are wasted, that's quite a lot compared to AMD APP SDK (GPU) which leaks only 40 megs per hour of looping.