<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Wed, 25 Nov 2009 11:53:53 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/feed" rel="self" type="application/rss+xml" />
    <title>Intel Software Network - <![CDATA[ Pipeline buffer between stages? ]]> feed</title>
    <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <em>"sometimes the first and last pixel does not match"<br /></em>I would suggest holding a magnifying glass over your own code first. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Wed, 28 Oct 2009 23:40:28 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <div style="margin:0px;">
<div id="quote_reply" style="width: 100%; margin-top: 5px;">
<div style="margin-left:2px;margin-right:2px;">Quoting - <a href="/en-us/profile/412573">Neeley</a></div>
<div style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"><em>I have set up a pipeline that reads image data from a file -&gt; performs a filter -&gt; performs a filter -&gt; output result to the screen. Just to be sure that i had things set up correctly (and I understood what I am doing) I replace the first and last pixel with a frame counter just after reading the data from file in my input stage. Each stage takes precautions to maintain the original frame counter in these pixels. After checking these pixels at my output stage, I am finding that these counters will get out of sinc, sometimes the frame numbers do not stay sequential and sometimes the first and last pixel does not match. I have even set each filter to serial_in_order and still get these odd results. This leads me to believe I should place buffers between my stages. So here are my questions:<br /><br />1) Should I place a buffer between each stage of the pipeline?<br />2) How large should these buffers be?<br />3) Should I use concurrent_queue to implement these buffers?<br />4) If I push and pop from within  each stages operator() what would I return as a token?<br />5) Is there a better solution to this problem?<br /><br />I know that debugging a problem with out seeing the actual code is hard,but any advice on this subject would be greatly appreciated.</em></div>
</div>
</div>
Hi,
<div><br /></div>
<div>what about using data-dependent breakpoints in a debugger to find out which thread and from which function write-accesses the first and the last pixel in the image? Since you're saying that you changed all of the stages to be serial_in_order, but the error did not go away, I'd assume that the problem is not with parallelism, but with the image processing code.</div> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Thu, 05 Nov 2009 05:30:26 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <div style="margin:0px;">
<div id="quote_reply" style="margin-top: 5px; width: 100%;">
<div style="margin-left:2px;margin-right:2px;">Quoting - <a href="/en-us/profile/334152">Anton Pegushin (Intel)</a></div>
<div style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"><em>Hi,
<div><br /></div>
<div>what about using data-dependent breakpoints in a debugger to find out which thread and from which function write-accesses the first and the last pixel in the image? Since you're saying that you changed all of the stages to be serial_in_order, but the error did not go away, I'd assume that the problem is not with parallelism, but with the image processing code.</div>
</em></div>
</div>
</div>
<br />I have fixed some of the problems that I was having. This is what I am doing now. I the input stage of the code I replace the first and last pixel with the frame number. Then in each of the piplelines that follow the first operation is to copy from the token that is passed into a local array. Then I store the first and last pixel into a local varible. When I leave the stage I replace the first and last pixel with the sotred values and copy into the token that I am passing out. This helped alot but still did not completely fix the problem. I then switched all the stages but the first and last to parallel, now the two stages after the input act as expected, but the last still has problems. I also started printing out the test pixels to individual txt files. one for each stage instead printing to the screen.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Thu, 05 Nov 2009 15:05:34 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ Why copy to a "local array" (seems expensive and useless)? What happens if you initialise task_scheduler_init with argument 1? If this does not make the problem go away, what if you actually make the program serial? Are you using vector instructions or plain C++? Note that parallel filters are run right after any preceding filter on the same thread, so if there is a difference with running them serially that should provide a clue. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Thu, 05 Nov 2009 20:56:41 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <div style="margin:0px;">&gt;&gt; Why copy to a "local array" (seems expensive and useless)? <br /></div>
<br />Raf, how would you do this? <br /><br />I originally was not doing this, but then I realized that since we are passing a (void*) between filters, if you do not protect the data while filter 2 is reading its input (filter1's output), filter 1 is changing the data. I am now thinking that I should probably wrap the memcpy in a CRITICAL_SECTION to further assure that I am not reading and writing at the same time. <br /><br />&gt;&gt; What happens if you initialise task_scheduler_init with argument 1? If this does not make the problem go away, what if you actually make the program serial? Are you using vector instructions or plain C++? Note that parallel filters are run right after any preceding filter on the same thread, so if there is a difference with running them serially that should provide a clue. <br /><br />I started with a serial implementation of this code, so yes things work right when I run it serially. The only reason I started inserting the frame count into the first and last pixel was to convince myself that I did not have concurrencey problems with the threaded implementation. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Fri, 06 Nov 2009 08:10:52 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <div style="margin:0px;">Why not allocate a memory buffer for each token (a portion of data processed by filters at once), pass this biffer through all pipeline stages, and free in the last one after the buffer is no more necessary? This way you should have no conflicts between filters. Sorry if I miss the reasons for the buffer being shared.</div> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Fri, 06 Nov 2009 09:22:00 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <div style="margin:0px;">
<div id="quote_reply" style="margin-top: 5px; width: 100%;">
<div style="margin-left:2px;margin-right:2px;">Quoting - <a href="/en-us/profile/333976">Alexey Kukanov (Intel)</a></div>
<div style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"><em>
<div style="margin:0px;">Why not allocate a memory buffer for each token (a portion of data processed by filters at once), pass this biffer through all pipeline stages, and free in the last one after the buffer is no more necessary? This way you should have no conflicts between filters. Sorry if I miss the reasons for the buffer being shared.</div>
</em></div>
</div>
</div>
<br />Thanks for the input. I think this answers my original question the best so far. I am starting to understand that the pipeline does help with the threading, but memory concurrency is still up to the user. I should have realized this as soon as I saw that the pipe line passes pointers. I think I have found that yes I do need some kind of buffer in between each stage of the pipeline, but those buffers only have to be large enough to ensure that no data races (concurrency issues) occur.   ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Fri, 06 Nov 2009 12:02:33 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <em>#5 "I originally was not doing this, but then I realized that since we are passing a (void*) between filters, if you do not protect the data while filter 2 is reading its input (filter1's output), filter 1 is changing the data."<br /></em>No, only one filter at a time is using that particular void*, and the referenced data is implicitly synchronised from one filter to the next if only plain C++ is being used. Don't handicap your program's performance with needless protection against imaginary races. The idea is actually that filters are visiting the data, not the other way around, which is important for cache locality. If you need to transform an image, then a void* value can point to two adjacent buffers, with each filter transforming the data directly from one buffer to the other (without an intermediate buffer!), although it would be better still to reuse one buffer if the transformation only needs local data (something like colour shift instead of image warp). ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Fri, 06 Nov 2009 18:26:42 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <div style="margin:0px;">&gt;&gt;No, only one filter at a time is using that particular void*, and the referenced data is implicitly synchronised from one filter to the next if only plain C++ is being used. Don't handicap your program's performance with needless protection against imaginary races.<br /><br />This is exactly what I was hoping, but the test did not seem to show this. I must be doing something wrong. <br /><br />&gt;&gt;The idea is actually that filters are visiting the data, not the other way around, which is important for cache locality.<br /><br />I will have to get my head around this. Are these buffers created outside the filters or are they members of the filters?<br /><br />&gt;&gt;If you need to transform an image, then a void* value can point to two adjacent buffers,<br /><br />Do you mean to use a memory block that is twice as large of the image where the first half is the input and the second half is the output? If not I really do not understand how one pointer can point to two buffers?<br /><br />&gt;&gt; although it would be better still to reuse one buffer if the transformation only needs local data (something like colour shift instead of image warp).<br /><br />I have examples of both, will also need to go trough an image and compile a list and in the next filter work on that list.<br /><br />Are there other exaples for pipeline? I have been locking for some but cannot find any thing but simple string manipulations.  <br /><br />I thank every one that has commented on this topic. I have tried to figure this out on my own (I even bought and read Reinders book) and only posted the questions here as a last resort. All of your comments are a big help.<br /></div> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Fri, 06 Nov 2009 22:05:43 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: Pipeline buffer between stages?</title>
      <description><![CDATA[ <em>"I will have to get my head around this. Are these buffers created outside the filters or are they members of the filters?"<br /></em>Dealing with buffers is your job, the pipeline only passes void* values from one filter to the next. The values can change from input to output, but the pipeline will try to apply successive filters on the same thread to improve locality. I don't know how important that is here, though, but for understanding what you are seeing you should probably know that successive parallel filters are executed one after the other on the same thread.<br /><br /><em>"Do you mean to use a memory block that is twice as large of the image where the first half is the input and the second half is the output? If not I really do not understand how one pointer can point to two buffers?"<br /></em>Yes, just to avoid an intermediate copy. Or you keep them separate, and let each filter read from the input and write to the newly allocated output, which is then passed to the next filter after the input is discarded. Just avoid using an intermediate local buffer with a wasted copying action.<br /><br />Sorry I couldn't help with the actual problem, which remains a mystery. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</link>
      <pubDate>Sat, 07 Nov 2009 01:10:58 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/69488/</guid>
      <category>ISN General</category>
    </item>
  </channel></rss>