<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Intel Software Network Blogs &#187; Asaf Shelly</title>
	<atom:link href="http://software.intel.com/en-us/blogs/author/asaf-shelly/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Wed, 10 Feb 2010 13:05:06 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Future of Internet</title>
		<link>http://software.intel.com/en-us/blogs/2009/12/31/the-future-of-internet/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/12/31/the-future-of-internet/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 21:01:03 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Social Media & Virtual Worlds]]></category>
		<category><![CDATA[What If Software]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[http://AsyncOp.com]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/12/31/the-future-of-internet/</guid>
		<description><![CDATA[Around Y2K a friend and I had an idea for a phone that works over the Internet by connecting to USB. The time was not so great because of that bubble. A few years later Skype did it.
After that I went into the area of media and Internet video and broadcasting. Part of this was [...]]]></description>
			<content:encoded><![CDATA[<p>Around Y2K a friend and I had an idea for a phone that works over the Internet by connecting to USB. The time was not so great because of that bubble. A few years later Skype did it.</p>
<p>After that I went into the area of media and Internet video and broadcasting. Part of this was the experimental broadcast of Israeli Channel 10 on this site: <a href="http://ttvv.tv">http://ttvv.tv</a>. There were two applications that I could think of at the time. The first is a storage that lets people share their video clips online - this is YouTube. The second application is a radio or TV channel with multiple sources. This means that everyone can bring their own content to the public, live. This was not really implemented yet and we only have web-blogs today. Perhaps one day a shopping center would has a video room for kids to broadcast their content after school..</p>
<p>About four to five years back I had a vision of all network routers in an organization becoming a cloud. You can read about this by searching the web for patents with my name on them. Today everybody is considering cloud as the near future of computer systems.</p>
<p>What I really want to talk about is the next step after it. The future of Internet in two to four years from today.</p>
<p>First and foremost anonymity will be gone. Gone or at least cast a side. Today we log in to most of the websites and most verify our email address. IP addresses can pin-point out location to the area of a city and very often a street, the company we work for, and this is widely public for every website to use. Facebook gave a face for the people we talk to and we can choose not to chat with people we do not know. People who do not log in to facebook can't interact with anyone. I am anonymous right now - you don't know where I am and where I work, and still you know more about me than anyone I meet at the shopping center near my house. You read my blog, you have a picture of my face, and you know what I think about many things. Anonymity allows people to feel free over the web but the truth is that Google knows who you are when you are browsing the web. They need this to help you with your search results and with ads. Websites know who you are, that you came back, what other sites you visited, and who sent you. The Internet started with anonymity for porn but in fact most porn sites will not only keep track over you but will also make the best effort to install a spyware on your computer so they can keep better track over you.</p>
<p>We can still be anonymous in the web but we can also be anonymous in a dark alley - nobody is going to recognize you there. The question is what kind of interaction are you willing to have with people who wish to stay completely anonymous...</p>
<p>The other side of innovation in the Internet and web, which will soon link to the first, is Cloud. I am not talking about cloud computing, rather it is cloud as a concept. We have a few clouds today, for example e-Mule is a cloud in which everyone can donate their file resources. You contribute more - you get more. DNS servers in the Internet operate as a cloud. When we think of machines working together we like it. It feels right. This is because our community is a cloud. Family is cloud, friends are cloud, every group of people who agree to share, who agree to give in exchange for receiving operate as a cloud, as a collective.</p>
<p>Next step for the Internet then is to let me identify the people around me so we can agree on cooperation. If I can locate the people near me then I get more out of the cloud. If I play online with the guy next door then he is more likely inclined to give me a cup of sugar when I am out of. You get more if your communities overlap. Friends are members of your community. Kids today have more friends online that face to face, this is because they are part of more online communities than otherwise. The more committed you are and more involved with a community you are better privileges. Communities also have leaders.</p>
<p>If I have a nice song that I wrote I cannot share that with the rest of the world on my own. I need to find a website, a place where people with the same interest meet. A while back I had to go to a club to play chess with other people. Today I can do it online by using an application that searches for someone else with interest in the same game. Gamers also have communities and you usually play online with your username. Today I have to go to a website to find out about new songs. Google cannot search for a song I wrote on my computer. I need to use a central server to publish that. Or do I?</p>
<p>Why can't I publish my work? I wrote a nice application and I want other programmers to benefit from this code. I share information I get better access to other resources that the community (or cloud) has to offer. Why do I need a web server to share files? Why can't I borrow web-server services from the community in exchange for sharing my computer as a web-server?</p>
<p>The Internet has no owner and anonymity is basic. Both arguments are false because you cannot do anything without being identified, and everywhere you go online you are dependent on web-site owners. They might want to delete your post or comment, they might agree to publish your work... The real Internet begins when everyone has an equal share. Like a Kibbutz, just like communism only you are the owner of your resources and you will only give them willingly - like democracy.</p>
<p>The time to start is now.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/12/31/the-future-of-internet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel &amp; Microsoft TechEd Europe 2009</title>
		<link>http://software.intel.com/en-us/blogs/2009/12/02/intel-amp-microsoft-teched-europe-2009/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/12/02/intel-amp-microsoft-teched-europe-2009/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 17:11:16 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Intel® Atom™ Developer Program]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Berlin]]></category>
		<category><![CDATA[guest blogger]]></category>
		<category><![CDATA[TechEd]]></category>
		<category><![CDATA[TechEd2009]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/12/02/intel-amp-microsoft-teched-europe-2009/</guid>
		<description><![CDATA[Microsoft TechEd 2009 was a wonderful experience. So many T-Shirts, pens, and chances to win prizes...
There was also something technical there, a session or two, Windows 7 features, Silverlight for Windows CE C++, Parallel Computing, Visual Studio 2010, and more and more....
Between session to session we visited the exhibition and had too much to eat.
It [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft TechEd 2009 was a wonderful experience. So many T-Shirts, pens, and chances to win prizes...<br />
There was also something technical there, a session or two, Windows 7 features, Silverlight for Windows CE C++, Parallel Computing, Visual Studio 2010, and more and more....</p>
<p>Between session to session we visited the exhibition and had too much to eat.<br />
It took us a while to figure out the way to the event. Berlin has some very nice buildings and rivers, and also some confusing scenery. For example a building that looks like a spaceship: very cool, but will make you walk fast and on the other side of the street :-)</p>
<p><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/_img_2053_.jpg" alt="Fly-by building" width="600" height="450" /></p>
<p>Special care for bicyclists with dedicated lanes on the sidewalk, but how do you figure this one out?</p>
<p><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/img_2052.jpg" alt="Stay home" width="600" height="450" /></p>
<p>Generally speaking the event was great and we even got to meet once more with our friends and talk about the Intel community and it looks like people were really interested in what Intel had to offer.</p>
<p><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/img_2084.jpg" alt="Stay home" width="600" height="450" /></p>
<p><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/img_2089.jpg" alt="Stay home" width="600" height="450" /></p>
<p><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/img_2087.jpg" alt="Stay home" width="600" height="450" /></p>
<p><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/img_2083.jpg" alt="Stay home" width="600" height="450" /></p>
<p>As always these were long and fruitful days. I learned a lot and even had a chance to teach (hopefully :) with a session called "parallel programming for embedded". You can watch the video here on the home page of my technical website: <a href="http://asyncop.com/">http://asyncop.com/</a></p>
<p>Bottom line: It is good to travel and it was a lot of fun and there is nothing better than going <strong>back home</strong> after a long and exiting week.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/12/02/intel-amp-microsoft-teched-europe-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TechEd 2009 Europe</title>
		<link>http://software.intel.com/en-us/blogs/2009/11/05/teched-2009-europe/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/11/05/teched-2009-europe/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 01:05:03 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Intel® Software Network 2.0]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[guest blogger]]></category>
		<category><![CDATA[Parallel Prog. & Multi-Core]]></category>
		<category><![CDATA[TechEd]]></category>
		<category><![CDATA[www.AsyncOp.com]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/11/05/teched-2009-europe/</guid>
		<description><![CDATA[Hi All,
If you are going to the event next week in Berlin then let me know about it. Maybe we can meet face to face and if there are enough of us perhaps even a gourp community meeting. This can be a good opportunity to meet the experts.
In any case, you are all welcome to [...]]]></description>
			<content:encoded><![CDATA[<p>Hi All,</p>
<p>If you are going to the event next week in Berlin then let me know about it. Maybe we can meet face to face and if there are enough of us perhaps even a gourp community meeting. This can be a good opportunity to meet the experts.</p>
<p>In any case, you are all welcome to join my session titled "Parallel Programming for Embedded". I will be presenting on Friday 10:45 - 12:00.</p>
<p>At the basis of this presentation is the fact that the hardware has always been parallel. This also caused the kernel drivers to live in a parallel environment, so even though embedded devices were late to adopt Multi-Core CPUs, the people who are working with the lower levels have always been working in parallel environments.</p>
<p>The session speaks of parallel systems in general side by side with embedded systems and infrastructure environemnts.</p>
<p>The goal of this session is to open the eyes and show the systems that have always been working in parallel and name the principles used with these systems.</p>
<p>You can read my previous blogs to learn more about this approach. For example these:</p>
<p><a href="http://software.intel.com/en-us/blogs/2008/10/29/is-dos-the-ideal-parallel-environment-part-iv/">is dos the ideal parallel environment part iv</a></p>
<p><a href="http://software.intel.com/en-us/blogs/2009/07/27/stateful-programming-a-case-study/">stateful programming a case study</a></p>
<p>Here are a few slides from this presentation.</p>
<div id="attachment_11631" class="wp-caption alignnone" style="width: 310px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/parallel-programming-for-embedded-slide-2.jpg"><img class="size-medium wp-image-11631" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/parallel-programming-for-embedded-slide-2-300x225.jpg" alt="Parallel Programming for Embedded TechEd 2009" width="300" height="225" /></a><p class="wp-caption-text">Parallel Programming for Embedded TechEd 2009</p></div>
<div id="attachment_11632" class="wp-caption alignnone" style="width: 310px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/parallel-programming-for-embedded-slide-56.jpg"><img class="size-medium wp-image-11632" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/parallel-programming-for-embedded-slide-56-300x225.jpg" alt="USB Ping Pong" width="300" height="225" /></a><p class="wp-caption-text">USB Ping Pong</p></div>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/parallel-programming-for-embedded-slide-71.jpg"><img class="alignnone size-medium wp-image-11634" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/11/parallel-programming-for-embedded-slide-71-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>Hope to see you all there,<br />
Asaf</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/11/05/teched-2009-europe/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>State-Phase Programming</title>
		<link>http://software.intel.com/en-us/blogs/2009/09/02/state-phase-programming/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/09/02/state-phase-programming/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 21:22:45 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/09/02/state-phase-programming/</guid>
		<description><![CDATA[It has been relatively easy for us to follow the path of a serial application. Today we face the need to execute several processes in parallel and thus have several execution paths at the same time. This is harder for us to manage and keeping track of this flow is complex. The Stack-Trace is no longer the state of the application. The post speaks of using States and Phases to keep track of application operations.]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" dir="ltr"><span style="'Times New Roman'"><span style="Times New Roman;">It has been relatively easy for us to follow the path of a serial application. The flow of the application is managed by the Stack. This makes the execution flow of a single process to be managed by CPU hardware. Today we face the need to execute several processes in parallel and thus have several execution paths at the same time. This is harder for us to manage and keeping track of this flow is complex. The hardware accelerated Stack is no longer sufficient for us and the Stack-Trace is no longer the state of the application. I have elaborated on this on a previous post called </span></span><a title="Stateful Programming - A Case Study" href="http://software.intel.com/en-us/blogs/2009/07/27/stateful-programming-a-case-study/"><span style="Times New Roman;">Stateful Programming - A Case Study</span></a><span style="'Times New Roman'"><span style="Times New Roman;">. This post analyzed the execution flow of a serial application in terms of Execution States. A system design that specifies states is beneficial for a serial application as explained in that post. When it comes to parallel applications this design step becomes a must.</span></span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The Application State is a cross-section of all states of running elements. On a serial application this can be the state of the Stack (or Stack-Trace). For a parallel application there are several processes working at the same time. Every thread is doing something or waiting for something. There are also suspended operations. Here are a few examples:</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">[1] OpenFile, ReadFile(A.txt), WriteFile(B.txt), CloseFile</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The states are: <em>(1) Initial State, (2) The File is open, (3) Data was <span style="yes"> </span>Read, (4) Data Written, (5) File Closed</em>.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">[2] OpenFile, ReadFile(A.txt), CloseFile, UpdateUI</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The states are: <em>(1) Initial State, (2) The File is open, (3) Buffer was Read, (4) File Closed, (5) GUI Updated</em>.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">[3] LockBuffer, ReadBuffer, UnlockBuffer, UpdateUI</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The states are: <em>(1) Initial State, (2) Buffer Locked, (3) Buffer was Read, (4) Buffer Unlocked, (5) GUI Updated</em>.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The examples above are serial implementations but can also be converted into parallel implementations.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">Example [3] seems to be serial however it is using a lock which means that the transition from State (1) to State (2) is dependant on the execution of another thread. For example if there are two thread doing this operation then Thread B can only move from State (1) to State (2) if Thread A is not in State 2 or 3. If Thread A is currently in state (3) then Thread B will enter a wait state and the operation will be suspended.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">Example [2] is serial but it is very likely that the GUI Window will be owned by another thread and only that thread is allowed to access the window. Modifying data on the window can be performed by posting a message to the queue of the owner thread. This means that Thread A, performing operation [2], will post a message to Thread B which is the owner of the GUI Window, and can then continue to handle another task. Thread A can in effect start another operation [2] and keep posting messages to Thread B. In this situation even though all threads are working, some of the operations can be suspended. How many system designs have you seen that show this?</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">Example [2] is serial because the file operations are blocking. It is possible to use non-blocking file operations and for example have a Callback function executed when the ReadFile is complete. The thread will open the file, call the ReadFile API and assume that the Callback be called and therefore the thread can take another operation. This time again, there can be suspended operations even though all threads are working.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">It seems that looking at the Stack-Trace as the state of the application is a huge mistake. There are many operations that have no thread associated with them and therefore have Stack related to them. Other operations can cross thread boundary or are effected by other threads and looking at the Stack-Trace shows us only partial information.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">There are bugs that appear randomly. <em>Random Bugs</em> we call them. Well, actually these bugs are very accurate. The problem is that we just don't have a good view of the system. Imagine that you are driving a car with no windows. You can only see the inside of the car. Sometimes you turn the wheel and everything is fine and sometimes you turn the wheel at the same way and you "Randomly" hit something. The way to predict and locate bugs is to have a real view of the interactions in our application.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The two basic concepts that we should need to add to our design are Phase and State. When we write a Procedural C code the operation is inside the function - this is why it is called a Procedure. The operation has several Phases that it is going through, or several states of execution. In Procedural programming every branch is commonly an execution state – a Phase. Executing the code 'step-by-step' over the Procedure means going over the different Phases of the operation. We can see the different Phases by looking at the lines of code (see my previous post for more information). The operation Phase does not have to be associated with a thread. When we break this association we can start managing our application better.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">A primitive management of operation Phase is performed by using Locks, Events, etc. We wait for an application wide Phase. The second important element is object State. We don't have good tools to keep track of that. Actually the only States we know are: <em>Not-Initialized, Initialized, Working, Can-Dispose, Disposed</em>. These States are kept to manage resources and avoid resource leaks. Some of the object States are relevant for some of the operations. For example I want to start scanning my picture only after the buffer is ready in memory and no one else is using the output file. This adds these new States to the picture object: <em>No-Data, Updating, Data-Ready</em>. We can include to this States that include version numbering etc. Actually there are more States when I want to start scanning my picture only after the buffer is ready: <em>Scanning-Data, Data-Scanned</em>.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">When we snapshot all Phases and States in the system we know exactly what happened and when!</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">To better explain the use of States I wrote a small sample application that uses a pseudo engine. A real implementation requires the use a kernel driver and would only confuse more than explain. Phases can be managed using the same State manager.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The code is published here:</span></p>
<p class="MsoNormal" dir="ltr"><a href="http://www.asyncop.com/MTnPDirEnum.aspx?treeviewPath=%5bo%5d+Open-Source%5cPhase-State+Programming"><span style="Times New Roman;">http://www.asyncop.com/MTnPDirEnum.aspx?treeviewPath=%5bo%5d+Open-Source%5cPhase-State+Programming</span></a></p>
<p class="MsoNormal" dir="ltr">
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The code is waiting for the correct State but it is also possible to post a message to a thread or to have all States and Phases waiting in a queue of tasks that belongs to a thread-pool.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">In the log you see, <strong>Num</strong> is the serial number of the log message, <strong>MM:SS.millisec</strong> is the time of message in the format of <em>minutes : seconds . milliseconds</em>, and <strong>Application Log</strong> is the message sent from the application. The messages that start with "<strong>[ Main ]</strong>" are Phases on main() and the messages that start with "<strong>[ Communication Buffer ]</strong>" are States of the object as it reports modifications.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">A real implementation will allow you to be notified of changes, allow to prevent the shift to next State until a task is completed, and allow to hook on State change before everyone is notified.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The code sample in <strong>PhaseStateTest00.cpp</strong> is the test application with main() and the other threads.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The code sample in <strong>State_Simulation.cpp</strong> and <strong>State_Simulation.h</strong> is the implementation and declarations of the pseudo State management engine.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">The code demonstrates how the system is still coherent even though an operation is crossing several threads and even State(3) of the object is used by both a thread and by main() which in terms of execution flow is similar to forking.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">Using this method of system modeling should greatly reduce execution-flow bugs, and can really help with system debug. Many of the bugs are anticipated and prevented at the design stages.</span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" dir="ltr"><span style="Times New Roman;">It is very difficult to implement this using user-mode API however a kernel driver can do this very easily.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/09/02/state-phase-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stateful Programming - A Case Study</title>
		<link>http://software.intel.com/en-us/blogs/2009/07/27/stateful-programming-a-case-study/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/07/27/stateful-programming-a-case-study/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 16:41:46 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[Multi-core development]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/07/27/stateful-programming-a-case-study/</guid>
		<description><![CDATA[ A case study of Stateful Programming: a class of students tought Stateful Programming before Object Oriented Programming.]]></description>
			<content:encoded><![CDATA[<p>One of the basic tools that a programmer has in multi-core programming and parallel computing is the state maintainability. A serial application has only one flow and therefore the Call-Stack (list of nested function calls) is in a way the application's state. In a parallel environment this is clearly not the case because there are many threads with many Call-Stacks at the same time. For example taking a snap shot of a parallel system in order to reproduce a bug requires the states of all running threads, sometimes processes, and sometimes even the operating system. The operating system behaves differently when it has enough memory than when it is out of memory. We need a complete system snap-shot in order to reproduce and solve bugs. You can read more about this in my previous post here: <a href="http://software.intel.com/en-us/blogs/2009/04/30/stateful-programming-a-key-element/">Stateful Programming - a key element</a>.</p>
<p>There are three basic types of Runtime States that we consider:<br />
1. First is the state of the operation I am working on right now: this thread, this object, this task, etc.<br />
2. There is the collection of states of all other operations currently running in the system: other threads, other objects, other processes, etc.<br />
3. There is also the state of the entire system: out of memory, flooded disk queue, with garbage collection in the background, etc.</p>
<p>When a program is running and I don't know its state I have no way of knowing what went wrong. The more stateful the application the more I know what is going on with it. Knowing the current state of operation can help me in two ways:<br />
1. My current state: meaning know what is the last thing my application has performed and how well it is performing, and<br />
2. Overall state: meaning learn the state of the system around me by the way my code is operating. For example if I try to allocate memory and get a response of 'out of memory' then it is probable that my other module had crashed because one of the ActiveX controls it was operating crashed because the system is out of memory.</p>
<p>There is a class of students that I am teaching these days. It is a 3 months programming course and they start with C, go over C++ and then C#. In the way they are taught the many concepts of programming such as parallel computing, embedded CE and VX Works, system design, etc.</p>
<p>My part started last week with a course about software debugging and bug prevention. This monday we start parallel computing. This is before they know Object Oriented and C++ programming and it is for a very good reason. We want to see if it helps to understand the parallel system flow before the mind is sealed with objects and the methodology around it.</p>
<p>This time when I went over the debugging slides in class I deviated at times. The goal was to focus on debugging techniques that would be also helpful in the parallel world. This worked like magic. It so seem that the techniques of the parallel world reduced debugging complexity for a serial application as well.</p>
<p>My focus here is on the stateful programming aspect of debugging. Far too often the only way to find a flow-bug in a parallel system is to look it up in the application's design. For this reason Stateful Programming has to do with design as much as it has to do with programming. We take a simple application that opens a file appends "1234" to it and closes the file. The file name is provided by the user. Here is the stateful analysis of the application:</p>
<p>Generally speaking we say that every time a decision is made in the code we change state. For example if I try to open a file and the file is opened then the applciation moved from a state of "need to open file" to "file is ready for work". If file opening failed then the application moved from "need to open a file" to "the file cannot be opened" which means that the overall operation fails. This means that whenever there is an 'if' statement we change a state. We then add that when ever there is a function call that can return a value other than success this function call is a state shifter. The API call to open a file can return either the file is opened or not so this call is supposed to change the state of the application.</p>
<p>This is generally speaking. More practically speaking we say that everytime there is an error it has to be made noticable either by breaking the application or posting a message to a log, to the user, or any other method. We also say thet there are hirarchies for states so unless I am trying to debug the user input loop (asking for a file name) I don't really care whether it failed and is asking the user again for the input, all I care is the final output of this loop: either we got the input or we don't.</p>
<p>We can of course specify macro flags to enable and disable message for internal states (to be tested using #ifdef). The power of logging states is that when an application crashes or has some race condition we can determin what happened because we have a good snapshot of the application and what it was doing. It is possible for example to see in the log that one thread was working with a file at the same time that another thread already was.<br />
Stateful Programming comes from the design and we need at least part of the Flow Diagram for our system. The best way to follow this through is to have a Flow Diagram followed by an Object Block Diagram, followed by level 2 Flow Diagram for internal followed by an Object Block Diagram and so on...</p>
<p>See the following diagram:</p>
<div id="attachment_8341" class="wp-caption alignnone" style="width: 310px"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/07/blog-2009-07-26-0-stateful-programming-a-case-study.jpg"><img class="size-medium wp-image-8341" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/07/blog-2009-07-26-0-stateful-programming-a-case-study-300x225.jpg" alt="Stateful Programming" width="300" height="225" /></a><p class="wp-caption-text">(Stateful Programming)</p></div>
<p>Here is a list of states according to this diagram:<br />
* Starting<br />
* Getting User Input<br />
* Exiting: User Request<br />
* Opening File "C:\my file"<br />
* Writing To File 4 bytes to "C:\my file"<br />
* Notifying User Of Error in File Write<br />
* Closing File<br />
* Process Completed Successfully</p>
<p>Here is a list of error states according to the diagram:<br />
* Error Getting User Input<br />
* Error Opening File "C:\my file"<br />
* Error Writing To File: 4 bytes to "C:\my file"</p>
<p>Some of the state notifications could be hidden unless activated. The user input loop can also have a few internal states activated on demand.</p>
<p>When we have several threads and processes running in parallel it is relatively simple to track down flow bugs in the system. On any given time the collection of states of all threads is the state of the appliation. Now flow control bugs (A.K.A. "Random Bugs") are very predictable.<br />
This is also compatible with sequence diagrams in visual studio 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/07/27/stateful-programming-a-case-study/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stateful Programming - a key element</title>
		<link>http://software.intel.com/en-us/blogs/2009/04/30/stateful-programming-a-key-element/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/04/30/stateful-programming-a-key-element/#comments</comments>
		<pubDate>Thu, 30 Apr 2009 16:24:32 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[multi-core]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/04/30/stateful-programming-a-key-element/</guid>
		<description><![CDATA[A key feature of Object Oriented Programming is code manageability and reusability, key feature of Procedural Programming is flow manageability. A key element in flow manageability is Stateful Programming. This methodology is very common with Procedural programmers and is very uncommon with Object Oriented programmers, but it is easily applicable.
I have recently decided to publish [...]]]></description>
			<content:encoded><![CDATA[<p>A key feature of Object Oriented Programming is code manageability and reusability, key feature of Procedural Programming is flow manageability. A key element in flow manageability is Stateful Programming. This methodology is very common with Procedural programmers and is very uncommon with Object Oriented programmers, but it is easily applicable.</p>
<p>I have recently decided to publish all of my code samples on my website: <a href="http://asyncop.com/Link.aspx?Open-Source">http://asyncop.com/Link.aspx?Open-Source</a>. This may take a long while and is an ongoing process but every process starts with a first step. The first step here was to write a tool that could parse C\++\# source code and convert it into readable HTML format. This website is about parallel computing so I have decided to hit two birds with one stone and I wrote a code parser in a way which proves a technological principle and the first project I published with this parser was the source code for the parser itself.</p>
<p>The parser is written in C# which means that it is fully Object Oriented in programming and in design. I also applied the principles of Stateful Programming to the code and design so the code doesn't look like an academic OOP code.</p>
<p>The source code can be found here: <a href="http://asyncop.com/MTnPDirEnum.aspx?treeviewPath=%5bo%5d+Open-Source%5cProject+Publisher">http://asyncop.com/MTnPDirEnum.aspx?treeviewPath=%5bo%5d+Open-Source%5cProject+Publisher</a>.<br />
As you will immediately notice, this is not academic OOP because functions here are not limited to 3 - 4 lines of code. Instead you will find long functions and very small objects.</p>
<p>At the base of this parser is the need to go over a long string of text and parse it. Going over the characters one by one, some parts are dependant only on the selected character (such as ':'), others are dependant on previous characters (number after number / after letter), and some are character block definitions (such as #, ", /*, //, etc.). There is also a higher lexical analysis - identifying reserved words.</p>
<p>There is another important feature to the parser. It is higher level parsing such as locating classes and functions, finding class names and tagging them and so on (at the time this post is published the tagging feature is not yet published).</p>
<p>Procedural programmers use a term called state while parsing a long buffer. During the pass over the buffer the application's state-of-parsing changes. This brought on some of the coding constraints of the C language. One of the important features of this language was to make it simple for parsing when going forward on the buffer with a state-machine mechanism.</p>
<p>In the code you will widely see the use of such states. Actually most of the code is just the definitions of the different states, the detection of the states, and states dependencies. Actually to allow simple and flexible parsing capability I had to give up on state-object inheritance and thus all objects that define states are of the same type.</p>
<p>Actually there are more than one type of state: There is the state of character parser and there is the state of the higher level parser - who's job is to detect class names and function names and tag them. These state are not of the same type because they are completely asynchronous to each other. It is possible for a letter or a number to be part of a class, a function, or global, and it is also true for '#', '{', '}', and so on.</p>
<p>This is how Procedural Programming worked for a very long time. Object Oriented Programming however managed states using the stack. When we detect a new state we go into a new function. If you want to know the state of your application then you need to follow the stack trace. This is very very bad and terrible because it means that there is a single application-wide state. The state of the execution flow of the process using the stack is also the state of the business logic, and we have already determined that the business logic may have and maintain more than a single state at the same time. True as it may be - programmers do manage to parse protocols using academic OOP but most of the times it is due to intuition and step-by-step debugging. This is by no means a good programming following a robust design.</p>
<p>The parser I published works. This is the fifth version (currently it marks braces). Every addition to the parser was simple and strait forward without touching the original design, only adding to it.</p>
<p>Would you have used this type of object oriented programming?</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/04/30/stateful-programming-a-key-element/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>All Sorts of Sorts</title>
		<link>http://software.intel.com/en-us/blogs/2009/04/27/all-sorts-of-sorts/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/04/27/all-sorts-of-sorts/#comments</comments>
		<pubDate>Mon, 27 Apr 2009 16:28:37 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[multi-core]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/04/27/all-sorts-of-sorts/</guid>
		<description><![CDATA[Hi all,
As some of you may already know I am getting married May 11'th. Yes, yes, a very happy occasion. Doesn't leave time for anything... Well, there is enough for a blog post but not enough for writing code and doing some QA.
I actually started with the Radix-Sort Challenge (Threading-Challenge-2009) but couldn't find the time [...]]]></description>
			<content:encoded><![CDATA[<p>Hi all,</p>
<p>As some of you may already know I am getting married May 11'th. Yes, yes, a very happy occasion. Doesn't leave time for anything... Well, there is enough for a blog post but not enough for writing code and doing some QA.<br />
I actually started with the Radix-Sort Challenge (<a href="http://software.intel.com/en-us/contests/Threading-Challenge-2009/codecontest.php">Threading-Challenge-2009</a>) but couldn't find the time to have it completed, not even talking about tested… So I thought that I might as well share my thoughts about this problem. Philosophically that is, nothing tested, no responsibility, so I can allow myself to wonder off and talk about things that may not even work ;-)</p>
<p>Going over Wikipedia (as briefly as possible) I learned that Radix sort is a type of sort that deals with constant size data types. Sorting algorithms deal massively with data comparisons. Since the input for this challenge was defined as 7 ASCII characters for each data item, it is very simple to convert it to a 64 bit integer. As a collection of 64 bit integers the CPU can compare two items in a single instruction and all new compilers have support for that. So basically the task is to sort a list of integers.<br />
I was asking myself how many cores I should use and whether or not I should use the build in CPU acceleration such as MMX. The simple answer I found was that if the collection of numbers can fit in the fast cache then I should start doing some math… However the problem here deals with somewhere up to 2^31 items, or up to 2G of items. This is way over the size of any cache so the work is done by the CPU directly with memory.<br />
The memory is always slower than the CPU and the operation to perform takes a single CPU instruction. This means that the bottle neck should be the memory and therefore the strategy should start by reducing memory transactions to the minimum.</p>
<p>The simplest and fastest solution I could think of for sorting numbers was to merge sorted lists. For example when we want to sort a list of 8 items we first sort the data as two lists of 4 items and then join the two lists. This manner requires only a single passage over every item in the list for every merge. For example a list of 1,6,3,7,9,2,5,4 is split and sorted into 1,3,6,7 and 2,4,5,9. The sort algorithm only goes forward in the list and has only a single comparison to make. Working in merges it takes 31 passes over every data item to complete 2G of items.<br />
The problem is that reading an item and copying it to an output list handles 2K of memory for every 1K of data to sort. The goal is to reduce memory transaction to the minimum. The solution is to swap the items in the merged lists, so for two merging lists of A1,A2,A3,A4 and B1,B2,B3,B4 the output should be a multi-dimensional array of: A1,B1,A2,B2,A3,B3,A4,B4. The next merge will result in A1,B1,C1,D1,A2,B2,C2… and so on. This will keep memory transactions to the minimum amount and also ensure that the memory is still in CPU cache for as long as possible.</p>
<p>Eventually the merges should be on lists that are too long for the cache to hold them, for example a merge between two lists of 100MB. The only thing we can do to utilize the cache is to keep merging on the same list for as long as possible, instead of methodically merging all the 1KB lists in the system from start to end. Another optimization that might help just a bit is merging forwards and backwards alternatively instead of going forwards all the time. When a forward merge is complete the end of the list is in cache so it is more efficient to start merging from that point downward to the beginning of the list. At the end of a reverse order merge the cache holds the beginning of the list and it is most efficient to merge forwards.</p>
<p>This is what I was going to try and test. Single threaded. That is of course if I had the time… but as you already know, I barely have the time to write about it in a blog post.</p>
<p>Good luck to everyone who did.</p>
<p>If you are reading this and you feel that my guesses are all bad – feel free to say it out loud. You can also reflect about it out loud, or just congratulate me for my eminent wedding… anything goes :-)</p>
<p>Best,<br />
Asaf</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/04/27/all-sorts-of-sorts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Value of Infrastructure</title>
		<link>http://software.intel.com/en-us/blogs/2009/04/06/the-value-of-infrastructure/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/04/06/the-value-of-infrastructure/#comments</comments>
		<pubDate>Mon, 06 Apr 2009 20:49:04 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[concept]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[multi-core]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/04/06/the-value-of-infrastructure/</guid>
		<description><![CDATA[
Much have been said about parallel computing and parallel programming. There are many methods to approach this area, such as using low level API, many types of libraries, language extensions, and so on. The best approach is the one that out-lived time and stayed with us through the different generation of computer technologies. These are [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>Much have been said about <strong>parallel computing</strong> and <strong>parallel programming</strong>. There are many methods to approach this area, such as using low level API, many types of libraries, language extensions, and so on. The best approach is the one that out-lived time and stayed with us through the different generation of computer technologies. These are the Infrastructures. Not just any infrastructure but some specific types.</p>
<p>There are libraries and collections of API sets that are still with us such as STL, Create etc. These however have evolved and some API sets were modified beyond recognition. For example Fork, or even the way we use processes in .Net applications today. A worker item is not really a thread and then there were fibers...</p>
<p>The stable and good <strong>infrastructure</strong> that I am talking about is something that we all know and have been using for a very long time. These infrastructure technologies have been parallel since the beginning of computer time and today are the most trusted and tested parallel infrastructure available.</p>
<p>These begin with lower level APIs such as Atomic Operations, Locks and so on, continue over Network Queues, and File Systems. Yes, they were all there since the stone age of computer systems... but I am not looking at the lower level infrastructure technologies. My interest is in the higher level ones. Higher level infrastructure platforms rid the programmer of the need to understand parallel computing. When a programmer is using such a good parallel infrastructure the only concern is business logic and business value. Why do I need my digital image expert to also know how to create a thread? The situation today is very bad.</p>
<p>Why do I need my <strong>database</strong> expert to understand how many CPUs are on the database server?? Why do I need my web site programmer to understand how many CPU cores there are on the <strong>web server</strong> machine??</p>
<p>Well, the answer is obviously that I wouldn't want the database expert and the website developer to count CPU cores before they start coding. The truth is that they actually don't have to even today!</p>
<p>Databases and web servers are perfect parallel infrastructure platforms. The database expert does not care how the database does what it does, it just does... And still the database programmer can perform parallel work with the database. Website programmers writes their code serially with simple languages such as PHP, ASP, and Asp.Net. The website programmer never even asked you how many cores there are on the CPU, and wouldn't even tell the difference between a race-condition and a door knob, and is still able to serve 10,000 clients in parallel - at the same time!</p>
<p>The ideal solution, so it seems, is to have the infrastructure do all the parallel work, such that would allow the business logic to remain serial. I would so expect that in 10 to 15 years time people will not have to know anything about parallel programming unless they are creating or modifying the parallel infrastructure. Just the same as programmers used to understand the hardware in the old DOS days and today only kernel programmers need to understand the hardware in order to create the device drivers. Application developers have no need what so ever to understand the hardware implementations.</p>
<p>The sooner we have these new infrastructure the better for all of us.</p>
<p>Asaf</p></div>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/04/06/the-value-of-infrastructure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FileSystem as a synchronization infrastructure</title>
		<link>http://software.intel.com/en-us/blogs/2009/03/25/filesystem-as-a-synchronization-infrastructure/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/03/25/filesystem-as-a-synchronization-infrastructure/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 15:54:07 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[guest blogger]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/03/25/filesystem-as-a-synchronization-infrastructure/</guid>
		<description><![CDATA[FileSystem has long evolved into an Object Store which manages named objects for the Kernel. This makes it as an excellent infrastructure for synchronous operations]]></description>
			<content:encoded><![CDATA[<p>A long time ago, ages in computer years, storage devices were introduced to the systems. These devices allowed non-volatile storage for computer system even when the computer is unplugged. The storage, a hard-disk, is very slow in comparison to CPU and RAM / ROM memory. This makes Random reads irrelevant and the storage device is read in batches of data called sectors. To better manage the data, every collection of sectors is given a name, the File Name. The engine that managed these devices was named the <strong>File System</strong>.</p>
<p>File Systems managed global resources: the disk is a global device in the system; disks have folders which are a collection of files and a way to manage or tag files; and files which are global resources representing a collection of data.</p>
<p>The File System had to manage named objects which could be accessed simultaneously by different applications, interrupt handlers, and software modules. Each file is identified using a globally unique name and each connection to a file has a process specific unique id (or Handle). File Systems also added cache management and support for buffered input and output. This is because a full sector must be read for every byte of data that it holds. The sector is read to memory and single-byte operation are made to a buffer on RAM acting as cache.</p>
<p>In time File Systems added support for communication devices such as COM ports and LPT parallel ports. Console output and input also worked through File System handles. Terminals connected to server hosts using console IO which means using the File System.</p>
<p>The File System had to provide support for working with objects, whether real files on disks or virtual such as console I/O (stdin, stderr, etc.):</p>
<ul>
<li>Named Objects: locate an object by its unique name and create a new object with a unique name</li>
<li>Lock an object so that only a single handle to the object can be used at a given time</li>
<li>Allow buffered operations and smart caches</li>
<li>Allow asynchronous reads, writes, and control operations. This also includes support for completion callbacks and completion events</li>
</ul>
<p>In effect operating systems today use the <strong>File System</strong> as an <strong>Object Store</strong>. The File System manages <strong>Named Objects</strong> (*<strong>atomic names</strong>), files, synchronization objects, buffered I/O, Shared Memory, etc, etc.</p>
<p>The File System is a very good asynchronous infrastructure. It has been this way for may years and works great, it is well tested and well debugged, works with the Kernel and therefore has always been fully parallel and is also very efficient. One of the best feature is that locking a File System object is noting like locking a <strong>MUTEX</strong>. A MUTEX is only a flag (<a href="http://software.intel.com/en-us/blogs/2009/03/24/locks-are-bad/" target="_blank">see this post</a>) whereas a File System lock really prevents anyone from overwriting the data thus really preventing a Race Condition.</p>
<p>Note to watch out from sharing file handles / descriptors instead of duplicating them.</p>
<p>Asaf Shelly</p>
<p><a href="http://AsyncOp.com">http://AsyncOp.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/03/25/filesystem-as-a-synchronization-infrastructure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Locks Are Bad!!!</title>
		<link>http://software.intel.com/en-us/blogs/2009/03/24/locks-are-bad/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/03/24/locks-are-bad/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 17:32:47 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[theory]]></category>
		<category><![CDATA[www.AsyncOp.com]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/03/24/locks-are-bad/</guid>
		<description><![CDATA[Students are tought to use Locks such as MUTEXs and Critical Sections. We are also told that a MUTEX is a type of Semaphore.
Don't use locks unless you really really have to because locks are bad!
]]></description>
			<content:encoded><![CDATA[<p>Students are tought to use Locks such as <strong>MUTEX</strong>s and <strong>Critical Sections</strong>. We are also told that a MUTEX is a type of <strong>Semaphore</strong>. This is all very bad!</p>
<p>When we lock we practically try to make sure that the resource is locked to a single core. - We have a multicore CPU but we try to make sure that all work with a resource is done using only one core. Why?</p>
<p>Even if we ensure that our code is using locks correctly and we are blocking our application successfully, there is still the possibility for a Race-Condition. How? - simple: Locks are only flags. Nothing more. At any given time someone can freely access our 'locked' object.</p>
<p>We are told that we are locking for a code section: "Critical Section" is the name, but we are not. We are locking for a resource. It is so simple to avoid Deadlocks when we understand that it is resource-ownership that we are looking for.</p>
<p>... and no, a MUTEX is not a Semaphore of one. The system can clean up a MUTEX if a thread terminates or unexpectedly exits. A MUTEX can also be re-entrant (system / library dependant)</p>
<p>If you find yourselves using locks when you are not a very low level infrastructure designer (such as designing a type of queue) then you have something very wrong going on.</p>
<p>If you find yourself using a lock that was not specifically part of the system design then something is really very very wrong with the system. Programmers have absolutly no reason for declaring new lock objects.</p>
<p>Don't use locks unless you really really have to because <strong>locks are bad</strong>!</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/03/24/locks-are-bad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CPU Auxiliary Cores</title>
		<link>http://software.intel.com/en-us/blogs/2009/02/04/cpu-auxiliary-cores/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/02/04/cpu-auxiliary-cores/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 01:11:39 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Academic]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[Overview]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[www.AsyncOp.com]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/02/04/cpu-auxiliary-cores/</guid>
		<description><![CDATA[Today I wish to share with you a model of a system that is a design pattern for parallel processing that you will probably not see in too many places. This model comes from a big and heavy organization that needs very fast responses, which for itself is a contradiction...]]></description>
			<content:encoded><![CDATA[<p>If you had ever attended my lectures then you already know that parallel computing is not an add-on or a library that we use. It goes much deeper into the system design and architecture. Many times when we go for lunch after a session about multicore programming people identify parallel methodologies in the every-day environment. When you do this you know that you are starting to really understand parallel computing. A simple example is lunch itself: Someone is taking your order and before or after the orders of other people at your table. The order includes the main course and the salad. The main courses are different and have different preparation periods. The salad is handled by someone completely different. Nevertheless all dishes come out at the same time for every table. The system uses Queues to manage the inputs and Accumulation Chambers to make sure that items wait for all items in the same group before they are sent out of the system. Many people notice this and start calling the patterns by name.</p>
<p>Today I wish to share with you a model of a system that is a design pattern for parallel processing that you will probably not see in too many places. This model comes from a big and heavy organization that needs very fast responses, which for itself is a contradiction. The big organization from which I am taking this practice is the army. An army is a huge organization in general and there is a lot to learn from its internal structure because an army does not produce hardware or software and does not provide services. An army does one thing: manage large scale operations, which fault tolerance at the expense of human lives – meaning zero.</p>
<p>The army is divided into units and these units have war rooms. A small unit can be a Tank and can receive targets or detect targets and respond fast, and a large unit can have multiple targets flowing into a huge queue from multiple small units. These targets are collected as Tasks, sorted, and dispatched to the most appropriate unit. So far this makes a lot of sense. The advantage of a small unit is its ability to respond fast: they shoot at me – I shoot at them, and the advantage of the large unit is with its ability to collect multiple targets and prioritize. We employ these methodologies today in computerized systems. The first is Hardware Acceleration and the latter is a software application or service. As an example for the first: when you click the CD-ROM drive 'open' button the tray will open and all the CD-ROM device needs is a power cable connected (no need for an OS, or BIOS to communicate with it). As an example for the latter method: I press the 'play' key on a multimedia keyboard and a media player starts playing music from the CD-ROM device.</p>
<p>This concept already exists as you can see. The interesting concept that I found is taking the best in both worlds. On one hand there is a need to collect information and locate targets. On the other hand this has to be fast enough or the targets become irrelevant. The time spreads leave no doubt. For example the time it takes to hit the target before it is too late is 20 seconds but the time it takes to filter out the target from the list and dispatch it is 3 to 5 minutes.</p>
<p>The solution that I found is an auxiliary unit. This unit is independent. It is 'initialized' with the filtering conditions (for example 'brown targets') and it has the ability to act immediately. Instead of boring you with the details I will provide the computerized pattern that I am looking for:</p>
<p>Here is an example of a problem: I have two processes. One is an application that doing some work and it is calling for the second which is a service to do some operation. After the operation is complete the application needs to do some cleanup. Let's provide timing so it is clearer: The application works for 120 milliseconds, calls the service for a 5 millisecond operation and then completes with a 5 microsecond cleanup. The problem is that the transition from the application to the service requires a heavy Context-Switch operation, when the service is done there is another Context-Switch back to the application, and then there is another Context-Switch to the next Application because our application had completed its operations. For the practice let's suppose that a Context-Switch costs 3 microseconds. In this case the switch from the application to the service adds less than 0.1% to the processing time, however the switch from the service to the application is 3us for the Context-Switch and 5us for the actual work which degrades overall application performance and system performance. This is especially true when multiple threads are used or a thread pool is used for short tasks. Sometimes this cannot be avoided for example we wait on an event that indicates that the file is fully written and when the event is released we close the file and exit.</p>
<p>The solution offered here is a small auxiliary CPU Core that can handle such small tasks. This CPU Core will not have an interrupt handling mechanism which means that it cannot support Page-Faults and thus all memory MUST be waiting for it in physical memory. It can also not handle Exceptions. CPU Exceptions are implemented as hardware Interrupts so an Exception on an Auxiliary Core must be handled by a real CPU Core.</p>
<p>The appropriate method would probably be 4 CPU Cores and 256 Auxiliary Cores. CPU Cores handle Context-Switches etc; however the implementation may very well allow multiple processes active at the same time. When a process enters a wait-state to wait for an event the process information required after the event is signaled will be prepared on a single memory page. When the event is signaled an Auxiliary core can execute the code immediately until the next wait-state. An example for this is the code fragment that handles a window's input loop. For the example above, the application will work for 120ms, prepare the appropriate information and wait for the service. This is a Context-Switch, however when the service is done it will not Context-Switch back to the process because the process is already active in memory and an Auxiliary Core will immediately start working. This pattern may be very effective in the case of two threads communicating with each other and blocking each other repeatedly.</p>
<p>This sounds like a very good methodology, however it means employing Core Dedication which is something that we are still trying to avoid.</p>
<p>I will soon add this as one of the design patterns in the collection at <a href="http://www.AsyncOp.com">http://www.AsyncOp.com</a>. If you have any further ideas that are not there then feel free to communicate with me.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/02/04/cpu-auxiliary-cores/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel&#039;s Parallel SDK - a positive approach</title>
		<link>http://software.intel.com/en-us/blogs/2008/12/10/intels-parallel-sdk-a-positive-approach/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/12/10/intels-parallel-sdk-a-positive-approach/#comments</comments>
		<pubDate>Wed, 10 Dec 2008 16:54:04 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Overview]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[www.AsyncOp.com]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/12/10/intels-parallel-sdk-a-positive-approach/</guid>
		<description><![CDATA[The world of computers started with machine language and Assembly. Then we got languages like C with power focus of Library Functions. The appearance of C Run-time Libraries (a.k.a. CRT or RTL) provided some layer of abstraction and there was no need to intimately know the inner implementations of system functionalities. Operating System API are [...]]]></description>
			<content:encoded><![CDATA[<p>The world of computers started with machine language and Assembly. Then we got languages like C with power focus of Library Functions. The appearance of C Run-time Libraries (a.k.a. CRT or RTL) provided some layer of abstraction and there was no need to intimately know the inner implementations of system functionalities. Operating System API are the same. Then we had C++ that encapsulated things in objects but the concept was basically the same. C++ also provided tools for creating more advanced libraries and this is how we got things like the Standard Template Library (STL) and now the majority of programmers never had to implement a List, a Stack, or a Hash Table.</p>
<p>Adjacent to this evolvement databases had a parallel evolution line. These started with simple encapsulated modules that provided API, which could have been internally based on STL, and advanced to a point where a database is a very complex engine that has highly advanced parallel support. At a time when design patterns demonstrate how to avoid the problems of parallel work – databases took full advantage of the raw power in parallelizing operations.</p>
<p>Today few developers find themselves in the need to write a database engine. The engine is fully parallel and thus multiple simultaneous operations are easily possible. This allows the application to be single threaded, relying on the database engine to do the parallel work. When tens and thousands of operations are performed there is little to no advantage in parallelizing the application because most of the work duration is inside the database engine. This came to a point where architects preferred using a database over traditional storage methodologies only because it is fully parallel, thread-safe, and very well managed.</p>
<p>Today we find all parallel work done in closed infrastructure modules such as Device Drivers, Database engines, Web-Server engines, Windows GUI system, and so on. It is very rare to find a fully parallel implementation of an application and its Business Logic.</p>
<p>Large amount of libraries publically available support both single-threaded and multi-threaded implementations. The first is faster and the latter is safer for multithreading. This is because threads were used for performing 'wait-operations' and not for true parallel work. Today we expect the multithreaded implementations to be faster in multiples than the single-threaded implementation. We don't have these libraries yet so we find ourselves designing and implementing parallel Lists, Stacks, Array Sort algorithms, and so on. In time we will have all these ready for us off the shelf as we were used to, to this day.</p>
<p>I have just learned that Intel is on the right track and what used to sound like a prophecies which I made in class – is now becoming reality thanks to Intel.</p>
<p>For those of you that did not see it yet: Intel has started releasing libraries that do regular operations that we are all used to, only these libraries support fully parallel work and increase performance by several multiples. I am referring of course to the new SDK for Core i7 processor family:<br />
<a href="http://software.intel.com/en-us/articles/intel-next-generation-intel-core-i7-processor-family-sdk/">http://software.intel.com/en-us/articles/intel-next-generation-intel-core-i7-processor-family-sdk/</a><br />
Another example is helping games with parts that are not traditionally hardware accelerated such as AI. See this article:<br />
<a href="http://software.intel.com/en-us/articles/smoke-game-technology-demo/">http://software.intel.com/en-us/articles/smoke-game-technology-demo/</a></p>
<p>When <a href="http://software.intel.com/en-us/blogs/author/aaron-tersteeg/">Aaron Tersteeg</a> and I were discussing this interesting new innovation he pointed out another article that covers this approach:<br />
<a href="http://software.intel.com/en-us/articles/a-library-based-approach-to-threading-for-performance/">http://software.intel.com/en-us/articles/a-library-based-approach-to-threading-for-performance/</a></p>
<p>This opens up the door for many innovative approaches and new library implementations for modules that we used to write by hand not long ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/12/10/intels-parallel-sdk-a-positive-approach/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is DOS the ideal parallel environment - Part IV</title>
		<link>http://software.intel.com/en-us/blogs/2008/10/29/is-dos-the-ideal-parallel-environment-part-iv/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/10/29/is-dos-the-ideal-parallel-environment-part-iv/#comments</comments>
		<pubDate>Wed, 29 Oct 2008 20:21:53 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[http://AsyncOp.com]]></category>
		<category><![CDATA[Parallel Computing]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/10/29/is-dos-the-ideal-parallel-environment-part-iv/</guid>
		<description><![CDATA[This is the fourth and last part of this article. Previous parts surveyed Windows User Mode and UNIX. Part 3 of this article covered Windows Kernel.
The discussion is how parallel are these operating systems and we save the best for last: This last part of the article raises a surprising question 'Is DOS the ideal [...]]]></description>
			<content:encoded><![CDATA[<p>This is the fourth and last part of this article. Previous parts surveyed Windows User Mode and UNIX. <a href="http://software.intel.com/en-us/blogs/2008/09/15/is-dos-the-ideal-parallel-environment-part-iii/">Part 3</a> of this article covered Windows Kernel.<br />
The discussion is how parallel are these operating systems and we save the best for last: This last part of the article raises a surprising question 'Is DOS the ideal parallel system?'... or is this question really that surprising?</p>
<p><strong>DOS</strong></p>
<p>The old DOS operating system is rarely ever used but we can still find it today for simple slim applications such as Windows CE pre-boot loader menu and whenever we use Windows 98 / ME. The name DOS stands for Disk Operating System which indicated that it could work with disks at a time when personal computers could use only ROM loader and sometimes ROM based Cartridges (something like a bootable disk-on-key with 32KB of read-only memory).</p>
<p>DOS was designed specifically for Intel's X86 processors at a time when Floating Point operations came as an external chip and was considered an accessory. The year is 1981 and a PC that has a multicore CPU makes as much sense as an ice cream for a database. (If you are reading this in the far future, currently we are still not using ice creams as database engines).</p>
<p>DOS is an over simplistic operating system that was designed very long ago. It does not support multiprocessing API, does not have locking mechanisms, does not support dynamic linking and code reusability, and is not object oriented. In fact in a DOS application most C library functions are only wrappers for system Interrupt calls. Everything in DOS is done by use of an Interrupt. Reading a character from the keyboard, writing a string to the screen, opening a file, moving the cursor, setting system time, and anything else an application can perform is done by calling a CPU Interrupt.</p>
<p>CPU Interrupts are a special mechanism that was added to the CPU in order to service hardware peripherals. When the keyboard has a new key-press event it will Interrupt the CPU by signaling on a dedicated pin of the CPU Chip. The 8086 CPU was very advanced at the time and a signal on the INT pin will make the CPU stop and ask the hardware what is the ID of the device that generated the INT request. We can still see this today on Windows XP's device manager – some of the devices have a tab called "Resources" and in it you can find the value of the IRQ – Interrupt Request. Every Interrupt number has a special Interrupt handler which is a function. The 8086 processor supported 256 interrupt numbers (Vectors).</p>
<p>The computer's ROM for the 8086 is called BIOS. It was the first thing to load and initialize the CPU. The BIOS was also responsible for any hardware functionality because there were no installable drivers for the basic devices such as display adapter and motherboard. Higher level software (such as operating system and applications) communicated with the BIOS by using the same Interrupt mechanism. The software would issue a Software Interrupt" which basically means asking the CPU to call the Interrupt handler function by the vector number. The CPU has an array of 256 function pointers and issuing an Interrupt means calling the function by its index in the array. For example you would initiate Interrupt 29 to query video card information. For extensive information about Interrupts see <a href="http://www.ctyme.com/intr/int.htm">Ralf Brown's interrupt list</a> – an amazingly valuable resource for working with interrupts.</p>
<p>There are three types of Interrupts: Hardware Interrupts are generated by hardware devices issues by a signal to the processor's INT pin, Software Interrupts are issued by drivers by executing the INT instruction, and CPU Interrupts which are generated internally by the CPU in reaction to extreme conditions. For example the CPU generates an Interrupt to escape from an infinite internal loop of dividing by zero, or for interrupting an instruction that violates access rights. The first two types on Interrupts are only for system kernel and CPU generated interrupts are sent as Exceptions to the user application.</p>
<p>The DOS operating system uses the same methodology as BIOS did. When you want to call a system API in your DOS application you issue INT 33 (21 hex). The DOS operating system installs the handler for interrupt 33 and every call to INT 33 is handled by DOS. Parameters are sent in CPU Registers. For example INT 33 with parameter 44 will return the current system time.</p>
<p>A DOS application is never parallel. You cannot have more than a single task running in the system. This is logical because there is only one core of CPU. Multitasking on one core is accomplished by stopping the running process and scheduling another one on the same core. The majority of operating system, Windows included, schedule processes by process and thread priorities. Every process and thread has a priority value and the system decides which process should run according to its priority value. The problem with this model is that processes perform different tasks. Some of the tasks are high priority and some are not. The solution is dedicating a thread for every task, so for example there is a thread that reads from the network in high priority and another thread that writes to disk in low priority. Setting process and thread priorities is a major part of system design.</p>
<p>DOS does not support multiple tasks but there was only one core on the CPU so there was no performance reason in re-scheduling.<br />
However, DOS was written at the lowest levels and is closely coupled with the hardware by design. The hardware is and always has been extremely parallel. Hardware Interrupts are handled by the CPU by priority - lowest first. When the CPU is handling Interrupt 15 and Interrupt 14 is signaled, the CPU will stop working on interrupt 15 and immediately start working on interrupt 14. This is why it is called an interrupt – because the CPU is interrupted based on priority.</p>
<p>It seems that DOS is based on very good parallel foundations. DOS also provides API that installs new interrupt handlers. You create an application and set it as a TSR – "Terminate Stay Resident", which means that the application runs from start to end and after it exits it stays in RAM waiting for Interrupts to occur.</p>
<p>Now let's turn DOS upside down:</p>
<p>We run a simple application. This application is of course single task, single thread. This application is equivalent to the Idle Process in parallel systems. Whenever nothing else runs this process is active. This process does not have to consume CPU and can sleep, wait or HALT (CPU instruction).<br />
Advanced operating systems use threads with different priorities to make sure that important tasks are run before the lesser ones. Here, instead of allocating threads for every operation and setting thread priority we set Interrupt Handlers to perform the operations, so that every task has a dedicated interrupt handler and therefore has a clear priority. Every operation that the application performs has a clear priority. For example the application's handler function for keyboard input has lower priority than the timer's handler function. These examples are hardware events but we can also implement software functionality with these handlers.</p>
<p>Write a simple console application with multiple functions, just like we are used to. Now, every function has a priority between 1 and 255. Every function has four parameters but you can pass pointers to structs that hold more data. Once you pass data to a function you can no longer touch this data, so the function becomes the only owner and therefore no need for locks. There can only be a single owner function for every resource, memory buffer, hardware device, dick file, etc. When you want to read from this resource or write to it you call its handler.</p>
<p>Since every function is only called by issuing an interrupt - there is true priority ordering between operations in this application. Every operation is stopped when a more important operation is pending and when this is complete, the original operation resumes. There is some problem with that: Operating Systems today have a mechanism called "Priority Inversion" that kicks in when a low priority thread owns a resource that a high priority thread requires. The mechanism raises priority of the lower thread so it can keep running to the point when it releases the resource (such as lock object) that the higher priority thread is waiting for. In our case we allow handlers to call other handlers so for example handler number 45 can issue interrupt level 23 or 66 without worrying about priority. Only hardware interrupts really stop execution. This way we start the operations in the right priority according to the issuing hardware but a keyboard interrupt can be anything from 'type a char' to 'immediately cancel the long operation'. This means that we need some method of making sure that the operations are kept in their logical priority, but DOS does not support this and we need to add some management. For example any hardware handler can verify the priority. If it is higher than the running task then it will start execution. If it is lower than the running task then it will push to tasks queue (or manipulate the process's stack by pushing another return address).</p>
<p>The software design of a parallel DOS system is very good but we are missing the Operation State management which can only be single task – stack based (anything else we need to manage manually). Another problem is that the priority based operations only work with hardware events and anything else has to be manually managed. So – we have some very good design but no real system support for it, both in hardware and software layers.</p>
<p>Is DOS the ideal parallel environment?<br />
DOS is really close to the hardware and the hardware has always been extremely parallel. The basic system design is extremely parallel, but the parallel engine is only implemented for the hardware and not for the software. So 'Yes' DOS supports a very advanced application design, but 'No' the CPU does not fully implement the functionality required for this design to really work.</p>
<p>The reason I wrote this very long article is so that we watch and learn from systems that are closely coupled with the computer's hardware because the hardware has always been parallel. We still learn how to write parallel applications when hardware related code has been working in parallel since the first Intel 8086 based machine was manufactured.</p>
<p>This is the final part of a four-part article that covered existing operating systems and their parallel design, with advantages and faults.<br />
The concepts brought here are given as background for a new software design model called Operation View Model.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/10/29/is-dos-the-ideal-parallel-environment-part-iv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is DOS the ideal parallel environment - Part III</title>
		<link>http://software.intel.com/en-us/blogs/2008/09/15/is-dos-the-ideal-parallel-environment-part-iii/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/09/15/is-dos-the-ideal-parallel-environment-part-iii/#comments</comments>
		<pubDate>Mon, 15 Sep 2008 20:19:55 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[http://AsyncOp.com]]></category>
		<category><![CDATA[Parallel Computing]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/09/15/is-dos-the-ideal-parallel-environment-part-iii/</guid>
		<description><![CDATA[Part 2 of this article talked about the parallel design of the UNIX operating system that was highley advanced at the time. This part talks about the design and implementation of the Windows NT Kernel that managed to maintain the original parallel design to a large extent. This article is presented as background information for a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://software.intel.com/en-us/blogs/2008/09/02/is-dos-the-ideal-parallel-environment-part-ii/">Part 2</a> of this article talked about the parallel design of the UNIX operating system that was highley advanced at the time. This part talks about the design and implementation of the Windows NT Kernel that managed to maintain the original parallel design to a large extent. This article is presented as background information for a new design model called Operation View Model.</p>
<p><strong>Windows Kernel</strong><br />
Windows Kernel is a completely different operating system than Windows User Mode that we all know. Windows Kernel is an embedded real-time system in its core and essence. The system can handle massive amounts of operations at a time, When a higher priority event is received the lower priority operation is immediately paused so that the higher operation can begin, operations travel between objects and between layers, and the system is using a 'floating stack'. Windows NT Kernel is a real-time embedded system that is designed for the parallel world.</p>
<p>This operating system was designed to work in parallel. The system design enforces parallel design compatibility and therefore all system components support parallel work. Driver verification program (Driver Signing) makes sure that all drivers comply with this parallel design.</p>
<p>Device driver developers are commonly hardware engineers or must have deep understanding of the hardware (higher level network and multimedia put aside). This means that driver developers usually expect multiple events coming in asynchronously, and the code reflects that. Several events can come from any of: user applications, other kernel drivers, and the hardware.</p>
<p>Windows NT Kernel developers understand the parallel world and write good parallel code, and the system design enforces good parallel design methodologies. This makes this operating system very good for the parallel world. With that come the problems. The majority of developers find it very difficult to understand this parallel design and the OS environment. The tools are very hard to work with and are not even close to User Mode Windows (Visual Studio .Net, Borland C++ Builder, Eclipse, etc) and the compiler toolkit had only recently been improved. This is just like any other real-time embedded system. Debugging tools are also problematic for this OS and there are three basic methodologies for debugging a component in the system:<br />
1. Text Outputs: It is possible to use an equivalent to Win32 OutputDebugString API, and this can also be sent to an external port.<br />
2. Break Points: It is also possible to break the execution on an event such as reaching a certain point in the code. The problem is that for this to function properly the entire system has to be halted.<br />
3. Profiling: By using a profiler it is possible to see system events without greatly interrupting the execution flow. If you expect buffers to come from the network and you keep pausing the system over breakpoints then multiple network buffers will be lost and you might change the condition that created the bug in the first place.<br />
If you it didn't pop out yet then allow me to help: When we try to debug parallel systems today these are the tools that we use. Text outputs are very favorable and when you start working with parallel code you will find yourself increasing the use of text outputs in place of using breakpoints and step-by-stepping. As for profiling see the tools provided by <a title="SysInternal" href="http://www.SysInternal.com" target="_blank">www.SysInternal.com</a>. These were created to help with kernel and system profiling and now we find ourselves using these tools when we debug our parallel user-mode code. The only concept that was not yet employed outside the kernel is pausing the entire system for a breakpoint. Imagine a service handling a network buffer and resetting an event. If you just pause your application as we do today the service can still do multiple things in the background. The kernel of course enjoys the assumption that all working elements are on the same machine.</p>
<p>Windows NT Kernel defines the system design. This starts with defining that each system component, a driver, must support a predefined functionality, even if it means simply reply that it does not. Here are the basic operations:<br />
Create – prepare to handle another user and allocate the required resources<br />
Close – deallocate these resources<br />
Wtite – Receive information<br />
Read – information required<br />
Control – handle some special control operation<br />
Cancel – Stop the ongoing operation<br />
A simple example is the Serial port: We create a file called "COM1", we can WriteFile to send output, ReadFile to read input, and we can I/O Control it to change the baud-rate and bit parity. When we are done we Close the Handle.</p>
<p>NT drivers are organized in layers. Any driver that manages hardware is the device driver for that hardware. Commonly the Device Driver is supplies by the device manufacturer and it is the only element in the system that knows how to talk to the device. The device driver creates a Virtual Device that represents the physical device. For example a Write operation will be sent to the Virtual Device and the device driver will translate that to I/O operations for the physical device. Any driver that does not talk directly to hardware is on a higher layer. An example is a USB Printer device that is connected to the USB port (system root hub) which is connected to the CPU through the PCI BUS. When we write to this printer we send the document to the USB Printer driver that sends it to the USB BUS driver that sends it to the PCI bus driver.</p>
<p>When the system is layered and every system component supports a universal interface, it is very simple to add filters to the system. A driver can declare that it is a filter driver and attach itself to another driver's input from the upper layer or output to the lower layer. We say that it is an upper filter or a lower filter respectively. This is supported by system design. A firewall is this type of filter. With firewalls we also get sniffers that monitor network traffic. When everything can be filtered then everything can be monitored. See <a title="SysInternal" href="http://www.SysInternal.com" target="_blank">http://www.SysInternal.com</a> tools that use this feature extensively.</p>
<p>The system is organized in layers of drivers and each driver is an object in the system. The File System as an Object Store has names for all these driver objects. In other words the Windows NT Kernel is a fully object oriented system. With that the developers must have clear execution flow and inside a driver object oriented methodologies are used as minimally as possible (high level, single instance drivers might look differently). Functions are pretty long and state is maintained within a function. We have an object oriented system that has a clear flow of execution, however this system only defines that operation travel one way and there is a clear understanding of who is servicing who. This is fundamentally different from most OO systems that we know today where object relations and hierarchy only describe inheritance and not execution flows.</p>
<p>In the Windows NT Kernel there are no locks when accessing a resource. Instead every resource has a single owner. A device is owned by the Device Driver. Each memory address space and each I/O space must be owned before they can be accessed. When a driver owns a resource no other driver can receive ownership over that resource. (See windows device manager, device properties has a tab called resources). Every driver that wants to access an owned resource has to communicate with the resource owner, hence there are no locks between drivers over these global resources.</p>
<p>Windows NT Kernel does not use processes and threads as we know them. Instead kernel drivers can run under any user-mode thread that just happened to run when the event was set. Windows NT Kernel does not identify an operation using a stack of a thread. Instead the kernel drivers use a 'floating stack'. This stack is not managed by the CPU. Every thread receives a request object as an event. This object has all the information required to handle the event including the input and output buffer and the type of request. Events travel from top to bottom and on the other way around. The request object (see <a title="IRP on MSDN" href="http://msdn.microsoft.com/en-us/library/ms806157.aspx" target="_blank">IRP on MSDN</a>) also stores the system managed stack. Each driver asks for it's own stack location to work with. When a request travels down the request object is added with new driver stack locations and when the request returns each driver can continue to work with its own stack location for that request. This way there is no CPU context switch and also the system is simulating a new thread for each request and event in the system. For performance optimizations the system allows a special method called Fast I/O, however a driver does not have to support it and if for any reason Fast I/O cannot be performed (unsupported, memory paging required, etc) the system will automatically generate a request object (IRP) to replace the Fast I/O operation.</p>
<p>The model defines objects in layers with events traveling between these objects according to the layers. This means private storage for each driver and a private storage for each event. The driver works with both when performing an operation.</p>
<p>Windows NT Kernel has a very good design for the parallel world. Copy this design for your applications if you can. With that there are a few minor problems with this system, for example there are only a few system priority levels, which means that it is hard to deterministically define event sequences for non-related events. For example when the hard drive is performing a write operation then the USB device has to wait for it to complete and cannot interrupt. There are other problems with this system but these are mostly related to performance and real-time. As far as parallel work is in question the Windows NT Kernel is a very good parallel system.<br />
 <br />
Next is the last part of this article which will describe the DOS as a parallel environment.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/09/15/is-dos-the-ideal-parallel-environment-part-iii/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Is DOS the ideal parallel environment - Part II</title>
		<link>http://software.intel.com/en-us/blogs/2008/09/02/is-dos-the-ideal-parallel-environment-part-ii/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/09/02/is-dos-the-ideal-parallel-environment-part-ii/#comments</comments>
		<pubDate>Wed, 03 Sep 2008 00:47:56 +0000</pubDate>
		<dc:creator>Asaf Shelly</dc:creator>
				<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Add new tag]]></category>
		<category><![CDATA[Guest Blog]]></category>
		<category><![CDATA[http://AsyncOp.com]]></category>
		<category><![CDATA[Parallel Computing]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/09/02/is-dos-the-ideal-parallel-environment-part-ii/</guid>
		<description><![CDATA[Part 2 of a four parts article that investigates the parallelism support in design of common operating systems today. This part of the article describes the evolution of UNIX systems in regards to parallel operations.]]></description>
			<content:encoded><![CDATA[<p><a href="http://software.intel.com/en-us/blogs/2008/08/25/is-dos-the-ideal-parallel-environment-part-i/">Part 1</a> of this article described how the original design of Windows User Mode was highly advanced and suitable for the parallel world, but in time the concepts and goals behind this design were lost and forgotten. This part of the article describes the evolution of UNIX systems in regards to parallel operations. This article is a background information for a collection of articles that will explain a new design model called Operation View Model.</p>
<p><strong>UNIX</strong><br />
UNIX operating system had a great advantage when it comes to parallelism because in its foundations we find the network. This means that the OS was designed to support distributed applications.<br />
It is noticeable that a great deal of attention was given to parallelism and flow control in the basic design of this operating system. One of the most basic elements is a queue. UNIX attaches an input queue and an output queue to many of the basic system components for example the console input and output, serial port, parallel port, etc. Furthermore the operating system support output redirection as an input to another application (Pipe). Redirecting the output queue to an input queue of the next level is purely Task Oriented Design. The problem with this model is that it was conceived at the very early days of computers and therefore the problems that it solves and the implementation are relatively simple and primitive in today's standards. Back then the keyboard had a 16 characters buffer (queue). Today a simple USB device uses a four layer network communications with larger buffers. This design and implementation is not widely used, and when it is – many times it is just for ease of use and not as part of a good parallel design.</p>
<p>UNIX has also introduced wide support for network communications. In this model every application creates a network socket and is listening to it. When you want an operation to be performed, you connect to that socket by its number, send the message, and close the socket when done. This model is also very good in its parallelism. Every service provider takes a socket with a predefined number. Every element in the system that has a request communicates to that socket. In other words every element in the system can initiate an operation on a given service provider identified by its socket id. In this model the system is a collection of event handlers called Services and every application can initiate a Task with each Service by sending data to its socket queue. Services can of course receive service from other Services. This model allows applications to be distributed with little to no effort. Like the other parallel models this model was long forgotten and got rejuvenated lately as SOA.</p>
<p>Last but not least important technique provided by the UNIX OS is called Fork. Forking is an operation performed by a process in which it is completely duplicated. Some sort of a "Copy – Paste my process" if you like. The two copies are almost identical and the only way to tell which is the original and which is the clone is by testing the value returned from the command Fork. If you are the original then you receive the clone's ID and if you are the clone then you receive and empty ID (NULL). This is a very advanced parallel computing model. In this model all global resources are 'thread-safe' because the process memory is not shared. This is in comparison to Windows where you create threads and all threads in the same process share the same address space. The Fork mechanism also takes care or system wide resources such as files etc. When the process is cloned then all Handles are duplicated. Duplicating a Handle instead of using the same handle means that the File-System manages all access rights and employs locks when required. These locks are real locks that can block access, and not a simple flag (like a MUTEX). Windows cannot employ this methodology because process creation and even thread creation is a really heavy procedure on Windows OS.<br />
With all complements that Fork gets, it was really early days in the computer world when this technology was conceived. Soon enough you will have 128 cores and need to create for example 20 parallel execution elements (threads, processes, etc.). Performing a Fork operation to spawn 19 new clones is somewhat of a problem. Another problem is joining all clones and waiting for all of them to complete and combining all data between them. Think about an image processing application. Think about video encoding – frame by frame.<br />
When we have 20 processes performing the same operation it is Force Duplication. A different model called a Pipeline can be employed. In the second model you don't just have all work elements doing the same job, you have several workers collecting data, several doing the processing, and several are saving data. It is very possible to have several processes in a single operation flow. For example: Read image, Remove noise, make black/white, clean edges, detect faces, compress, save image. Launching a single process to clone into all these components is very difficult. Data transfer is also problematic. Original UNIX solved this using a Pipe between several process applications but the problem is that this method is heavily using memory and memory access and might miss the whole point of massively using the CPU because the memory might becomes a bottleneck in this case.<br />
In time UNIX developers slowly forgot about these features and instead of improvements we got replacements such as threads and locks. These are the very things that Fork was designed to avoid (exposing resources to all running elements and using locks to serialize the access to these resources).</p>
<p>Bottom line we get good design and concepts that were eventually misunderstood and forgotten, and in time replaced by alternatives that make it harder to make the best out of a multi-core CPU because Task Oriented Design was not one of the design efforts.</p>
<p>Next to follow, Part 3 will explore the Windows NT Kernel as an example of an excellent parallel system. Part 4 will talk about DOS.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/09/02/is-dos-the-ideal-parallel-environment-part-ii/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
