Dr. Rao Mikkilineni, Member, IEEE
Download this article (PDF 980KB)
In this paper, we focus on another aspect that we learn from the genes in living organisms that deals with precise replication and execution of encapsulated DNA sequences. We describe a computing model, recently proposed, extending the SPC model to create self-configuring, self-monitoring, self-healing, self-protecting and self-optimizing (self-managing or self-*) distributed software systems. This approach allows utilizing a parallel distributed computing model to implement service virtualization and workflow execution practiced in current business process implementation in IT where a workflow is implemented as a set of tasks, arranged or organized in a directed acyclic graph (DAG). Two implementations of the new computing model have been realized.
In his article on “The Trouble with Multi-core,” David Patterson  observers that “the most optimistic outcome, of course, is that someone figures out how to make dependable parallel software that works efficiently as the number of cores increases. That will provide the much-needed foundation for building the microprocessor hardware of the next 30 years. Even if the routine doubling every year or two, the number of transistors per chip were to stop--the dreaded end of Moore's Law--innovative packaging might allow economical systems to be created from multiple chips, sustaining the performance gains that consumers have long enjoyed."
Although I'm rooting for this outcome--and many colleagues and I are working hard to realize it--I have to admit that this third scenario is probably not the most likely one."
Up to now, the upheaval in hardware brought about by the multi-core chips (often dubbed as an inflection point) is not matched by an equal innovation in software to take advantage of the abundance of computing, memory, network, and storage resources. While the new class of processors offers parallel processing and multi-thread architecture, the operating systems that have evolved over the past four decades are optimized to work with serial von Neumann stored program computers.
The term "von Neumann bottleneck" was coined by John Backus  in his 1977 ACM Turing award lecture to address the issues arising from the separation between the CPU and memory. According to Backus: "Surely there must be a less primitive way of making big changes in the store than by pushing vast numbers of words back and forth through the von Neumann bottleneck. Not only is this tube a literal bottleneck for the data traffic of a problem, but, more importantly, it is an intellectual bottleneck that has kept us tied to word-at-a-time thinking instead of encouraging us to think in terms of the larger conceptual units of the task at hand. Thus programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck and much of that traffic concerns not significant data itself, but where to find it."
The limitations of the SPC computing architecture were clearly on his mind when von Neumann gave his lecture at the Hixon symposium in 1948 in Pasadena, California . "The basic principle of dealing with malfunctions in nature is to make their effect as unimportant as possible and to apply correctives, if they are necessary at all, at leisure. In our dealings with artificial automata, on the other hand, we require an immediate diagnosis. Therefore, we are trying to arrange the automata in such a manner that errors will become as conspicuous as possible, and intervention and correction follow immediately." Comparing the computing machines and living organisms, he points out that the computing machines are not as fault tolerant as the living organisms. He goes on to say "It's very likely that on the basis of philosophy that every error has to be caught, explained, and corrected, a system of the complexity of the living organism would not run for a millisecond."
The resiliency of biological systems stems from its genetic computing model supporting the genetic transactions of replication, repair, recombination and reconfiguration . Evolution of living organisms has taught us that the difference between survival and extinction is the information processing ability of the organism to:
- Discover and encapsulate the sequences of stable patterns that have lower entropy, which allow harmony with the environment providing the necessary resources for its survival,
- Replicate the sequences so that the information (in the form of best practices) can propagate from the survived to the successor,
- Execute with precision the sequences to reproduce itself,
- Monitor itself and its surroundings in real-time, and
- Utilize the genetic transactions of repair, recombination and rearrangement to sustain existing patterns that are useful.
Life’s purpose, it seems, is to transfer the stable patterns that have proven useful from the survived to the successor in the form of encapsulated executable best practices. A cellular organism’s genetic program (specifying the sequences of stable patterns that assist in establishing equilibrium between the organism and its surroundings that provide the necessary resources) is encoded in its DNA. Consequently, faithful replication of that sequence is essential to preserve the organism’s unique characteristics.
A cellular organism’s genetic program (specifying the sequences of stable patterns that assist in establishing equilibrium between the organism and its surroundings that provide the necessary resources) is encoded in its DNA. Consequently, faithful replication of that sequence is essential to preserve the organism’s unique characteristics. According to Singer and Berg , “One of the wonders of DNA is that it encodes the complete machinery and instructions for its own duplication: some genes code for enzymes that synthesize the nucleotide precursors of DNA and others specify proteins that assemble the activated nucleotides into polynucleotide chains. There are also genes for coordinating the replication process with other cellular events, and still others that encode the proteins that package DNA into chromatin. Another extraordinary property of DNA is that it functions as a template and directs the order in which the nucleotides are assembled into new DNA chains. Provided with precisely the same synthetic machinery, different DNA’s direct only the formation of replicas of themselves.”
In addition, the genetic program also specifies the enzymatic machinery that rectifies the errors that occur occasionally during DNA replication, as well as enzymes that repair damage to the bases or helical structures of DNA caused by various factors. Even more impressive fact is that the genetic program also provides opportunities for genome variation and evolutionary change . “Certain genes encode proteins that promote strand exchanges between DNA molecules and thereby create new combinations of genetic information for the progeny. Other proteins cause genome rearrangements by catalyzing the translocations of small segments or even large regions within and among DNA molecules. Such recombination and translocations provide some of the substrates for evolution’s experiments, but some rearrangements cause disease. By contrast, the proper functioning of several genetic programs actually depends on specific DNA rearrangements.”
In essence, the genetic transactions of DNA (namely the mechanisms of replication, repair, recombination and rearrangement) provide a model for powerful abstractions that are essential for a computing model to create self-configuring, self-monitoring, self-protecting, self-healing, self-optimizing and self-propagating distributed systems. In this paper, based on the lessons from the von Neumann computing model, and the genetic computing model, we propose a new approach that implements the description, replication, signaling enabled control and execution of distributed tasks using the conventional SPC model. We call this Distributed Intelligent Managed Element (DIME) network computing model. The new computing DIME network architecture (computing DNA) allows us to implement a workflow as a set of tasks, arranged or organized in a directed acyclic graph (DAG) and executed by a managed network of distributed computing elements (DIMEs). These tasks, depending on user requirements are programmed and executed as loadable modules in each DIME.
In Section II, we describe the new parallel distributed computing model and illustrate its self-* properties. In Section III, we review two proofs of concept implementing the DIME network architecture and sketch a possible virtual services infrastructure. In Section IV, we present a comparison of DIME networks with current service architectures and conclusions for future direction.
II. THE DIME NETWORK ARCHITECTURE
The raison d’etre for DIME computing model is to fully exploit the parallelism, distribution and massive scaling possible with multicore processor based servers, laptops and mobile devices supporting hardware assisted virtualization and create a computing architecture in which the services and their management in real-time are decoupled from the hardware infrastructure and its management.
The model lends itself to be implemented i) from scratch to exploit the many core servers and ii) in current generation servers exploiting features available in current operating systems. In this section, we describe the DIME network architecture and both proof-of-concept implementations to demonstrate its feasibility. Figure 1 shows the transition from the current SPC computing model to DIME network computing model.
The services and their management are both part of the service executable packages implemented in a network of von-Neumann SPC computing nodes. Following the genetic computing model, the DIME computing model separates the service execution and its management exploiting the parallelism. Each DIME is a self-managed element with autonomy on its resources and is network aware with a signaling infrastructure. A signaling network overlay allows parallelism in resource configuration, monitoring, analysis and reconfiguration on-the-fly based on workload variations, business priorities and latency constraints of the distributed software components.
The DIME network architecture consists of four components:
- A DIME node which encapsulates the von Neumann computing element with self-management of fault, configuration, accounting, performance and security (FCAPS).
- Signaling capability that allows intra-DIME and Inter-DIME communication and control,
- An infrastructure that allows implementing distributed service workflows as a set of tasks, arranged or organized in a DAG and executed by a managed network of DIMEs and
- An infrastructure that assures DIME network management using the signaling network overlay over the computing workflow
The self-management and task execution (using the DIME component called MICE, the managed intelligent computing element) are performed in parallel using the stored program control computing devices. Figure 2 shows the anatomy of a DIME.
The self-management and task execution (using the DIME component called MICE, the managed intelligent computing element) are performed in parallel using the stored program control computing devices.
The DIME orchestration template provides the description for instantiating the DIME using an SPC computing device with appropriate resources required (CPU, memory, network bandwidth, storage capacity, throughput and IOPs). The description contains the resources required, the constraints and the addresses of executable modules for various components and various run time commands the DIME obeys. This description is called the regulatory gene and contains all the information required to instantiate the DIME with its FCAPS management components, the MICE and the signaling framework to communicate with external DIME infrastructure.
The service regulator provides the description for instantiating the DIME services using the MICE with appropriate resources required (CPU, memory, network bandwidth, storage capacity, throughput and IOPs). The description contains the resources required, the constraints and the addresses of executable modules for various components and various run time commands the service obeys. The configuration commands provide the ability for the MICE to be set up with appropriate resources and I/O communication network to be set up to communicate with other DIME components to become a node in a service delivery network implementing a workflow. Figure 3 shows the service implementation with a service regulator and the service execution package.
Signaling allows groups of DIMEs to collaborate with each other and implement global policies. The signaling abstractions are:
- Addressing: For network based collaboration, each FCAPS aware DIME must have a globally unique address and any services platform using DIMEs must provide name service management.
- Alerting: Each DIME is capable of self -identification, heartbeat broadcast, and provide a published alerting interface that describes various alerting attributes and its own FCAPS management
- Supervision: Each DIME is a member of a network with a purpose and role. The FCAPS interfaces are used to define and publish the purpose, role and various specialization services that the DIME provides as a network community member. Supervision allows contention resolution based on roles and purpose. Supervision also allows policy monitoring and control.
- Mediation: When the DIMEs are contending for resources to accomplish their specific mission, or require prioritization of their activities, the supervision hierarchy is assisted with mediation object network that provides global policy enforcement.
The DIME network architecture supports the genetic transactions of replication, repair, recombination and rearrangement. Figure 4 shows a single node execution of a service in a DIME network.
A single node of a DIME that can execute a workflow by itself or by instantiating a sub-network provides a way to implement a managed DAG executing a workflow. Replication is implemented by executing the same service as shown in figure 5.
By defining service S2 to execute itself, we replicate S2 DIME. Note that S2 is a service that can be programmed to terminate instantiating itself further when resources are not available. In addition, dynamic FCAPS (parallel service monitoring and control) management allows changing the behavior of any instance from outside (using the signaling infrastructure) to alter the service that is executed. Figure 6 shows dynamic service reconfiguration.
The ability to execute the control commands in parallel allows dynamic replacement of services during run time. For example by stopping service S2 and loading and executing service S1, we dynamically change the service during run time. We can also redirect I/O dynamically during run time. Any DIME can also allow a subnetwork instantiation and control as shown in figure 7. The workflow orchestrator instantiates the worker nodes, monitors heartbeat and performance of workers and implement fault tolerance, recovery, and performance management policies.
It can also implement accounting and security monitoring and management using the signaling channel. Redirection of I/O allows dynamic reconfiguration of worker input and output thus providing computational network control.
III. PROOF OF CONCEPT IMPLEMENTATIONS AND VIRTUAL SERVICES INFRASTRUCTURE
In summary, the dynamic configuration at DIME node level and the ability to implement at each node, a managed directed acyclic graph using a DIME sub-network provides a powerful paradigm for designing and deploying managed services. The DIME network computing model just formalizes a distributed object network implementation (borrowing heavily from the computing models deployed by the DNA and the Genomes) that can be programmed to self-configure, self-secure self-monitor, self-heal and self-optimize based on business priorities, workload variations and latency constraints implemented as local and global policies. The infrastructure can be implemented using any of the standard Operating Systems that are available today. The key abstractions that are leveraged in this model are:
- Parallel implementation of self-management and computing element at the (DIME) node level and
- Parallel implementation of signaling based DIME network management and workflow implementation as a managed DAG
The parallelism and signaling allow the dynamism required to implement the genetic transactions which provide the self-* features that are the distinguishing characteristics of living organisms:
- Specialization: each computing entity is specialized to perform specific tasks. The intelligence is embedded locally that can be utilized to perform collection, computing and control functions with an analog interface to the real world.
- Separation of concerns: groups of computing entities combine their specializations through mediation to create value added services
- Priority based mediation: the mediation is supervised to resolve contention for resources based on overall group objectives to optimize resource utilization
- Fault tolerance, security and reliability: using alerting, supervision and mediation, implement sequencing of workflow to provide a high degree of resilience to the workflow.
The DIME computing model does not replace any of the computing models that are implemented using the SPC computing model today. It provides a self-* infrastructure to implement them with the dynamism and resiliency of living organisms. The DIME computing model focuses only on the reliable execution of stable patterns that are described as managed DAGs. It does not address how to discover more stable patterns from existing workflows (with lower entropy).
The DIME network computing model was originally suggested by Rao Mikkilineni  and two proofs of concept were developed, one using Ubuntu* Linux Operating System  by Giovanni Morana and another using a native Parallax Operating System running on Intel multi-core servers [7 and 8] by Ian Seyler. The Linux* implementation demonstrates scaling and self-repair of Linux* processes without the use of Hypervisor. The Parallax OS is implemented in assembler language with C/C++ API for higher level programming and demonstrates scaling and self-repair across Intel multi-core servers.
The objective of operating systems and programming languages is to reduce the semantic gap between business workflow definitions and their executions in a von Neumann computing device. The important consequences of current upheaval in hardware with multi-CPU and multi-core architectures on a monolithic OS that shares data structures across cores are well articulated by Baumann et al . They also introduce the need for making the OS structure, hardware-neutral. "The irony is that hardware is now changing faster than software, and the effort required to evolve such operating systems to perform well on new hardware is becoming prohibitive." They argue that single computers increasingly resemble networked systems, and should be programmed as such.
As the number of cores increase to hundreds and thousands in the next decade, current generation operating systems cease to scale and full-scale networking architecture has to be brought inside the server. The DIMEs enable the execution of distributed and managed workflows within a server or across multiple servers with its unifying network computing model. Exploiting this, Parallax operating system leverages chip-level hardware assistance provided to virtualize, manage, secure, and optimize computing at the core. While this work is in its infancy, the new OS also seems to exploit fully the parallelism and multithread execution capabilities offered in these computing elements to implement the managed DAGs with parallel signaling control network. The service infrastructure we sketch here has three main components:
1. Service component development platform which allows defining each service gene (regulatory and domain specific components), 2. Service workflow orchestrator that composes workflows and prepares the images for runtime, and 3. Service assurance platform that allows run-time policy implementations and dynamic services management.
Figure 8 shows a simple service domain in which a set of distributed nodes that control environment using monitoring of sensors and control of a fan.
There are two management workflows shown in the picture:
1. The FCAPS management of the DIME infrastructure implemented at the operating system level and 2. The FCAPS management of the domain specific workflow implemented at the application level which in this case is monitoring and managing the service workflow implemented by the MICE network.
The global and local policies for the service domain are implemented in each node. The DIME infrastructure management assures instantiation and run time-service assurance. The framework provides a scalable architecture with dynamic reconfiguration of service workflow made possible by the genetic transactions of replication, repair, recombination and reconfiguration provided in the DNA. The implementation of the service framework either on a native OS such as Parallax or a current OS such as Linux* or Windows* provides a 100% decoupling of services management from infrastructure management of hardware that is hosting the services. The resulting architectural resilience, in our IT infrastructure at the core, comparable to that of cellular organisms brings telecom grade trust to global communication, collaboration, and commerce at the speed of light.
This and other papers implementing proofs of concept [5, 6, 7 and 8] describe a first step in evaluating a new parallel, distributed and scalable computing model that extends the current von Neumann SPC computing model. In fact, it is perhaps closer to the self-replicating model von Neumann was seeking to duplicate the characteristics of fault tolerance, self-healing and other such attributes observed in living organisms . Discussing the work of Francois Jacob and Jacques Monod on genetic switches and gene signaling, Mitchell Waldrop  points out that "DNA residing in a cell's nucleus was not just a blueprint for the cell - a catalog of how to make this protein or that protein. DNA was actually the foreman in charge of construction. In effect, DNA was a kind of molecular-scale computer that directed how the cell was to build itself and repair itself and interact with the outside world.”
The DIME network architectures provides a way to create a blueprint for the business workflow and a mechanism to execute it based on a service management workflow implementing local and global policies based on business priorities, workload fluctuations and latency constraints. This approach is quite distinct from current approaches [9, 12, 13, 14, 15, 16, 17 and 18] that use von-Neumann computing model for service management and offers many new directions of research to provide next level of scaling, telecom grade trust through end-to-end service FCAPS optimization and reduced complexity in developing, deploying and managing distributed federated software systems executing business workflows.
The beauty of this model is that it does not impact the current implementation of the service workflow using von-Neumann SPC nodes. But by introducing parallel control and management of the service workflow, the DIME network architecture provides the required scaling, agility and resilience both at the node level and at the network level. The signaling based network level control of a service workflow that spans across multiple nodes allows the end-to-end connection level quality of service management independent of the hardware infrastructure management systems that do not provide any meaningful visibility or control to the end-to-end service transaction implementation at run time. The only requirement for the DIME infrastructure provider is to assure that the node OS provides the required services for the service controller to load the Service Regulator and the Service Execution Packages to create and execute the DIME.
The network management of DIME services allows hierarchical scaling using the network composition of sub-networks. Each DIME with its autonomy on local resources through FCAPS management and its network awareness through signaling can keep its own history to provide negotiated services to other DIMEs thus enabling a collaborative workflow execution.
We identify various major areas of future research that may prove most effective:
- Implementing DNA in current operating systems, as the DIMEs in Linux*  approach illustrates, provides an immediate path to enhance efficiency of communication between multiple images deployed in a many-core server without any disruption to existing applications. In addition, auto-scaling, performance optimization, end-to-end transaction security and self-repair attributes allow various applications currently running under Linux* or Windows* to migrate easily to more efficient operating platforms.
- Implementing a new OS such as Parallax [7. 8] allows designing a new class of scalable, and self-* distributed systems design transcending physical, geographical and enterprise boundaries with true decoupling between services and the infrastructure that they reside on. The service creation and workflow orchestration platforms can be implemented on current generation development environments whereas the run time services deployment and management can be orchestrated in many-core servers with DNA.
- Signaling and FCAPS management implemented in hardware to design a new class of storage will allow the design of next generation IT hardware infrastructure with Self-* properties.
- As hundreds of cores in a single chip enable thousands of cores in a server, the networking infrastructure and associated management software including routing, switching and firewall management will migrate to the server inside from outside. The DIME network architecture with its connection FCAPS management using signaling control will eliminate the need to replicate current network management infrastructure also inside the server. The routing and switching abstractions will be incorporated in intra-DIME and Inter-DIME communication and signaling infrastructure.
Eventually, it is possible to conceive of signaling being incorporated in the many-core chip itself to leverage the DNA in hardware.
The author wishes to acknowledge many valuable discussions, with Kumar Malavalli, Albert Comparini and Vijay Sarathy from Kawa Objects Inc.*, which have contributed to the DIME Network Architecture. The author also wishes to express his gratitude to Giovanni Morana from Catania University*, Italy and Ian Seyler from Return Infinity*, Canada for very quickly implementing the DIME networks to create the proofs of concept demonstration of service virtualization and real-time dynamic self-* capabilities.
-  David Patterson, “The trouble with multi-core”, IEEE Spectrum, July 2010, p28
-  Backus, J. “Can programming be liberated from the von Neumann style? A functional style and its algebra of programs”, Communications of the ACM 21, 8, (August 1978), 613-641
-  Neumann, J. v., "Papers of John von Neumann on Computing and Computer Theory", in Charles Babbage Institute Reprint Series for the History of Computing, edited by William Aspray and Arthur Burks MIT Press.Cambridge, MA:1987, p409, p.474.
-  Maxine Singer and Paul Berg, “Genes & genomes: a changing perspective”, University Science Books, Mill Valley, CA, 1991, p 73
-  Mikkilineni, R “Is the Network-centric Computing Paradigm for Multicore, the Next Big Thing?” Retrieved July 22, 2010, from Convergence of Distributed Clouds, Grids and Their Management: http://computingclouds.wordpress.com
-  Giovanni Morana, and Rao Mikkilineni, “Scaling and Self-repair of Linux* Based Applications Using a Novel Distributed Computing Model Exploiting Parallelism". IEEE proceedings, WETICE2011, Paris, 2011
-  Rao Mikkilineni and Ian Seyler, "Parallax – A New Operating System for Scalable, Distributed, and Parallel Computing", The 7th International Workshop on Systems Management Techniques, Processes, and Services, Anchorage, Alaska, May 2011
-  Rao Mikkilineni and Ian Seyler, “Parallax – A New Operating System Prototype Demonstrating Service Scaling and Self-Repair in Multi-core Servers”, IEEE proceedings, WETICE2011, Paris, 2011
-  Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schupbach, and Akhilesh Singhania, "The Multikernel: A new OS architecture for scalable multicore systems", In Proceedings of the 22nd ACM Symposium on OS Principles, Big Sky, MT, USA, October 2009
-  John von Neumann, Papers of John von Neumann on Computing and Computing Theory, Hixon Symposium, September 20, 1948, Pasadena, CA, The MIT Press, 1987, p454, p457
-  Mitchell Waldrop, M., “Complexity: The Emerging Science at the Edge of Order and Chaos”, Simon and Schuster Paperback, New York, 1992, p 31
-  Wentzlaff, D. and Agarwal, A. (2009). Factored operating systems (fos): the case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43(2):76–85.
-  Liu, R. Klues, K. Bird, S. Hofmeyr, S. Asanovi´c, K. and Kubiatowicz, J. (2009) Tesselation: Space-Time Partitioning in a Manycore Client OS, In HotPar09, Berkeley, CA, 03/2009.
-  Colmenares, J. A. Bird, S. Cook, H. Pearce, P. Zhu, D. Shalf, J. Hofmeyr, S. Asanovic, K and Kubiatowicz, J. (2010). Tesselation: Space-Time Partitioning in a Manycore Client OS in Proc. 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar'10). Berkeley, CA, USA. June.
-  Wentzlaff D. and Agarwal. A. (2009). Factored operating systems (fos): the case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43(2):76–85.
-  Nightingale, E. B. Hodson, O. McIlroy, R. Hawblitzel, C. and Hunt G. Helios: Heterogeneous Multiprocessing with Satellite Kernels, ACM, SOSP’09, October 11–14, 2009, Big Sky, Montana, USA
-  Mao, O. Kaashoek, F. Morris, R. Pesterev, A. Stein, L. Wu, M. Dai, Y. Zhang, Y. Zhang, Z. Corey: an operating system for many cores, (2008). Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation OSDI '08, San Diego, California, December.
-  Rajkumar Buyyaa, C. S. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems Volume 25, Issue 6, 599-616.
R. Mikkilineni received his PhD from University of California, San Diego in 1972 working under the guidance of prof. Walter Kohn. He later worked as a research associate in University of Paris, Orsay, Courant Institute of Mathematical Sciences, New York and Columbia University, New York..
He is currently the Founder and CTO of Kawa Objects Inc., California, a Silicon Valley startup developing next generation computing infrastructure. His past experience includes working at AT&T Bell Labs, Bellcore, U S West, several startups and more recently at Hitachi Data Systems..
Dr. Mikkilineni co-chairs the 1st track on Convergence of Distributed Clouds, Grids and their Management in IEEE International WETICE2011 Conference.