How to use Software Development Tools Targeting Intelligent Systems and Embedded Devices

Abstract

Embedded platforms and integrated intelligent systems come in many shapes and sizes. Many are based on heterogeneous System-on-Chip designs using a multitude of hardware architectures. System software stack customization and application development for varied configurations requires a flexible approach and development tools that can be adapted to the needs of cross-development for a variety of software stacks. Software development tools should also provide implementation approaches for dedicated throughput optimization and compute power of high priority processes.

Whether the intelligent system you develop for is a low-power Intel® Atom™ Processor based design running a small real-time custom embedded Linux* or a Intel® Xeon™ based multi-socket design with reserved processor cores for different tasks, in either case the basic requirements of embedded software development apply. It is desirable to have deep hardware platform and device driver insight, flexible cross-build environment support for sysroot based solutions, configurability to adjust to unique requirements, and integration options for commonly used cross-build environments.

This article goes into the dos and don'ts of defining a custom development environment for your specific needs when targeting Intel® architecture and the available solutions to achieve a well-designed development setup. 

Introduction

Over the past several years we have seen the traditional embedded market segment experience a transformation from fixed function and isolated embedded systems to a new category of intelligent systems. The transformation is changing the way people engage with and experience computing. Implications from this shift are that devices are more capable and interact with each other and these usage models demand greater performance, an increasingly capable cloud of services, and software technology and tools that support these new classes of devices. These devices are secure, connected and managed. Several industry analyst firms such as IDC* and others are seeing this shift and have created a new embedded sub-category called “Intelligent Systems”. The increased usage of system-on-chip (SoC) designs in traditional embedded market segments as well as the new category of Intelligent Systems requires the developer as well as tools vendor a like to reassess the entire development process from design, code generation and debug to system performance analysis.

When developing the system software stack for embedded devices you are frequently confronted with the question how to set up the most appropriate development environment first. It helps to understand the options for approaching software development targeting embedded systems. In this article we will focus on SoC designs using an application processor based on x86 or Intel architecture. The target application processor architecture being similar to the architecture of the system most likely used by the developer for his or her development work has some unique advantages and widens the choices available.

These choices range from traditional native development of applications on the development host for later deployment on a target device to full blown traditional embedded cross development using a virtual machine with hypervisor, a remote debug connection to a physical target device and a sysroot or chroot based build environment. We will attempt in this article to shed some light on each of these options.

One of the most commonly used operating system for Intel architecture based embedded systems is a customized flavor of Linux*. These may range from something based mainstream Linux* distributions to custom builds from scratch. Linux distributions specialized on embedded and mobile computing applications like Wind River* Linux*, Yocto Project*, or Android*, are playing a more and more prominent role. Frequently embedded designs have a real-time requirement for at least parts of the software stack. This implies that isolating those components of the software stack that have this requirement and ensuring that they do have only minimal dependencies on the non real-time part of the software stack becomes one key challenge of developing and defining embedded systems software.

Furthermore, when talking about SoCs, understanding the interaction between all the platform components becomes vitally important. You may be developing device drivers that take advantage of special GPU features or use micro-engines and other hardware accelerators found on the platform. You may be designing the software interface for the many wireless radios found in today’s small form factor devices. In either scenario, being able to analyze the message timing – making sure the variable values that are exchanged between the different platform components contain the right value at the right time, becomes important for platform software stack stability. Being able to access device configuration registers easily and in a comprehensive manner also simplifies developing device drivers and controlling platform component interaction timings.

When first starting on an embedded software development project these requirements may seem confusing and it is easy to get distracted by the complexity of the different requirements instead of focusing on the simple basics of defining your setup. This is the starting point of this article.

 

1.    Intel® Architecture in Embedded Devices and Intelligent Systems

The x86 architecture played an important role in embedded almost since its inception. Over the past 30+ years many industry players like AMD*, ST Microelectronics*, and General Dynamics* as well as Intel have had many highly integrated designs targeted at the embedded market place. Recently two developments accelerated, are changing the needs of software developers and are driving the requirement for a more flexible approach to software development environments. The traditional boundaries of embedded and non-embedded are breaking down in that highly integrated computational power is needed throughout the modern communication, M2M and cloud infrastructure, integrating the embedded domain ever more closely with the cloud and server infrastructure.

Figure 1: From Embedded Devices to Intelligent Systems

 

The Intel® Atom™ processors are used in designs ranging from print imaging, industrial control, digital signage, point of sales, medical tablet, in-vehicle infotainment, digital surveillance, IPTV, connected services gateways and home energy management. Every embedded use case that requires both compatibility to the wide range of the x86 based infrastructure and is power sensitive at the same time, is suitable for this processor generation.

AMD* Opteron* series, Intel® Core™ and Intel® Xeon™ processor families are used in a wide variety of server, industrial, networking, communication and media infrastructure applications.

The embedded devices and intelligent systems operating inside the “Internet of Things” are based on the same architecture that powers many development machines and personal computers. Hence they benefit from the same software ecosystem that enables original equipment manufacturers (OEMs) of those devices to rapidly develop and upgrade with cost savings.


Figure 2: Categories of Intelligent Systems

 

However, the ever increasing integration level and increased use of heterogeneous multi-core systems of modern designs add some unique challenges to the established ecosystem of embedded software development solutions for Intel architecture. Here we are taking a look at these challenges and how they can be addressed.

2.    Cross–Development for Intelligent Systems

The first decision to be made when defining the build environment is the choice of a cross-build development tools set to base development on. If you are using a proprietary real-time operating system (RTOS), the basic development toolchain will come from the OS vendor. Examples for this are Integrity* from Green Hills* Software, VxWorks* from Wind River*, Nucleus* from MentorGraphics* or QNX*. All of these come with their defined set of development tools, which can only be augmented by utilities from other vendors.

If you are using a Linux* based target OS, whether with or without real-time scheduler, the choices get considerably more varied. You have the option to build your own cross build tool chain, create a full target OS image inside a buildroot/chroot wrapper or to use a variety of pre-defined cross build environments that come with specific Linux* flavors targeted for the embedded market.

Let us start with the roll-your-own approach to highlight what to consider when choosing a cross-development centric GNU* toolchain.

When talking about development tools usage in embedded, one must distinguish three different machines:

  • the build machine, on which the tool set is built
  • the host machine, on which the tool set is executed
  • the target machine, for which the tool set generates code

Very rarely do you actually need 3 systems. In most scenarios the build machine and the host machine will be identical. This reduces the setup to a development host machine and a development target machine.

 

Figure 3: Basic Cross-Development Setup

 

A complete tool suite is made up of the following components:

  • Compiler: This could be a range of different compilers. For Linux* targets the most common compiler to use are the GCC* and G++ GNU* Project compilers. This could be augmented with for instants the Intel® C++ Compiler or in some cases it may even be useful to replace it with the Intel® C++ Compiler for an entire application build. The main question will be whether this compiler is going to be a native GCC build or a compiler build that is part of a tool suite specifically targeted at cross development.
  • Build Process Backend Components: Looking beyond the compiler itself, there are the build backend tools like assembler, linker, and output format conversion utilities. In the Linux* world this is covered by the GNU Binutils*.
  • Libraries:
    • The main set of library files required are part of the C library, which implements the traditional POSIX API that can be used to develop user space applications. It interfaces with the kernel through system calls, and provides higher-level services. GLIBC* (http://www.gnu.org/software/libc/) is the C library from the GNU project. Embedded GLIBC (http://eglibc.org)) is a variant of the GNU C Library (GLIBC) optimized for embedded systems. Its goals include reduced footprint, support for cross-compiling and cross-testing, while maintaining source and binary compatibility with GLIBC. uClibc (http://uclibc.org) is an alternate C library, which features a much smaller footprint. This library can be an interesting alternative if flash space and/or memory footprint is an issue. The C library has a special relation with the C compiler, so the choice of the C library must be done when the custom tool suite is generated. Once the toolchain has been built, it is usually no longer possible to switch to another library.
    • Additionally libraries for optimized and simplified implementation of signal processing, media processing and security encryption routines may be of interest. These are libraries like Vector Signal Image Processing Library VSIPL (http://www.vsipl.org/) or the Intel® Integrated Performance Libraries (Intel® IPP) or Intel® Math Kernel Library (Intel® MKL).
  • Performance and Power Analysis Tools: With the typically embedded requirement of getting a maximum amount of performance for a dedicated task combined with reduced power consumption or battery usage out of a given highly integrated chipset, software tuning and analysis tools  become an ever more important part of your cross-development toolset. These tools come in very varied packages and with a wide range of use cases. They will be discussed in a bit more detail later.
  • Debuggers: The tool whose importance is commonly most underestimated, by those not already used to the extreme reliability requirements of many embedded use cases, is the debugger. These fall into three categories. Memory and code correctness checking tools, high level language application debug tools and system software debuggers. The latter two are focused on real-time debug of software stack components. A forth type of debug tools that has found new live because of the complex interactions in heterogeneous multicore SoC chipsets is the instrumentation based event tracing utility.

 

Only the build tools configuration of a tool set like this would be considered for generating yourself. 

 

The most straightforward approach that is especially easy in a scenario where the development host architecture as well as OS, are closely related to the target architecture and OS is something called buildroot (http://www.buildroot.net). The concept behind this is that you have a complete customized embedded Linux build environment and can create a custom OS image. If you include the build tools in this OS image, you automatically also get the build environment with it. If you then use chroot (http://www.linux.org/article/view/chroot-access-another-linux-from-your-current-linux-) to create a virtualized boot environment for your virtual OS image, you can run your new embedded Linux* build environment in a protective virtual wrapper on top of the standard Linux* host. As in the case of Intel® Architecture targets the development host and target are closely related, this is possible without the overhead of an actual virtual machine.

 

The alternative approach, which is valid for other architecture targets as well as Intel® architecture is the generation of a custom cross-build binutils and gcc tools set using the –sysroot option to tell the compiler which version of C libraries, headers, start objects and other shared objects to use and where they can be found. All common GNU based cross-build environments use some variation of this approach.

 

Creating your own cross-build GNU tool set is often tied into creating your own custom Linux* platform. OpenEmbedded* (http://www.openembedded.org) with its Bitbake* utility is one approach to roll a custom Linux*. Poky Linux* (http://pokylinux.org) offers an approach to creating your own cross-build GNU toolchain.

 

Many complete Embedded Linux* frameworks like OpenEmbedded* or Yocto Project* rely on these as a foundation.

 

Instead of creating your own framework it usually makes more sense to rely on the frameworks that already exist. In addition to OpenEmbedded and Yocto Project, there are also a variety of offerings from Code Sourcery*, Wind River*, MontaVista* and Timesys*, that may be worth having a closer look at.

 

Now that we defined the base Embedded Linux* platform to use for our Intel® architecture targeted development, let us tackle the question of whether virtualization is desired for your setup and then take the Yocto Project* as an example in the context of which we talk more about the individual tools components for your embedded development environment

 

3.    Virtualization

In the embedded space the key motivation for virtualization is to separate a critical workload from the rest of the software stack and assigning dedicated resources to it. As with the tool chain definition there are a wide range of choices. As we are talking Linux, you could create this protective virtualization wrapper based on a combination of the open-source QEMU*, KVM* and Virtual Machine Manager utilities.

You can also have a closer look at the solutions from the specialists in this field like Oracle VirtualBox* (https://www.virtualbox.org), VMWare* (http://www.vmware.com) or Wind River’s Hypervisor* (http://www.windriver.com/products/product-notes/PN_Hypervisor_0610.pdf).

Virtualization alone will not do the trick. To assign dedicated hardware resources a virtual machine manager or hypervisor will be needed.

4.    Build & Software Design

After having made our choice of an embedded Linux* OS and baseline development tool set, it is now time to configure the build environment. For our purposes we assume a Yocto Project based build framework with Yocto Project’s Poky Linux based Application Development Toolkit (ADT).

Figure 3: Yocto Project* and ADT* Build Flow

 

Below is a sample of some of the environment settings required to set up the build environment. These are predefined in an environment file that comes with ADT.

export PATH=/home/users/poky/1.3/sysroots/x86_64-pokysdk-linux/usr/bin:/home/users/poky/1.3/sysroots/x86_64-pokysdk-linux/usr/bin/i586-poky-linux:$PATH
export PKG_CONFIG_SYSROOT_DIR=/home/users/poky/1.3/sysroots/i586-poky-linux
export PKG_CONFIG_PATH=/home/users/poky/1.3/sysroots/i586-poky-linux/usr/lib/pkgconfig
export CONFIG_SITE=/home/users/poky/1.3/site-config-i586-poky-linux
export CC="i586-poky-linux-gcc  -m32   -march=i586 --sysroot=/home/users/poky/1.3/sysroots/i586-poky-linux"
export CXX="i586-poky-linux-g++  -m32   -march=i586 --sysroot=/home/users/poky/1.3/sysroots/i586-poky-linux"
export CPP="i586-poky-linux-gcc -E  -m32   -march=i586 --sysroot=/home/users/poky/1.3/sysroots/i586-poky-linux"
export AS="i586-poky-linux-as  "
export LD="i586-poky-linux-ld   --sysroot=/home/users/poky/1.3/sysroots/i586-poky-linux"
export GDB=i586-poky-linux-gdb
export STRIP=i586-poky-linux-strip
export RANLIB=i586-poky-linux-ranlib
export OBJCOPY=i586-poky-linux-objcopy
export OBJDUMP=i586-poky-linux-objdump
export AR=i586-poky-linux-ar
export NM=i586-poky-linux-nm
export M4=m4

This shows the challenge of forcing a cross-compiler to pick up the correct libraries and binutils vs. reverting to the build host defaults.

This becomes even more obvious when using the Intel® C++ Compiler in conjunction with the Yocto Project* ADT. The reason for doing this may be for spot-optimization of particularly performance sensitive routines or even for whole-application optimization. The Intel® C++ Compiler can be called with the –platform=yl13 option. The integration magic is hidden in a compiler environment file that looks as follows:

*platform:

  yocto

*yocto_sdk_toolchain:

  %$(YOCTO_TOOLCHAIN)

*sysroot:

  %$(YOCTO_SYSROOT)

*target_root:

  %(sysroot)

*gcc_install:

  %(sysroot)/usr/lib/gcc/i586-poky-linux/4.6.4

*intel_include:

  %(intel_root)/../compiler/include

*intel_lib:

  %(intel_root)/../compiler/lib/ia32

*exec_path:

  %(yocto_sdk_toolchain)/i586-poky-linux

*exec_prefix:

  i586-poky-linux-

*gxx_include:

  %(sysroot)/usr/include/c++

*link_lib_path:

  %(intel_lib)%(path_separator)%(gcc_install)%(path_separator)%(sysroot)/lib%(path_separator)%(sysroot)/usr/lib%(path_separator)%(sysroot)/usr/lib/i586-poky-linux/4.6.4

*link_start_files:

  %{static?%{p?%(sysroot)/usr/lib/gcrt1.o;%(sysroot)/usr/lib/crt1.o};%{!shared?%(sysroot)/usr/lib/crt1.o}} %(sysroot)/usr/lib/crti.o %{static?%(sysroot)/usr/lib/i586-poky-linux/4.6.4/crtbeginT.o;%{shared?%(sysroot)/usr/lib/i586-poky-linux/4.6.4/crtbeginS.o;%(sysroot)/usr/lib/i586-poky-linux/4.6.4/crtbegin.o}}

*link_end_files:

  %{!static?%{shared?%(sysroot)/usr/lib/i586-poky-linux/4.6.4/crtendS.o;%(sysroot)/usr/lib/i586-poky-linux/4.6.4/crtend.o};%(sysroot)/usr/lib/i586-poky-linux/4.6.4/crtend.o} %(sysroot)/usr/lib/crtn.o

*link_default_libs:

%{!static?%{i-dynamic|shared?-Bdynamic;-Bstatic}} -lsvml -limf \

  %{!static?-Bdynamic} -lm \

  %{!static?%{i-dynamic|shared?-Bdynamic;-Bstatic}} -lipgo -ldecimal \

  %{!static?%{!no-intel-extensions?--as-needed -Bdynamic -lcilkrts -lpthread -lstdc++ --no-as-needed}} \

  %{i_cxxlink? \

    %{cxxlib-gcc? \

      %{!static?%{i-static|static-libcxa?-Bstatic;-Bdynamic}} -lcxaguard}} \

  %{openmp|parallel?%{!static?%{i-static?-Bstatic;-Bdynamic}} -liomp5} \

  %{openmp-profile?%{!static?%{i-static?-Bstatic;-Bdynamic}} -liompprof5} \

  %{openmp-stubs?%{!static?%{i-static?-Bstatic;-Bdynamic}} -liompstubs5} \

  %{pthread|parallel|openmp|openmp-profile?%{!static?-Bdynamic} -lpthread} \

  %{!static?%{i-dynamic|shared?-Bdynamic;-Bstatic}} %{pic-libirc?-lirc_pic;-lirc} \

  %{!static?-Bdynamic} -lc \

  %{cxxlib-gcc? \

    %{!cxxlib-nostd?%{!static?-Bdynamic} -lstdc++;%{!static?-Bdynamic} -lsupc++} \

    %{static|static-libgcc? \

      %{!static?-Bstatic} -lgcc -lgcc_eh; \

        %{!shared?%{!static?%{static-libgcc?-Bstatic;-Bdynamic}} -lgcc -lgcc_s}} \

    %{!static?-Bdynamic} -ldl -lc}

 

The Intel® C++ Compiler provides a set of pre-configured environment files of this type, that take care of telling it where to find the correct libraries, start object, end objects, header files and such for all the different possible linkage models. This environment file architecture allows it to be flexible enough to fit with most any GNU cross-build environment.

To simplify things additionally there is an environment file editor that is part of the Intel® C++ Compiler Eclipse integration.


Figure 4: Intel® C++ Compiler Cross-Build Environment File Editor

Another important aspect of setting up an embedded development cross-build environment is the decision, which signal processing library to use. VSIPL* and its derivatives have become the quasi standard in the open source world for this purpose. The Intel® IPP and Intel® MKL do however also offer some interesting alternatives, especially with their low-level targeted optimization for the latest generation of Intel® architecture.

5.    Power Tuning

Power Tuning is one of the historically most neglected areas of embedded software development. Mostly it was seen as the realm of platform hardware design. With the need to squeeze ever more battery life out of our devices as well as meeting challenging thermal requirements, software level power tuning has however come into its own.

As in our previous discussions let us start with the available open source solutions. A good resource is the Less Watts* initiative (http://www.lesswatts.org). It provides best known methods for reducing power consumption on an OS as well as application level. In addition you can download a range of utilities like PowerTop*, the Battery Life Toolkit*, and Power QoS*. Have a look around and find the approach that is right for you.

Granola* from MiserWare* provides an integrated power management and power footprint solution that is primarily targeted at server type applications, but can be scaled for embedded use cases as well.

Intel recently added power analysis capabilities to its VTune™ Amplifier product as well, combining basic principles of tools like PowerTop with its background in mapping events (in this case frequency changes and power mode changes) to application and system software source locations in memory. (Fig. 5)

 

Figure 5: Wake-Up Event and Sleep Mode Analysis

 

6.    Performance Tuning

In addition to direct power tuning, tuning for performance and thus having critical frequently used applications finish their work quickly and then allow the OS to revert to a lower power state quickly also has considerable impact on battery life of small form factor devices. Application and software stack performance is however also a desirable goal in its own right. Whether it is only to create a smoother user experience or in the case of dedicated signal processing to reduce the stress on the high performance embedded server processors doing all the work.

In the Linux* world OProfile* is probably the most used utility for advanced performance tuning. It comes with command-line sampling capabilities as well as Eclipse* Integrated Development Environment (IDE) integration.

Fig. 6: OProfile* and Intel® VTune™ Amplifier Performance Data Collection.

The Intel® VTune™ Amplifier provides similar baseline functionality with its own graphical user interface (GUI). The key feature for performance analysis as well as power analysis in Embedded is the ability to support remote data collection and having a small footprint remote sampling collector. In addition being able to do system-wide sampling across the kernel space / user space barrier is important as many embedded applications have strong dependencies on device drivers that interact with dedicated micro-engines or other I/O devices. The solution to this is to have a processor Performance Monitoring Unit (PMU) driver implemented as a kernel module. This gives broader access to more performance relevant platform events like cache misses, branch mispredictions, memory writes and TLB lookups.  In addition it allows looking at these from a whole software stack perspective and not just from the viewpoint of a single application as you would have with dynamic binary instrumentation based solutions. It also limits the memory impact of the performance sampling, an important consideration for many throughput optimized embedded use cases with memory intensive application workloads.

Figure 7 shows how a remote command line sampling collector interacting with a driver stub on the target allows for remote performance and power sampling without having to manually do result file transfers.

Fig. 7: command-line remote data collection as implemented with the Intel® VTune™ Amplifier

7.    Reliability and Debug

The often most neglected component of a complete tool suite outside the embedded space are the debuggers. Embedded devices with their higher average deployment in the field, more difficult maintenance and higher stress on platform components compared to standard consumer PCs, do however ask for increased reliability of the software stack as well. This means that any development tool suite for embedded devices and intelligent systems is not complete without a well thought-out set of debug tools.

As mentioned in the introductory cross-development overview, debuggers fall into three categories: application debuggers, system debuggers, and event tracing tools.

Application Debug

For application debug in embedded cross-development scenarios, let us focus on GDB* as an example. It comes with a remote debug agent called gdbserver. This debug agent can be installed on the debug target to launch a debuggee and attach to it remotely from the development host.

To do so, start your program using gdbserver on the target machine. gdbserver then automatically suspends the execution of your program at its entry point, waiting for a debugger to connect to it. The following command start an application and tells gdbserver to wait for a connection with the debugger on localhost port 2000.

                $ gdbserver localhost:2000 program
     Process program created; pid = 5685
     Listening on port 2000

Once gdbserver has started listening, we can tell the debugger to establish a connection with this gdbserver, and then start the same debugging session as if the program was being debugged on the same host, directly under the control of GDB.

                $ gdb program
     (gdb) target remote targethost:4444
     Remote debugging using targethost:4444
     0x00007f29936d0af0 in ?? () from /lib64/ld-linux-x86-64.so.
     (gdb) b foo.adb:3
     Breakpoint 1 at 0x401f0c: file foo.adb, line 3.
     (gdb) continue
     Continuing.
     
     Breakpoint 1, foo () at foo.adb:4
     4       end foo;

It is also possible to use gdbserver to attach to an already running program, in which case the execution of that program is simply suspended until the connection between the debugger and gdbserver is established. The syntax would be

$ gdbserver localhost:2000 --attach 5685

 

to tell gdbserver to wait for GDB* to attempt a debug connection to the running process with process ID 5685

 

Using GDB* for remotely debugging an application running inside a virtual machine follows the same principle as remote debug using the gdbserver debug agent.

 

The only additional step is to ensure TCP/IP communication forwarding from inside the virtual machine and making the ip address of the virtual machine along with the port used for debug communication visible to the network as a whole.

 

This requires in the case of QEMU that the bridge-utils package is installed. Inside the virtual machine in the guest OS the /etc/qemu-ifup file needs to be modified to include the correct setting for IP forwarding. Wikibooks is a good resource for the details of this setup (http://en.wikibooks.org/wiki/QEMU/Networking).

 

System Software Debug

For the system integrator and device manufacturer it may very well be necessary to work on the device driver and system software stack layer when determining their software stack to be as reliable and stable as feasible.

For true firmware, OS level system and device driver debug, using a JTAG interface is the most commonly used method in the embedded intelligent systems world. The Joint Test Action Group (JTAG) IEEE 1149.1 standard defines a “Standard Test Access Port and Boundary-Scan Architecture for test access ports used for testing printed circuit boards.” This standard is commonly simply referred to as the JTAG debug interface. From its beginnings as a standard for circuit board testing it has developed into the de facto interface standard for OS independent and OS system level platform debug.

More background information on JTAG and its usage in modern system software stack debugging is available at in the article “JTAG 101; IEEE 1149.x and Software Debug” by Randy Johnson and Stewart Christie (http://download.intel.com/design/intarch/papers/321095.pdf).

From the OEM’s perspective and that of their partner application and driver developers, understanding the interaction between the driver and software stack components running on the different parts of the system-on-chip (SoC) integrated intelligent system or smartphone form factor device is critical for determining platform stability. From a silicon validator’s perspective, the low level software stack provides the test environment that exposes the kind of stress factors the platform will be exposed to in real-world use cases. In short, modern SoCs require understanding the complete package and its complex real-world interactions, not just positive unit test results for individual hardware components. This is the level of insight a JTAG-based system software debug approach can provide. This can be achieved by merging the in-depth hardware awareness JTAG inherently provides with the ability to export state information of the Android OS running on the target.

Especially for device driver debug, it is important to understand both the exact state of the peripheral device on the chipset and the interaction of the device driver with the OS layer and the rest of the software stack.

Various JTAG vendors offer system debug solutions for embedded Intel® architecture including:

  • American Arium* (http://www.arium.com)
  • Wind River* (http://www.windriver.com/products/JTAG-debugging/)
  • Macraigor Systems* (htttp://www.macraigor.com)
  • Green Hills Software* (http://www.ghs.com/products/probe.html)
  • Lauterbach* (http://www.lauterbach.com)
  • Intel (/en-us/embedded-development-tools)

A system debugger, whether debug agent based or using a JTAG device interface, is a very useful tool to help satisfy several of the key objectives of OS development. The debugger can be used to validate the boot process and to analyze and correct stability issues like runtime errors, segmentation faults, or services not being started correctly during boot. It can also be used to identify and correct OS configuration issues by providing detailed access and representations of page tables, descriptor tables, and also instruction trace. The combination of instruction trace and memory table access can be a very powerful tool to identify the root causes for stack overflow, memory leak, or even data abort scenarios.

If you are connected with your JTAG Debugger to the target platform early during the boot process, to find e.g. issues in the OS bring-up, it is first and foremost important to configure your debugger to only use hardware breakpoints. Attempting to write software breakpoints into memory, when target memory is not fully initialized, can corrupt the entire boot process.

Analyzing the code after the Linux* compressed zImage kernel image has been unpacked into memory, is possible by simply releasing run control in the debugger until start_kernel is reached. This implies of course that the vmlinux file that contains the kernel symbol info has been loaded. At this point the use of software breakpoints can be re-enabled. The operating system is then successfully booted once the idle loop mwait_idle has been reached.

A good JTAG debugger solution for OS level debug should furthermore provide visibility of kernel threads and active kernel modules along with other information exported by the kernel. To allow for debugging dynamically loaded services and device drivers, a kernel patch or a kernel module that exports the memory location of a driver’s initialization method and destruction method may be used.

Especially for system configuration and device driver debugging, it is also important to be able to directly access and check the contents of device configuration registers. These registers and their contents can be simply listed with their register hex values or visualized as bitfields as shown in Figure 8. A bitwise visualization makes it easier to catch and understand changes to a device state during debug  while the associated device driver is interacting with it.


Fig. 8: Device Register Bitfield View

Additionally, if your debug solution provides access to Last Branch Storage (LBR) based instruction trace, this capability can, in conjunction with all the regular run control features of a JTAG debugger, be used to force execution stop at an exception and analyze the execution flow in reverse identifying the root cause for runtime issues.

Last Branch Records can be used to trace code execution from target reset. Since discontinuities in code execution are stored in these MSRs, debuggers can reconstruct executed code by reading the ‘To’ and ‘From’ addresses, access memory between the specific locations, and disassemble the code. The disassembly is usually displayed in a trace GUI in the debugger interface. This may be useful for seeing what code was executed before a System Management Interrupt (SMI) or other exception if a breakpoint is set on the interrupt.

Event Trace Debug

Dozens of software components and hardware components interacting on SoCs increase the amount of time it takes to root-cause issues during debug. Interactions between the different software components are often timing sensitive. When trying to debug a code base with many interactions between components single-stepping through one specific component is usually not a viable option. Traditional printf debugging is also not effective in this context because the debugging changes can adversely affect timing behavior and cause even worse problems (also known as “Heisenbugs”).

There are a variety of static software instrumentation based data event tracing technologies that help address this issue. The common principle is that they utilize a small amount of DRAM buffer memory to capture event data as it is being created and then uses some kind of logging mechanism to write the trace result into a log file. Data trace monitoring can be real time by interfacing directly with the trace logging API or can be done offline by using a variety of trace viewers for analyzing more complex software stack component interactions. 

 

LTTng*, Ftrace* and SVEN* are 3 of the most common such implementations.

8. Summary

Developing the Linux* system software stack for complex SoCs provides some unique challenges. These challenges can however be easily managed with the right set of software development tools and by using a well thought out development setup. Embedded system software development targeting Intel® Architecture based designs is as straight forward and frequently even provides more options than you find for other architectures. The usual development methodologies used for other embedded architectures also apply for Intel® Architecture. A rich and established development tools ecosystem for the embedded space exists, especially when talking about embedded Linux* targets. Open source projects like Yocto Project* and OpenEmbedded* provide a rich framework and set of utilities. This is augmented by commercial offerings from Mentor Graphics*, Wind River*, Green Hills* Software and others. The architecture has been used and proven itself in the embedded space over many decades. One advantage of having the same basic embedded architecture on development host and target is that it reduces the need for using cross-development, although especially for system development it is frequently still recommendable.

In addition, even if not required, it is frequently still recommendable to employ cross-development to ensure a clean build and validation environment. Linux* provides a wide array of customizable build solutions targeting the Intel® Architecture for exactly that purpose.

The rich set of software development tools from the open source community as well as tools vendors and Intel itself makes development easier, especially for those moving from personal computer centric native development to developing for intelligent systems for the first time.

In this article we provided a wide overview of the challenges of developing for Embedded Linux* and the types of development tools solutions, that are available. Furthermore we also highlighted the importance of taking advantage of the analysis and debug capabilities available when developing for Intel® Architecture.

The key to being successful when developing for Intel® Architecture is to be aware of the rich ecosystem and to first define the build environment that meets your needs. Keep the build environment simple, but also ensure that target environment dependencies are not broken. Relying on printf debugging can cost valuable time, when serious issues arise. Taking advantage of advanced cross-debuggers and performance analyzers, will increase software stack stability and performance. Take a look at embedded Linux* frameworks like Yocto Project* targeting Intel® architecture to have a good start into defining your custom Linux* software environment.

Intel is actively working on ensuring the availability of comprehensive all-in-one embedded and intelligent systems software development solutions.    

This is only a starting point, indicating the things to request and look for when setting up a project. Please check out the additional resources and references for more in-depth guidance.

Additional Resources

Software Development for Embedded Multi-core Systems, Max Domeika, Boston: Newnes, 2008, ISBN 978-0-7506-8539-9

Embedded Intel® Architecture Chipsets Web Site, Santa Clara, CA: Intel Corporation: http://www.intel.com/products/embedded/chipsets.htm

Intel® Atom™ Performance for DSP Applications, Tim Freeman and Dave Murray: Birkenhead, UK, N.A. Software Ltd. 2009: http://www.nasoftware.co.uk/home/attachments/019_Atom_benchmarks.pdf

Intel® Embedded Design Center http://edc.intel.com

JTAG 101; IEEE 1149.x and Software Debug, Randy Johnson and Stewart Christie, Santa Clara, CA: Intel Corporation 2009: http://download.intel.com/design/intarch/papers/321095.pdf

Intel® System Studio Forum

https://software.intel.com/en-us/forums/intel-system-studio

From the Debug Research Labs – Debugging Android  by Hagen Patzke: http://www.lauterbach.com/publications/debugging_android.pdf

Real-Time Debugging with SVEN and OMAR by Pat Brouillette and Jason Roberts http://www.eetimes.com/electrical-engineers/education-training/tech-papers/4373526/Real-Time-Debugging-with-SVEN-and-OMAR 

References

Break Away with Intel® Atom™ Processors: A Guide to Architecture Migration, Lori Matassa & Max Domeika, Intel Press 2010, ISBN 978-1-934053-37-9 

Intel® System Studio: http://www.intel.com/software/products/atomtools and /en-us/intel-system-studio

Yocto Project*: http://www.yoctoproject.org

IEEE Standard for Reduced-Pin and Enhanced-Functionality Test Access Port and Boundary-Scan Architecture : http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5412866

For more complete information about compiler optimizations, see our Optimization Notice.