Intel for Android* Developers Learning Series #2:The Intel Mobile Processors

 

1.    Inside Medfield

Intel® Atom™ Processor Z2610, formerly known as Medfield, shown in Figure 1, is a platform targeted for smartphones designing with Android operating system. Medfield is divided into two complexes, the North Complex and the South Complex. The North Complex consists of Saltwell, a single-core processor, a 32-bit dual channel LPDDR2 memory controller, a 3D graphics core, video decode and encode engines, a 2D display controller that is capable of supporting up to three displays, and an image processor for camera input. The South Complex of the Intel Atom Processor Z2610 consists of all the necessary I/O interfaces to complete a smartphone design, such as a security engine, a storage controller supporting SD/eMMC storage cards, a USB OTG controller, a 3G modem, Complimentary Wireless Solution (CWS) interfaces, SPI, and UART.

 

Figure 1: Medfield Block Diagram

 

1.1.    Saltwell, the General Picture

Saltwell architecture is fairly simple. The idea of the design is to create a processor with a balance between optimized performance and efficient power consumption. The processor uses in-order architecture, which is different from most of the other processors in the market. The processor has a 64-KB L1 cache and a 512-KB L2 cache. This processor supports Intel® Burst Performance Technology, which lets the processor dynamically increase the CPU speed. There are three frequency modes in Saltwell: Low Frequency Mode (LFM) runs at 600 MHz, High Frequency Mode (HFM) runs at 900 MHz, and Burst Frequency Mode (BFM) runs at 1.6 GHz. Among the power optimization features, Saltwell has an ultra-low power smart L2 cache that keeps data while the CPU is in C6 states, in order to lower the latency during the resumption of C states. In addition, Saltwell has separate power planes and clock inputs for the core and the rest of the SoC, which makes power and clock gating easily configurable through Intel® Smart Idle Technology (Intel SIT). This technology enables the CPU to be switched off completely while the SoC is still in the ON state (S0 state).

2.    Architecture Differences between Saltwell and ARM (Cortex A15)

As listed in the book Break Away with Intel® Atom™ Processors: A guide to Architecture Migration, the Intel Atom architecture is very different from the ARM architecture in every way. Table 1 shows a list of high level differences between Saltwell and ARM Cortex architecture.

 

Saltwell

ARM Cortex

Technology

32 nm

28 nm

Architecture

In-order

Out-of-order

Integer pipelines

16

15

L1 cache

64 KB

Configurable up 64 KB

L2 cache

512 KB

Max 4 MB

Instruction set

IA32, Intel® Streaming SIMD Extensions, Intel® Supplemental Streaming SIMD Extensions 3

ARM, Thumb

Multi Core/Thread Support

Single core with Intel® Hyper-Threading Technology

Multi-core

Security Technology

Intel® Smart & Secure Technology (Intel® S&ST)

TrustZone* Technology

Table 1: High Level Differences between Saltwell and ARM (Cortex A15)

2.1.  Architecture:

As mentioned, Saltwell has an architecture similar to other processors in the Intel Atom series. It uses an in-order execution design. With an in-order processor, all the instructions are executed according to the order they are fetched, whereas out-of-order processors are capable of executing multiple instructions simultaneously and reordering them later in the pipeline. ARM processors use out-of-order architecture, which has the advantage of executing instructions with minimal latency. However, this increases the complexity of the core design. The elimination of the reordering logic is one of the power reduction initiatives of the Intel Atom processor.

2.2.  Integer Pipelines

There are six phases in Intel Atom pipelines; the details are listed in Table 2.

Phase

Pipeline Stages

Instruction fetch

3

Instruction decode

3

Instruction issue

3

Data access

3

Execute

1

Write back

3

 

Table 2: Intel® Atom™ Instruction Phases and Pipeline Stages

This results in a total of 16 integer pipelines in the Intel Atom processor and three extra stages are required to execute floating point instructions. The latest ARM processor has 15 integer pipelines. The lengthy pipeline in the ARM processor trades off energy over performance. Saltwell can decode up to two instructions per clock cycle while the latest ARM processor is a triple issue superscalar architecture.

2.3.  Instruction Sets

ARM instruction sets are always 32-bit and aligned on a four-byte boundary whereas IA32 instruction sets vary in size and do not require any alignment. Another difference between ARM instructions and IA32 instructions is how the instruction is executed. For ARM, all the instructions are conditionally executed to reduce branch overhead and misprediction during branching. There are condition flags that each instruction needs to fulfill in order to take effect, otherwise the instruction will act as NOP and get discarded. There are conditional instructions as well in Intel architecture; these are called conditional MOV instructions. Other instructions in IA32 are not conditionally executed.

2.4.  Multi-Core/Thread Support

As mentioned previously, Saltwell supports Intel® Hyper-Threading Technology (Intel HT Technology) where tasks are completed by using shared resources. The details of the technology will be discussed further in the next section. ARM multi-core architecture has unique resources to perform its tasks on each core. The coherency of the cores is handled by AMBA 4 AXI*, a compatible slave interface that is directly interfaced to the core.

2.5.  Security Technology

There is a security subsystem in Medfield called Intel® Smart & Secure Technology (Intel S&ST) that is a complete hardware and software security architecture. This subsystem is compliant with industry standards, supporting AES, DES, 3DES, RSA, ECC, SHA-1/2, and DRM. It also supports 1000 bits of OTP and enables Secure Boot. The implementation in the ARM processor for a security system is different. There is no separate controller for the security subsystem as Intel implemented. The ARM processor uses TrustZone Technology, where resources in the system such as processor and memory are divided into two worlds: the Normal World and the Secure World. There are three motivations for this architecture: (1) to provide a security framework that allows designers to customize the functions needed depending on the use cases, (2) to save silicon area and power where there will not need to have a dedicated processor for secured tasks (3) to prevent intrusion during debug to security sensitive task in the Secure World or non-security-sensitive tasks in the Normal World, by providing a single debug component.

3.    Intel® Hyper-Threading Technology

Intel® Hyper-Threading Technology (Intel HT Technology) enables software to have a view of multiple logical processors in a physical processor package. Saltwell uses Intel® Hyper-Threading Technology as a boost to its performance. Having a second thread in a single in-order architecture processor enables Saltwell to execute multiple instructions within a clock cycle sharing the execution resources among the two threads, giving a 50-percent performance improvement compared to a single thread processor, as shown in Figure 2.

Figure 2: Benefits of Intel® Hyper-Threading Technology

In Intel HT Technology, the processor has duplicates of architecture state that consists of general purposes registers, control registers, the advanced programmable interrupt controller (APIC) registers, and some machine state registers1. The duplication of architecture states is the reason software is able to view a single core processor as two logical processors. Caches, execution units, branch predictors, control logic, and buses are shared between the two threads. This created a concern where there might be resource contention and workload imbalance between the threads. However, most of the current development kits such as Dalvik and JavaScript already have the capability to support multi-threaded environments, giving developers an easy way to generate applications that utilize the advantage of Intel HT Technology. Applications developers on Android can also utilize the Intel® VTune Performance Tool to analyze the workload and perform resource tuning on their applications.


[1] Developers can go to http://intel.com/software/android  to find more information on porting apps to Intel Atom platform.

[2] NDK documentation outlines compiler flags in the “ndk/docs/CPU-ARCH-ABIS.html” page.

For more complete information about compiler optimizations, see our Optimization Notice.