As a beginner in the world of embedded systems, we sometimes tend to rush into coding and blinking LEDs without understanding what we are coding for. This primer tries to explain some terms that get thrown around commonly but perhaps are not very well understood by everyone. We do this with the help of a popular architecture called ARMv7 from ARM.
But why ARM? Read on!
ARM has been around for 30 years now! It is fair to say that ARM has been at the forefront of the rapid technology boom that has happened over the last two decades. From cars to nuclear power plants, from home appliances to construction equipment, from voting machines that elect governments to the complex defence systems that protect them – ARM-based processors have been at the heart of some of the most defining products that mankind has used for the last so many years.
ARMv7 – The Architecture
A CPU architecture is a specification document that talk about the various features that a CPU can have. It does not talk about a particular CPU – but rather is a blueprint of how to go about implementing a CPU. A CPU core is an implementation of the CPU architecture with certain configurations in place. Changing these will result in a new CPU altogether. For example: depth of the pipeline, number of cache levels, number of cores, instruction set support, etc.
ARM has constantly improved upon their first ever architecture. One of the most significant architectures that ARM has released was the ARMv7.
ARM probably did not know it then but the ARMv7 would go on to play a huge role in catapulting ARM to the top of the list of CPU providers. SOC vendors like NXP, Freescale, Atmel rushed to license the CPU cores that ARM designed using this architecture. For example: Cortex-A7, Cortex-A8, Cortex-M4, Cortex-M7, etc. Interestingly, giants like Qualcomm, Apple, and Samsung have adopted an alternative strategy. Instead of building SOCs based on the ARM CPU designs, they have licensed the architecture for their in-house CPU designs.
Needless to say, the ARMv7 has left behind an immense legacy and it is not even done yet!
ARMv7 architecture came with three profiles – A, R and M. This was a result of recognising that a single architecture was not capable of addressing the incredibly diverse needs of the electronics and computing industry. The processing requirements of a smartphone or tablet are very different from those of a smart electricity meter, for instance.
The ARMv7-A is the Application processor profile. It is geared towards designing processors that offer high performance and powerful capabilities. Most of the time, processors based on the ARMv7-A profile run some rich OS like Android, Linux, etc. Some examples are Cortex-A8 core, Cortex-A9 core, etc.
These processors can support sophisticated features – some of these are discussed below. Some features are optional and may not be present inside all cores that make use of this profile.
- Virtual Memory Addressing – Operating systems like Android and Linux rely very heavily on virtual addressing for interoperability and ease of memory management when running multiple processes that often share memory spaces. An MMU (Memory Management Unit) makes it possible to perform extremely quick virtual to physical address mapping in order to prevent any degradation of performance.
- Multicore capability – Most sophisticated applications like a smartphone or servers or industrial gateways require extremely high performance. Executing at CPU clock speeds is one way to maximize performance but there is only so much enhancement one can get by increasing the clock speeds. By having multiple cores, it is possible to distribute the computing load and boost performance. Of course, this comes at the cost of additional $$ and complexity as well!
- TrustZone security – TrustZone is a standard extension to all Cortex-A processors. It creates two virtual machines, running on a single processor, with carefully controlled partitioning between the two. This allows for implementation of highly secure systems for applications such as DRM or e-payment or storing user identities and credentials, etc.
- NEON (Advanced SIMD Extension) – NEON is an optional extension to ARMv7-A which provides an instruction set and register bank for high-performance SIMD multimedia programming. It provides acceleration for key algorithms in data compression, transcoding, image processing etc.
The ARMv7-R is the Real-time profile. It is geared towards designing processors that offer extremely high-performance real-time applications. Some examples are hard disk controllers, ABS systems and engine control systems in cars, escalator systems and so on. As you can see, these are extremely specialized applications with tight real-time requirements. Processors using this architecture profile are not very common and often are customized heavily towards a specific application. Common examples are the Cortex-R4, Cortex-R5, etc.
Some key features of this architecture profile are as follows.
- Hardware Divide – It is not very common to see a hardware divide instruction in ARM processors. Divisions are generally implemented using software which gives highly un-optimized and non-deterministic performance. Processors based on the ARMv7-R profile support a hardware divide instruction that makes the division process deterministic and of course – fast.
- Tightly Coupled Memory (TCM) – In order to enhance the realtime-ness of the applications, some part of the SRAM can be tightly coupled to the CPU thus reducing the latency required to fetch an instruction from a memory attached to the AHB bus. For example, if there is a life-saving critical routine that absolutely needs to execute as soon as an interrupt occurs (say a wheel lock condition during intensive braking), instead of executing the exception handler from flash, one can execute it from the TCM which can save a precious few cycles by providing instruction at the CPU clock speed – which at end of it all may save lives.
- Safety and fault-tolerance features – The L1 memory system and buses incorporate ECC and parity error detection/correction, a feature which is required for many safety-critical applications. Coretx-R5 and Cortex-R7 can also be implemented in a Dual-Core Lock Step (DCLS), providing hardware redundancy. This is of utmost significance for automotive, aerospace and medical applications.
The ARMv7-M is the microcontroller profile. Microcontrollers are by far the most common embedded computers and are present inside all kinds of applications around it. It is extremely rare to find a human being who has not been impacted at some point in their life by a microcontroller-based system. The Cortex-M series of microcontroller CPU designs from ARM hold a dominant position in the market. These CPUs are relatively simple to implement and the profile allows implementing very versatile products.
For example – The Cortex-M4 core is an implementation of the ARMv7-M that is a very good mix of energy efficiency, performance and ease of implementation. The Cortex-M7 core is at the far end of the spectrum and extremely high on performance while still optimising power consumption wherever possible.
Some key features of a typical processor that makes use of the ARMv7-M profile are as below.
- Energy efficiency – All the Cortex-M cores support a range of power-efficient architectural sleep and standby states. When coupled with multiple power domains, this allows very energy-efficient devices to be designed.
- High-density instruction set – Cortex-M processors only support the variable-length Thumb-2 instruction set. This allows for very dense and efficient code, while retaining full 32-bit processing capability.
- Simple programmer’s model – In the smallest configuration Cortex-M processors support a two operating mode (with no concept of privilege), a single stack up and a single, simple register set. There is optional support on the higher-end processors for privileged operation, separate process and exception stacks. This supports everything from the simplest bare-metal application to more demanding requirements which need a real-time operating system.
- Simple debug solutions – The debug architecture of Cortex-M processors is highly configurable. The usual JTAG port is optional and can be replaced with Serial Wire Debug in applications where pin count is important. The number of breakpoints and watchpoints can be configured at design time, as can the trace
In a nutshell…
In a nutshell, below is a simple way to understand the relationship between the architecture (ARMv7), the CPU cores and the SOCs.
In the next post, we try to understand what an instruction set is and look at ARM’s instruction set at a top level.