Most embedded systems in industrial, medical and aerospace industries need time critical performance; meaning they should offer assurance in bounded time execution of tasks and repeatable consistent behavior across their entire life cycle of operation. For example, closed loop feedback and feedforward control systems are extremely time sensitive and cannot afford high variability in performance metrics such as rise time, overshoot, settling time and steady state errors. But rarely general purpose MPU / MCU designed for consumer applications meet the required performance. Physical processes cannot wait for a delayed scheduler response or a badly designed SW. But much of the underperformance is inherent to the way these computing systems are built. We need to look into what entails this problem and a glimpse into a futuristic project of Azle.
For the same processor under similar conditions, there are many other factors impacting the time of execution for the same set of instructions! This can be much more pronounced if branching instructions are involved. Even without factoring a user program or an OS/RTOS scheduler, there are numerous CPU-level nuances that lead to variations and unpredictability in execution time. Electrical jitter is normally understandable in a bus system that could potentially cause communication errors. But jitter is inherent in all electronic circuits, including the digital circuits which are building blocks of any microprocessor.
While the famous quote by Russell Ackoff pertains to organizations, the statement is true probably for any kind of system that has multiple parts. Though above mentioned problems are pronounced in a singular processing unit, redundant systems and complex systems use multiple processing units and the timing performance of the whole system matters. Typically a luxury car can have 80-120 microcontrollers each running their own software, however a reliable commercial drone may run much less number of microcontrollers, but have higher timing performance requirements. Complex coordination is the name of the game in complex systems, and analyzing failure modes and fault tree / fault cascade analysis is crucial in such systems. So we draw bounding boxes around functional subsystems and isolate their functionality from the rest of the system, so that it works as an independent workhorse unit without at the mercy of a central system competing for resources. For example a FADEC, Full Authority Digital Engine Control system in an aircraft is an independent bounded subsystem used to take care of engines and their fueling, without any computational or performance penalties on main computer or autopilot. Yet sometimes even bounded subsystems have to time-coordinate with other subsystems. The physics model of an aircraft is a singular unit, irrespective of the number of subsystems we may design for convenience. Similarly the physical model of a high performance control system is still a singular unit, irrespective of the number of subsystems it may have. So the timing error in one subsystem crawls up and affects the entire system.
Electrical jitters: Electrical jitters are more common in communication buses particularly of high speed in nature, carrying few gigabits of data per second. The inductance of the conductor and the capacitance surrounding it via the shield offer inertia to rapidly raising and falling signal levels. There are also problems with signal reflections and crosstalk etc. Much of the communication errors originating from bus jitters are compensated by higher layer protocols, but it degrades the overall bit error rate and makes timing performance unreliable. These problems can not only be observed at giga bitrates but even a few million bits per second if the voltage levels of signals are higher. For example ARINC 429 uses a 22V swing between -11V for low level to +11V high level.
Scheduling based non-determinism
Processor architecture based non-determinism
Clock based non-determinism
Network based non-determinism
I/O based non-determinism
Even small embedded systems on a 8/16 bit microcontroller run a tiny operating system or a scheduler for effectively managing resources and threads. It may listen to a small matrix keypad input while also trying to process the command. Similarly for a program running on a modern computer, CPU scheduling is inherently non-deterministic. A program cannot predict when it will be granted CPU time, as it's just one of many tasks competing for resources in a multi-threaded environment. The large number of concurrent processes and the rapid, unpredictable changes in their execution order introduce a strong element of randomness. While the operating system might follow a set of deterministic rules to manage these tasks, a single program has no way of knowing when its turn will come. So the outcomes are always unpredictable in the timeline & they carry an element of “OS jitter”.
So OS jitter is the variation or inconsistency in the timing of a task's execution. It measures how much the time it takes for a task to run deviates from one iteration to the next. In a real-time systems, where timing is critical, minimizing OS contributed jitter becomes a primary goal.. Further, the OS jitter is caused by a variety of factors related to how an operating system manages the hardware and OS primitive resources. 
Interrupt Handling: The OS may pause a running program to handle an interrupt from hardware, like ADXB or CAN bus interface, sensor hard limits signals. The unpredictable timing of these interrupts introduces jitter.
Context Switching: When the OS switches the CPU's attention from one program to another, it has to save the state of the current program and load the state of the next. The duration of this process can vary.
Background Processes: Other programs running in the background, such as daemons or scheduled tasks, can unexpectedly compete for CPU time, affecting a program's timing.
Resource Contention: When multiple programs try to access a shared resource (like memory or a bus), they can cause delays for each other, which adds to overall jitter.
Caching and Pipelines: Modern CPU features like caches and instruction pipelines can also introduce timing variations. A cache miss, for example, can cause a small but unpredictable delay.
While some general-purpose ARM cores, like those in the Cortex-A series, prioritize high performance and power efficiency for applications like smartphones and desktops, they may not offer the strict determinism and predictability required for hard real-time systems. This is due to features like complex memory management units (MMUs), caches, and branch prediction, which can introduce variability in execution times. Brach predictors also cause speculative execution and control flow speculation. However, ARM also offers specific processor series designed for real-time applications:
These processors are specifically engineered for embedded systems demanding high reliability, availability, fault tolerance, and deterministic real-time responses. They feature fast interrupt response times and predictable performance, making them suitable for applications like automotive control systems and industrial automation.
These microcontrollers are widely used in embedded systems where real-time performance is crucial, but resource constraints and low power consumption are also important. They offer a balance of performance and efficiency for a wide range of real-time applications, including IoT devices, wearables, and motor control.
If the Blighty prodigies like Sophie Mary Wilson, the inventor behind ARM, were to reimagine the craft of microarchitecture, it may be a much more power efficient one. But another Blightly computer architecture maverick Michael David May may invent the famed XMOS. This architecture of XMOS doesn’t exactly focus on power efficiency, but extreme determinism. The XMOS architecture achieves determinism by fundamentally rethinking the traditional microcontroller design, moving away from interrupt-driven systems and instead using a multi-core, event-driven model. It's designed to make timing predictable, which minimizes jitter. Unlike traditional microcontrollers that rely on a single CPU core and an RTOS to manage tasks and interrupts, the XMOS architecture uses several key features to ensure predictable timing:
Multiple Logical Cores:
XMOS devices contain multiple 32-bit logical cores. Each core can run one or more tasks, and they can execute code independently and in parallel. This is a significant departure from single-core processors that must handle all events sequentially.
Hardware-Level Threading:
Each logical core supports multiple hardware threads. The processor can switch between these threads with zero overhead because each thread has its own set of registers. This eliminates the unpredictable delays associated with software context switching in a traditional OS.
Built-in Hardware Scheduler:
The xTIME™ scheduler is a key component. It's a hardware-based scheduler that manages events from I/O ports, timers, and other cores with guaranteed, predictable behavior. The hardware handles the scheduling and prioritization of tasks, so there's no need for an external RTOS.
Direct I/O Access:
The cores have a tight, direct connection to the I/O pins, which allows for extremely fast and predictable I/O response times (as fast as 10 ns). This is crucial for applications that require precise timing for external signals, as it bypasses the unpredictable delays of memory and bus access.
No Caches:
The XMOS architecture does not use data caches. Caches are a common source of non-determinism in general-purpose CPUs because the time it takes to access data can vary depending on whether it's in the cache or main memory. By forgoing caches, XMOS ensures that the execution time of code is consistent and predictable.
Traditional microcontrollers are often interrupt-driven. When an external event occurs, the main program is paused, an interrupt service routine (ISR) is executed, and then the main program resumes. This process can introduce significant and unpredictable jitter, especially if multiple interrupts occur in quick succession.
In contrast, the XMOS architecture treats external events as signals that are fed directly to a core, where they trigger a thread to execute. Because of the multi-core design and hardware scheduler, one core can handle an event while others continue their tasks, preventing the jitter-inducing interruptions seen in other systems. This allows for complex, real-time tasks to be implemented in software with the timing predictability of a hardware solution similar to something based on FPGA. British engineering is simply too bright and blinding! Bravo Mr. May!
Most MPU/MCUs have two different clock sourcing options, internal and external. The external clock sources can be chosen based on the speed and stability requirements. While plain crystal oscillator with proper load capacitors is good enough for most applications, advanced applications in control systems and data acquisition can seek to use TCXO, Temperature Controlled Crystal Oscillator and OCXO, Oven Controlled Crystal Oscillator. The internal reference is typically an RC oscillator mostly etched on the chip itself. Understandably an RC oscillator is less stable considering the variations in operating temperatures, manufacturing variations and drifts, power supply instabilities and circuit loading. So an internal RC oscillator is not suitable for ADC sampling, external event sampling, trigger generation, signal generation and asynchronous communications.
While less aggressive than in desktop CPUs, many modern, higher-end microcontrollers use Dynamic Frequency Scaling (DFS) to manage power consumption and heat, which is the chief cause of clock-related non-determinism.
Variable Execution Time: DFS changes the microcontroller's clock speed (frequency) on the fly. An instruction that takes 10 clock cycles might complete in 10 ns at a high frequency but 100 ns at a low frequency. Since the time a task takes to finish (its Worst-Case Execution Time, or WCET) is no longer constant, the timing of the overall system becomes unpredictable.
External vs. Internal Triggers: The frequency change is often triggered by:
External Events: An operating system (if present) requesting a speed change to meet a performance goal or conserve battery.
Internal Hardware: A thermal monitor throttling the clock down to prevent overheating, which a program cannot predict or control.
Clock Switching Overhead: Switching the clock source (e.g., from an internal high-speed oscillator to a Phase-Locked Loop (PLL) or a lower-speed external crystal) is a process that takes a small but non-zero amount of time. This introduces a transient, non-deterministic delay in the execution pipeline.
Network based non-determinism is understandable for traditional ethernet and WiFi based networks, problems can arise from PHY layer to any higher layer protocol. However most RT embedded systems use CAN bus for its simplicity and ease of maintenance. But CAN bus can uniquely undermine systems' performance and safety.
CAN bus is deterministic under controlled load, allowing upper bounds to be calculated for message latencies if all node timing and priorities are known. Message priority is determined by identifier value; lower-value IDs have higher priority, and these messages are guaranteed to win arbitration when multiple messages contend for the bus. Real-time performance can be maintained if high-priority, time-critical messages dominate the bus and load stays within calculated limits.
Lower-priority messages may be delayed indefinitely if higher-priority traffic persists, causing missed deadlines for those lower-priority signals.
If the bus is heavily loaded with messages, even higher-priority messages experience increased wait times, as CAN cannot preempt ongoing transmissions.
Clock drift, oscillator inaccuracies, and varying message intervals contribute to communication jitter, introducing temporal uncertainty in regular message delivery.
Each CAN node must keep its clock synchronized within strict tolerances (often better than 0.5% error) to avoid bit timing slip and communication errors, which, if not managed, can lead to lost or corrupted messages.
Asynchronous clocking and data-dependent operations (bit stuffing) add variability to frame lengths and can lengthen transmission times under certain data patterns.
Temperature changes can affect oscillator performance, causing increased message dropout or errors at higher or lower temperatures.
Theoretical hard real-time guarantees are possible on CAN only with strict traffic shaping, message priority assignment, and careful clock management.
Traffic shaping and message scheduling can reduce worst-case latency and jitter, but must be tuned for each application’s mix of message types and frequencies.
Good hardware design (use of crystals instead of RC oscillators for clock sources) and correct CAN bus setting parameters (synchronization jump width, time quanta) help reduce uncertainties.
In summary, CAN bus provides bounded latency for high-priority messages but introduces uncertainties due to arbitration, clock drift, bus utilization, and environmental factors, which need to be carefully managed for reliable real-time microcontroller performance.
IO peripherals such as serial ports, timers and communication interfaces require CPU attention, often managed through interrupts and polling methods. Interrupts allow peripherals to peripherals to signal the CPU when an event occurs, but this can cause context switching overhead, delay other tasks, and introduce latency, especially as more peripherals are added or when multiple interrupts occur concurrently. Managing high-frequency or high-data-rate IO through polling is usually inefficient and can further degrade real-time responsiveness due to CPU cycle consumption.
Some architectures mitigate these problems using features like DMA controllers or coprocessors to offload IO handling, reducing CPU burden and improving determinism. But variability in the timing of IO events due to clock inaccuracies, process variations, or asynchronous external signals can introduce errors into time-sensitive systems. Even if the problems are solved in the HW upstream, the downstream problem in the RTOS level still remains. Multiple peripherals sharing buses or CPU cycles can lead to queuing and priority inversion, meaning low-priority IO can occasionally preempt critical real-time operations, causing unpredictable delays.