This article has been machine-translated from Chinese. The translation may contain inaccuracies or awkward phrasing. If in doubt, please refer to the original Chinese version.

Chapter 5: Central Processing Unit

5.1 CPU Functions and Composition

5.1.1 CPU Functions

The Central Processing Unit is the component that controls the computer to automatically complete instruction fetching and execution tasks; it is the core component of a computer, commonly referred to as the CPU. Its functions are as follows:

Instruction Control: A program is an ordered set of instructions, ensuring the machine executes the program in the specified order.
Operation Control: The CPU manages and generates operation signals for each instruction fetched from memory, and sends various operation signals to the corresponding components, thereby controlling these components to operate as required by the instructions.
Timing Control: Implements timing control over various operations. The operation signals of all instructions in a computer are strictly controlled by timing.
Data Processing: Performs arithmetic and logical operations on data. Completing data processing is the fundamental task of the CPU.

5.1.2 Basic Composition of CPU (Key Topic)

(1) Central Processing Unit CPU = ALU + Cache + Control Unit (2) Arithmetic Logic Unit (ALU)

Arithmetic Logic Unit (ALU)
General-purpose Registers: R0~R3
Data Buffer Register: DR
Program Status Word Register: PSW
Receives commands from the controller and performs operations.
- Arithmetic operations, logical operations

(3) Control Unit

Control Unit Components
- Program Counter (PC)
- Instruction Register (IR)
- Address Register (AR)
- Instruction Decoder (ID)
- Timing Generator
- Operation Controller
Decision-making mechanism: Completes coordination and command of the entire computer system’s operations.
Main functions:
- Fetches an instruction from the instruction cache and indicates the location of the next instruction in the instruction cache.
- Decodes or tests the instruction and generates the corresponding operation control signals to initiate the specified actions.
- Directs and controls the direction of data flow between the CPU, data cache, and I/O devices.
- Information flowing between memory and the control unit — instruction flow
- Information flowing between memory and the ALU — data flow

5.1.3 Main Registers in CPU (Key Topic)

Data Buffer Register (DR)

Used to temporarily store ALU computation results, or a data word read from data memory, or a data word from an external interface.
Functions
- Serves as a time buffer for information transfer between ALU computation results and general-purpose registers
- Compensates for the speed difference between the CPU and memory/peripheral devices

Instruction Register (IR)

Used to hold the instruction currently being executed
Instruction Decoder (ID)
- An instruction temporarily stored in the instruction register can only be identified after its opcode portion is decoded.
- The decoder analyzes and interprets the instruction, generating the corresponding control signals.

Program Counter (PC)

The Program Counter (PC) is used to store the address of the instruction currently being executed or the address of the next instruction to be executed.
If the program executes sequentially, after each instruction execution, the PC value should be incremented by 1, i.e., PC <- PC+1
If the program has a jump: PC <- PC + offset address
Has both register and counting functions

Data Address Register (AR)

Used to hold the address of the memory unit currently being accessed by the CPU. Due to the speed difference between memory and CPU, the address register must be used to hold address information until a read/write operation is completed.

General-Purpose Registers (R0~R3)

Used for transferring and temporarily storing data
Can also participate in arithmetic and logical operations and store computation results
Accumulator (AC)
- A general-purpose register
- Provides a workspace for the ALU
- Temporarily stores ALU computation results

Program Status Word Register (PSW)

Used to store various condition codes established by arithmetic and logical instruction operations or test results
Such as: carry flag (C), zero flag (Z), stores interrupt and system operating status information, etc.
A register composed of various status condition flags

5.1.4 Operation Controller and Timing Generator

(1) Data Path: The path for transferring information between registers.
(2) Operation Controller: Provides various operation signals for establishing data paths, to correctly select data paths and load relevant data into registers, thereby completing instruction fetching and execution control.
Based on different design methods, can be classified into sequential logic type and stored logic type:
- Hardwired Controller: Implemented using sequential logic techniques
- Microprogrammed Controller: Implemented using stored logic.
(3) Timing Generator: Provides timing and sequence signals for implementing timing control over various operation signals. The CPU also has other functional components such as the interrupt system and bus interface.

5.2 Instruction Cycle

5.2.1 Basic Concepts of Instruction Cycle (Key Topic)

Program Execution Process

The order of program execution in a von Neumann architecture computer:

Start from the program’s first address
Execute each instruction step by step, and form the address of the next instruction to be executed
Automatically execute instructions continuously until the last instruction of the program

Instruction Execution Process

Fetch instruction
- Send the instruction address to the main memory address register
- Read from main memory, send the read content to the designated register
Analyze instruction
Execute the instruction according to its specified content
- Different instructions vary greatly in the number of operation steps and specific operation content
Check for interrupt requests; if none, proceed to the execution of the next instruction

Instruction Cycle

Each time the CPU fetches and executes an instruction, it must complete a series of operations. The time required for this series of operations is usually called an instruction cycle.

Machine Cycle

A machine cycle is also called a CPU cycle. The CPU cycle is usually defined by the minimum time to read an instruction word from memory. The instruction cycle is often expressed in terms of the number of CPU cycles.

Clock Cycle

A CPU cycle contains several clock cycles (usually called beat pulses or T cycles, which are the most basic unit of processing operations). The sum of these clock cycles defines the time width of one CPU cycle.

Single-cycle, multi-cycle: A single cycle means completing both instruction fetching and execution within one CPU cycle. Most instructions require multiple CPU cycles to complete all operations of the instruction cycle.

5.2.2 Instruction Cycle of MOV Instruction

When designing a computer, box notation language can be used to represent the instruction cycle of an instruction
Method:
- Box — CPU cycle
- Box content — Data path operations or certain control operations
- Diamond symbol — Judgment or test (in terms of timing, it is associated with the CPU cycle of the box immediately preceding it, and does not occupy a separate CPU cycle)
- ~ — Common operation. After an instruction is executed, certain operations performed by the CPU, mainly for handling peripheral device requests, such as interrupt handling. If there are no peripheral requests, the CPU proceeds to fetch the next instruction. Since instruction fetching is common to every instruction, it is also a common operation.

Summary

An instruction consists of one instruction fetch cycle and one or more execution cycles
In each CPU cycle, the data path is clearly defined
The establishment and operation of data paths are controlled by the operation controller, which of course depends on what instruction it is

5.3 Timing Generator and Control Methods

5.3.1 Functions and System of Timing Generator

Functions

The controller in the CPU uses it to direct the machine’s work
The CPU can use timing signals/cycle information to distinguish whether what is fetched from memory is an instruction (fetch) or data (execute)
Clock pulses within a CPU cycle impose strict constraints on CPU actions
The various signals issued by the operation controller are functions of time (timing signals) and space (component operation signals).

System

The characteristics of the hardware devices composing the computer determine that the most basic system of timing signals is the level-pulse system. (Using D flip-flop as an example)
D is the level input terminal, CP (Clock Pulse) is the pulse input terminal
S is the set terminal, R is the reset terminal
The characteristic equation is as follows
- When D=0, on the rising edge of CP, the D flip-flop state is set to 0
- When D=1, on the rising edge of CP, the D flip-flop state is set to 1
When implementing data transfer between registers, data is applied to the flip-flop’s level input terminal, while the control signal for loading data is applied to the flip-flop’s clock input terminal. The high or low level represents whether the data is 1 or 0, and the level signal must be stable before the control signal for loading data arrives.
Based on different design methods, operation controllers can be classified into sequential logic type and stored logic type:
- Hardwired Controller: Implemented using sequential logic techniques
- Microprogrammed Controller: Implemented using stored logic.
Hardwired Controller:
- Timing signals use the three-level system of major state cycle — beat level — beat pulse
- One beat level represents the time of one CPU cycle, representing a larger time unit;
- One beat level contains several beat pulses, representing smaller time units.
- A major state cycle contains several beat levels, being the largest time unit
Microprogrammed Controller:
- Timing signals use the two-level system of beat level — beat pulse

5.3.2 Timing Signal Generator

Functions:
- Generates timing signals. Different types of computers have different timing circuits.
- Large and medium-sized computers have complex timing circuits, while microcomputers have simple timing circuits.
Composition:
- Clock source
- Ring pulse generator
- Beat pulse and read/write timing decode logic
- Start/stop control logic

5.3.3 Control Methods

The number of CPU cycles contained in a machine instruction reflects the complexity of the instruction. The number and sequence of operation signals in different CPU cycles also vary.
Control Methods: Methods for forming timing signals that control different operation sequences. Three basic control methods:
- Synchronous control method
- Asynchronous control method
- Combined control method

Synchronous Control Method (Fixed number of machine cycles and clock cycles per instruction)

Executes all different instructions with completely uniform machine cycles
Uses variable-length machine cycles
Combination of central control and local control

Asynchronous Control Method

Each instruction takes as much time as it needs
The “end” signal produced when the previous micro-operation completes serves as the “start” signal for the next micro-operation

Combined Control Method (Used by microprogrammed controllers)

Most instructions are completed within fixed cycles, while a few operations that are difficult to determine use asynchronous methods
The beat pulses within a machine cycle are fixed, but the number of machine cycles per instruction is not fixed

5.4 Microprogrammed Controller

Development
- The concept and principles of microprogramming were first proposed by Professor M.V. Wilkes of Cambridge University at the Manchester University Computer Conference in 1951, when there were no suitable storage elements for control memory to store microprograms.
- In 1964, IBM successfully adopted microprogramming technology in the IBM 360 series.
- Since the 1970s, the development of VLSI technology has promoted the development and application of microprogramming technology.
- Currently, microprogramming technology has been widely adopted from mainframes to minicomputers and microcomputers.
Basic Idea:
- Following the problem-solving approach, operation control signals are organized into microinstructions and stored in control memory. During execution, microinstructions are fetched from control memory to generate the operation control signals needed for instruction execution, causing the corresponding components to perform the specified operations.
- From the above, it can be seen that microprogramming technology is a technique that uses software methods to design hardware.

5.4.1 Microprogramming Control Principles

1. Micro-commands and Micro-operations

Micro-command: Various control commands sent by the control unit to the execution unit through control lines.
- A micro-command is the smallest and most basic unit of control signals.
Micro-operation: The operation performed by the execution unit after receiving a micro-command.
Micro-commands and micro-operations have a one-to-one correspondence. A micro-command is the control signal for a micro-operation, and a micro-operation is the execution process of a micro-command. Micro-operations are the most basic operations in the execution unit.
Due to the data path structure, micro-operations can be classified into compatible and mutually exclusive types:
- Mutually exclusive micro-operations: Micro-operations that cannot be performed simultaneously or in parallel within the same beat.
- Compatible micro-operations: Micro-operations that can be performed simultaneously or in parallel within the same beat.

2. Microinstructions and Microprograms

Microinstruction: Micro-commands that are executed in parallel within the same CPU cycle, stored in control memory, are called a microinstruction.
Microprogram: A program composed of several microinstructions, used to implement the function of an instruction.
Each machine instruction corresponds to a microprogram, which is interpreted and executed to complete the operations specified by the instruction.

3. Relationship Between Machine Instructions and Microprograms (Key Topic)

One machine instruction corresponds to one microprogram, and this microprogram consists of a sequence of several microinstructions.
Machine instructions are related to main memory, while microinstructions are related to control memory.

5.4.2 Microprogramming Design Techniques

1. Micro-command Encoding

Micro-command encoding is the representation method used for the operation control field in microinstructions. There are three encoding methods: direct representation / encoded representation / hybrid representation

Direct Representation

In the operation control field of a microinstruction, each bit represents one micro-command. Each bit can directly control the computer without needing decoding.
For example, each independent binary bit in the operation control field represents a micro-command. A “1” in that bit means the micro-command is active, while a “0” means it is inactive.
Characteristics:
- This method has a simple structure, strong parallelism, and fast operation speed, but the microinstruction word is too long. If the total number of micro-commands is N, the operation control field of the microinstruction must have N bits.
- Additionally, among the N micro-commands, many are mutually exclusive and cannot be executed in parallel. Arranging them in one microinstruction is meaningless and only causes decreased information utilization.

Encoded Representation

A group of mutually exclusive micro-command signals is organized into a field, and each micro-command signal is decoded through a field decoder, with the decoded output serving as the operation control signal.
Characteristics of encoded representation:
- Can avoid mutual exclusion conflicts, significantly shortening the instruction word
- But adds decoding circuitry, slowing down microprogram execution speed

Hybrid Representation

Combines the previous two methods, balancing the characteristics of both.
Some encodings in a field cannot independently define certain micro-commands and need to be combined with encodings from other fields for joint definition.
Important encoding notes: The operation control field in the field encoding method is not arbitrary and must follow these principles:
1. Place mutually exclusive micro-commands in the same segment, and compatible micro-commands in different segments. This not only helps improve information utilization and shorten the microinstruction word length, but also helps fully utilize the parallelism of the hardware to speed up execution.
1. Should match the data path structure.
1. Each sub-segment should not contain too many information bits, otherwise it will increase the complexity and time of the decoding circuitry.
1. Generally, each sub-segment should reserve one state indicating that this field issues no micro-commands. Therefore, when a field is three bits long, it can represent at most seven mutually exclusive micro-commands, with 000 typically representing no operation. Here is an example:

2. Methods of Generating Micro-addresses

The sequential control of microinstruction execution is actually the problem of how to determine the address of the next microinstruction
Entry address: Each machine instruction corresponds to a microprogram. After the common instruction-fetch microprogram fetches a machine instruction from main memory, the entry address of the corresponding microprogram is found based on the machine instruction’s opcode field.
There are mainly two methods for generating the next micro-address:
- Counter method
- Multi-way branch method

Counter Method

Method:
- When executing microinstructions sequentially, the next micro-address is generated by adding an increment to the current micro-address;
- When executing microinstructions non-sequentially, a branch must be taken, and after the current microinstruction is executed, execution transfers to the next microinstruction at the specified next micro-address. In this method, the micro-address register is typically replaced by a counter.
Advantages: Short sequential control field in microinstructions, simple micro-address generation mechanism
Disadvantages: Weak multi-way parallel branching capability, slow speed, poor flexibility.

Multi-way Branch Method

Multi-way branching:
- A single microinstruction has the ability to branch to multiple destinations
- When the microprogram does not branch, the next micro-address is directly given by the sequential control field of the microinstruction
- When the microprogram branches, there are multiple “candidate” micro-addresses to choose from
- One micro-address is selected based on the “test condition” flag and “status condition” information in the sequential control field

3. Microinstruction Format (Key Topic)

Divided into two types: horizontal microinstructions and vertical microinstructions (1) Horizontal Microinstructions

A horizontal microinstruction is one that can define and execute multiple micro-commands in parallel at once. The format is as follows:

Control Field	Test Condition Field	Next Address Field

(2) Vertical Microinstructions

A micro-opcode field is set in the microinstruction, using micro-opcode encoding, where the micro-opcode specifies the function of the microinstruction. Similar to the structure of machine instructions.

Comparison of Horizontal and Vertical Microinstructions

Horizontal microinstructions have strong parallel operation capability, high efficiency, and strong flexibility, while vertical microinstructions are relatively weaker.
Horizontal microinstructions have shorter execution time per instruction, while vertical microinstructions take longer.
Microprograms interpreting instructions with horizontal microinstructions have the characteristic of longer microinstruction words but shorter microprograms. Vertical microinstructions are the opposite.
Horizontal microinstructions are difficult for users to master, while vertical microinstructions are similar to regular instructions and are relatively easier to understand.

5.5 Hardwired Controller (Omitted)

5.6 Pipelined CPU

5.6.1 Parallel Processing Technology

Concept of Parallelism

The property that operations or computations can be performed simultaneously in a problem
For example: Under the same delay conditions, an n-bit ALU performing n-bit parallel operations is nearly n times faster than a one-bit ALU performing n-bit serial operations (narrow sense). (Broad sense) As long as two or more tasks of the same or different nature are completed at the same instant (simultaneity) or within the same time interval (concurrency), overlapping in time, they all exhibit parallelism.
As long as two or more tasks of the same or different nature are completed at the same instant (simultaneity) or within the same time interval (concurrency), overlapping in time, they all exhibit parallelism.
Three forms:
- Temporal parallelism (overlap): Multiple processing stages are staggered in time, taking turns using the same set of hardware components to speed up hardware turnover and gain speed. The implementation method is to use pipelined processing components.
- Spatial parallelism (resource replication): Gaining advantages through quantity
- It truly embodies simultaneity
- LSI (Large Scale Integration) and VLSI (Very Large Scale Integration) provide technical guarantees for this
- Temporal + Spatial parallelism: Pentium adopted superscalar pipeline technology

5.6.2 Structure of Pipelined CPU

System composition of a pipelined computer
- Memory hierarchy: Main memory uses interleaved multi-module memory; Cache
- Pipelined CPU: Instruction unit, instruction queue, execution unit
- Instruction pipeline
- Instruction queue: FIFO
- Execution unit: Can consist of multiple arithmetic-logic units constructed in a pipelined manner, with fixed-point and floating-point arithmetic units separated.
To implement pipelining, the input task (or process) is first divided into a series of subtasks, and each subtask can be executed concurrently at different stages of the pipeline.
When tasks continuously enter the pipeline, execution results are continuously produced at the output end of the pipeline, thus achieving task-level parallelism.

5.6.3 Major Issues in Pipelines (Key Topic)

Bottleneck Problem

There are slow stages in the pipeline
Solutions:
- Further divide into several sub-stages
- Use resource replication methods

Stall Problems (Three Hazards)

Due to hazard conflicts
Resource hazards, data hazards, control hazards
Resource hazards: Multiple instructions entering the pipeline compete for the same functional unit in the same clock cycle.
- Solutions: Delay the following instruction by one beat before advancing; add an additional functional unit
Data hazards: In a program, if the next instruction can only execute after the previous instruction has completed, then these two instructions have a data dependency.
- RAW (Read After Write) - A subsequent instruction uses data written by a previous instruction
- WAW (Write After Write) - Two instructions write to the same location
- WAR (Write After Read) - A subsequent instruction overwrites a location read by a previous instruction
- Solutions: Delay the subsequent instruction’s read of the dependent location; set up direct forwarding paths (Forwarding)
Control hazards
- Cause: When executing branch instructions, depending on the branch condition result, the processor may fetch the next sequential instruction or may branch to a new target address to fetch an instruction, causing the pipeline to stall.
- Solutions: Delayed branch method: Place the branch instruction at the last pipeline entry - Branch prediction method: Use hardware to predict future behavior, allowing the branch instruction to enter the pipeline early

Chapter Summary

The CPU is the central processing component of a computer, with basic functions including instruction control, operation control, timing control, and data processing. Early CPUs consisted of two main parts: the ALU and the control unit. With the development of high-density integrated circuit technology, modern CPU chips have become three main parts: ALU, cache, and control unit, which also include floating-point arithmetic units, memory management units, etc. The CPU must have at least six types of registers: instruction register, program counter, address register, data buffer register, general-purpose registers, and status condition register. The time for the CPU to fetch an instruction from memory and execute that instruction is called the instruction cycle. In CISC, since different instructions have different operational functions, the instruction cycles of various instructions are not identical. Dividing instruction cycles is an important basis for designing the operation controller.
The timing signal generator provides the timing signals needed for CPU cycles (also called machine cycles). The operation controller uses these timing signals for timing, fetching and executing instructions in an orderly manner. Microprogramming design technology is a technique for designing operation controllers using software methods, with advantages including regularity, flexibility, and maintainability, and thus has been widely used in computer design. However, with the development of ULSI technology and speed requirements, hardwired logic design has regained attention. The basic idea of a hardwired controller is: A certain micro-operation control signal is a logical function of the instruction opcode decoder output, timing signals, and status condition signals, expressed using Boolean algebra, then implemented with gates, flip-flops, and other devices.
Parallel processing technology can be applied throughout various steps and stages of information processing. In summary, there are mainly three forms: 1. Temporal parallelism; 2. Spatial parallelism; 3. Temporal + Spatial parallelism. Pipelined CPUs are processors constructed based on the principle of temporal parallelism, representing a very economical and practical parallel technology. Currently, virtually all high-performance microprocessors use pipeline technology. The main issues in pipeline technology are resource hazards, data hazards, and control hazards, which require corresponding technical countermeasures to ensure smooth pipeline operation without stalls.

Computer Organization Review Summary (5): Central Processing Unit