SALIENT FEATURES OF 80386DX
• The 80386DX is a 32-bit processor that supports, 8-hit/16-bitJ32-bit data operands.
• The 80386 instruction set is upward compatible with all its predecessors.
• The 80386 can run 8086 applications under protected mode in its virtual 8086 mode of operation.
• With its 32-bit address bus, the 80386 can address up to 4 Gbytes of physical memory.
• The physical memory is organised in terms of segments of 4 Gbytes size at maximum.
• The 80386 CPU supports 16K(16384) number of segments and thus the total virtual memory space is 4Gbytes x 16K =64 terrabytes.
• The memory management section of 80386 supports the virtual memory, paging and four levels of protection, maintaining full compatibility with 80286.
• The concept of paging which is introduced is an 80386, enables it to organise the available physical memory into pages of size 4 Kbytes each, under the segmented memory.
• The 80386 can be supported by 80387 for mathematical data processing.
• It also offers a set of total eight debug registers DR0-DR7 for hardware debugging and control.
• The 80386 has an on-chip address translation cache.
• The 80386 is available in another version—80386SX—which has identical architecture as 80386DX with the difference that it has only a 16-bit data bus and 24-bit address bus. This low cost, low power version of 80386 may be used in a number of applications.
• 80386DX is available in 132 pin grid array packed and has 20 MHz and 33 MHz versions.
- List the points of differences between 386 DX and 386 SX processor.
i) 80386 DX has 32 bit address lines and 32 – bits data lines.
ii) 80386 SX has 24 bit address lines ans 16 bit data lines.
iii)Using 24 bit address lines 80386 SX can access 16 MB of physical memory.
iv) Using 32 bit address lines 80386 DX can access 4 GB of physical memory.
v) 80386 SX is low cost, low power version of 80386 microprocessor series.
ARCHITECTURE AND SIGNAL DESCRIPTIONS OF 80386
• The architecture of 80386 is shown in Fig. along with all the internal details. The internal architecture of 80386 is divided into three sections viz
i) Central processing unit.
ii) Memory management unit and.
iii)Bus interface unit.
The central processing unit: -
1. It is further divided into execution unit and instruction unit
The execution unit has eight general purpose and eight special purpose registers which are either used for handling data or calculating offset addresses.
The instruction unit decodes the opcode bytes received from the 16-byte instruction code queue and arranges them into a 3-instruction decoded-instruction queue, after decoding them so as to pass it to the control section for deriving the necessary control signals.
The barrel shifter increases the speed of all shift and rotate operations.
The multiply/divide logic implements the bit-shift-rotate algorithms to complete the operations in minimum time.
Even 32-bit multiplications can be executed within one microsecond by the multiply/divide logic.
The Memory Management Unit (MMU): -
It consists of a segmentation unit and a paging unit.
The segmentation unit allows the use of two address components, viz, segment and offset for relocability and sharing of code and data. The segmentation unit allows a maximum size of 4 Gbytes segments.
The paging unit organizes the physical memory in terms of pages of 4 Kbytes size each. The paging unit works under the control of the segmentation unit, i.e. each segment is further divided into pages. The virtual memory is also organized in terms of segments and pages by the memory management unit.
The segmentation unit provides a four level protection mechanism for protecting and isolating the system’s code and data from those of the application program. The paging unit converts linear addresses into physical addresses. The control and attribute PLA checks the privileges at the page level. Each of the pages maintains the paging information of the task. The limit and attribute PLA checks segment limits and attributes at segment level to avoid invalid accesses to code and data in the memory segments.
The bus control unit: -
It has a prioritizer to resolve the priority of the various bus requests. This controls the access of the bus. The address driver drives the bus enable and address signals A0 - A31. The pipeline and dynamic bus sizing units’ handle the related control signals. The data buffers interface the internal data bus with the system bus.
REGISTER ORGANIZATION OF 80386
The microprocessor 80386 contains: -
i) Eight 32 bit GPRs.
ii) Six Segment Registers.
iii)32 bit flag register.
iv) Segment Descriptor Registers.
v) Control Registers.
CLK2: - This input pin provides the basic system clock timing for the operation of 80386.
D0-D31: - These 32 lines act as bidirectional data bus during different access cycles.
A31 - A2: - These are the upper 30 bits of the 32-bit address bus.
BE0# to BE3#: - The 32-bit data bus supported by 80386 and the memory system of 80386 can be viewed as a 4-byte wide memory access mechanism. The four byte enable lines, BEQ# to BE3#, may be used for enabling these four banks. Using these four enable signal lines, the CPU may transfer 1 byte/ 2bytes/3bytes or 4bytes of data simultaneously.
W/R#: -The write/read output distinguishes the write and read cycles from one another.
DIC#: - This datalcontrol output pin distinguishes between a data transfer cycle from a machine control cycle like interrupt acknowledge.
LOCK#: - The LOCK# output pin enables the CPU to prevent the other bus masters (like coprocessors) from gaining the control of the system bus.
ADS#: - The address status output pin indicates that the address bus and bus cycle definition pins (WIR#, D/C#, M/1O#, BE#-BE3) are carrying the respective valid signals. The 80386 does not have any ALE signal and so this signal may be used for latching the address to external latches.
READY#: - The ready signal indicates to the CPU that the previous bus cycle has been terminated and the bus is ready for the next cycle. This signal is used to insert WAIT states in a bus cycle and is useful for interfacing of slow devices with the CPU.
BS16: - The bus size-l6 input pin allows the interfacing of 16-bit devices with the 32-bit wide 80386 data bus. Successive 16-bit bus cycles may be executed to read a 32-bit data from a peripheral.
HOLD: - The bus hold input pin enables the other bus masters to gain control of the system bus if it is asserted.
HLDA: - The bus hold acknowledge output indicates that a valid bus hold request has been received and the bus has been relinquished by the CPU.
BUSY#: - The busy input signal indicates to the CPU that the coprocessor is busy with the allotted task.
ERROR#: - The error input pin indicates to the CPU that the coprocessor has encountered an error while executing its instruction.
PEREQ: - The processor extension request output signal indicates to the CPU to fetch a data word for the coprocessor.
INTR: - INTR interrupt pin is a maskable interrupt input, that can be masked using the IF of the flag register.
NMI: - A valid request signal at the non-maskable interrupt request input pin internally generates a non-maskable interrupt of type 2.
RESET: - A high at this input pin suspends the current operation and restarts the execution from the starting location.
N/C: - No connection pins are expected to be left open while connecting the 80386 in the circuit.
Vcc: - These are system power supply lines.
Vss: - These are return lines for the power suply.
i) General Purpose Registers (GPR): -
The 80386 has eight 32-bit general purpose registers which may even be used either as 8-bit or 16-bit registers. A 32-bit register, known as an extended register, is represented by the register name with prefix E. For example, a 32-bit register corresponding to AX is EAX, similarly that corresponding to BX is EBX etc. The AX now represents the lower of the 32-bit register EAX. While AH and AL have the same meaning as in the case of 8086. Similarly, the registers BX,CX and DX have their 8-bit, 16-bit and 32-bit representations.
The 16-bit registers BP. SP. SI and DI in the architecture of 8086. are now available with their extended size of 32 bits and are named as EBP, ESP, ESI and EDI. However, the names BP SP, SI and DI represent the lower 16-bits of their 32-bit counterparts, and can be used as independent 16-bit registers.
ii) Segment Register: -
The six segment registers available in 80386 are CS, SS, DS, ES, FS and GS. The CS and SS are the code and the stack segment registers respectively, while DS, ES, FS and GS are the four data segment registers. A 16-bit Instruction Pointer IP, is available along with its 32-bit counterpart EIP, and both serve their conventional functions as per requirement. The 16-bit or lower size registers are used by 16-bit addressing, but the 32-bit addressing modes may use all the register widths, i.e. 8,16 or 32 bits.
iii) Flag Register: -
The flag register of 80386 is a 32-bit register. Out of the 32 bits, Intel has reserved bits D18to D31, D15, D5 and D3, whileD1 is always set at 1. The lower 15 bits (D0-D14) of this flag register are exactly the same as the 80286 flag registers, right from their position to the corresponding functions. Only two extra new flags are added to the 80286 flag register to derive the flag register of 80386. These are the VM and RF flags.
a) VM-Virtual Mode Flag: - If this flag is set, the 80386 enters the virtual 8086 mode within the protected mode. This is to be set only when the 80386 is in protected mode. In this mode, if any privileged instruction is executed an exception 13 is generated. This bit can be set using the IRET instruction or any task switch operation only in the protected mode,
b) RF-Resume Flag: - This flag is used with the debug register breakpoints. It is checked at the starting of every instruction cycle and if it is set, any debug fault is ignored during the instruction cycle. The RF is automatically reset after successful execution of every instruction, except for the fRET and POPF instructions. Also, it is not cleared automatically after the successful execution of iMP, CALL and INT instructions causing a task switch. These instructions are used to set the RF to the value specified by the memory data available at the stack.
c) IOPL: - Flag bits indicate the privilege level of the current IO operations.
iv) Segment Descriptor Registers: -
The segment descriptor registers of 80386 are not available for programmers, rather, they are internally used to store the descriptor information, like attributes, limit and base addresses of segments. The six segment registers have corresponding six 73-bit descriptor registers. Each of them contains 32-bit base address, 32-bit base limit and 9-bit attributes. These are automatically loaded when the corresponding segment registers are loaded with selectors.
v) Control Registers: -
The 80386 has three 32-bit control registers CR0, CR2, and CR3 to hold global machine status. The load and store instructions are available to access these registers. The control register CR1 is reserved for use in future Intel processors.
vi) System Address Registers: -
Four special registers are defined to refer to the descriptor tables supported by 80386. The 80386 supports four types of descriptor tables, viz. Global Descriptor Table (GDT), Interrupt Descriptor Table (IDT), Local Descriptor Table (LDT) and Task State Segment Descriptor (TSS). The system address registers and system segment registers hold the addresses of these descriptor tables and the corresponding segments. These registers are known as GDTR. IDTR, LDTR and TR respectively. The GDTR and IDTR are called as system address and LDTR and TR are called as system segment registers.
vii) Debug and Test: -
Registers Intel has provided a set of eight debug registers for hardware debugging. Out of these eight registers- DR0 to DR7, two registers DR4 and DR5 are Intel reserved. The initial four registers DR0 to DR3 store four program controllable breakpoint addresses, while DR6 and DR7 respectively hold breakpoint status and breakpoint control information. Two more test registers are provided by 80386 for page cacheing, namely test control and test status registers. The debug and test registers are shown in Fig.
The 80386 supports eleven addressing modes. The 80386 has all the addressing modes which were available with 80286. In case of all those modes, the 80386 can now have 32-bit immediate or 32-bit register operands or displacements. Besides these, the 80386 has a family of scaled modes. In case of the scaled the modes, any of the index register values can be multiplied by a valid scale factor to obtain the displacement. The valid scale factors are 1, 2, 4 and 8.
Scaled Indexed Mode: -
Contents of an index register are multiplied by a scale factor that may be added further to get the operand offset.
MOV EBX, LIST [ESI*2] List displacement.
MUL ECX, LIST [EBP*4]
Based Scaled Indexed Mode: -
Contents of an index register are multiplied by a scale factor and then added to base register to obtain the offset
MOV EBX. [EDX*4] [ECX]
MOV EAX, [EBX*2] [ECX]
Based Scaled Indexed Mode with Displacement: -
The contents of an index register are multiplied by a scaling factor and the result is added to a base register and a displacement to get the offset of an operand.
MOV EAX, LIST [ESI*2] [EBX + 0800]
MUL EBX, LIST [EDI*8] [ECX + 0100]
The displacement may be any 8-bit, 16-bit or 32-bit immediate number. The base and index register may be any general purpose register except ESP.
DATA TYPES OF 80386
The 80386 supports the following 17 data types, each of which is discussed here in brief. Some of them have already been discussed in the previous chapter.
2. Bit Field :— A group of at the most 32 bits (4 bytes)
3. Bit String:— A string of contiguous bits of maximum 4 Gbytes in length.
4. Signed Byte:— Signed byte data
5. Unsigned Byte:— Unsigned byte data.
6. Integer word:— Signed 16-bit data.
7. Long Integer:— 32-bit signed data represented in 2’s complement form.
8. Unsigned Integer Word:— Unsigned 16-bit data
9. Unsigned Long Integer:— Unsigned 32-bit data
10. Signed Quad Word:— A signed 64-bit data or four word data.
11. Unsigned Quad Word:— An unsigned 64-bit data.
12. Offset:— A 16 or 32-bit displacement that references a memory location using any of the addressing modes.
13. Pointer:— This consists of a pair of 16-bit selector and 16 I 32-bit offset.
14. Character:— An ASCII equivalent to any of the alphanumeric or control characters.
15. Strings:— These are the sequences of bytes, words or double words. A string may contain minimum one byte and maximum 4 Gigabytes.
16. BCD:— Decimal digits from 0-9 represented by unpacked bytes.
17. Packed BCD:— This represents two packed BCD digits using a byte, i.e. from 00 to 99.
REAL ADDRESS MODE OF 80386
After reset, the 80386 starts from the memory location FFFFFFF0H under the real address mode. In the real mode, 80386 works as a fast 8086 with 32-bit registers and data types. The addressing techniques, memory size, interrupt handling in this mode of 80386 are similar to the real address mode of 80286. All the instructions of 80386 are available in this mode except for those designed to work with or for protected address mode. In the real mode, the default operand size is 16-bit but 32-bit operands and addressing modes may be used with the help of override prefixes. The segment size in real mode is 64KB; hence the 32-bit effective addresses must be less than 0000FFFFFH (20 - bit). The real mode initializes the 80386 and prepares it for protected mode. Least significant nibble of segment address will always be 0.
Memory Addressing in Real Mode
In the real mode, the 80386 can address at the most 1Mbytes of physical memory using address lines A0-A19. Paging unit is disabled in the real address mode, and hence the real addresses are the same as the physical addresses. To form a physical memory address, appropriate segment register contents (16-bits) are shifted left by four positions and then added to the 16-bit offset address formed using one of the addressing modes, in the same way as in the 80386 real address mode. The segments in 80386 real mode can be read, written or executed, i.e. no protection is available. Any fetch or access past the end of the segment limit generates exception 13 in real address mode. The segments in 80386 real mode may be overlapped or non-overlapped. The interrupt vector table of 80386 has been allocated 1Kbyte space starting from 00000H to 003FFH. Figure shows the physical address formation in real mode of 80386.
PROTECTED MODE OF 80386
All the capabilities of 80386 are available for utilization in its protected mode of operation. In this mode, the 80386 can address 4 Gigabytes of physical memory and 64 terabytes of virtual memory per task. The 80386 in the protected mode supports all softwares written for 80286 and 8086 to be executed under the control of memory management and protection abilities of 80386. The protected mode allows the use of additional instructions, addressing modes and capabilities of 80386.
Addressing in Protected Mode
In this mode, the contents of segment registers are used as selectors to address descriptors which contain the segment limit, base address and access rights byte of the segment. The effective address (offset) is added with segment base address to calculate linear address. This linear address is further used as physical address, if the paging unit is disabled. Otherwise, the paging unit converts the linear address into physical address.
The paging unit is a memory management unit enabled only in the protected mode. The paging mechanism allows handling of large segments of memory in terms of pages of 4 KB size. The paging unit operates under the control of segmentation unit. The paging unit if enabled converts linear addresses into physical addresses, in protected mode. Figures (a) and (b) show addressing in protected mode without and with paging unit enabled respectively.
A lot has already been said about segmentation while dealing with 8086 and 80286. In short, the segmentation scheme is a way of offering protection to different types of data and code. The 80386 also utilizes the three types of segment descriptor tables as the 80286 does. However, there are slight differences between the 80386 and the 80286 descriptor structures. Again, associated with each descriptor, there are the corresponding descriptor table registers, which are manipulated by the operating system to ensure the correct operation of the processor, and hence the correct execution of the program.
The three types of the 80386 descriptor tables are listed as follows:
1. Global Descriptor Table (GDT)
2. Local Descriptor Table (LDT)
3. Interrupt Descriptor Table (IDT)
Their respective significances are also similar to the corresponding descriptor table significance s in 80286.
Unlike 80286 descriptors, the 80386 descriptors have a 20-bit segment limit and 32-bit segment address. The descriptors of 80386 are 8-byte quantities containing access right or attribute bits along with the base and limit of the segments.
Descriptor Attribute: -
Bits The A (accessed) attribute bit indicates whether the segment has been accessed by the CPU or not. The TYPE field decides the descriptor type and hence the segment type. The S-bit decides whether it is a system descriptor (S = 0) or code/data segment descriptor (S = 1). The DPL field specifies the descriptor privilege level. The D-bit specifies code segment operation size. If D - 1, the segment is a 32-bit operand segment else, it is a 16-bit operand segment. The P-bit (Present) signifies whether the segment is present in the physical memory or not. If P = 1, the segment is present in the physical memory. The G-(granularity) bit indicates whether the segment is page addressable. The zero bit must remain zero for compatibility with future processors. The AVL (Available) field specifies whether the descriptor is available to the user or to the operating system. Figure shows the general structure of a segment descriptor of 80386.
The five types of descriptors that the 80386 has are as follows:
1. Code or data Segment Descriptors
2. System Descriptors
3. Local descriptors
4. TSS (Task State segment) Descriptors
5. GATE Descriptors
All these descriptors have similar definitions as in the case of 80286. Their respective structures may be slightly different as compared to the general segment descriptor structure of 80286, but they have similar functions as in 80286. The 80386 provides a four level protection mechanism, exactly in the same way as the 80286 does.
BASE :- Base Address of the segment
LIMIT:- The length of the segment
P :- Present Bit-i = Present, 0 = Not Present
DPL:- Descriptor Privilege Level 0-3
S:- Segment Descriptor-0 = System Descriptor, 1 = Code or Data Segment Descriptor
TYPE:- Type of Segment
A:- Accessed Bit
G:- Granularity Bit - 1 = Segment length is page granular, 0 = Segment length is byte granular
D:- Default Operation Size (recognized in code segment descriptors only) - 1 = 32-bit segment, 0 = 16-bit segment
0:- Bit must be zero (0) for compatibility with future processors
AVL:- Available field for user or OS
Paging Operation: -
Paging is one of the memory management techniques used for virtual memory multitasking operating systems. The segmentation scheme may divide the physical memory into variable size segments but the paging divides the memory into fixed size pages. The segments are supposed to be the logical segments of the program, but the pages do not have any logical relation with the program. The pages are just the fixed size portions of the program module or data. The advantage of the paging scheme is that the complete segment of a task need not be in the physical memory at any time. Only a few pages of the segments, which are required currently for the execution, need to be available in the physical memory. Thus the memory requirement of the task is substantially reduced, relinquishing the available memory for other tasks. Whenever the other pages of the task are required for execution, they may be fetched from the secondary storage. The previous pages which are executed need not be available in the memory, and hence the space occupied by them may be relinquished for other tasks. Thus the paging mechanism provides an effective technique to manage the physical memory for multitasking systems.
Paging Unit: -
The paging unit of 80386 uses a two level table mechanism to convert the linear addresses provided by segmentation unit into physical addresses. The paging unit converts the complete map of a task into pages, each of size 4K. The task is then handled in terms of its pages, rather than segments. The paging unit handles every task in terms of three components namely page directory, page tables and the page itself.
Page Descriptor Base Register: -
The control register CR, is used to store the 32-bit linear address at which the previous page fault was detected. The CR3 is used as page directory physical base address register, to store the physical starting address of the page directory. The lower 12 bits of CR3 are always zero (page size 212 = 4 K) to ensure the page size aligned with the directory. A move operation to CR3 automatically loads the page table entry caches and a task switch operation, to load CR0 suitably.
Page Directory: -
This is at the most 4 Kbytes in size. Each directory entry is of four bytes, thus a total of 1024 entries are allowed in a directory. The following Fig. (a) Shows a typical directory entry. The upper 10 bits of the linear address are used as an index to the corresponding page directory entry. The page directory entries, point to the page tables.
Page Tables: -
Each page table is of 4Kbytes in size and may contain a maximum of 1024 entries. The page table entries contain the starting address of the page and the statistical information about the page as shown in Fig (b). The upper 20-bit page frame address is combined with the lower 12 bits of the linear address. The address bits A17-A21 are used to select the 1024 page table entries. The page tables can be shared between the tasks.
The P-bit of the above entries indicate, if the entry can be used in address translation. If P = 1, the entry can be used in address translation, otherwise, it cannot be used. The P-bit of the currently executed page is always high. The accessed bit A is set by 80386 before any access to the page. If A = 1, the page is accessed, otherwise, it is unaccessed. The D-bit (Dirty bit) is set before a write operation to the page is carried out. The D-bit is undefined for page directory entries. The OS reserved bits are defined by the operating system software.
The User/Supervisor (U/S) bit and Read/Write (R/W) bit are used to provide protection. These bits can be decoded as shown in Table 10.2 to provide protection under the four level protection model. The level 0 is supposed to have the highest privilege, while the level 3 is supposed to have the least privilege.
Conversion of Linear address to Physical: -
The paging unit receives a 32-bit linear address from the segmentation unit. The upper 20 linear address bits (A12-A31) are compared with all the 32 entries in the translation look aside butler to check if it matches with any of the entries. If it matches, the 32-bit physical address is calculated from the matching TLB entry and placed on the address bus.
For converting all the linear addresses to physical addresses, if the conversion process uses the two levels paging for every conversion, a considerable time will be wasted in the process. Hence, to optimize this, a 32-entry (32 x 4bytes) page table cache is provided which stores the 32 recently accessed page table entries. Whenever a linear address is to be converted to physical address, it is first checked to see, whether it corresponds to any of the page table cache entries. This page table cache is also knows Translation Look-aside Buffer (TLB).
If the page table entry is not in TLB, the 80386 reads the appropriate page directory entry it then checks the P-bit of the directory entry. If P = I. it indicates that the page table is in the memory. Then 80386 refers to the appropriate page table entry and sets the accessed bit A. If P = I. in the page table entry, the page is available in the memory. Then the processor updates the A and D bits and accesses the page. The upper 20 bits of the linear address, read from the page table are stored in TLB for future possible access. If P = 0 the processor generates a page fault exception number 14. This exception is also generated if page protection rules are violated. Every time a page fault exception is generated the CR, is loaded with the page fault address. Figure 10.9 shows the overall paging operation with TLB.
VIRTUAL 8086 MODE
In its protected mode of operation 80386DX provides a virtual 8086 operating environment to execute the 8086 programs. The real mode also can be used to execute the 8086 programs along with the capabilities of 80386, like protection and a few additional instructions. However, once the 80386 enters the protected mode from the real mode, it cannot return back to the real mode without a reset operation. Thus, the virtual 8086 mode of operation of 80386 offers an advantage of executing 8086 programs while in protected mode.
The address forming mechanism in virtual 8086 mode is exactly identical with that of 8086 real mode. In virtual mode, 8086 can address 1 Mbytes of physical memory that may he anywhere in the 4 GB address space of the protected mode of 80386. Like 80386 real mode, the addresses in virtual 8086 mode lie within 1 Mbytes of memory. In the virtual mode, the paging mechanism and protection capabilities are available at the service of the programmers (note that the 80386 supports multiprogramming, hence more than one programmer may use the CPU at a time). Paging unit may not be necessarily enabled in the virtual mode, but may be needed to run the 8086 programs which require more than 1 MB of memory for memory management functions.
The virtual 8086 mode executes all the programs at the privilege level 3. Any of the other programs may deny access to the virtual mode programs or data. However, the real mode programs are executed at the highest privilege level, i.e. level 0. Note that the instructions to prepare the processor for protected mode can only be executed at level 0.
The virtual mode may be entered using an IRFT instruction at CPL — 0 or a task switch at any while executing any task whose TSS is having a flag image with VM flag set to 1. The IRET instruction may be used to set the VM flag and consequently enter the virtual mode. The PUSHF and POPF instructions are unable to read or set the VM bit, as they do not access it. Even in the virtual mode, all the interrupts and exceptions are handled by the protected mode interrupt handler. To return to the protected mode from the virtual mode, any interrupt or exception may be used. As a part of interrupt service routine, the VM bit may be reset to zero to pull back the 80386 into the protected mode.
ENHANCED INSTRUCTION SET OF 80386
The instruction set of 80386 contains all the instructions supported by 80286. The 80286 instructions are designed to operate with 8-bit or 16-bit data, while the same mnemonics for 80386 instruction set may be executed over the 32-bit operands, besides 8-bit and 16-bit operands. Moreover, because of the enhanced architecture of 80386 over 80286, with additional general purpose registers, segment registers and flag register, a number of additional instructions were introduced in the instruction set of 80286, to form the instruction set of 80386. An additional addressing mode, viz .scaled mode, also contributes considerably to the enhancement of the 80386 instruction set. These newly added instructions may be categorized into the following functional groups: 1. Bit scan instructions
2. Bit test instructions
3. Conditional Set byte instructions
4. Shift double instructions
5. Control transfer via gates instructions
Various instructions under these groups are explained briefly in the following text:
1. Bit Scan Instructions: -
80386 instruction set has two bit scan mnemonics, viz. BSF (Bit Scan Forward) and BSR (Bit Scan Reverse). Both of these instructions scan the operand for a 1’ bit, without actually rotating it. The BSF instruction scans the operand from right to left. If a 1’ is encountered during the scan, zero flag is set and the bit position of 1’ is stored into the destination operand. If no I’ is encountered, zero flag is reset. The BSR instruction also performs the same function but scans the source operand from the left most bit towards right.
2. Bit Test Instructions: -
80386 has four bit test instructions, viz. BT (Test a Bit), BTC (Test a Bit and Complement), BTR (Test and Reset a Bit) and BTS (Test and Set a bit). All these instructions test a bit position in the destination operand, specified by the source operand.
If the bit position of the destination operand specified by the source operand satisfies the condition specified in the mnemonic, the carry flag is affected appropriately. For example, in the case of BT instruction, if the bit position in the destination operand, specified by the source operand, is ‘1’, the carry flag is set, otherwise, it is cleared.
3. Conditional Set Byte Instruction: -
This instruction sets all the operand bits, if the condition specified by the mnemonic is true. This instruction group has 16 mnemonics corresponding to 16 conditions as shown in Table below.
For example, SETO EAX; this instruction sets all the bits of EAX, if the overflow flag is set.
4. Shift Double Instructions: -
These instructions shift the specified number of bits from the source operand into the destination operand. The 80386 instruction set has two mnemonics under this category. viz. SHLD (Shift Left Double) and SHRD (Shift Right Double). The SHLD instruction shifts the specified number of bits (in the instruction) from the upper side. i.e. MSB of the source operand into the lower side, i.e. LSB of the destination operand. The SHRD instruction shifts the number of bits specified in the instruction from the lower side, i.e. LSB of the source operand into the upper side i.e. MSB of the destination operand.
1. SHLD EAX, ECX, 5
This instruction shifts 5 MSB bits of ECX into the LSB positions of EAX one by one starting from the MSB of ECX.
2. SHRD EAX, ECX, 8
This instruction shifts 8 LSB bits of ECX into the MSB positions of EAX one by one starting from the LSB of ECX.
5. Control Transfer Instructions: -
The 80386 instruction set does not have any additional instructions for the intrasegment jump. However; for intersegment jumps it has got a set of new instructions which are variations of the previous CALL and JUMP instructions, and are to be executed only in the protected mode. These instructions are used by 80386 to transfer the control either at the same privilege or at a different privilege level. Also, different versions of control transfer instructions are available to switch between the different task types and TSS (Task State Segment). The corresponding RET instructions are also available to switch back from the new task initiated via CALL, JMP or INT instructions to the parent task.
THE CPU WITH A NUMERIC COPROCESSOR—80486DX
The 80386-80387 couplet, when it was introduced, was seen as the most powerful processing unit, wherein the use of 80387 was optional. However, with the increasing demand for more and more processing capability for advanced applications, the use of 80387 became more often compulsory than optional. Also, the designers thought of integrating the floating-point unit inside the CPU itself. The 32-bit CPU 80486 from Intel is the first processor with an inbuilt floating-point unit. It retained the complex instruction set of 80386, but introduced more pipelining for speed enhancement.
The 80486 is packaged in a 168-pin grid array package. The 25 MHz. 33 MHz. 50 MHz and 100 MHz (DX-4) versions of 80486 are available in the market. The 80486 is also available as 80486SX that does not have the numeric coprocessor integrated into it. The 80486DX is the most popular version of 80486. All the discussions in this text are thus related to 80486DX.
Salient Features of 80486
As mentioned in the introductory rote, 80486DX is the first CPU with an on chip floating-point unit. For fast execution of complex instructions of the xxx86 family, the 80486 has introduced a five stage pipeline. Two out of the five stages are used for decoding the complex instructions of the xxx86 architecture. This feature which has been used widely in RISC architectures results in a very fast instruction execution which will be explained later. The 80486 is also the first amongst the xxx86 processors to have an on-chip cache. This 8Kbytes cache is a unified data and code cache and acts on physical addresses. The details of the cache and cache controller operations are discussed later in this chapter. Further, features like boundary scan test and on-line parity check were introduced in 80486 to make it more susceptible to fault tolerant architectures. The memory and I/O capabilities of 80486 are similar to 80386DX. There are certain signals and architectural features, not available in 80386, which enhance the overall performance of 80486.
Architecture of 80486
The 32-bit pipelined architecture of Intel’s 80486 is shown in Fig. 10.15. The internal architecture of 80486 can be broadly divided into three sections, namely bus interface unit, execution and control unit and floating-point unit.
The bus interface unit is mainly responsible for coordinating all the bus activities. The address driver interfaces the internal 32-bit address output of cache unit with the system bus. The data bus transreceivers interface the internal 32-bit data bus with the system bus. The 4X80 write data buffer is a queue of four 80-bit registers which hold the 80-bit data to be written to the memory (available in advance due to pipelined execution of write operation). The bus control and request sequencer handles the signals like ADS#, WIR#, DIC#, M/IO#, PCD, PWT, RDY#, LOCK#, PLOCK#, BOFF#, A2OM#, BREQ, HOLD. HLDA, RESET, INTR, NMI, FERR# and IGNNE# which basically control the bus access and operations.
The burst control signal BRDY# informs the processor that the burst is ready (i.e. it acts as ready signal in burst cycle). The BLAST# output indicates to the external system that the previous burst cycle is over. The bus size control signals BS 16# and BS5# are used for dynamic bus sizing. The cache control signals KEN#, FLUSH, AHOLD and EADS# control and maintain the cache in coordination with the cache control unit. The parity generation and control unit maintain the parity and carry out the related checks during the processor operation. The boundary scan control unit, that is built in 50 MHz and advanced versions only, subject the processor operation to boundary scan tests to ensure the correct operation of various components of the circuit on the mother board, provided the TCK input is not tied high. The prefetcher unit fetches the codes from the memory ahead of execution time and arranges them in a 32-byte code queue.
The instruction decoder gets the code from the code queue and then decodes it sequentially. The output of the decoder drives the control unit to derive the control signals required for the execution of the decoded instructions. But prior to execution, the protection unit checks, if there is any violation of protection norms. In case of violation, an appropriate exception is generated. The control ROM stores a micro-program for deriving control signals for execution of different instructions. The register bank and ALU are used for their conventional usages. The barrel shifter helps in implementing the shift and rotates algorithms. The segmentation unit, descriptor registers, paging unit, translation look aside buffer and limit and attribute PL4 work together to manage the virtual memory of the system and provide adequate protection to the codes or data in the physical memory. The floating-point unit with its register bank communicates with the bus interface unit under the control of memory management unit, via its 64-bit internal data bus. The floating-point unit is responsible for carrying out mathematical data processing at a higher speed as compared to the ALU, with it’s built in floating-point algorithms.
Register Organization of 80486: -
The register set of 80486 is similar to that of the 80386. Only a flag called as alignment check flag is added to the flag register of 80386 at position D18 as shown in Fig. 10.16. If the AC flag bit is set to ‘1’, whenever there is an access to a misaligned address, a fault (exception) is generated. The misaligned address means a word access to an odd address or a double word access to an address that is not on a double word boundary and so on. The alignment faults are generated only at privilege level 3.
General Features of 80486
Floating Point Unit One of the major limitations in 80386-387 system is that the 80386 sends the instruction or data to 80387 using an I/O handshake technique. To perform this handshaking and to carry out the additional housekeeping task, the 80386 requires about 15 clock cycles or more. Thus it was felt that even if the coprocessor architecture is enhanced to achieve a higher speed, the major bottleneck of the communication overhead remains. Hence designers concluded that having an on- chip floating-point unit was imperative and not optional. With this idea, Intel’s 80486 CPU integrated an on-chip floating-point unit. Due to the space limitation, however, 80486 implements the FPU based on a partial multiplier array. The FPU contains a shift and add data path which is controlled by microcode. The FPU registers of 80486 are similar to those in 80387. The FPU TAG word, control word and status words are also the same as those of 80387. The FPU can work either under the control of the Memory Management Unit MMU (protected mode) or without any control of MMU (read mode). The FPU supports all the data types supported by 80387. The floating-point unit instruction set of 80486 is upwardly compatible with that of 80387. A large number of instructions supporting floating-point arithmetic are supported by 80486. Some of the important ones include FSQRT (Floating-point Square Root), floating-point transcendental like FSIN and floating-point arithmetic instructions like FMUL. The detailed discussion of the instruction set of 80486 is out of the scope of this book and hence avoided.
Addressing Modes: -
The addressing modes supported by 80486 are exactly the same as those of
80386. The physical address calculation methods of 80486 are also similar to those of 80386 in real as well as protected virtual address mode. The memory organisation and addressing techniques are the same as those of 80386. The memory and 110 addressing capability of 80486 is the same as that of 80386.
Interrupts of 80486: -
Like other 8086 family processors, the 80486 can also handle 256 (00 to FFH) hardware interrupts on its INTR pin. The interrupt type N (00 to FF) is to be passed to the CPU by an external hardware like interrupt controller. In the real mode, the structure of the Interrupt Vector Table (IVT) is exactly the same as that of 8086. However in protected mode, the interrupt vectors are 8-byte quantities and are handled by an interrupt descriptor table, containing 256 possible interrupt vectors (256 x 8 = 2 Kbytes). Out of the total of the 256 interrupts, 32 are reserved by Intel while the remaining 224 are free for use by the users. The interrupt priorities and other details are same as the other 8086 family processors.
Data Types of 80486: -
The 80486 CPU supports a wide range of data types including the floating- point data types, as listed briefly. Please note that the FPU does not support any unsigned data type.
(i) Signed/unsigned Data Type 8-bit, 16-bit, 32-bit signed and unsigned integers are supported by 80486 while the FPU supports 16-bit, 32-bit and 64-bit signed data.
(ii) Floating Point Data Types Single precision double precision, extended precision real data are supported only by the FPU.
(iii) BCD Data Types Packed and unpacked BCD data types. The CPU supports 8-bit packed and unpacked data types. The FPU supports 80-bit packed BCD data types.
(iv) String Data Types Strings of bits, bytes, words and double words are supported by the CPU. Each of the strings may contain up to 4 Gbytes.
(v) ASCII Data Types The ASCII representation of the characters are supported by 80486.
(vi) Pointer Data Types 48-bit pointers containing 32-bit offset at the least significant bits and 16-bit selector at the most significant bits are supported by the CPU. Also 32-bit pointers containing 32-bit offsets are supported by the CPU.
(vii) Little Endian and Big Endian Data Types Normally the 8086 family uses the Little Endian data format. This means for a data of size bigger than one byte, the least significant byte is stored at the lowest memory address while the most significant byte is stored at the highest memory address. The complete data is referred to by the lowest memory address, i.e. the address of the least significant byte.
The Big Endian format allows the storage of data in the exactly opposite manner, i.e. the MSB is stored at the lowest memory address, while the LSB is stored at highest memory address. The 80486 has two special instructions to convert a data from Little to Big Endian or vice versa. The pointers and Big Endian data types were not supported by 80386.
Modes of Operation of 80486: -
After reset, the 80486, just like 80286 and 80386, starts execution in the real address mode. The real address mode operation of 80486 is exactly similar to 80386. While executing in real address mode, the 80486 initializes registers, peripherals, IVT sets up descriptor tables and prepares itself for the protected mode. The protected mode operation of 80486 is also similar to that of 80386, right from the address formation to descriptor types and structures. In the protected virtual address mode, the 80486 also supports a virtual 8086 mode for execution of 8086 applications. The protection schemes and privilege levels allowed by 80486 are similar to those of 80386. The other operations like task switching, paging and exception handling of 80486 are also similar to the corresponding operations in 80386.
On Chip Cache and Cache Control Unit
This is a unique feature of 80486 that is not available in 80386. The on-chip cache is used for storing the opcode as well as data. For this new enhancement, to the architecture of 80386, the two bits, PWT (Page Write Through) and PCD (Page Cache Disable) are defined in the page directory entry and page table entry of 80486 as shown in Figs 10.18(a) and (b).
The Page Write Through (PWT) bit controls the write policy for the current page. If PWT = 1, then the current page is write through otherwise, it is write back. The PCD bit controls the cache ability of the corresponding page. If PCD = 0, the caching is enabled for on-chip cache subject to the favorable status of KEN# (cache enable) input, the status of CD (cache disabled) and NW (No Write-Through) bits in the control register 0 (CR0). If PCD = 1, independent of all other pins or bit status, the cacheing is disabled. The 80486 maintains a write through cache, hence the PWT bit is ignored internally, and still it can be used to control the write policy of the second level cache (external). The PWT and PCD bits’ status is displayed on the PWT and PCD pins of 80486 during a memory access.
To accelerate the speed of operation, the 80486 is provided with an 8Kbytes on-chip cache. However, even with this added feature, the 80486 is fully software compatible with 80386. The physical organization of the cache is shown in Fig. 10.19. The 8Kbytes of on-chip cache is divided into four 2Kbytes associative memory blocks. Each 2Kbytes memory block is arranged in l6byte (columns) and 128 rows, i.e. each of 128 rows, contain 16 bytes. Each of the 128 rows is associated with a 21-bit tag register. The cache is referred to by the row number (address) and block number. A 16-byte row is divided into 4-byte lines. Any of the four lines cannot be accessed partially. If a write operation is attempted to an address of which the segment descriptor is available in cache, along with the cache, the data is written to the external memory, otherwise, it is only written to external memory.
Cache Maintenance: -
The on-chip cache is controlled using the Cache Disable (CD) and No Write through (NW) bits of control register CR0 as shown in Table 10.3. To completely disable the cache, the CD and NW bits must be set to 11 and the cache must be flushed. Otherwise, every cache hit to the previous contents will unnecessarily generate a cache read cycle internally. Any memory block can be defined as cacheable or non-cacheable by using external hardware or system software. The external hardware informs the microprocessor, by deactivating the KEN# pin, that the referenced area is non-cacheable.
SALIENT FEATURES OF 80586 (PENTIUM)
In the introductory note we have hinted that the designers of Pentium had basically two clear points in mind:
(a) To design a CPU with enhanced complex instruction sets, which should remain code compatible with earlier X86 CPUs—from 8086 to 486 and,
(b) To achieve performance so as to match the third generation RISC performances.
Both these objectives were, to a large extent met while designing the Pentium CPU. Thus Pentium designers introduced a lot of RISC features while retaining the complex instruction sets supported by the earlier X86 CPUs.
A salient feature of Pentium is its superscalar, super pipelined architecture. It has two integer pipelines U and V, where each one is a 4-stage pipeline. This enhances the speed of integer arithmetic of Pentium to a large extent. Moreover, it has an on-chip floating-point unit, which has increased the floating-point performance manifold compared to the floating-point performances of 80386/486 processors.
Another feature of Pentium is that it contains two separate caches, viz, data cache and instruction cache. One may recall that in 80486 there was a single unified data/instruction cache. All these features will be explained in detail later in this section.
Before presenting the Pentium architecture, a few advanced architectural concepts will be explained first. This will help the reader to understand the superscalar pipelined architectures of advanced CPUs like Pentium.
A FEW RELEVANT CONCEPTS OF COMPUTER ARCHITECTURE
One of the key issues in the design of modern computer architecture may be stated like this: ‘How to ensure maximum throughput from a system?’. There are various advanced architectural techniques which have been employed to achieve maximum throughput. We will discuss only a few of them.
So far while discussing the Intel CPU architectures up to 80486, we have seen that only one instruction is issued to the execution unit per cycle. This obviously leads to a comparatively slow process of decoding and execution. For enhancement of processor performance, beyond one instruction per cycle, the computer architects employ the technique of Multiple instruction issue (MI!). Thus a microprocessor which is capable of issuing more than one instruction per single processor cycle will be termed as MI! microprocessor. Obviously, for executing more than one instruction in a cycle, the microprocessor must have more than one execution channels. Thus there are two problems, viz. (a) How to issue multiple instructions and (b) How to execute them concurrently. Keeping in view these two issues, MI! architectures may again be redivided in two classes of architectures—(i) Very Long Instruction Word (VLIW) architecture and (ii) Superscalar architecture.
In VLIW processors, the compiler reorders the sequential stream of code that is coming from memory into a fixed size instruction group and issues them in parallel for execution. On the other hand, in superscalar architecture the hardware decides which instructions are to be issued concurrently at run time.
The Pentium CPU is based on superscalar architecture. The hardware, in case of a superscalar architecture like Pentium, becomes enormously complex because in such a processor multiple instructions have to be issued in each cycle to the execution unit.
Another important concept involved here is that of pipelining. We have already explained pipelined architecture for computing integer arithmetic in an 80486 CPU. As a matter of fact, pipelining has been implemented in all the processors from 8086 onwards, in a limited sense when instructions have been prefetched and stored in a queue. With these few remarks, we now present the architecture of Pentium.
A salient feature of Pentium is that it supports superscalar architecture which has been explained in the previous section. For execution of multiple instructions concurrently, Pentium microprocessor issues two instructions in parallel to the two independent integer pipelines known as U and V pipelines. Each of these two pipelines has 5 stages, as shown in Fig. 11.2. These pipeline stages are similar to the one in 80486 CPU. Functions of these pipelines have been presented in brief:
1. In the prefetch stage of the pipeline, the CPU fetches the instructions from the instruction cache, which stores the instructions to be executed. In this stage, the CPU also aligns the codes appropriately. This is required since the instructions are of variable length and the initial opcode bytes of each instruction should be appropriately aligned. After the prefetch stage, there are two decode stages D1 and D2.
2. In the D1 stage, the CPU decodes the instruction and generates a control word. For simple RISC like instructions involving register data transfer or arithmetic and logical operations, only a single control word might be sufficient enough for starting the execution. However, as we know X86 architecture supports complex CISC instructions and require microcoded control sequencing.
3. Thus a second decode stage D, is required where the control word from D1 stage is again decoded for final execution. Also the CPU generates addresses for data memory references in this stage.
4. In the execution stage, known as E stage, the CPU either accesses the data cache for data operands or executes the arithmetic/logic computations or floating-point operations in the execution unit.
5. In the final stage of the five-stage pipeline, which is the WB (writeback) stage, the CPU updates the registers’ contents or the status in the flag register depending upon the execution result.
Although, as we mentioned Pentium pipeline structure is somewhat similar to the 80486 pipeline structure, Pentium achieves a lot of speed-up by integrating additional hardware in each pipeline stages. Thus while 80486 may take two clock cycles to decode some instructions, Pentium takes only one.
Separate Code and Data Cache
Unlike 80486 microprocessors’ unified code/data cache of 8 Kbyte size, Pentium has introduced two separate 8 Kbyte caches for code and data. From the fundamental principles of cache operation, one may observe that a unified cache, as in 80486 will always have a higher hit ratio than two separate caches. Why then Pentium has gone in for separate caches? The answer probably lies in the fact that to support the superscalar organisation, it demanded more bandwidth that a unified cache could not provide. Moreover to efficiently execute the branch prediction (explained later in the section), separate caches are more meaningfully employed.
We have already mentioned in the introductory note in this chapter that to reduce the communication overhead, there is a need to eliminate the coprocessor which has been actually implemented in 80486 CPU. The 80486 CPU contains a floating-point unit which is not pipelined. The FPU of Pentium has introduced massive pipelining with an eight stage pipeline. The first five stages of the pipeline are identical to the U and V integer pipelines as discussed earlier. In the operand fetch stage, the FPU fetches the operands either from the floating-point register file or from the data cache. There are eight general purpose floating point registers in the FPU. There are, however, two execution stages in Pentium, unlike in 80486, viz, the first execution stage (Xl stage) and second execution stage (X2 stage). In these two stages, the floating point unit reads the data from data cache and executes the floating-point computation. In the write back stage of the pipeline, the FPU writes the results to the floating-point register file. There is an additional error reporting stage where the FPU reports the internal status (including error) which may necessitate additional processing for completion of the floating-point execution.
The block diagram of the floating-point unit is shown in Fig. 11.3. The unit broadly contains five segments, capable of performing five different floating-point computations. These are briefly explained as follows:
1. Floating-point Adder Segment (FADD) This segment is responsible for addition of floating- point numbers and executes many floating-point instructions like addition, subtraction and comparison. This segment is active during X1 and X, stages of the pipeline and executes on single-precision, double-precision and extended precision data.
2. Floating-point Multiplier Segment (FAND) This segment performs floating-point
multiplication in single-precision, double-precision and extended precision modes.
3. Floating-point Divider Segment (FDD) This segment executes floating-point division and square root instructions. It calculates 2 bits of quotient every cycle and operates during both X1 and X2 pipeline stages.
4. Floating-point Exponent Segment (FEXP) This segment calculates the floating-point exponent. This is an important segment which interacts with all other floating-point segments for necessary adjustment of mantissa and exponent fields in the final stage of a floating-point computation.
5. Floating-point Rounder Segment (FRD) The results of floating-point addition or division process may be required to be rounded off before write back to the floating-point registers. This segment performs rounding off operation before write back stage.
11.3.4 Floating-point Exceptions
As in the case of integer arithemetic, there are six possible floating-point exceptions in Pentium. These are: 1. Divide by zero 2. Overflow 3. Underflow 4. Denormal operand and 5. Invalid operation. These exceptions carry their usual meanings. The divide by zero exception, invalid operation exception and denormal operand exception can be easily detected even before the actual floating-point calculation.
A mechanism known as Safe Instruction Recognition (SIR) had been employed in Pentium. This mechanism determines whether a floating-point operation will be executed without creating any exception. In case an instruction can safely be executed without any exception, the instruction is allowed to proceed for final execution. If a floating-point instruction is not safe, then the pipeline stalls the instruction for three cycles and after that the exception is generated.