R4300i MICROPROCESSOR MIPS R4300i Microprocessor Technical Backgrounder -Preliminary Chapter 1. R4300i Technical Summary -=- [PREFORMATTED] =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Performance: SPECint92 60 SPECfp92 45 ISA Compatibility MIPS-I, MIPS-II, AND MIPS-III Master Clock Frequency 10 Mhz min/ 67 Mhz max Pipeline Clock 10 Mhz min/ 100 Mhz max (1x, 1.5x, 2x, or 3x of master clock) System Interface clock 67 Mhz max Caches 16 KB I-cache and 8 KB D-cache TLB 32 double entries; Variable Page size (4 KB to 16 MB in 4x increments) Power dissipation: 1.8 watts (typ.) at max. operating frequency Supply voltage min 3.0 V typ. 3.3 V max 3.6 V Packaging: 120-pin Plastic Quad Flat Pack (PQFP) Die size: 44 mm2 Process Technology: 0.35 micron, 3-level-metal CMOS -=- [PREFORMATTED] =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Chapter 2. Overview This paper introduces the RISC R4300i microprocessor from MIPS Technologies, Inc. (MTI). The information presented in this paper discusses how the R4300i differs from previous microprocessors from MTI. This chapter provides general information on the R4300i, including: ˙˙˙[*]Introduction ˙˙˙[*]The R4300i microprocessor ˙˙˙[*]Packaging and design support ˙˙˙[*]Future upgrades Introduction Reduced instruction-set computer (RISC) architectures differ from older complex instruction-set computer (CISC) architectures by optimizing performance for the available silicon area. The MIPS architecture, developed by MTI, is firmly established as the leading RISC architecture today. The MIPS R4300i microprocessor extends the benefits of RISC's performance to consumer electronics. The R4300i microprocessor also delivers high performance to existing embedded and computing applications at a low cost. The low cost and high performance provided by the R4300i are needed for the latest consumer applications such as interactive television and games. In the beginning, RISC microprocessors were typically used for high performance applications. Lately, these processors have found their way into the embedded systems market as well. Today, MIPS RISC processors are used in network controllers, laser printers, and X-terminals among other applications. The migration of MIPS RISC processors to these applications has been facilitated by lower costs as well as high integration of various functional blocks into a single die. The R4300i can deliver up to 60 times the integer performance of a VAX 11/780 (60 SPECint '92) at a cost approaching less than $1 per SPECint. The R4300i is also designed for low power so that it can be offered in a low cost plastic package. This makes the R4300i a strong candidate for consumer and embedded applications. Between 1985 and 1994, three generations of the MIPS architecture have been introduced and widely adopted. The first commercial MIPS processor, the R2000, ran at 8-MHz and used a 32-bit architecture. The R3000 family raised system speed to 40 MHz. The R4000 family uses a 64-bit architecture to boost instruction throughput and increase the available address space. It also adds multilevel cache and multiprocessor capabilities. The R4000 family (R4400PC, R4400SC, R4400MC, R4000PC and R4000SC) currently work at pipeline speeds of up to 200 MHz. Recently MTI announced the MIPS R10000 microprocessor that offers industry leading performance for scientific and database applications. MTI's semiconductor partners have successfully implemented MIPS standard processors in a variety of semiconductor processes and introduced numerous derivative products. MTI semiconductor partners include Integrated Device Technology, Inc., LSI Logic Corp., NEC Corporation, Performance Semiconductor, Inc., Siemens and Toshiba Corporation. Users of the MIPS architecture include AT&T, Cisco, Control Data, NEC, Network Computing Devices, Pyramid Technology, QMS, Siemens- Nixdorf, Silicon Graphics, Sony, Texas Instruments and Tektronix. R4300i Microprocessor The R4300i uses a variety of techniques to provide high performance at low cost and low power consumption. These techniques include power- reduction features, power management features, cost-reduction features, and architectural optimizations. Major R4300i characteristics include 64-bit processing, 100-MHz internal pipeline clock frequency, low-voltage operation, power-saving modes, plastic packaging, and a single data path for integer and floating- point operations. The R4300i implements the MIPS-III instruction set architecture and is fully software compatible with all existing MIPS processors. Chapter 3 provides a complete description of architectural enhancements in the R4300i over previous MIPS microprocessor families and other R4000 family members. R4300i Family The R4300i is the first member of a family of microprocessors. The R4300i family uses high integration, power management and virtual memory implementation to bring high performance and low cost to the consumer market. Future R4300i family members are planned to increase the options available to systems developers. The R4300i has been designed in well-defined basic blocks to simplify implementation of the R4300i core logic in new products. For example, by removing the caches, memory-management unit, and system interface, a high-performance RISC core is available for integration in a derivative processor or ASIC. Packaging and Design Support The R4300i will be made available in a single 120-pin PQFP package. The 120-pin plastic quad flat pack (PQFP) offers a low-cost package for surface-mount assembly, decreasing processor cost further to benefit embedded applications, and with a low profile suitable for consumer applications. Future Upgrades It is anticipated that higher frequency versions of the R4300i will become available in the future. These will provide an extra performance boost at the same low price points as the current R4300i. Chapter 3. Implementation The R4300i differs from the R4000 family in four main categories. These categories, discussed in the next sections, are:- ˙˙˙[*]Power reduction features ˙˙˙[*]Power management features ˙˙˙[*]Cost reduction features ˙˙˙[*]Architectural optimization ˙˙˙[*]System bus interface These features combine to achieve typical power dissipation of 1.8 watts, a reduced power mode dissipation of 0.4 watts, and a power- down mode where the processor is turned off. These features also allow a die size of less than 7mm on a side, while maintaining full 64-bit operation. Power Reduction Features The R4300i is designed using low-power design techniques. These are techniques that reduce power dissipation while running standard tasks. Examples of low-power design techniques, discussed below, are: ˙˙˙[*]3.3-Volt operation ˙˙˙[*]Dynamic logic design ˙˙˙[*]Cache bank partitioning ˙˙˙[*]Write-back data cache ˙˙˙[*]Cache prefetching ˙˙˙[*]Micro TLB 3.3-Volt operation The R4300i was designed for operation at 3.3V to reduce power consumption. CMOS power dissipation increases with the square of potential difference between power (VDD) and ground (VSS). At lower voltage levels, however, the threshold voltage at which a logic signal switches between zero to one changes. Redesign of the gate's physical width-to-length ratio is necessary to maintain logic speed at lower voltages; that slightly increases total power dissipation. In sum, reducing the rail-to-rail voltage difference by 1.7 volts reduces overall comparative power dissipation to about 70%. A lower rail-to-rail (VDD to VSS) voltage difference also increases the chip's sensitivity to electrical noise, as there is a smaller noise margin around the threshold voltage. MTI designed the R4300i interface using low-voltage CMOS (LVCMOS) characterization to ensure noise immunity and reliable signal operation. Dynamic logic design The R4300i uses dynamic rather than static logic design to reduce transistor count and power dissipation. The R4300i 's power management functions make the dynamic logic design approach suitable for use in consumer applications. Cache bank partitioning Most instruction and data cache accesses exhibit spatial and temporal locality of reference. In the R4300i, the instruction and data caches are each split into four banks; only one of the four banks in the instruction or data caches is powered up at any one time. This saves power on every cache access cycle. There is no performance degradation on cache bank misses; enabling of cache banks is entirely transparent to system operation. Write-back cache The R4300i uses a write-back policy for write operations. In a write- back policy, data in the cache is written to the main memory only when the cache line is replaced. Cache lines can be replaced whenever new data from a different address is to be loaded into the cache or if the cache line is invalidated. A write-back cache policy reduces store activity on the system bus, which improves system performance and simplifies memory subsystem design. Cache prefetching As instruction reads are usually sequential, the R4300i fetches two consecutive 32-bit words (instructions) every time it reads the instruction cache. Dual instruction access from the cache reduces the frequency of cache enabling for instruction access, which reduces power dissipation. TLB and Micro Instruction-TLB The memory management unit (MMU) translates virtual addresses to physical addresses by looking up address correspondences in a "page table". The processor maintains a complete page table in main memory, but accesses to main memory are slow. The translation lookaside buffer (TLB) keeps copies of page table entries on-chip, which accelerates virtual-to-physical address translation. The TLB structure is large and consumes power when enabled. The R4300i therefore includes a "micro TLB" on chip which contains page table entries for the two most recently referenced instruction pages. Power Management Features The R4300i also has built-in power management features in addition to power reduction features. Power management is used when peak performance is not required and it allows the processor to operate in different modes. These modes require less power and therefore reduce average power consumption over a period of time. The power management modes , discussed below, are:- ˙˙˙[*]Standard operating mode ˙˙˙[*]Sleep mode ˙˙˙[*]Power-down mode Standard operating mode In this mode the processor operates at a maximum of 100 MHz pipeline speed and 50 MHz external interface speed. Power dissipation at maximum frequency in this mode is estimated at 1.8 watts. Sleep mode This feature allows the processor to change dynamically to one quarter of the normal speed. For example, if the pipeline operates normally at 100 Mhz, it would operate at 25 Mhz in sleep mode. Typically, chipset logic triggers this mode when it detects no user activity over some pre-determined amount of time(such as between keystrokes or mouse movements). In reduced power mode, power dissipation falls to 0.4 Watts, one quarter of the normal. Power-down mode In the R4300i, all variable registers are both readable and writable. On power-down, the state of the processor can therefore be written to non-volatile RAM. On power-up, the registers can be restored to the same state. This "instant-on" capability allows the processor to be activated in milliseconds, instead of the many seconds normally required to boot an operating system. This not only reduces power consumption but is a power saving benefit for consumers. Cost Reduction Features The R4300i is designed for low cost. The main areas contributing to microprocessor cost are packaging, test and assembly, and die costs. Packaging cost reduction For low cost applications, the R4300i will be available in a 120- pin plastic quad flat package (PQFP). High-performance microprocessors normally require expensive ceramic or metal packages with enhanced thermal dissipation because of their higher power requirements. The reduction in the system bus width from 64 bits to 32 bits and elimination of some control signals resulted in a low pin-count package offering. Lower power dissipation facilitated the use of plastic packages. Test & assembly cost reduction The R4300i implements column redundancy in both instruction and data caches. Column redundancy reduces the chip's sensitivity to defects in the caches. The test process first tests the integrity of each bit column in the cache; polysilicon fuses in the chip are then blown to swap redundant bit columns for defective bit columns. Column redundancy increases the die yield at the test stage, which in turn decreases testing costs for each good die. Die cost reduction The die area was reduced by the following techniques: ˙˙˙High-density 0.35 micron design rules ˙˙˙Use of 4-transistor RAM cells in the caches ˙˙˙Unified CPU/FPU data path ˙˙˙Reduced configurability ˙˙˙Optimized cache and TLB size The last three cost reduction strategies fall in the category of architectural optimizations, discussed below. Architectural Optimizations The R4300i fully implements the current ISA, MIPS-III standard. As in all the MIPS R3000 and R4000 processors, an on-chip CP0 coprocessor contains an MMU for virtual address translation and exception processing control. The system address/data bus interface is different from the R4000 series processors. The R4300i has a 32-bit multiplexed address/data bus and a 5-bit command bus. A flush buffer, which was added to the R4400 for increased graphics performance, is retained in the R4300i. The R4300i also includes on-chip clock frequency division circuitry to support internal 100-MHz operation from an external 50-MHz clock. The R4300i has the option of operating internally at 1, 1.5, 2 or 3 times the frequency of the external clock. In addition, the R4300i eliminates RClock that existed in the R4000. The output clocks SyncOut and TClock can be turned off for power savings. This allows high microprocessor performance and also simplifies system design. The differences from the R4000, discussed below, are: ˙˙˙[*]Combined integer/floating-point data path ˙˙˙[*]Optimized 5-stage pipeline ˙˙˙[*]Optimized cache and TLB size ˙˙˙[*]Reduced physical address space ˙˙˙[*]Additional instruction trace support ˙˙˙[*]Simplified processor initialization Unified integer/floating-point data path The R4300i's integer unit shares its data path with the FPU unit. CPU and floating-point instructions are both executed in the same 5-stage pipeline. This represents a considerable saving in die area and corresponding power dissipation. Optimized pipeline Pipelining allows multiple instructions to overlap during execution for greater throughput. All processor operations require five basic operations: instruction fetch, instruction decode, instruction execution, accessing data and writing of the results. These operations can be split up over a pipeline so that several instructions can be treated concurrently: while one instruction is being decoded, another can be fetched, and so on. The R4300i uses a five-stage pipeline instead of the eight-stage pipeline found in the R4000 series processors. The eight-stage pipeline allows the R4000 series processors to reach higher speeds. However, the shorter pipeline achieves greater efficiency at a given speed than its longer counterpart. Loads, stores, jumps and branches are resolved in fewer cycles, and exception processing is simplified. Less control logic means reduced die area. Optimized cache and TLB size The R4300i uses separate instruction and data caches. Instruction cache size impacts performance to a greater extent owing to locality of instruction code. The combination of 16-Kbytes instruction cache and 8-Kbytes data cache gives optimum performance for a fixed total cache size. The on-chip TLB is also reduced from the 48 entry-pairs in the R4000 to 32 entry-pairs. These sizes have been selected after extensive simulation to give the R4300i the best trade-off between high performance and small die area. TLBs are very critical in implementing virtual memory systems. In consumer applications such as settops, the existence of TLBs allow for implementation of security as well as fast context switching times when running multiple processes. In addition, virtual page sizes from 4KB, 16KB, 64KB, 256KB, 1MB, 4MB, and 16MB are supported just as in the R4000 series processors. Reduced physical address space The physical address space has been reduced from 36 bits to 32 bits. This still supports a physical address range of 4 Gbytes, more than enough for consumer applications. Additional instruction trace support The R4300i includes an additional debugging mode called instruction trace support. This mode lets the user find out the physical address to which the CPU has branched or jumped whenever a branch, jump, or exception is taken. Simplified processor initialization Processor initialization has been simplified, reflecting the lower degree of configurability. The R4300i has two hardware pins for configuration during the reset initialization sequence and a configuration register in CP0 is used to set other options such as data rate or endianess. Chapter 4. Benefit Summary The main benefits of the R4300i, discussed below, are: ˙˙˙[*]Price/Performance ˙˙˙[*]Low Power ˙˙˙[*]Low Cost ˙˙˙[*]Compatibility Price/Performance The R4300i dramatically improves price/performance for both system designers and end-users. The R4300i achieves price/performance over ten times better than existing microprocessors. The R4300i will be the first commercially available processor to deliver less than $1/SPECint mark in 1995. Low Power The R4300i was specifically designed for high-performance consumer applications by including reduced power and power management features. ˙˙˙Reduced-power features are those that perform traditional tasks ˙˙˙in a way that reduces power consumption. ˙˙˙Power management features are architectural enhancements that ˙˙˙further reduce internal and system-wide power consumption through ˙˙˙switching modes. The combination of reduced-power and power-management features allows the R4300i to fit in an inexpensive plastic package. Low Cost In practice, good price/performance usually means increased performance at a given price point. The R4300i, in contrast, brings existing high-performance levels to an unprecedented low price-point. System manufacturers can decrease component costs, thereby increasing margins, or offer their products at a lower price, thereby stimulating demand. Price-sensitive consumer applications such as games and interactive television benefit particularly from a low cost, high-speed microprocessor. Factory automation and robotics are also likely applications for the R4300i. Compatibility Software compatibility is a fundamental requirement to preserve investments in software over time. All MIPS processors maintain software compatibility. A program compiled and linked to run on the R3000 processor will run on the R4300i. Appendix A. Glossary Cache. An on-chip temporary storage area containing a copy of main memory fragments. Cache access is much faster than main memory access. CISC (Complex Instruction Set Computing). A design approach that attempts to achieve performance gains with complex instruction and data types and hardware controlled memory management. CPU (Central Processing Unit). The part of a microprocessor where the majority of the instructions are executed. Die. The silicon chip after it has been cut from a wafer and before it has been packaged. Flush Buffer (also called a write buffer). The flush buffer is a temporary storage location for data that is being written from the pipeline or cache to main memory. The flush buffer allows the processor to continue executing instructions while data is being written to main memory. FPU (Floating-Point Unit). Dedicated logic to accelerate calculations using floating-point numbers. IU (Integer Unit). The part of a CPU that performs calculations using integer arithmetic. LVCMOS (Low-voltage CMOS). An IEEE standard for low-voltage logic design. MMU (Memory Management Unit). That part of a microprocessor which implements virtual-to-physical address translation and the memory system hierarchy including cache memory. MTI (MIPS Technologies, Inc.). The developer of the MIPS RISC architecture, the leading RISC architecture worldwide. Page Table. An area of main memory containing sets of virtual addresses with their corresponding physical addresses and protection data. RISC (Reduced Instruction Set Computing). A design philosophy that avoids implementing complex functions in silicon but realizes large performance increases through executing simpler, standardized instructions at faster, more efficient rates. Pipeline. A mechanism to allow multiple instructions to overlap during execution for greater throughput. A five-stage pipeline offers peak performance five times that of a non-pipelined processor. PQFP (Plastic Quad Flat Pack). A plastic package with pins on the four edges, cheaper than a CPGA. TLB (Translation Lookaside Buffer). An on-chip "page table" cache containing copies of the page tables used by the MMU for virtual- to-physical address translation. 64-bit Processor. A processor in which all address and data paths are 64 bits wide. Leading-edge applications require 64-bit processors today. 32-bit capability in a 64-bit processor is important to manage the smooth transition from 32 to 64 bits. The MIPS R4000, R4400 and R4300i MPUs can run in either 32-bit or 64-bit mode. Appendix B. Bibliography The following related documents are available from MIPS Technologies, Inc. MIPS R4400 Microprocessor, Technology Backgrounder. MIPS Technologies, Inc., Corporate Backgrounder. NEC Corporation, Corporate Backgrounder. R4000 / R4400 User's Manual, [Prentice Hall]. MIPS RISC Architecture, Kane & Hall [Prentice Hall]. Contact MIPS Technologies, Inc. for a more complete list of publications available on RISC technology and the MIPS architecture. MIPS Technologies, Inc. Information Service: 1-800- I GO MIPS or 1-800-446-6477 Inside the US 415-688-4321 Outside the U.S World Wide Web: URL to [*]http://www.mips.com MIPS, the MIPS Technologies logo, and R3000 are registered trademarks, and R2000, R4000, R4400, and R6000 are trademarks of MIPS Technologies, Inc. Windows NT is a trademark of Microsoft Corp. --------------------------------------------------------------------------- HTMLCon Conversion Statistics and Information --------------------------------------------------------------------------- Total bytes processed .......... 26394 Total lines processed .......... 665 Total links processed .......... 29 Total images processed ......... 0 Total strings replaced ......... 0 Total filters used ............. 10 Were references preserved ...... FALSE Rough formatting preserved ..... FALSE Space compressed used .......... TRUE Rough line break used .......... 72