August 5, 1997.
11/71 | Intel announces the 4004, the first microcomputer chip. It consists of 2300 transistors running at 108 kHz, giving a processing power of 0.06 MIPS. The manufacturing process is at 10 micron. |
---|---|
04/74 | Intel introduces the 8080, consisting of 6000 transistors running at 2 MHz, giving 0.64 MIPS. That increase in speed is allowed by a manufacturing process at 6 micron. The 8080 is at the core of the first personal computer, the Altair. |
06/78 | Introduction of the 8086, featuring 29,000 transistors. The first version runs at 4.77 MHz, giving 0.33 MIPS, but later revisions go up to 8 MHz (0.66 MIPS), then 10 MHz (0.75 MIPS). |
06/79 | Introduction of the 8088, similar to the 8086, except for an 8-bit external bus (instead of 16 bits for the 8086). This is the first popular CPU in the PC, then the PC/XT. |
02/82 | Intel introduces the 80286, changing the PC/XT into a PC/AT. Its 134,000 transistors at 1.5 micron run at 6 MHz (0.9 MIPS), then 10 MHz (1.5 MIPS), and finally 12 MHz (2.66 MIPS). The external bus is again up, to 16 bits. |
10/85 | Introduction of the 80386DX, running at 16 MHz (5 to 6 MIPS).
The 3rd generation chip includes 275,000 transistors at 1.5 micron. The bus width,
as well as the internal architecture, is now 32 bits. This is the first CPU that
allows programmers to create a series of virtual x86 machines, the first step towards
real multi-tasking.
In february 1987, it is followed by a version at 20 MHz (6 to 7 MIPS), and another at 25 MHz (8.5 MIPS) in april 1988. The latest version is available on the day when Intel also introduces the 486, the 4th generation x86 processor, in april 1989; that version is clocked at 33 MHz (11.4 MIPS). |
1988 | Harris introduces the 80C286 running at 16 and 20 MHz. The performances are comparable to those of a 80386 at the same frequency. |
06/88 | Introduction of the 80386SX, running at 16 MHz (2.5 MIPS). This chip is similar to the 386DX, except for a narrower 16-bit external bus. The chip is targeted to the low-end market. Higher frequency versions follow in january 1989, at 20 MHz (2.5 MIPS), then 25 MHz (2.7 MIPS). Finally, a 33 MHz (2.9 MIPS) revision is introduced in october 1992 to counter AMD chips. |
04/89 | Intel announces the 486DX, now with over 1 million transistors,
at 1.2 million transistors (1 micron). For the first time, the chip includes a math coprocessor
unit and an 8K internal L1 cache. It's also the first chip to process the instructions in
several stages, in a pipeline. The performance at 25 MHz is roughly 50 times that of a
8088 (20 MIPS).
Faster versions run at 33 MHz (27 MIPS) in may 1990, then a new progress in the manufacturing process (down from 1 to 0.8 micron) allows for higher clock rates: 50 MHz (41 MIPS) in june 1991. |
10/90 | Intel introduces the 386SL, the first CPU specifically designed for portables. Running at 20 MHz (4.21 MIPS), it consists of 855,000 transistors. It is comparable to the 386SX, with a 32-bit internal architecture but a 16-bit external bus. It is highly integrated, including the cache as well as the bus and the memory controllers. Later, Intel introduces a 25 MHz (4.21 MIPS) chip in september 1991. |
03/91 | AMD introduces the Am386, challenging the 386DX from Intel. The chip runs at frequencies going up to 40 MHz, making of it an appealing solution for the low-end market, though the power users already enjoy the 486 for two years now. The chip includes 200,000 transistors at 0.8 micron. Intel 386 is now using a 1 micron manufacturing process, and stops at 33 MHz. |
04/91 | Introduction of the 486SX running at 16 MHz (13 MIPS). This is a copy of the 486DX chip, but without math coprocessor. The bus width is 32 bits, as for the 486DX. It features 1,185,000 transistors at 1 micron. The manufacturing process is improved to 0.8, then 0.6 micron. As the 486DX, it uses an 8K internal L1 cache. Later versions run at 20 MHz (16.5 MIPS), then 25 MHz (20 MIPS) in september 1991, and 33 MHz in september 1992. |
03/92 | Intel announces the 486DX2, the first chip that uses a core frequency different from the external frequency: 50 MHz in the CPU, 25 MHz on the bus. It consists of 1.2 million 0.8 micron transistors. Its performance reaches 41 MIPS. It is rapidly followed by another version, running at 66 MHz, 33 MHz on the bus (54 MIPS) in august 1992. |
1992 | Cyrix, once a math coprocessor maker that was founded in 1988, releases its version of the 486. For the production facilities, Cyrix relies on a foundry that belongs to IBM. |
11/92 | Intel introduces the 486SL, the 486 targeted towards portables. It is first running at 20 MHz (15.4 MIPS), then 25 MHz (19 MIPS), and finally 33 MHz (25 MIPS). The manufacturing process involves 1.4 million transistors at 0.8 micron. |
03/93 | Intel introduces the Pentium 60 and 66 MHz, both on the same date
(100 and 112 MIPS resp.) This is the first superscalar chip, meaning it can process
two instructions in parallel. The number of transistors reaches now 3.1 million, at 0.8 micron
(BiCMOS). The bus is now 64 bits for data, and 32 bits for addresses; the internal
architecture is 32 bits. External speed is at 60 or 66 MHz. The L1 cache is 8K for instruction,
8K for data. The Pentium is a significant step forward from the 486, providing a two times
better performance for integer processing, and a huge fivefold improvement for floating point
calculations. It is also the first chip to require efficient cooling (the Pentium 66
has a mean power consumption of 13W, while peaks reach 16W). Although, later, lower voltages
alleviated the problem, heating concerns have never disappeared since that time.
Intel breaked the logic behind the names 286, 386,... because it couldn't enforce patents put on names that are simply numbers. Now, the chip is called Pentium, or P5 (its former code-name). Later, Intel introduces on the same day (in march 1994) the Pentium 90 and 100 (resp. 149.8 and 166.3 MIPS), with 3.2 millions transistors at 0.6 micron (BiCMOS). The power supply goes down from 5V to 3.3V. |
04/93 | One month after the release of the Pentium, AMD launches the Am486, using 1 million transistors at 0.7 micron on 3 layers. That process will be enhanced several times, first to 0.5 micron, then 0.35 micron, allowing for higher clock rates. The Am486 will go up to 120 MHz (with a 3 times multiplier). Intel stopped the evolutions of the 486DX4 at 100 MHz, which again makes it an interesting solution for the low-end market. The L1 cache is 8K large. |
03/94 | The 486DX4 uses the same technique as the DX2, but this time, the bus frequency is multiplied by 3 inside the microprocessor. The first version runs at 75 MHz (core frequency), giving 53 MIPS. It is followed by a 100 MHz version. The chip uses 1.6 million transistors at 0.6 micron. To limit the memory bottleneck at those frequencies, the L1 cache is doubled, to 16KB. |
10/94 | The Pentium 75 (126.5 MIPS) follows the previous Pentiums. It features 3.2 million transistors, and is manufactured with a 0.6 micron process. This is a low-cost and low-power alternative to higher-end Pentiums. The multiplier is at 1.5, meaning the external speed is at 50 MHz. |
03/95 | The Pentium 120 reaches now 203 MIPS. Later, the manufacturing process goes from 0.6 to 0.35 micron. This allows for higher frequencies, and, in june 1995, Intel announces the Pentium 133 MHz (218.9 MIPS), including 3.3 million transistors. Both use a multiplier by 2 (resp. 60 and 66 MHz external frequency). In january 1996, the multiplier goes even higher, to 2.5, for two new versions, running at 150 and 166 MHz core frequency. Five months later, the multiplier climbs to 3, giving a core frequency of 200 MHz. |
06/95 | Cyrix announces its 5x86, a 4th generation chip, first running at 100 MHz. A later version, released in october 1995, is running at 120 MHz and has performances that are comparable to those of a Pentium 90. The L1 cache is a unified 16K cache. |
10/95 | Cyrix introduces the 6x86, first running at 100 MHz.
To better define its performance, Cyrix, with AMD, IBM and SGS-Thomson, define the so-called
P-rating, that measures their speed as the frequency of the Pentium that has the same
performances. The 100 MHz 6x86 is then called a P120+. The L1 cache is a unified 16K cache. In february 1996, the 6x86 range is completed by versions running at 110 MHz (P133+), 120 MHz (P150+) and 133 MHz (P166+). In June 1996 - the month when Intel releases its P200 - a P200+ (150 MHz - bus speed at 75 MHz) is added to that range. Later, that CPU range is enhanced by the 6x86L, a low power version of the chip. |
11/95 | Intel introduces the Pentium Pro processor, its 6th generation chip.
This is the first x86 CPU to use a superscalar architecture with a RISC core (explanation
below).
For the first time, the chip includes an internal L2 cache, which is clocked at the same
frequency as the core of the CPU. With an internal frequency of 150 MHz (256K L2 cache),
166 MHz (512K L2 cache), 180 MHz (256K L2 cache) or 200 MHz (256 or 512K L2 cache), the
chip is a dual cavity PGA, which uses 5.5 million transistors in the CPU cavity, and
another 15.5 millions in the 256K L2 cache cavity or 31 millions for 512K L2 cache versions.
The L1 cache is still 8K for instructions, and 8K for data.
The CPU itself is manufactured at 0.6 micron at 150 MHz, or 0.35 &mu at higher frequencies;
the cache is manufactured at 0.6 micron for the 256K versions, and at 0.35 micron for the
512K versions. The bus is 64 bits, plus 64 bits to the L2 cache. The external bus runs at
60 or 66 MHz (multiplier 2.5 or 3).
The chip is mainly optimized for 32 bit applications, which, at the time the PPro was designed, Intel planned to be ubiquitous. For 16-bit applications, the PPro performs slightly slower than Pentium CPUs running at the same core frequency. |
12/95 | The Am5x86 is a redesign of the Am486, but using the same architecture. This is in fact a 4th generation design that delivers Pentium 75-like performance. The core frequency is at 133 MHz, with a 4 times multiplier. The L1 cache is now a 16K unified cache. However, it comes much too late to get the market attention. Besides, the chip is long to come to mass production. |
05/96 | AMD releases a full 5th generation design, a 4-issue superscalar chip with a RISC core,
featuring the same kind of enhancements that were introduced by Intel in its Pentium
Pro: out-of-order execution, speculative execution with branch prediction, register
renaming,... It features a 16K L1 cache for instructions, and 8K for data. The chip
uses 4.3 million transistors at 0.35 micron. The K5 include versions with ratings of 75,
and 90 MHz, then 100 (Q3'96), 120 and 133 (Q4'96), and finally 166 MHz (Q1'97). However, it's already late, and other competitors like NexGen or Cyrix have already working alternatives for a long time. The chip is not really a success. |
01/97 | Intel introduces 57 new intructions in the x86 instruction set, providing a Pentium CPU with MMX Technology. The purpose of that extension is to speed up calculations typically found in multimedia applications. The number of transistors climbs to 4.5 millions (0.35 micron CMOS). The internal L1 cache is doubled from the Pentium, from 16K to 32 K. The first versions of the processor include a 166 and a 200 MHz chip. Unfortunately, Intel could not introduce the chip - which is mainly targeted to the home market - for Christmas last year. A later revision, available in June 1997, is running at 233 MHz. |
02/97 | AMD introduces a 6th generation chip, the K6, which features MMX extensions, and which elaborates on the RISC experience AMD had with the K5. The chip includes a whopping 8.8 million transistors at 0.35 micron. The L1 cache is 32K for instruction, and 32K for data. The chip has performances which are comparable to those of the Pentium-II - which is not yet available - and is therefore promised to a nice success, also because of the pricing policy of AMD, which claimed it would sell it for 25% less money than its Pentium-II counterpart. |
05/97 | Intel merges the Pentium with MMX Technology and the Pentium Pro to create the Pentium-II, a better Pentium Pro, with the MMX set of instruction, and optimized both for 16- and for 32-bit applications. 3 versions are announced at the same time, 233, 266 and 300 MHz, though the mass production of the 300 MHz version begins only during Q3'97. The chip features 7.5 million transistors at 0.35 micron with a 512K L2 cache. The L1 cache is doubled from the Pentium Pro, from 16K to 32K (16K for instructions, 16K for data). The bus width is 64 bits (with ECC), plus another 64 bits to the L2 cache (optionnally with ECC). For the first time, the CPU is packaged in the SEC, the Single Edge Contact cartridge, which Intel plans to use as well for future CPUs. |
05/97 | The same month, Cyrix introduces the 6x86MX, adding MMX extensions to the 6x86. It also quadruples its cache, with a 64K unified L1 cache. The manufacturing process is at 0.35 micron, with core frequencies at 150 (PR-166), 166 (PR-200) and 188 MHz (PR-233), with bus speeds at 60, 66 and 75 MHz resp. |
98 | In the future, higher frequency revisions of the Pentium-II may become available. Later, Intel will introduce the 7th generation x86 chip, which it currently develops with HP. |
Current situation
Intel | AMD | Cyrix | ||||
---|---|---|---|---|---|---|
Pentium Pentium MMX | | 6x86 | ||||
| AMD K5 | 6x86MX | ||||
Pentium Pro Pentium II | AMD K6 | |||||
Legend | ||||||
CPU CPU |
Standard set of instructions MMX enhanced CPU | |||||
|
|
A good old veteran - the Pentium CPU The Pentium is the first Intel chip to be superscalar, with two 5-stages pipelines; this means the CPU is able to execute simultaneously two integer instructions or one floating point instruction. The data bus is 64 bits wide. The processor contains a 2-way set associative 16K (8K for data, 8K for instructions) L1 cache. To reduce latencies induced by cache misses, the Pentium design includes dynamic branch prediction. Instructions are loaded in two prefetch buffers (one buffer using the branch prediction algorithm and another buffer for instruction loaded in a linear order). Introduced in march 1993 and due to retire at this year end, the Pentium CPU, once at the high-end of the market, looks now like a good old veteran. Even more, with the latest price cuts of Intel, that set the MMX CPUs at the same price as the non-MMX enhanced Pentium chips running at the same frequency, that chip seems even more old-fashioned. For the record, the chip exists in frequencies going from 60 to 200 MHz (60, 66 MHz run at 5V, while 75, 90, 100, 120, 133, 150, 166 and 200 MHz run at a cooler 3.3V). | ||||||||||||||||||
The Cyrix 6x86 (formerly code-named M1) The 6x86 is essentially Cyrix's counterpart of the Pentium. It is a two-way superscalar unit, using register renaming, out-of-order completion, data dependency removal, branch prediction and speculative execution (see the glossary below). Featuring a 16K unified L1 cache, it runs at frequencies from 100 MHz, introduced in october 95, up to 150 MHz (june 96). It is pin-compatible with the Pentium MMX processor, using a socket 7 architecture. Beware, though, that the clock multiplier is fixed at 2, and that you need therefore a motherboard that supports 75 MHz to use a 150 MHz 6x86 (P200+). That family has been recently enhanced by the 6x86L, a low-power version of the CPU. Thanks to the use of advanced techniques, the 6x86 performs better than a Pentium running at the same frequency. In order to define precisely its performance, Cyrix, with AMD, IBM and SGS-Thomson created the P-rating, a rating based on real-world applications that measures the speed as the frequency of the Pentium that has the same performances. The table of the P-ratings as a function of the physical core frequency is as follows:
|
The Pentium with MMX extensions The "Pentium CPU with MMX extensions" - the official name for the much touted Intel new version of the 5th generation chip - was the first CPU to include MMX extensions, the largest evolution in the x86 family since the 386, according to Intel. Besides the MMX extensions (refer to the side bar for a more detailed explanation), the Pentium with MMX extensions also features a larger 4-way set associative L1 cache (32K - 16K for instructions, 16K for data), which improves its speed by about 10% when compared to a standard Pentium chip at the same frequency. When using MMX enabled applications, the speed improvement is typically around 60%.
From the point of view of the microarchitecture of the CPU, the Pentium MMX also adds an MMX processing
unit next to the two integer units and the floating point unit already found in the standard Pentium.
It is also superscalar of degree 2. Other changes include larger write buffers for better performance
of the memory, and a better branch prediction algorithm, with 4 prefetch buffers. The Pentium
MMX is a split voltage design (2.8V for the core, 3.3V for the I/Os) that fits in a socket 7 architecture.
|
What is MMX exactly ? The MMX acronym stands for Matrix Manipulation eXtensions. These extensions correspond to a set of 57 new instructions first introduced by Intel in january this year to enhance the performance of multimedia and graphic intensive applications. One of the core technologies used in MMX is SIMD (single instruction multiple data). The idea behind that name is to apply the same instruction in parallel to several data, within the same execution unit. To provide large enough registers, the MMX instructions use the 80-bit floating point registers, of which they use 64 bits. To fill those 64 bits, MMX also provides four new 64-bit data types (corresponding to 8 8-bit data, 4 16-bit data, 2 32-bit data or 1 64-bit data). Although MMX uses the floating point registers, this set of instructions are integer instructions, which means the CPU can calculate as much MMX instructions as it can do for integer instructions (2 for Pentium, 3 for the Pentium-II). For example, let's suppose it fetches 4 16-bit data in one of those registers, it can then multiply the content of that register by another number, and accumulate the results (that's a convolution actually), all in a single clock cycle. From now on, all the new designs from Intel will incorporate the MMX features. Other chips providing MMX extensions include the K6, and the 6x86MX. The main group of applications that take advantage of the MMX extensions up to now are games, and some video-conferencing or image retouching packages. You can have a list of such applications by looking at the site of Intel (link available in the side bar about related stories). |
|||||||||||
The Cyrix 6x86MX (formerly code-named M2) Launched on May 30, 1997, the 6x86MX, like the K6, relies on the ubiquitous socket 7 architecture. It adds 57 MMX instructions to the former generation, the 6x86 chip, thereby qualifying it as an MMX-enhanced CPU. When compared to the 6x86, the 6x86MX is a split voltage design that also adds a more flexible multiplier, with settings at 2, 2.5, 3 and 3.5 (the only value used up to now is 2.5). It features a huge 64K unified L1 cache. Unlike other recent processors, the 6x86MX does not split the x86 instructions into small RISC-like instructions prior to processing. However, it does use advanced features like a superscalar architecture, with register renaming, data dependency removal, multi-branch prediction, speculative execution, superpipelining and out-of-order completion, exhibiting performances comparable (the so-called PR-rating) to Pentium-IIs running at 166, 200 or 233 MHz. A PR-266 is expected to be introduced in Q4'97. Those chips rely on a 0.35 micron process, which Cyrix expects to replace by a 0.25 micron process beginning of next year, allowing for a PR-300 version in the same timeframe. However, with only 35,000 units sold in Q2'97, the chip shows only a limited availability up to now. A PR-233 is expected to be priced around 17500 BEF (±13000 BEF for a PR-200) when it will become readily available. The table of the PR-ratings is as follows:
|
Glossary
|
The Pentium Pro Originally defined as a chip that should surpass the Pentium while using the same manufacturing process, the Pentium Pro is the first chip that implemented a series of innovative features, like a 3-way superscalar RISC-like core architecture, with dynamic execution. The purpose of dynamic execution is to preserve an improvement when adding a third execution pipeline to the two already present in the Pentium, by preventing pipeline stalls. The three execution pipelines consist each of 12 decoupled stages. At the microarchitectural level, a unit is provided that decodes 3 x86 instructions into RISC-like instructions, the micro-operations, to be processed by an out-of-order core. The core of the CPU consists of 6 processing units (2 integer units/1 floating point unit/1 jump unit and 2 address generation units), which are controlled by a scheduler able to simultaneously dispatch up to 5 micro-op from the Re-Order Buffer to a processing unit, and retire up to 3 micro-operations. The scheduler chooses micro-ops that are ready to be processed (i.e. of which the operands are loaded), taking into account the data dependencies and the idle processing units. Another key element to dynamic execution is speculative execution: when a cache miss occurs, the processor already executes the next instructions, without waiting for the cache miss to be serviced. The temporary results from speculatively executed instructions are also stored in the Re-Order Buffer, where they are forwarded to other instructions in the instruction pool via the bus interface unit. If the results of the branch prediction algorithm were wrong, temporary results in the ROB are discarded, and the execution resumes at the new address. The accuracy of the branch prediction algorithm, which defines the frequency of the correct predictions, is more than 90%, according to Intel. Next the retirement unit copies definitively the results of (at most 3) completed instructions residing in the ROB to memory, in an in-order sequence. Considering the dramatical impact of memory accesses on the overall performance, the Pentium Pro designers added an internal 256 or 512K L2 cache inside the CPU, running at full core speed. However, this design makes the Pentium Pro an expensive chip (think of the price tag of a 512K L2 cache version compared to a 256K chip and you understand how costly a fast cache can prove). The cache is serviced by an independent bus (DIB architecture). Next to the internal L2 cache, the Pentium Pro provides also a more conventional 8+8K L1 cache. Well known for its relatively poor performance in 16-bit applications (at the timeframe of the Pentium Pro design, Intel couldn't know most of us would still live in a 16-bit world today, and hence decided to produce a chip mainly targeted to 32-bit applications), the Pentium Pro is still - until the Deschutes becomes available in Q2'98 - the best choice for server applications, for 32-bit operating systems (one of the reasons being its support for up to 4 glueless parallel processors). The Pentium Pro fits in a socket 8 motherboard and runs at frequencies ranging from 150 to 200 MHz, with 256, 512 or 1024K L2 cache (not all combinations are available). |
The Pentium II (formerly code-named Klamath) The ideas at the core of the Pentium-II design are threefold:
|
Glossary - Dynamic Execution
|
|||||||||||||||||
The AMD K6 After the disaster of its 5th generation chip, AMD bought NexGen - that also released its own design of a 5th generation chip - and hired the ex-Pentium project head at Intel for the K6 design team. They produced an efficient 4-issue superscalar design with a RISC core, which AMD claims it will always sell for 25% less money than corresponding alternatives from Intel. The chip, equipped with a huge 64K L1 cache (32K for instructions, 32K for data) is manufactured in a 5 layer 0.35 micron process. It is compatible with the now venerable socket 7 architecture, the most efficient design that does not imply new development costs for motherboard manufacturers, according to AMD. Unfortunately, that design still relies on external L2 cache, typically accessed via the system bus at 66 MHz. However, don't be fooled by that design, its performance range makes it definitely a (good) competitor to the Pentium II. Tests conducted in house showed its performance to be almost twice that of a standard Pentium CPU running at the same core frequency, and to be only slightly under those of the Pentium-II. Also, although the name "K6" doesn't state anything about MMX - Intel sued AMD for using the MMX moniker, but the court finally stated that AMD should be allowed to use the "MMX" name, provided it is specified to be a trademark from Intel - the K6 features actually MMX extensions. Like the K5 and the Pentium Pro, the K6 first splits the x86 instructions in series of small RISC-like instructions - which AMD calls RISC86, the equivalent of the "micro-operations" in the Pentium Pro - prior to processing. This is done by means of a series of dedicated decoders, with the help of the predecoding information (the predecoding information includes among other things the length in bytes of the x86 instructions, allowing the decoding of several of them in parallel). These RISC86 instructions are then processed by a RISC core with techniques like multi-level branch prediction, speculative execution, register renaming, out-of-order execution and data forwarding. This core consists of 7 processing units (load/store/2 integer units/1 FPU/1 Multimedia unit for MMX instructions and 1 branch unit), which are controlled by a scheduler able to issue up to 6 and retire up to 4 RISC86 instructions per clock cycle, making it a 4-issue superscalar chip. The branch prediction unit uses a 8,192 entries branch history table to predict jumps, yielding correct predictions over 95% of the time. The target adresses would consume too much memory to be stored, and are instead calculated on the fly by special decoding units. The instructions located at the target locations are stored in 16 16-byte buffers, providing for frequently accessed instructions at least one complete x86 instruction to the decoding logic without waiting for an external memory (either another cache or main memory) access. Like the Pentium Pro, the retirement of instructions is always made in-order. Like other manufacturers, AMD is building a new silicon wafer fab to support a 0.25 micron process in the near future, allowing for higher clock rates. 100 MHz external bus speed support is also expected in the near future. |
The war in the CPU arena is more intense than ever. Sure, Intel has already been challenged many times
in the past with CPUs from AMD, Cyrix, or others, but for the very first time, AMD has now both a
competitive CPU at the high-end of the market, and a large production capacity to back the demands up.
Besides, AMD announced it will always sell that chip at 25% less money for the same performance as
equivalent Intel chips. In that world with slumping prices, it might be able to shoot real bullets,
this time. Analysts foresee that AMD could take a market share of up to 30%, if Intel decides not to
sacrifice part of its benefice to save its market shares.
Cyrix also seems to have a new appealing offering, with its 6x86MX.
Be sure you seize that opportunity...