Low End Mac's Online Tech Journal

Macintosh CPUs

Part 2, PowerPC

601 / 603 / 604 / G3 / G3+ / G4 / G4+

The biggest change in the Apple product line was the transition to the PowerPC (hereafter PPC) family of CPUs. Designed by a consortium of Apple, IBM, and Motorola, the PowerPC quickly became the most popular RISC processor on the market with the introduction of the Power Macintosh line in March 1993. (The great advantage of a RISC processor is that it handles a very simple set of instructions, but processes them very, very quickly. This is the direction Intel is going with their forthcoming Merced CPU.)

The 601

The first PPC chip was the 601, initially available in speeds from 60-80 MHz. This 64-bit CPU was used in the Power Mac 6100, 7100, 7200, and 8100. The 601 contains 2.8 million transistors, has a 32 KB unified level-1 (L1) cache, and can process up to 3 instructions per cycle. By the time the 601 was discontinued, it was available at speeds up to 120 MHz and could run at 2x or 3x bus speed.

The 601 contains three execution units, one for integers, one for branch processing, and on for floating point.

The 601 was specifically designed to power the IBM RS/6000, so there were many instructions present for that computer that were not included in future PPC designs. This is the only PPC with a unified instruction and data cache; future models contained separate instruction and data caches.

The 603 and 603e

The second generation split the line into entry level and power user chips. The 603 had only 1.6 million transistors, drew about half as much power as the 601, had two smaller caches (8 KB for instructions, 8 KB for data), and could process up to two instructions per cycle. Low power consumption was a key design factor, since Apple wanted to design PowerBooks around the 603. One way the 603 reduced power consumption was the smaller L1 cache, which so reduced performance that Apple refused to use it in a PowerBook. At that time, 680x0 emulation was crucial. With too small an internal cache, the 603 handled emulated code very poorly. However, it was used in the Performa 5200 and 6200, a couple of Road Apples that gave the 603 an undeservedly bad reputation. The 603 was available at speeds from 75 MHz to 160 MHz. (Cycle for cycle, performance was comparable to the 601, but at lower cost.)

The 603 has five execution units, up from three in the 601. These were an integer unit, floating point unit, branch processing unit, load/store unit, and system register unit. The 64-bit 603 could run on either a 32-bit or 64-bit data bus.

With a small redesign, the PPC 603e was introduced. Essentially a 603 with an improved cache (16 KB each for instructions and data), it offered significantly improved performance without draining batteries too quickly. The Performa 5260 and 6300 were designed around this improved chip, as was every PPC-based PowerBook. Speeds ranged from 100 MHz to 300 MHz.

The 603e came in three variants. The 100-133 MHz models used 0.5 micron traces and ran at 1.5x to 4x bus speed. The 166-200 MHz 603e used 0.35 micron technology and ran at 2x to 6x motherboard speed. Finally, the 200-300 MHz version used a 0.29 micron design; it also ran from 2x to 6x bus speed. This version of the 603e requires a minimum 50 MHz bus to achieve 300 MHz.

The 604 and 604e

The power user second generation chip was the 604. Containing 3.6 million transistors, drawing twice the power of the 601, and with a dual L1 cache (16 KB for instructions, 16 KB for data), this workhorse could deal with four instructions per cycle. Beyond that, it was designed to work in tandem with other 604s, making it possible to design computers with two or more CPUs. (Daystar was the first to do this. Seeing their success, Apple licensed the multiprocessor technology for incorporation in the Mac OS.) The 604 started at 120 MHz and was first used in the Power Mac 7600, 8500, and 9500.

The 604 had six independent executions units: two single-cycle integer units, a multi-cycle integer unit, a floating point unit, a branch prediction unit, and a load/store unit. This made is an incredible number cruncher and a top choice for Photoshop users.

The 604 was tweaked for even more performance, resulting in the 604e. As with the 603e, the newer CPU doubled the size of the instruction and data caches, significantly improving performance. At the same time, the 604e could process up to six instructions per cycle. The last revision of the 604e (Mach 5) was available at speeds of 350 MHz.

There were two types of 604e. The first, based on 0.35 micron technology, ran at 180-233 MHz at 2x to 4x bus speed. Switching to a 0.25 micron design, the 604e could run at 250-350 MHz and 3x to 7x bus speed (this meant a minimum 50 MHz bus to drive the 350 MHz 604e).

The 750 (a.k.a. G3)

Arthur, legendary King of England, became the code-name for the third generation PPC, eventually named the 740 and 750. The successor of the 603e, these chips were optimized to run real software, not some theoretical ideal. Early benchmarks show the 750 outperforming the 604e, making it look like the older chip will be reserved for multiprocessor designs or floating-point intensive work.

Like the 604 and 604e, the G3 incorporates six separate execution units. On the G3, there are two integer units, a floating point unit, a branch unit, a load/store unit, and a system register unit.

The 740 and 750 can work in a dual-processor configuration. The key difference between the 740 and 750 is support of the level-2 (L2) cache. The 740 has no built-in L2 cache support and is designed to use a cache on the motherboard. The 750 has built-in cache support and can use either an inline or backside cache, both of which run faster than a motherboard-based cache. Of the two, the backside cache provides the most performance (and, of course, requires more expensive memory). The inline or backside cache must run between full CPU speed and one-third that speed. The L2 cache may be 256 KB, 512 KB, or 1 MB.

Both the 740 and the 750 used a 0.29 micron design initially and were available in speeds of 200, 233, and 266 MHz - followed by speeds up to 466 MHz. They can run at 3x to 8x bus speed, and up to 10x on the latest revision, which means a 500 MHz G3 on a 50 MHz bus is possible. Considering the number of Mac OS computers with 40 MHz motherboards, upgrades to 320 MHz would be possible, but probably not practical.

Beyond that, the Beige G3 with a 66 MHz bus could support a 666 MHz G3, while the B&W G3 could handle a 1000 MHz G3 on its 100 MHz bus! Of course, the higher the CPU-to-bus ratio, the more important it is to have a large L2 cache.

Like the 603 and 603e, the 740 and 750 are 64-bit chips that can function on either a 32-bit or 64-bit bus.

With speeds past 400 MHz, this was the workhorse CPU for the Mac OS until the G4 was unveiled.

Because of issues with cache coherency, it is unlikely we will ever see a dual PPC 750 computer using an inline or backside cache. The difficulty and time involved in checking motherboard RAM and the other cache would actually slow performance, as Be has noted. If dual G3 systems are ever made, they should be on the fastest motherboards with the largest conventional L2 cache possible.

The 750CX, 750CXe (a.k.a. G3+)

Code named SideWinder by IBM, the next generation G3 will be aimed at portables, although it will probably find itself inside the iMac, too. The 750CX will be available in speeds of 350 to 550 MHz, while the 750CXe will supports speeds of 500 to 700 MHz, and possibly even higher in the future.

Both processors include an integrated 256 KB level 2 cache, which runs at full CPU speed. This eliminates the need for an external cache. IBM notes that the 750CX is approximately 5% faster than the old G3 processor at the same clock speed using an external 256 KB L2 cache. It is only 5% slower than the old G3 with an external 512 KB L2 cache.

Although IBM will be able to create fast G3s, don't expect Apple to ever release a G3 with a higher MHz rating than the fastest available Power Mac G4. Still, the integrated L2 cache running at full CPU speed should make the 750CX and 750CXe more than a match for the G4 unless a program is AltiVec enhanced. The forthcoming G4+ will also include an on-chip L2 cache.

Expect the upgrade market to embrace the next generation G3 if they can (IBM has removed some 60x connections on the 750CX, so it may not be possible to run it on anything earlier than a Power Mac G3), which should support at least a 10x multiplier. This could allow an even more efficient 333 MHz upgrade for the PowerBook 1400, 500 MHz daughter cards for the upgradable first generation PCI Power Macs, 666 MHz upgrades for the beige Power Mac G3, and up to 1 GHz upgrades for the blue G3 and its 100 MHz bus.

Apple adopted the 750CX with the second-generation iBook in September 2000. The 750CX was only available in 366, 400, and 466 MHz speeds; Apple has not used the 400 MHz version.

The 750CX is designed for a maximum bus speed of 100 MHz, and it can run at up to 8x bus speed. This would allow a 533 MHz iBook using the 66 MHz bus, but the chip was never specified for that speed.

Going beyond 466 MHz, the PowerPC 750CXe supports up to 133 MHz bus speed and 10x multipliers, which means 1.3 GHz is a theoretical possibility. At this point, IBM has only announced the following speeds:

  • 400, 600, 667 MHz, bus speed to 133 MHz
  • 500, 700 MHz, bus speed to 100 MHz

Apple is apparently using the 750CXe in the 500 MHz iMac (North American edition) and 600 MHz iMac. (The 400 MHz model and international 500 MHz model use the traditional PowerPC 750 CPU.)

G4 (a.k.a. Motorola 7400)

Now available in the Power Mac G4, the G4 processor is to the G3 as the 604 was to the 603 - and then some! Like the 604, and unlike the G3, G4 is designed for multiple processor operation. It also runs about 25% faster for basic floating point math calculations.

The G4 initially shipped at 350-450 MHz and peaked at 500 MHz. It is assumed the G4 has at least a 10x multiplier, but the initial run tops out at 9x. This means a computer with 100 MHz motherboard could support a 900 MHz G4, and also that the beige G3 with its 66 MHz design could take a 600 MHz G4, if it ever existed. Older Macs taking a CPU daughter card could support a 500 MHz G4 on a 50 MHz bus with the 10x multiplier.

And there's a very real possibility of 10x and higher multipliers on later versions of the G4. Until 500 MHz and faster versions are available, 9x will be adequate for even the old Macs with a 50 MHz system bus.

The Motorola 7400 uses a 0.15 micron die and a new copper process. It is probably the first mass produced chip made using a 0.15 micron die.

The G4 also has a new bus architecture that more than doubles memory bandwidth, although only the Sawtooth motherboard used in the G4/450 and G4/500 are designed around the new memory bus.

New features include the ability for one CPU to send data directly to another without using system memory, the ability to use a 2 MB level 2 cache (previous PowerPC designs were limited to 1 MB), 128-bit internal architecture (vs. 64-bit for other PowerPC chips), 64- or 128-bit access to the cache (two different versions of G4; 64-bit is for backward compatibility with older systems), and (for the Motorola version only) the AltiVec "Velocity Engine" multimedia extensions.

The G4 has a total of 7 execution units: two for integer work, plus one each for load/store, branch/system, floating point, AltiVec ALU, and AltiVec Permute. Preliminary SPECfp scores are about 30% higher than the G3 at the same clock speed.

AltiVec has the ability to increase performance of certain functions, especially those found in things like QuickTime and Photoshop, by up to 16x, although programs will have to be modified to take advantage of the new AltiVec instructions.

Although we'll initially see the G4 in a version that uses the same 32-bit instructions as earlier PowerPC chips, there will also be a version designed around 64-bit instructions. This will require a 64-bit clean version of the Mac OS, which should be part of the Mac OS X design -- and maybe even OS 9. (Old timers will recall Apple's switch from 24-bit addressing to 32-bit addressing with System 7.0. A lot of programs had to be recompiled for 32-bit operation and some Macs - notably the 68000-based ones - simply could not run in 32-bit mode. We should expect some of the same kind of teething pains with pre-Carbon, pre-OS X software and hardware.)

The 64-bit bus version of the G4 will be pin compatible with the PowerPC 750; the 128-bit bus version will require additional pins and a different configuration.

G4, second generation

Apple began shipping models with the next generation G4 in January 2001. The Power Mac G4/466 and G4/533 use the PowerPC 7410 (identified as "CPU Type: PowerPC G4 (11.3)" by System Profiler), a low power version of the G4. The same chip is used in the PowerBook G4. This chip is essentially a low power 7400. The 7410 and 7450 both support bus speeds to 133 MHz.

The Power Mac G4/667 and G4/733 use the PowerPC 7450, which has an on-chip 256 KB level 2 cache running at full CPU speed. It also includes 4 integer math execution units (3 simple + 1 complex), which is twice as many as the 7400 and 7410. It also doubles the number of AltiVec units, adding simple and complex to floating and permute.

Exactly what this means is a bit vague, but where the old G4 was the most powerful CPU, MHz for MHz, on the market, the 7410 trumps it. At this time, Power Mac G4s with the new CPU are just starting to reach the market.

Summary, PowerPC family

CPU       speed*    instructions  L1 cache
601     60-120 MHz   3 per cycle     32 KB
603     75-160 MHz   2 per cycle    2x8 KB
603e   100-300 MHz   2 per cycle   2x16 KB
604    120-180 MHz   4 per cycle   2x16 KB
604e   150-350 MHz   4 per cycle   2x32 KB
G3     200-450 MHz   3 per cycle   2x32 KB  8-10x bus multiplier
750CX  366-466 MHz   3 per cycle   2x32 KB+ 8x bus multiplier
750CXe 400-700 MHz   3 per cycle   2x32 KB+ 10x bus multiplier
G4     350-600 MHz  19 per cycle** 2x32 KB  supports 2 MB L2 cache
7410   466-533 MHz  20 per cycle** 2x32 KB  supports 1 MB L2 cache
7450   667-733 MHz       unknown   2x32 KB+ supports 2 MB L3 cache
* as used in Apple or Maclone
** AltiVec can do up to 16 simultaneous calculations
+ integrated 256 KB level 2 cache

Other Resources