> Microsoft's Xbox 360 & Sony's PlayStation 3 - Examples of Poor CPU 
> Performance
> Date: June 29th, 2005
> Author: Anand Lal Shimpi
> "In our last article we had a fairly open-ended discussion about many of 
> the challenges facing both of the recently announced next-generation game 
> consoles.  We discussed misconceptions about the Cell processor and its 
> ability to accelerate physics calculations, as well as touched on the GPUs 
> of both platforms.  In the end, both the Xbox 360 and the PlayStation 3 
> are much closer competitors than you would think based on first 
> impressions.
> The Xbox 360's Xenon CPU features more general purpose cores than the 
> PlayStation 3 (3 vs. 1), however game developers will most likely only be 
> using one of those cores for the majority of their calculations, leveling 
> the playing field considerably.
> The Cell processor derives much of its power from its array of 7 SPEs 
> (Synergistic Processing Elements), however as we discovered in our last 
> article, their purpose is far more specialized than we had thought. 
> Speaking with Epic Games' head developer, Tim Sweeney, he provided a much 
> more balanced view of what sorts of tasks could take advantage of the 
> Cell's SPE array.
> The GPUs of the next-generation platforms also proved to be quite 
> interesting.  In Part I we speculated as to the true nature of NVIDIA's 
> RSX in the PS3, concluding that it's quite likely little more than a 
> higher clocked G70 GPU.  We will expand on that discussion a bit more in 
> this article.  We also looked at Xenos, the Xbox 360's GPU and 
> characterized it as equivalent to a very flexible 24-pipe R420.  Despite 
> the inclusion of the 10MB of embedded DRAM, Xenos and RSX ended up being 
> quite similar in our expectations for performance; and that pretty much 
> summarized all of our findings - the two consoles, although implementing 
> very different architectures, ended up being so very similar.
> So we've concluded that the two platforms will probably end up performing 
> very similarly, but there was one very important element excluded from the 
> first article: a comparison to present-day PC architectures.  The reason a 
> comparison to PC architectures is important is because it provides an 
> evaluation point to gauge the expected performance of these 
> next-generation consoles.  We've heard countless times that these new 
> consoles would offer better gaming performance than anything we've had on 
> the PC, or anything we would have for a matter of years.  Now it's time to 
> actually put those claims to the test, and that's exactly what we did.
> Speaking under conditions of anonymity with real world game developers who 
> have had first hand experience writing code for both the Xbox 360 and 
> PlayStation 3 hardware (and dev kits where applicable), we asked them for 
> nothing more than their brutal honesty.  What did they think of these new 
> consoles?  Are they really outfitted with the PC-eclipsing performance 
> we've been lead to believe they have?  The answer is actually quite 
> frequently found in history; as with anything, you get what you pay for.
> Learning from Generation X
> The original Xbox console marked a very important step in the evolution of 
> gaming consoles - it was the first console that was little more than a 
> Windows PC.
> It featured a 733MHz Pentium III processor with a 128KB L2 cache, paired 
> up with a modified version of NVIDIA's nForce chipset (modified to support 
> Intel's Pentium III bus instead of the Athlon XP it was designed for). 
> The nForce chipset featured an integrated GPU, codenamed the NV2A, 
> offering performance very similar to that of a GeForce3.  The system had a 
> 5X PC DVD drive and an 8GB IDE hard drive, and all of the controllers 
> interfaced to the console using USB cables with a proprietary connector.
> For the most part, game developers were quite pleased with the original 
> Xbox.  It offered them a much more powerful CPU, GPU and overall platform 
> than anything had before.  But as time went on, there were definitely 
> limitations that developers ran into with the first Xbox.
> One of the biggest limitations ended up being the meager 64MB of memory 
> that the system shipped with.  Developers had asked for 128MB and the 
> motherboard even had positions silk screened for an additional 64MB, but 
> in an attempt to control costs the final console only shipped with 64MB of 
> memory.
> The next problem is that the NV2A GPU ended up not having the fill rate 
> and memory bandwidth necessary to drive high resolutions, which kept the 
> Xbox from being used as a HD console.
> Although Intel outfitted the original Xbox with a Pentium III/Celeron 
> hybrid in order to improve performance yet maintain its low cost, at 
> 733MHz that quickly became a performance bottleneck for more complex games 
> after the console's introduction.
> The combination of GPU and CPU limitations made 30 fps a frame rate target 
> for many games, while simpler titles were able to run at 60 fps.  Split 
> screen play on Halo would even stutter below 30 fps depending on what was 
> happening on screen, and that was just a first-generation title.  More 
> experience with the Xbox brought creative solutions to the limitations of 
> the console, but clearly most game developers had a wish list of things 
> they would have liked to have seen in the Xbox successor.  Similar 
> complaints were levied against the PlayStation 2, but in some cases they 
> were more extreme (e.g. its 4MB frame buffer).
> Given that consoles are generally evolutionary, taking lessons learned in 
> previous generations and delivering what the game developers want in order 
> to create the next-generation of titles, it isn't a surprise to see that a 
> number of these problems are fixed in the Xbox 360 and PlayStation 3.
> One of the most important changes with the new consoles is that system 
> memory has been bumped from 64MB on the original Xbox to a whopping 512MB 
> on both the Xbox 360 and the PlayStation 3.  For the Xbox, that's a factor 
> of 8 increase, and over 12x the total memory present on the PlayStation 2.
> The other important improvement with the next-generation of consoles is 
> that the GPUs have been improved tremendously.  With 6 - 12 month product 
> cycles, it's no surprise that in the past 4 years GPUs have become much 
> more powerful.  By far the biggest upgrade these new consoles will offer, 
> from a graphics standpoint, is the ability to support HD resolutions.
> There are obviously other, less-performance oriented improvements such as 
> wireless controllers and more ubiquitous multi-channel sound support.  And 
> with Sony's PlayStation 3, disc capacity goes up thanks to their embracing 
> the Blu-ray standard.
> But then we come to the issue of the CPUs in these next-generation 
> consoles, and the level of improvement they offer.  Both the Xbox 360 and 
> the PlayStation 3 offer multi-core CPUs to supposedly usher in a new era 
> of improved game physics and reality.  Unfortunately, as we have found 
> out, the desire to bring multi-core CPUs to these consoles was made a 
> reality at the expense of performance in a very big way.
> Problems with the Architecture
> At the heart of both the Xenon and Cell processors is IBM's custom PowerPC 
> based core.  We've discussed this core in our previous articles, but it is 
> best characterized as being quite simple.  The core itself is a very 
> narrow 2-issue in-order execution core, featuring a 64KB L1 cache (32K 
> instruction/32K data) and either a 1MB or 512KB L2 cache (for Xenon or 
> Cell, respectively).  Supporting SMT, the core can execute two threads 
> simultaneously similar to a Hyper Threading enabled Pentium 4.  The Xenon 
> CPU is made up of three of these cores, while Cell features just one.
> Each individual core is extremely small, making the 3-core Xenon CPU in 
> the Xbox 360 smaller than a single core 90nm Pentium 4.  While we don't 
> have exact die sizes, we've heard that the number is around 1/2 the size 
> of the 90nm Prescott die.
> IBM's pitch to Microsoft was based on the peak theoretical floating point 
> performance-per-dollar that the Xenon CPU would offer, and given 
> Microsoft's focus on cost savings with the Xbox 360, they took the bait.
> While Microsoft and Sony have been childishly playing this flops-war, 
> comparing the 1 TFLOPs processing power of the Xenon CPU to the 2 TFLOPs 
> processing power of the Cell, the real-world performance war has already 
> been lost.
> Right now, from what we've heard, the real-world performance of the Xenon 
> CPU is about twice that of the 733MHz processor in the first Xbox. 
> Considering that this CPU is supposed to power the Xbox 360 for the next 
> 4 - 5 years, it's nothing short of disappointing.  To put it in 
> perspective, floating point multiplies are apparently 1/3 as fast on Xenon 
> as on a Pentium 4.
> The reason for the poor performance?  The very narrow 2-issue in-order 
> core also happens to be very deeply pipelined, apparently with a branch 
> predictor that's not the best in the business.  In the end, you get what 
> you pay for, and with such a small core, it's no surprise that performance 
> isn't anywhere near the Athlon 64 or Pentium 4 class.
> The Cell processor doesn't get off the hook just because it only uses a 
> single one of these horribly slow cores; the SPE array ends up being 
> fairly useless in the majority of situations, making it little more than a 
> waste of die space.
> We mentioned before that collision detection is able to be accelerated on 
> the SPEs of Cell, despite being fairly branch heavy.  The lack of a branch 
> predictor in the SPEs apparently isn't that big of a deal, since most 
> collision detection branches are basically random and can't be predicted 
> even with the best branch predictor.  So not having a branch predictor 
> doesn't hurt, what does hurt however is the very small amount of local 
> memory available to each SPE.  In order to access main memory, the SPE 
> places a DMA request on the bus (or the PPE can initiate the DMA request) 
> and waits for it to be fulfilled.  From those that have had experience 
> with the PS3 development kits, this access takes far too long to be used 
> in many real world scenarios.  It is the small amount of local memory that 
> each SPE has access to that limits the SPEs from being able to work on 
> more than a handful of tasks.  While physics acceleration is an important 
> one, there are many more tasks that can't be accelerated by the SPEs 
> because of the memory limitation.
> The other point that has been made is that even if you can offload some of 
> the physics calculations to the SPE array, the Cell's PPE ends up being a 
> pretty big bottleneck thanks to its overall lackluster performance.  It's 
> akin to having an extremely fast GPU but without a fast CPU to pair it up 
> with.
> What About Multithreading?
> We of course asked the obvious question: would game developers rather have 
> 3 slow general purpose cores, or one of those cores paired with an array 
> of specialized SPEs?  The response was unanimous, everyone we have spoken 
> to would rather take the general purpose core approach.
> Citing everything from ease of programming to the limitations of the SPEs 
> we mentioned previously, the Xbox 360 appears to be the more 
> developer-friendly of the two platforms according to the cross-platform 
> developers we've spoken to.  Despite being more developer-friendly, the 
> Xenon CPU is still not what developers wanted.
> The most ironic bit of it all is that according to developers, if either 
> manufacturer had decided to use an Athlon 64 or a Pentium D in their 
> next-gen console, they would be significantly ahead of the competition in 
> terms of CPU performance.
> While the developers we've spoken to agree that heavily multithreaded game 
> engines are the future, that future won't really take form for another 3 - 
> 5 years.  Even Microsoft admitted to us that all developers are focusing 
> on having, at most, one or two threads of execution for the game engine 
> itself - not the four or six threads that the Xbox 360 was designed for.
> Even when games become more aggressive with their multithreading, 
> targeting 2 - 4 threads, most of the work will still be done in a single 
> thread.  It won't be until the next step in multithreaded architectures 
> where that single thread gets broken down even further, and by that time 
> we'll be talking about Xbox 720 and PlayStation 4.  In the end, the more 
> multithreaded nature of these new console CPUs doesn't help paint much of 
> a brighter performance picture - multithreaded or not, game developers are 
> not pleased with the performance of these CPUs.
> What about all those Flops?
> The one statement that we heard over and over again was that Microsoft was 
> sold on the peak theoretical performance of the Xenon CPU.  Ever since the 
> announcement of the Xbox 360 and PS3 hardware, people have been set on 
> comparing Microsoft's figure of 1 trillion floating point operations per 
> second to Sony's figure of 2 trillion floating point operations per second 
> (TFLOPs).  Any AnandTech reader should know for a fact that these numbers 
> are meaningless, but just in case you need some reasoning for why, let's 
> look at the facts.
> First and foremost, a floating point operation can be anything; it can be 
> adding two floating point numbers together, or it can be performing a dot 
> product on two floating point numbers, it can even be just calculating the 
> complement of a fp number.  Anything that is executed on a FPU is fair 
> game to be called a floating point operation.
> Secondly, both floating point power numbers refer to the whole system, CPU 
> and GPU. Obviously a GPU's floating point processing power doesn't mean 
> anything if you're trying to run general purpose code on it and vice 
> versa. As we've seen from the graphics market, characterizing GPU 
> performance in terms of generic floating point operations per second is 
> far from the full performance story.
> Third, when a manufacturer is talking about peak floating point 
> performance there are a few things that they aren't taking into account. 
> Being able to process billions of operations per second depends on 
> actually being able to have that many floating point operations to work 
> on.  That means that you have to have enough bandwidth to keep the FPUs 
> fed, no mispredicted branches, no cache misses and the right structure of 
> code to make sure that all of the FPUs can be fed at all times so they can 
> execute at their peak rates.  We already know that's not the case as game 
> developers have already told us that the Xenon CPU isn't even in the same 
> realm of performance as the Pentium 4 or Athlon 64.  Not to mention that 
> the requirements for hitting peak theoretical performance are always 
> ridiculous; caches are only so big and thus there will come a time where a 
> request to main memory is needed, and you can expect that request to be 
> fulfilled in a few hundred clock cycles, where no floating point 
> operations will be happening at all.
> So while there may be some extreme cases where the Xenon CPU can hit its 
> peak performance, it sure isn't happening in any real world code.
> The Cell processor is no different; given that its PPE is identical to one 
> of the PowerPC cores in Xenon, it must derive its floating point 
> performance superiority from its array of SPEs.  So what's the issue with 
> 218 GFLOPs number (2 TFLOPs for the whole system)?  Well, from what we've 
> heard, game developers are finding that they can't use the SPEs for a lot 
> of tasks.  So in the end, it doesn't matter what peak theoretical 
> performance of Cell's SPE array is, if those SPEs aren't being used all 
> the time.
> Another way to look at this comparison of flops is to look at integer add 
> latencies on the Pentium 4 vs. the Athlon 64.  The Pentium 4 has two 
> double pumped ALUs, each capable of performing two add operations per 
> clock, that's a total of 4 add operations per clock; so we could say that 
> a 3.8GHz Pentium 4 can perform 15.2 billion operations per second.  The 
> Athlon 64 has three ALUs each capable of executing an add every clock;  so 
> a 2.8GHz Athlon 64 can perform 8.4 billion operations per second.  By this 
> silly console marketing logic, the Pentium 4 would be almost twice as fast 
> as the Athlon 64, and a multi-core Pentium 4 would be faster than a 
> multi-core Athlon 64. Any AnandTech reader should know that's hardly the 
> case.  No code is composed entirely of add instructions, and even if it 
> were, eventually the Pentium 4 and Athlon 64 will have to go out to main 
> memory for data, and when they do, the Athlon 64 has a much lower latency 
> access to memory than the P4.  In the end, despite what these horribly 
> concocted numbers may lead you to believe, they say absolutely nothing 
> about performance.  The exact same situation exists with the CPUs of the 
> next-generation consoles; don't fall for it.
> Why did Sony/MS do it?
> For Sony, it doesn't take much to see that the Cell processor is eerily 
> similar to the Emotion Engine in the PlayStation 2, at least conceptually. 
> Sony clearly has an idea of what direction they would like to go in, and 
> it doesn't happen to be one that's aligned with much of the rest of the 
> industry.  Sony's past successes have really come, not because of the 
> hardware, but because of the developers and their PSX/PS2 exclusive 
> titles. A single hot title can ship hundreds of millions of consoles, and 
> by our count, Sony has had many more of those than Microsoft had with the 
> first Xbox.
> Sony shipped around 4 times as many PlayStation 2 consoles as Microsoft 
> did Xboxes, regardless of the hardware platform, a game developer won't 
> turn down working with the PS2 - the install base is just that attractive. 
> So for Sony, the Cell processor may be strange and even undesirable for 
> game developers, but the developers will come regardless.
> The real surprise was Microsoft; with the first Xbox, Microsoft listened 
> very closely to the wants and desires of game developers.  This time 
> around, despite what has been said publicly, the Xbox 360's CPU 
> architecture wasn't what game developers had asked for.
> They wanted a multi-core CPU, but not such a significant step back in 
> single threaded performance.  When AMD and Intel moved to multi-core 
> designs, they did so at the expense of a few hundred MHz in clock speed, 
> not by taking a step back in architecture.
> We suspect that a big part of Microsoft's decision to go with the Xenon 
> core was because of its extremely small size.  A smaller die means lower 
> system costs, and if Microsoft indeed launches the Xbox 360 at $299 the 
> Xenon CPU will be a big reason why that was made possible.
> Another contributing factor may be the fact that Microsoft wanted to own 
> the IP of the silicon that went into the Xbox 360.  We seriously doubt 
> that either AMD or Intel would be willing to grant them the right to make 
> Pentium 4 or Athlon 64 CPUs, so it may have been that IBM was the only 
> partner willing to work with Microsoft's terms and only with this one 
> specific core.
> Regardless of the reasoning, not a single developer we've spoken to thinks 
> that it was the right decision.
> The Saving Grace: The GPUs
> Although both manufacturers royally screwed up their CPUs, all developers 
> have agreed that they are quite pleased with the GPU power of the 
> next-generation consoles.
> First, let's talk about NVIDIA's RSX in the PlayStation 3.  We discussed 
> the possibility of RSX offloading vertex processing onto the Cell 
> processor, but more and more it seems that isn't the case.  It looks like 
> the RSX will basically be a 90nm G70 with Turbo Cache running at 550MHz, 
> and the performance will be quite good.
> One option we didn't discuss in the last article, was that the G70 GPU may 
> feature a number of disabled shader pipes already to improve yield.  The 
> move to 90nm may allow for those pipes to be enabled and thus allowing for 
> another scenario where the RSX offers higher performance at the same 
> transistor count as the present-day G70.  Sony may be hesitant to reveal 
> the actual number of pixel and vertex pipes in the RSX because honestly 
> they won't know until a few months before mass production what their final 
> yields will be.
> Despite strong performance and support for 1080p, a large number of 
> developers are targeting 720p for their PS3 titles and won't support 
> 1080p. Those that are simply porting current-generation games over will 
> have no problems running at 1080p, but anyone working on a truly 
> next-generation title won't have the fill rate necessary to render at 
> 1080p.
> Another interesting point is that despite its lack of "free 4X AA" like 
> the Xbox 360, in some cases it won't matter.  Titles that use longer pixel 
> shader programs end up being bound by pixel shader performance rather than 
> memory bandwidth, so the performance difference between no AA and 2X/4X AA 
> may end up being quite small.  Not all titles will push the RSX to the 
> limits however, and those titles will definitely see a performance drop 
> with AA enabled.  In the end, whether the RSX's lack of embedded DRAM 
> matters will be entirely dependent on the game engine being developed for 
> the platform.  Games that make more extensive use of long pixel shaders 
> will see less of an impact with AA enabled than those that are more 
> texture bound. Game developers are all over the map on this one, so it 
> wouldn't be fair to characterize all of the games as falling into one 
> category or another.
> ATI's Xenos GPU is also looking pretty good and most are expecting 
> performance to be very similar to the RSX, but real world support for this 
> won't be ready for another couple of months.  Developers have just 
> recently received more final Xbox 360 hardware, and gauging performance of 
> the actual Xenos GPU compared to the R420 based solutions in the G5 
> development kits will take some time.  Since the original dev kits offered 
> significantly lower performance, developers will need a bit of time to 
> figure out what realistic limits the Xenos GPU will have.
> Final Words
> Just because these CPUs and GPUs are in a console doesn't mean that we 
> should throw away years of knowledge from the PC industry - performance 
> doesn't come out of thin air, and peak performance is almost never 
> achieved. Clever marketing however, will always try to fool the consumer.
> And that's what we have here today, with the Xbox 360 and PlayStation 3. 
> Both consoles are marketed to be much more powerful than they actually 
> are, and from talking to numerous game developers it seems that the real 
> world performance of these platforms isn't anywhere near what it was 
> supposed to be.
> It looks like significant advancements in game physics won't happen on 
> consoles for another 4 or 5 years, although it may happen with PC games 
> much before that.
> It's not all bad news however; the good news is that both GPUs are quite 
> possibly the most promising part of the new consoles.  With the 
> performance that we have seen from NVIDIA's G70, we have very high 
> expectations for the 360 and PS3.  The ability to finally run at HD 
> resolutions in all games will bring a much needed element to console 
> gaming.
> And let's not forget all of the other improvements to these 
> next-generation game consoles.  The CPUs, despite being relatively 
> lackluster, will still be faster than their predecessors and increased 
> system memory will give developers more breathing room.    Then there are 
> other improvements such as wireless controllers, better online play and 
> updated game engines that will contribute to an overall better gaming 
> experience.
> In the end, performance could be better, the consoles aren't what they 
> could have been had the powers at be made some different decisions.  While 
> they will bring better quality games to market and will be better than 
> their predecessors, it doesn't look like they will be the end of PC gaming 
> any more than the Xbox and PS2 were when they were launched.  The two 
> markets will continue to coexist, with consoles being much easier to deal 
> with, and PCs offering some performance-derived advantages.
> With much more powerful CPUs and, in the near future, more powerful GPUs, 
> the PC paired with the right developers should be able to bring about that 
> revolution in game physics and graphics we've been hoping for.  Consoles 
> will help accelerate the transition to multithreaded gaming, but it looks 
> like it will take PC developers to bring about real change in things like 
> game physics, AI and other non-visual elements of gaming. "