ENGINEERING
Otellini Discusses Intel’s Computing Strategy
- Written by: Writer
- Category: ENGINEERING
At this week’s Intel Developers Forum, Paul Otellini, Executive Vice President and General Manager of the company’s architecture group discussed Intel’s computing strategy in a keynote address. The following is a transcript of that address. Three particular areas covered were what Intel is doing in the Itanium architecture, what it’s doing in low-power, and what it’s doing in driving increasing degrees of parallelism into all of its microprocessors. ANNOUNCER: Ladies and gentlemen, please welcome Paul Otellini. (Applause.) PAUL OTELLINI: Thank you, and good morning. We created that video for a reason. I think it's particularly interesting to think about how the industry goes through downturns. The PC industry is 20 years old this year, and I think in going through downturns or through business cycles, one can very often lose focus or lose sight of what customers really want. The video and the people we interviewed said to us very clearly that they want a better computing experience; they want faster computers; they want them easier to use. They want it because this is what the tools of their lives require. It's how they do their jobs; it's how they manage their lives. At 20 years, the PC industry is going through a downturn. There's not a lot of advice out there on how to weather one of these cycles, but I think there is some illustrative advice from the Great Depression. In 1933, in the depths of the Great Depression, a fellow by the name of Charles Kettering gave a speech, and he opened his speech by saying, "I believe business will come back when we get some products that people want to buy." It seems pretty intuitive. But you have to understand who Kettering was. Kettering was one of the preeminent technologists of his era. He was the man who founded Delco Electronics. He invented the electric starter for the automobile. He sold his business to General Motors and retained a position as head of the research organization of GM until he retired in 1947. He described research as the organized process to make people dissatisfied with what they have. (Laughter.) PAUL OTELLINI: I think we're trying to come a little higher than that and make people aspire to what we can build. But Kettering gives us a template for what I think we need to do to weather this cycle. And there are further analogies with respect to the automotive industry and the computer industry. The automobiles started out segmenting itself by price and speed and horsepower. There was Henry Ford's famous quote that you could have any color you want as long as it's black. Well, the auto industry has evolved pretty dramatically, price and speed and horsepower are important, but safety, reliability, fuel economy, the total driving experience are what buyers look at, demand, and are advertised against now. The PC industry is very comparable. It was megahertz and price. Today it still is about gigahertz and price, but increasingly, it's about reliability, style, ease of use, security, and power conservation in the case of portables. In other words, the total computing experience. Now, computing has always been a race between delivering MIPS and consuming them. And I think it's interesting to go back now to the 20th anniversary and see how computing evolved to consume the MIPS that were available over the last 20 years. The 286 was introduced in 1982. One of the early giants of the industry, Gordy Campbell, predicted a ten year life and ten years of growth for that product. Applications developed around the 286 quickly consumed its capability. In fact, they quickly consumed the capability of the 386, which followed by bringing VGA graphics, Windows, PageMaker, and similar kinds of applications to the computer. By the time we got to the 486, though, the first refrain from the press came out that talked about "Do your users really need a 486 PC? No." But the industry -- the software industry -- quickly consumed the CPU cycles that the 486 was able to deliver. And on it went into the Pentium( era where a magazine no less illustrious than PC World actually wrote "Pentium on a server, sure, but on the desktop, never." The industry consumed the Pentium MIPS. In hindsight, we all understand that. And they also consumed the compute power we threw in the subsequent generations of the Pentium(r) II and Pentium(r) III as we moved into the Internet era. And now as we launch the Pentium( 4 into very, very high volumes, we see the same refrain out there. "Does the market really need a 1.5 GHz processor," wrote Piper Jaffrey last year, and they answered no. In this consistent race to deliver MIPS and consume them, there is a tremendous amount of application types that are on the horizon that don't need a lot more development to imagine that we could all be consuming a lot more MIPS than we have today. So what will drive the need for MIPS consumption going forward? There are, as I said, a number of application types, each one of which consumes more MIPS than the fastest processor that we can build today. Simple things like moving speech to text take 1500 MIPS. Natural language processing and 802.11a in software would require two 4 GHz processors today. And you can imagine doing things faster, encoding a two-hour digital video onto MPEG2 in two minutes instead of two hours. It takes more compute cycles than the industry could possibly throw at it in the foreseeable future. The Pentium 4 2 GHz machine, the fastest microprocessor in the world introduced yesterday, consumes that very small line at the bottom there, yellow line going across, on any one of those applications. You start running a few of those in parallel and you've tied up the machine. So this is a bit of an abstract, and I asked our demo team to construct a demo for you that would consume every MIP of the fastest processor that Intel could build. And to do that, I asked them to stretch a bit, to go to two and a half gigahertz, give us 500 more megahertz than the fastest processors that we're shipping today. And to see how we're doing, let me bring out Victoria and Greg. And first of all, how fast is the machine? VICTORIA: Well, Paul, what I'm going to show you today is the version of our Northwood 0.13-micron technology, and this is the first version of the processor anywhere outside the fab. So why don't we put up to the big screens the output here, and we're running at 3.5 GHz. So we've not only met, but exceeded your expectations and have delivered 3.5 GHz. PAUL OTELLINI: By a gigahertz. VICTORIA: Yeah. So I'd like to describe a scenario and usage model that will require every bit of performance that this sort of system would give. So let's start over here. What I want to take mention of is up on the middle screen I have an animation that's going to represent what's happening here on stage for the audience. So it will help paint a clear picture of what's going on. So as we all know, the typical American household has several televisions and PCs, and over the course of the next few years, that trend is only going to continue with more PCs and other connected computing devices per household. And we've put together one such future scenario here today, and we're going to demonstrate pieces of that scenario. So if we take a look at our household, in the den we have a Pentium 4 3.0 GHz desktop. In our family room, we've got a high-definition display. Up in the kids' room, we have an older Pentium III system. Up in the master bedroom, we have a pen-based computer with a TV, and then downstairs in the kitchen we have another pen-based computer. All of these devices are connected via 802.11. So why don't we go ahead and start off in our den and high-performance desktop, and we've got our oldest daughter Suzie, she's home from college and playing a game of Quake 3*. But this computer is actually providing more power than just the processing for this game. It's also acting as the cornerstone of the family's personal entertainment network. Among other things, it's a smart PVR or personal video recorder, so that each member of the family has control over what they want to watch and when they want to watch it. Let me give you an example of that. Let's say there's a member of the family in the household, he's in the den, and he wants to turn on his high-definition TV because he's planning the family vacation. So he's going to turn on his favorite travel show. He can select either prerecorded or live high-definition content. In this case we plan to be using an IPAQ. Next, if we go upstairs to the kids' bedroom, Bobby's getting ready for a baseball game, but he wants to record his favorite cartoon from The Cartoon Network*. He can go ahead and select his programming very easily using either his Pentium III system or another hand-held device. Then, what I actually want to do is explain this GUI interface that you see. We've put this together to represent a four-video PVR. Basically, this interface controls both playback, recording, and status of the multivideo inputs into your computer. Next let's go on downstairs to the kitchen. We're going to look at Dad. Dad is getting ready to prepare a feast for the family this evening. And he's got his home and garden show coming up. He's going to use his handheld device to go ahead and both play back and record his cooking show so that he can archive his recipes. And then we're going to go upstairs to the master bedroom. Mom's getting ready for a business trip. But she also wants to record her favorite show from Tech TV. So, as you can see, there are a lot of things going on here. We're actually recording three types of NTSC video while we're recording high-definition TV. But we're also playing a game of Quake 3 simultaneously on the same system. This is the first public demonstration of this type of capability. And the performance of the processor makes this possible. In fact, Greg, why don't we go ahead and bring up our CPU monitor and see what kind of workload this is creating. You'll see we're using 100 percent of our Pentium 4 3 GHz processor to do this. Not only are we going to have more PCs and laptops, but we're also going to have MP3 players, PDAs, cell phones, wireless picture frames, digital cameras. In addition, we're going to see complete new usage models for our home computers, such as unified messaging centers and gaming servers. PAUL OTELLINI: Fantastic. Thank you very much. VICTORIA: Thank you. (Applause.) PAUL OTELLINI: Just to recap, because I know that went pretty quickly, that was the world's first demonstration of our 0.13-micron version of the Pentium 4. It actually runs at three and a half gigahertz. It wasn't running quite that fast this morning. I was surprised. The home demo was done at 3 GHz. We've essentially taken yesterday's record machine of 2 GHz and upped it by a gigahertz and a half, then, on a 3 GHz system, done things that were not that extreme, not that extreme in terms of your imagination, things that you would do inside your home in a very, very rudimentary fashion, and consumed every MIP that a 3 GHz processor could throw at it. I think it's interesting to think about how these applications will evolve and how the processors will evolve. But if you go back to what I said earlier about delivering a total computing experience, making these machines intrinsically better to use, I think we need to do more. We showed 2 GHz yesterday. We launched 2 GHz yesterday. We demonstrated 3.5 today. And we're convinced that the microarchitecture that the Pentium 4 embodies will be able to scale to 10 gigahertz over its lifetime. Gigahertz are necessary. They're necessary for the evolution and improvement in computing. But they're not sufficient. And what I would like to spend the rest of this discussion about is really moving beyond gigahertz, what other things we as a company and we as an industry need to think about doing to be able to enable that better computing experience. Of course, what we're talking about is based upon raw compute power. Over the life of our product line, we've been able to increase the MIPs, the CPU MIPs year after year, cycle after cycle, and we'll continue to do so. Over the history of our microprocessor architecture, we've embodied new architectural elements into the processor to enhance its overall capability, going back to the early days of the 386 and the 486, where we integrated cache and floating-point units to be able to increase the utility of the microprocessor. And then in later eras, we put in multiple execution units, new instructions for multimedia, new instructions for video, and so forth. We'll continue to do those things, moving forward, and that's much of what IDF is going to be about this year. Another thing we do is we drive platform capability beyond the processor. Again, our history here is very robust in terms of platform standards, like PCI, AGP, USB, and the IAPC that we've developed in partnership with companies like Microsoft. This year at IDF, you'll see a tremendous focus on incremental platform capabilities, new capabilities to move the platform forward. And in particular, in the keynotes tomorrow and on Thursday, you'll hear more about USB 2, AGP 8X, InfiniBandTM, Serial ATA, and 3GIO. What I would like to focus on in the rest of my talk this morning is incremental processor capabilities beyond gigahertz. Three particular areas I wanted to cover, what we're doing in the Itanium architecture, what we're doing in low-power, and what we're doing in driving increasing degrees of parallelism into all of our microprocessors. To make this happen, though, all of us in this room and in this industry need to change the pattern of our investments. For 20 years, we have been focused religiously on delivering and taking advantage of that next megahertz. As we go forward and start focusing on the compute experience, we need to start thinking beyond that megahertz, thinking about how we can build substantially better computers to meet the goals of the people in the video. Intel's investment in this area is varied. And today I wanted to focus on three particular areas. The first one is a new technology that is code-named Banias. Banias is focused on the promise of very high-performance mobile computing at very, very low power, to be able to give you a much longer battery life. Low power, though, is not only associated with mobile computing. Increasingly, it's a factor in every form factor of computers that are out there. In servers, the model is moving towards increasing compute density. If you look at the mapping of the server types between now and 2005, by the time we get to 2005, two-thirds of all the server units built will be either rack optimized or Blades. Those units, those form factors, will require increasing compute density and increasing low-power techniques. Similarly, on the desktop, if you look at the evolution of the form factor from the standard desktop to small form-factor desktops, the expectation is that by 2005, 50 percent of the volume will be small form factor. This requires us to take a different kind of design approach than we have historically towards the entire computer. But most particularly in notebooks, where power matters the most, where form factors matter the most, is this issue important. In fact, if you look at the growth of the mobile market, it's expected to grow much faster than the desktop market over the next five years, but virtually all of the growth in the market is in the thin-and-light and mini form factors. And the thin-and-light form factor is, itself, an evolving form factor, becoming ever thinner, ever lighter, ever longer lasting. So with these trends in mind, we started to think about what we could do as a company to advance the state of low power. For over a decade, Intel has introduced greater than a dozen technologies that address the need to lower power for power conservation. They tend to be in two areas. They tend to be those kind of technologies with focus on lowering the power to give you a smaller form factor, the so-called thermal power, the absolute maximum power a device would consume in a given environment. The second vector has to do with delivering lower power for a longer battery life, the average power equation, to be able to get the maximum power, maximum battery life, with the maximum performance in a given device. We've introduced a number of technologies like voltage scaling, voltage reduction, clock gating to be able to turn circuits on and off when they're not used. As I said earlier, ACPI does a lot of the power management of the notebooks. Going forward, we are in this conference starting to talk about a number of new technologies that will be embedded in this next-generation microprocessor for notebooks called Banias. Micro ops fusion, special sizing techniques for circuits that allow us to be able to size them exactly for the power that will be needed, much more aggressive clock gating that we'll talk about in a minute. The problem that we are trying to address with this next-generation mobile processor is really how to break through the so-called power wall. We know how to deliver increasing performance with increasing power. And the job in mobile computing for much of the last decade is how to deliver increasing performance by constraining power, to be able to deliver a compute experience that users in notebooks really wanted. And as we looked at Banias, we knew we could deliver substantially higher performance than we have today through a number of architectural enhancements, the classic microarchitectural approach that Intel has used for generation after generation. But that was insufficient. It would yield a power curve that was greater than what we thought notebooks needed in that time frame. So we started looking at other techniques in terms of microarchitecture, logic, and circuits that allowed us to be able to deliver that maximum performance, but at a substantially lower overall power. That product, as I said earlier, is called Banias. What I'd like to do now is run a short clip showing the design manager for the Banias project describing the products and the specifications in a little more detail. (Video playing): MOOLY EDEN: Hi, my name is Mooly Eden, I'm Israel general manager, and I'm excited to have this opportunity to talk to you in IDF. Banias is a microprocessor architected specifically for the mobile segment, namely higher performance at lower power. Let's dive into the microprocessor and have a sneak peak at some of its attributes. Only the unit that needs to operate is operating to execute specific instruction. All the rest of the chip is sleeping. If we need the execution unit, it works. If we need the cache, it's working. All the rest are not consuming any power. If we take an analogy for a house, we switch off all the lights that we do not need. The only difference between Banias and the house, if it's Banias, it's done automatically. Let's look at the circuit. While designing Banias, we look at all the circuits and we check each circuit, and each one to draw more power than actually required we downsized to reduce the power consumption and so enabled much longer battery life. And if we look at the architecture domain, and let's take an example, we introduced the microfusion. Rather than executing all the micro ops one at a time we fuse them together at the front end and required to execute less. If we try to make another analogy, if you tried to get from Santa Clara to San Francisco, there's two way to expedite it. Either driving the car faster or shorting the distance between Santa Clara and San Francisco. We took the second approach. We fused the instruction together, so we need to execute less instruction. And for more information, stay tuned. PAUL OTELLINI: Since I live in San Francisco, I can't wait for Mooly to shorten the distance between the two cities. The next area I wanted to move to is Itanium. And in fact, talk a little bit about the requirements for the enterprise and enterprise computing. From our perspective, there are three fundamental requirements driving enterprise computing today. The first, of course, is performance. This is one of the few segments out there that demands insatiable performance to compute the insatiable amounts of data that corporations and businesses worldwide are collecting, whether it's databases or mining or transaction processing or supply-chain management. IT loads are a function of the overall data for a business, the overall data is growing almost exponentially in terms of the collection of data day in and day out. So we continue to need very, very high performance machines here. The second feature we need is scalability, the ability to scale out and scale up, to add overall capacity to meet a given business' needs. And I think it's interesting to reflect on what enterprises need having lived through the boom and bust period of the dot-coms. And one way to do that is go back ten years and look at a typical enterprise configuration and look at it today. Ten years ago, people focused on inventory and manufacturing data and finance, and today you see the computing environment in a given enterprise encompassing much, much more. HR data, business intelligence data, supply-chain management data, online sales and so forth. The trend towards e-Business is not dead. It's continuing at a very robust pace. All businesses will move to e-Business, in our view, because it is quite simply the most efficient way to do business in the long run. Scalability of your enterprise computing environment is critical to that movement. The third thing that enterprise computers need is availability. If you look at the task of an IT manager, managing his servers and keeping them up 24 hours a day, 7 days a week, 52 weeks a year is job 1. And the requirements for business consumers are 5 to 10x below that of enterprise servers. They need to be up all the time. Intel has been working on an architecture that we embody in a product called Itanium that is meant to address the needs, the computing needs, of the enterprise going forward. In terms of performance, we have a number of technologies that we'll talk to you about in this conference, multithreading and multicore capabilities in the product line. But one of the things we did in the early architecture of Itanium was to design it to be able to seamlessly add a number of execution units as the transistor budgets allowed us to do over time. What do I mean by this? If you go back and look at a block diagram of the first Itanium processor, the processor that was code named Merced, that processor had the level 3 cache sitting outside the microprocessor because we were constrained on transistor budget. It had nine -- I'm sorry, it had for integer units, a number of instruction ports, nine of them, built into the processor and gave us relatively high performance. The second generation of the processor is code named McKinley. For McKinley, we incorporated the level 3 cache onto the die. We took the issue ports up from 9 to 11, and we took the integer units up from four to 6, a 50 percent increase in integer units. The total transistor count of McKinley is 240 million transistors. But we looked forward in terms of Moore's Law to what capabilities we have, and it's very clear to us that not only can we continue to increment various units on the microprocessor; we can also put multiple cores onto a given piece of silicon to get optimal enterprise level performance. The second thing we are focused on in this architecture is scalability. And we talked in the past about our I/O substructures. You'll hear more about InfiniBand on Thursday. I think one of the best ways to measure the scalability of the Itanium architecture is to look at what people are doing with it. In the last couple of weeks, there was an announcement by the National Science Foundation. There was a contract for $53 million to four U.S. research institutions to build the first tera-scale facility. This is a distributed facility for super computing that will deliver 13.6 teraflops, the world's largest and fastest computer. This machine will be comprised of over 1,000 servers supplied by IBM Corporation, and those servers will embody over 3300 McKinley processors delivering this performance. Scalability par excellence. And then a third feature of the architecture is availability. We put quite a bit of time into the reliability aspects of this machine. We designed in new features that we call machine-check architecture that look at not just error correction and detection, but logging of it, being able to give IT managers incremental knowledge as to where their systems are failing, even though they never really quite fail, and how to address overall reliability going forward. And we're very excited about much of this technology. And I'd like to bring out Dave and Scott now to show you how this actually works in McKinley. DAVE: Hi. Thanks. At the last IDF, you showed how only a few weeks after first silicon, McKinley systems were able to boot on three operating systems. Now, these were single-processor systems running basic applications. Here today, we have a four-way capable McKinley system built around the 870 chipset. It's running the SAP server application on IBM's DB2 database, on Windows Advanced Server Limited Edition*. Clearly, we've come a long way in the last six months. PAUL OTELLINI: Absolutely. DAVE: Over here, we have a Pentium 4 processor client that's running the mySAP* GUI on Windows XP*. And taken together, these two systems can perform end-to-end SAP transactions. I'd like to direct your attention to the screen on the right. We're putting a massive load on this database. And you'll see when this begins, this is called a table reorg. This is something that IT managers might do or a system administrator might do once a month. It's very system intensive. And the chart on the bottom now shows that once it's begun, it's starting kicking the system's resources into high gear. Also, I'd like to direct your attention to a field called "logical reads." We're going to go ahead and reset this to zero. We'll come back to this a little later on to show just how much progress we've made on the database. Okay. We're going to use this stack to demonstrate some of the advanced capabilities of the Itanium processor's advanced architecture. This is a capability that allows a very tight cooperation between the processor, the system firmware, and the operating system to provide intelligent error management. And we're going to throw some problems at the system to see how it reacts. And that'll be a good test of our machine check architecture. Okay. We're going to do this over here using these ribbons, ribbon cables, that are attached on one end to our test system that's injecting the errors, and on the other end through our server system. I'd like to direct our attention back to your right-hand screen. This is going to be a graphical depiction of the various parts of our server system that we're going to inject errors into. You have the processors, the system bus, the chipset, as well as some of the major interconnects. Now, on the screen on your left, you will see the output of the server screen. It's going to stay the output of the server screen as we inject these errors. And it'll show them being corrected just as the operating system and the system administrator would see. All right. Now the fun starts. So let's begin by throwing some memory errors at the system. Now, on your graphical user interface over there on the right-hand side, you can see what happens. The error gets corrected before it hits the processors. And on your left-hand side of the screen, you can see that not only has the system handled this error, but it's actually showed it to the operating system and on the screen for the system administrator to see. As you can tell, it knows all the details of exactly where the error occurred. This is just a simple error that got detected by the hardware. If there were a more complex error, the operating system would actually be enrolled in facilitating the recovery. So proof positive that we can detect and correct and the operating system can see it. Now, we have several other errors we can show. We're going to inject a front-side bus error, we're going to inject an error in the processor cache and an error in the scalability port. All at once. We're going to bombard the system so you get more proof of what's happening. One thing I wanted to point out, on the scalability port, we have an additional mechanism called link-level retry, which when it finds a packet that's bad on the scalability port, it'll actually retry it to get clean data. This is very important as we scale our systems with multiple processors, we have many, many exposed interconnects. And now we have a secondary level of containment there and detection. Great. I think you get the idea. You've seen what we've done here. Let's try something a little different. Now, McKinley systems have redundant hot-swappable fans. So it's very unlikely you're ever going to face this situation. But for the sake of argument, let's say multiple fans fail. How would the system react to this? Well, we're going to use this device over here to show that scenario exactly. As we raise the temperature on this processor, this device is actually connected to one processor, we've set a threshold of 50 degrees Celsius just to save a little bit of time. You'll see some activity on both screens. First, on this screen, you have an issue rate monitor that reads 100 percent when we are getting the instruction throughput that we would expect, given the load we've put on the system. Now let's go back and actually raise the temperature past 50 degrees. And you can see that it's starting to rise. Now, as it gets closer -- that's the red numbers on the top. As they get closer to 50 degrees, you will see something on both screens. It's almost there. On the left-hand screen, you'll see that the system has detected that there's an abnormal condition, in this case, a thermal condition. And it's also reported that the processor has gone into a thermal throttle mode. Basically, the processor has shut off some of its pipelines to compensate for this. So in most cases, without this capability, the system could fail. But McKinley has a feature called enhanced thermal management that notifies the system and takes action to keep the system up and running. You can see that the throughput has gone down a little bit. It is no longer at 100 percent. So there's been some compensation there. Now, let's go back to our client screen. And we will see what's happening with the database while all of this is happening. First of all, you can see that the chart is still moving. We still probably have database transaction. But let's verify that. All right. Let's go to our logical reads and refresh that and see just how many have occurred since we've been doing that. As you can tell, we have had hundreds of thousands of logical reads while all of this has been happening. PAUL: Now, why don't we do that one more time to refresh it just so you can see it's actually still running, still doing logical reads. And you can see we've had several thousand more in just the few moments that we've done this. So what does this show us? The client and the server are still attached, still able to talk to each other, still up and running, and still able to perform transactions. What this means is we can continue running our business even as the IT department works to get the system back up to 100 percent efficiency. DAVE: So, Paul, this is demonstrating strong compatibility between the Itanium processor and McKinley. And this is just a small set of the various data integrity capabilities built into the Itanium processor family that provides the framework for advanced performance, scalability, reliability functions. And all this is built on an extensible machine check architecture that provides maximum customization capabilities for our customers. PAUL OTELLINI: Fantastic. Thanks. DAVE: Thanks, Paul. (Applause.) PAUL OTELLINI: As you can see, we've come an awful long way in the six months or so that we've had first silicon on McKinley. This architecture just keeps getting better. It's starting to really flesh out to be able to deliver on all of the promises -- reliability, availability, performance, scalability -- that we architected into it when we first set out here. The last area I'd like to talk about in terms of moving beyond gigahertz is a new technology from Intel called Hyper-Threading. And what Hyper-Threading is really all about is our continuing quest to improve performance through increasing degrees of parallelism. Now, we've seen how this works in servers for a number of years. Many, many servers are multiprocessor. Most operating systems and most applications are already threaded to handle by multiprocessor systems. The applications take advantage of the performance that can be thrown at them. We're starting to see a very large number of workstations also move towards multiprocessing model. Their applications also take advantage of the kinds of compute power that multiprocessing can bring. The question is when will parallelism and multiprocessing move to the desktop, which has a very severe cost constraint embedded into it. Well, if you think back to the application usage that I described earlier, while few of the operating systems and virtually none of the applications on the desktop today are multithreaded, you can start imagining scenarios where you have these compute-intensive tasks running in parallel that would require either a second processor to deliver the performance or incremental performance out of the processor that's there. Think back to the video and that kid talking about running 20 things at once on his PC and then having it freeze up. How do we deliver him more performance? Intel has been working on parallelism in our architecture for many, many years. And what we are really looking at now is the evolution of that technology. First generation parallelism was really embedded in the 486. It was an instruction-level parallelism, and the 486 gave us one instruction per clock. As the technology moved on, we moved to Superscaler machines with the Pentium, Pentium II, and Pentium III architectures, and we were able to go to two and three instructions per clock, still using instruction level parallelism. But in that second generation, we also brought into the architecture the capability for thread-level parallelism in the Pentium Pro. It was the first time our processors were able to run in an SMP, or symmetric multiprocessing mode, where you had one thread per processor, still three instructions per clock. Now, as we move into the third generation, we still have instruction-level parallelism increasing. The Pentium 4 architecture gives us three instructions per clock but substantially more frequency headroom to get more performance in every given cycle. With the Itanium family and the EPIC architecture, we introduced a different degree of parallelism. It's called explicitly parallel instruction computing and what it focuses on is delivering compiler-optimized parallelism to be able to deliver up to six instructions her clock in an optimized environment. So the question is can we start bringing the advantages of this very high degree of parallelism into lower price points in the marketplace. And in fact, that is exactly our plan. We have a product code named Foster that will be introduced next year as a Xeon product and this is based on the Pentium 4 microarchitecture, but built into this microarchitecture is increasing degrees of thread-level parallelism. And by that I mean that we create the environment where essentially we can deliver two threads per processor so that the software looking at a single given microprocessor will see it as two separate processors, two virtual processors, and deliver parallel instructions to it, increasing the overall performance. We call this technology Hyper-Threading. And in layman's language, Hyper-Threading is simply multiprocessing or degrees of multiprocessing on a chip. In the past, if you needed or wanted the performance of multiprocessing, you put two processors or more into a given system, and you had to put around them the dedicated resources to take advantage of those processors. With Hyper-Threading, you can, as a user, now use this technology in different kinds of systems. For those systems that require the maximum performance, you can have multiprocessing with Hyper-Threading. For those systems that are focused on a cost point, you can deliver higher performance with a single microprocessor using shared resources. Multiprocessing on a chip. How does this work. Well, it really is all about resource utilization at the end of the day. Earlier I talked about the Superscaler architecture that was embedded in the Pentium III. Pentium III was a three instruction per clock machine. So if you consider that this block on the left-hand side of the screen represents a row of execution units with three instructions running for every clock, every time there's a dark area, it means that something couldn't execute. Either the data wasn't there because the cache wasn't filled or the software wasn't optimized to be able to capture every cycle that was available to it. In Superscaler multiprocessing, we put two of those machines next to each other. We're able to get up to 2x the performance as the code was optimized, but you're spending twice the resources. With Hyper-Threading, essentially you get much of the benefits of that multiprocessing environment, being able, again, to address that single processor as if it was two machines and have multiple parallel instructions being executed, and you end up getting more net instructions per cycle than you would have ordinarily. And lastly, to complete the analogy, you can indeed create multiprocessing systems with Hyper-Threading and get sort of the best of both worlds, get absolutely higher performance in the overall environment. Now seeing is believing. In this one, we're actually able to show you the technology today, and to do that let me bring Brad out. Good morning, Brad, how are you? BRAD: Good. How are you? PAUL OTELLINI: Good. BRAD: What I'd like to show you this morning is an example of how Hyper-Threading technology will give us improvement in an enterprise area, and will actually allow more users to be connected. So to do this, we are going to use Hyper-Threading technology. And as you mentioned just a few moments ago, Hyper-Threading technology enables systems to act as though there's more than one processor there. So what that means is this system right here will really be able to take full advantage of multithreaded applications. So if we look at this screen right here, we can see two performance metrics that we're looking at. We're looking at processor utilization. And on the left here, we're looking at connection attempts per second. So while the audience is reeling from this stunning graphical display of server performance data, let me tell you what's going on. Here we can see that both processors are quite heavily utilized. But the important takeaway point here is that the connection attempts per second on the Hyper-Threaded-enabled system is 30 percent above that of the non-Hyper-Threaded-enabled system. Now, that's a significant difference, especially if you apply that across a large organization. Next step, I'd like to show you an example of Hyper-Threading on a workstation. Here I have two identical Xeon-based processor systems. The machine on the left here is Hyper-Threading enabled. The machine over here has Hyper-Threading disabled. Looking at the CPU task meter, we can see this machine indicates the two processors are present, whereas over here, we can see that only one is present. Now what I've loaded on both these machines is a high-end digital creation tool called Maya by Alias Wavefront. This application is used by digital content creation artists for Hollywood movies, broadcast-quality videos, even high-resolution game development. Let's see how fast each of these machines can render a lifelike object in realtime. Now, again, this is the Hyper-Threading-enabled machine. We can already see this is starting to pull ahead of the machine over here. In fact, here, too, we will see about a 30 percent improvement in overall graphical rendering capability. So for those graphic artists, that's a significant difference. And one last important point I'd like to make here is that this technology is fully compatible with existing multiprocessor aware applications. What that means is software applications that are multithreaded will be able to be used by Hyper-Threaded applications on Hyper-Threaded-enabled systems. PAUL OTELLINI: That's fantastic. This is a very exciting technology. I think this represents one of the frontiers of microprocessor development. You'll see more and more use of this throughout the Intel product line, starting first in servers and workstations next year, and ultimately, we'd love to move this technology on to the desktop for the very obvious benefits that I described earlier. Now, where does all this leave us? Let's get back to Mr. Kettering's axiom. We've got to build great products that people want to buy. In many ways, this is exactly the essence of IDF, especially now, in a difficult economic environment for our industry. Growth for us as an industry is not a given. We have to earn every incremental unit that we sell into the marketplace. Intel is going to continue to invest heavily in providing you and bringing to the market the templates for growth. You'll see a microprocessor road map over the next few days that is nothing less than stunning in terms of our ability to take these new technologies and deploy them into a variety of segments over the next couple of years, chipsets that take advantage and personalize all of these processors for the compute tasks at hand, platform initiatives to improve the overall state of the computing environment, and the tools and compilers and infrastructure that you as developers all need to build better products. So I think in all of this, there are significant opportunities for you as co-developers of great products with us. The first one is very simply to take advantage of the raw MIPs. We introduced 2 GHz yesterday. You saw three and a half gigahertz today. Four gigahertz is on the horizon. We will continue to throw more and more processing power at the users. It's also important for those of you in software, and 60 percent of the classes at this forum have to do with software development, to begin to innovate around increasing -- these increasing degrees of parallelism in the multiple architectures I described. Thread your applications, your drivers, your operating systems to take advantage of this relatively free performance. There are a number of new technologies coming out for mobile computing. We'd like to see you utilize them to deliver on the promise of anytime, anywhere, seamlessly connected, wireless computing that lasts all day and beyond. Lastly, I'd like to ask you to help us address what I think is the single largest growth opportunity for our industry, that is, the buildout of the enterprise for e-Business, particularly on the backs of the Itanium processor family. It's very clear to me that we have a tremendous opportunity in front of us and that together we can build great products and reignite the growth that the industry so badly needs. With that, I'd like to thank you this morning and bring out our key partner in helping build great products, a man that represents Microsoft Corporation, Jim Allchin. (Applause.) ----- Supercomputing Online thanks the Intel Corporation for allowing us to bring this transcript to our readers. Visit www.intel.com -----