Issue 4, 18th December 1995: RISC's New Foe, VLIW
So why is this, and what is VLIW?
In the older, CISC (Complex Instruction-Set Computer) style, the microprocessor provides complex instructions to support high-level language features as directly as possible, such as array indexing with bounds checking. Experience has shown that this has tended to make compilation of high-level languages more rather than less difficult, because it is not always clear when it is more efficient to use one complex instruction instead of several simpler ones, and so compiler writers often opted for the simpler ones. The problem is, the whole CPU is running slower than it otherwise might in order to be able to support to the most complex instructions, so even if you don't use them, you are still paying for them. The most widely-used CISC family of general-purpose CPUs is almost certainly Intel's x86 range, now several generations on from the initial 16/8-bit 8088, itself based on the 8-bit 8080 and before that the 4-bit 4040. It is a credit to Intel's ingenuity, technical skill, and the huge amount of money it has spent on the development of this line, that given the CISC handicap, and complete backward binary compatibility of code to the 8086, the Pentium Pro, is still up with the front runners in CPU technology. But Intel realises that its on that slippery slope of diminishing returns. Keeping up with the leaders will cost progressively more with CISC than other architectures, and ultimately that will drive customers away.
RISC (Reduced Instruction-Set Computer) architecture is nominally a particular style of CPU design involving register windowing and a reduced complexity of CPU compared to CISC designs; in practice it is the reduction in complexity that is important, because that allows all instructions to run somewhat faster, and thus you get more bangs for your silicon buck. The downside is that the compiler writers need to work harder, so the complexity moves into the software. But the compiler's choices are simpler, and so usually the compiler can do a better job of making use of the CPU's capabilities than with CISC, and, in general, RISC systems do seem to do better than equivalent-cost CISC systems in terms of performance. However, RISC CPUs such as Sun's SPARC are now having to resort to all sorts of tricks to run fast enough, such as running multiple instructions in parallel using `superscalar' techniques, and the latest SPARC family to be found in the new Sun ``Ultra'' contains some very CISC-like instructions for assisting with MPEG image decompression, for example. So even RISC is not RISCy enough, it seems.
So, step up to the microphone, VLIW. The concept of VLIW has been around for a while. There have even been (semi)-commercial applications such as the Multiflow Trace, so this is no headline-friendly vapourware.
VLIW essentially takes the main RISC concept even further---move more complexity from the hardware to the compilers that generate code for it, and allow more parallelism while you're at it. The basic principle is to provide even simpler and more primitive operations for the the compiler to work with, which will be almost at the level of the microcode of some CISC designs. The compiler will instruct the CPU to open these connections and latch this result, rather than add two registers together and store the result in another register. In order to do this the compiler has to have an intimate understanding for how the target processor works, but in return can pull every internal lever and push every button simultaneously, if clever enough, to have as much of the CPU active at once as possible, thus delivering the maximum performance possible. To achieve this the compiler needs to push and pull an awful lot of levers at once, thus the long instruction word to accommodate all the `lever' control signals. But the argument is that it is ultimately better to do this in software where exploiting a new and better performance-boosting idea (or possibly, fixing a bug) requires only a code change, not a new release of the hardware. The bigger potential problem is that the intimate knowledge of the CPU is only good for one release of the CPU, and there will not be full binary compatibility between one generation and the next such as Intel have achieved with the x86 range and Sun with the SPARC range, for example. And users do not want to have to buy new copies of their software every time they buy a faster chip. This is one reason that Intel has customer loyalty, whereas Sun got into hot water when they moved from the Motorola-based Sun-3 family to SPARC.
There are two ways that this problem can be tackled. Firstly, by providing a subset of operations guaranteed to remain compatible for a few chip generations. Portable code can use these instructions, and for example, a booting operating system can use these while figuring out what sort of CPU it is running on. The other possibility is the use of a portable interpreted code, possibly compiled on the fly, such as Java, Sun's new networkable, multithreading, C++ variant. Although visible primarily as a Web-programming language right now, the possibilities for use as a general portable language to be run on whatever your local machine happens to be, look very good. Download your new version of Microsoft Word in Java form form the Net (paying for it in e-cash, naturally), and run it on your new VLIW machine alongside your new financial-modelling software, also in Java. This sort of application means that binary compatibility is almost irrelevant providing your interpreter/compiler can generate the appropriate binary if top speed it required (and maybe it won't be for a word-processor).
There are other ways of implementing some or all of the VLIW concept. For example, you can glue a large number of existing processors together and have the compiler generate instructions to operate them all at once. This seems to be what HP has in mind. You use some sort of barrier synchronisation technique to get all your different processors in sync where the compiler cannot tell exactly how long a sequence of instructions will take on any one of the processors.
Expect to see VLIW technology in top-end workstations and compute servers from all the major (UNIX) manufacturers over the next five years, and consider brushing up those Java skills to program them. I hope to take a look at Java in a couple of issues' time. In the next issue I will examine multi-threading.
See the glossary entry for some further VLIW links.
Damon Hart-Davis, Computing Editor
dhd@exnet.com.
15--16, London, UK. Enabling Networks for Internet Access. Tel: +44 171 610 4533.
17--18, London, UK. Accessing the Internet. Tel: +44 171 610 4533.
22--26, San Diego, CA, USA.
USENIX 1996:
Annual Technical Conference.
Everything you wanted to know about UNIX from all your UNIX heros!
22--23, Dublin, Ireland. Accessing the Internet. Tel: +44 171 610 4533.
24--26, Braga, Portugal. EUROMICRO: Fourth EUROMICRO Workshop on Parallel and Distributed Processing.
22--23, San Diego, CA, USA. NDSS '96: Network and Distributed System Security.
15--19, Honolulu, Hawaii. HiNet '96: Second International Workshop On High-speed Network Computing. IPPS '96: Tenth International Parallel Processing Symposium.
6--9, San Jose, CA, USA. ATM '96.
13--16, Budapest, Hungary. JENC7: 7th Joint European Networking Conference.
23--24, Antwerpen, Belgium. Third International Workshop On Community Networking.
27--28, Philadelphia, PA, USA. IOPADS: Fourth Annual Workshop on I/O in Parallel and Distributed Systems.
5--7, London, UK. UKCMG UK Independent IT User Forum, contact by mail for more info.
10--12, Boston, MA, USA. Second IEEE Real-Time Technology and Applications Symposium (email).
12--14, L'Aquila, Italy. Eighth Euromicro Workshop on Real-time Systems. (Mail for more info.)
17--22, Boston, MA, USA. ED-MEDIA '96, ED-TELECOM '96: World Conference on Educational Multimedia and Hypermedia and World Conference on Educational Telecommunications. (Mail AACE.)
10--13, Monterey, CA, USA. Fourth Tcl/Tk workshop.
29--2 Aug, Morgantown, WV, USA. Software Reuse Conference.
31--3 Aug, New Brunswick, NJ, USA. CAV '96: Computer-Aided Verification.
26--30, Poitiers, France.
Eurographics '96:
Graphics, Virtual Reality, Graphics Highways.
27--29, Lyon, France. Euro-Par'96 Workshop #5: Parallel Languages and Programming.
3--6, Boulder, CO, USA. VL '96: IEEE Symposium on Visual Languages.
16--20, Berlin, Germany.
PARCELLA '96:
Seventh International Workshop on Parallel Processing by Cellular
Automata and Arrays
25--27, Dijon, France. PDCS'96: Parallel and Distributed Computing Systems.
9--11, Bologna, Italy. WDAG-10: 10th International Workshop on Distributed Algorithms.
16--19, San Francisco, CA, USA. WebNet-96: World Conference of the Web Society.
15--19 April 1996, Honolulu, Hawaii. HiNet '96: Second International Workshop On High-speed Network Computing.
6--9 May 1996. ATM '96. Send proposals to the Technology Transfer Institute.
10--13 July 1996, Monterey, CA, USA. Fourth Tcl/Tk workshop.
2--4 September 1996, Connemara, Ireland. Seventh ACM SIGOPS European Workshop: Systems Support for Worldwide Applications.
16--20 September 1996, Berlin, Germany. PARCELLA '96: Seventh International Workshop on Parallel Processing by Cellular Automata and Arrays.
16--20 November 1996, Cambridge, MA, USA. CSCW '96: Cooperating Communities. (Mail Mark Klein.)