Issue 5, 15th January 1996: Multithreading
[Computing, Law, Music, Science and Technology, SciFi, Sport, The Unexplained, UP, icons]
Firstly, multithreading can help a program doing lots of complex I/O activity. When in the bad old days you would have used the select() system call (or the X event loop of a typical windowing application) and a lot of sleight-of-hand, you can now stop explicitly multiplexing all the I/O through that one point in the the program and instead have a number of daemon threads, one to handle each stream of I/O activity. These threads then run inside the same address space as your main program, and safely exchange data with the main program and one another using mutex locks and other such concurrency-control mechanisms. Because these threads run inside the same address space (the same process from the UNIX point of view) the operating system makes no effort to protect the threads from one another, which saves an awful lot of effort since they were all written by the same person and are all `on the same side' as it were and don't need or want to be protected from one another; thus exchanging data between threads is efficient. If you are wanting to run something like an HTTPD server for the Web, you want the servicing of a request to be as cheap as possible (the CERN server, for example, fork()s off a new process to handle each request, and is consequently much slower than those that don't).
And indeed, I/O bound processes are often candidates for multithreading. Doing a recursive directory listing? All those stat() calls to discover if a directory entry is a directory, a plain file or something else are blocking (they make your process wait even though your CPU is not being used), and if you are searching a network-mounted filesystem you will pay for network and fileserver latency as well as disc latency. Spawn a new thread for each subdirectory you search and you can keep the remote fileserver and the network flat out, and overcome some disk latency, and you'll possibly find that your search runs several times faster than the single-threaded version without paying a penny more for your client or your fileserver.
Multithreading can really pay off when you have multiple processors to hand. Since it is getting more difficult to increase the speed of individual CPUs, UNIX vendors and others are using multiple-processor configurations to get more overall system speed, and a popular way of using this is with SMP (Symmetric Multi-Processing), where any part of the operating system or user process may run on any physical processor in the system. In particular, this means that multithreaded components can use multiple CPUs at once. The user need not be aware of this except for increased performance and processor utilisation figures in excess of 100%!
We can expect to see products such as spreadsheets, multi-media packages, and other CPU-intensive programs making use of this type of technology soon, and the tools are just becoming available in a stable form. Sun's own compilers just about produce reliable multithreaded code, and the debuggers are rocky but beginning to look professional and dependable too.
Java supports multithreading for the reasons given above, and because in practice it's the only mechanism that Java will have for introducing parallelism into its code (it won't have access to lower-level features like fork()); more on this in the upcoming Java issue.
This is all a few years behind the supercomputing community, where they've been working with these parallelisation and debugging problems for years. For example, ParaSoft, have a suite of products for many of the big and expensive machines that you and I will never get to play with, but now also for Suns, HPs and the like.
So multithreading may be worthwhile in I/O- or CPU- intensive applications, ``But can I use it?'' you are now asking. Well, yes, if your system has light-weight processing (LWP) or multithreading support. Solaris 2.4 and higher have reasonably mature thread/LWP support for example, which you can make use of with anything from GNU C upwards.
And with the final arrival of the IEEE POSIX P1003.1C pthreads standard (it was known as P1003.4a) we can expect commerical tools to write portable multithreading code to start appearing at reasonable prices, and even in dependable shareware/PD form.
So what's the downside? Well, as with almost any performance-optimising technique, it turns out you have to be much more careful writing your code to make use of multithreading. Those global variables are really going to cause you trouble now, and even some standard library routines you've been using for years turn out no longer to be safe to use. But if you start by multithreading only those performance-critical parts of your code, the new constraints needn't bite too deep. All you need do now is buy a good book on multithreading!
Look in the glossary for more references.
Damon Hart-Davis, Computing Editor
dhd@exnet.com.
(Topics coming up in future issues, let me know of others you'd like covered: Java, Systolic processing, biocomputing, ATM.)
22--26, San Diego, CA, USA. USENIX 1996: Annual Technical Conference. Everything you wanted to know about UNIX from all your UNIX heros!
22--23, Dublin, Ireland. Accessing the Internet. Tel: +44 171 610 4533.
24--26, Braga, Portugal. EUROMICRO: Fourth EUROMICRO Workshop on Parallel and Distributed Processing.
22--23, San Diego, CA, USA. NDSS '96: Network and Distributed System Security.
15--19, Honolulu, Hawaii. HiNet '96: Second International Workshop On High-speed Network Computing. IPPS '96: Tenth International Parallel Processing Symposium.
6--9, San Jose, CA, USA. ATM '96.
13--16, Budapest, Hungary. JENC7: 7th Joint European Networking Conference.
23--24, Antwerpen, Belgium. Third International Workshop On Community Networking.
27--28, Philadelphia, PA, USA. IOPADS: Fourth Annual Workshop on I/O in Parallel and Distributed Systems.
5--7, London, UK. UKCMG UK Independent IT User Forum, contact by mail for more info.
10--12, Boston, MA, USA. Second IEEE Real-Time Technology and Applications Symposium (email).
12--14, L'Aquila, Italy. Eighth Euromicro Workshop on Real-time Systems. (Mail for more info.)
17--22, Boston, MA, USA. ED-MEDIA '96, ED-TELECOM '96: World Conference on Educational Multimedia and Hypermedia and World Conference on Educational Telecommunications. (Mail AACE.)
10--13, Monterey, CA, USA. Fourth Tcl/Tk workshop.
29--2 Aug, Morgantown, WV, USA. Software Reuse Conference.
31--3 Aug, New Brunswick, NJ, USA. CAV '96: Computer-Aided Verification.
26--30, Poitiers, France. Eurographics '96: Graphics, Virtual Reality, Graphics Highways.
27--29, Lyon, France. Euro-Par'96 Workshop #5: Parallel Languages and Programming.
3--6, Boulder, CO, USA. VL '96: IEEE Symposium on Visual Languages.
16--20, Berlin, Germany. PARCELLA '96: Seventh International Workshop on Parallel Processing by Cellular Automata and Arrays
25--27, Dijon, France. PDCS'96: Parallel and Distributed Computing Systems.
9--11, Bologna, Italy. WDAG-10: 10th International Workshop on Distributed Algorithms.
16--19, San Francisco, CA, USA. WebNet-96: World Conference of the Web Society.
15--19 April 1996, Honolulu, Hawaii. HiNet '96: Second International Workshop On High-speed Network Computing.
6--9 May 1996. ATM '96. Send proposals to the Technology Transfer Institute.
10--13 July 1996, Monterey, CA, USA. Fourth Tcl/Tk workshop.
2--4 September 1996, Connemara, Ireland. Seventh ACM SIGOPS European Workshop: Systems Support for Worldwide Applications.
16--20 September 1996, Berlin, Germany. PARCELLA '96: Seventh International Workshop on Parallel Processing by Cellular Automata and Arrays.
16--20 November 1996, Cambridge, MA, USA. CSCW '96: Cooperating Communities. (Mail Mark Klein.)