[PLUG] Multithreading

Fri Nov 21 23:59:01 UTC 2003

Chris Jantzen writes:

 > For computer architectures, classically it meant: nothing. Symmetric
 > Multi Threading, SMT, or Intel's marketing "HT" changes this a bit.

HT is just Intel's implementation of what is called "Simultaneous
Multithreading" in the academic literature.  The idea is to put two
complete CPU architecture states (registers, PC, status bits, etc.) in
one CPU, and have that CPU rapidly switch back and forth between them,
typically as often as every cycle.  The motivation is that a modern
CPU runs so fast compared to the memory that it often sits idle for
hundreds of clock cycles waiting for some datum that it's loading to
come in from main memory.  Why not be working on something else during
that time?  (The CPU equivalent of having the kernel switch to another
process while servicing a page fault.)

 > Multiprogramming with processes is simpler, safer, and much more
 > standardized.

And a lot slower.

The primary difference between multiple processors and multiple
threads (here I'm thinking of something like POSIX threads) is that
with the former, you create separate processes.  So, from the point of
view of the kernel, and the multiprogramming support hardware in the
CPU, the different processes that make up your parallel program are
indistinguishable from a collection of processes launched by different
users.  If you want them to cooperate on a single problem (usually the
goal, if you're not merely exploiting "job parallelism" by running
many independent tasks in parallel), you need to find some way for
them to share data.

The standard way of passing data back and forth in a distributed
environment is to create sockets, but this is grossly inefficient if
your application requires frequent exchange of data between the
different processes.  So the usual procedure is to create an explicit
block of shared memory, which, at the CPU level, involves editing the
page tables of the different processes so they all point to the same
block of physical memory.  (Naturally, the kernel has to do this for
you.)  Any data that you want to do must be explicitly mapped to this
region.

With a threading package, such as POSIX Threads (or Pthreads), all the
threads are in the same process space.  The kernel is not necessary
for threading to work (though some things may be done there for
greater speed).  The switching of tasks is done at the user level, by
the threading library underneath the user code.  This is a big win if
you happen to have more threads/processors than physical CPUs, because
switching threads is MUCH faster than switching processes.  (Switching
processes may require flushing cache(s) in some architectures.)

In Pthreads, splitting off a thread is just a function call.  The new
thread gets its own stack (for its automatic variables).  It shares
globals with the other thread(s).  The heap is shared, though everyone
gets their own blocks when they call malloc or new (provided the
memory allocation library is written to be "thread-safe").  The
tradeoff here is that, as you say, it is easier to write bad code
which doesn't share data properly, leading to hard-to-debug code which
is non-deterministic.  However, there are mechanisms in the libraries
to do things safely if you choose to use them rather than analyze your
code carefully, and their overheads are still less than interprocess
communication.

Kevin