ECE344 - Operating Systems
Assignment 3: Synchronization
Due Date: Sunday February 25, 2001, 11:59:59pm.
Objectives
The purpose of this lab is to get an introductory exposure to the available
synchronization primitives.
Preparation
Read and understand the man pages for mutex and semaphore.
Problems
-
Write a simple program as follows: Initialize a global integer variable
to 0. For further reference, I will call this variable x, but you
can, of course, call it anything you want. Call the function compute()
14 times. This function is declared as follows:
void compute( int * );
The object code for this function can be found here.
The argument to compute() should be a pointer to x. What
is the value of x after the 14 calls?
-
Change one of the programs you wrote for Lab 2 to create 2 threads
that each call compute() 7 times. Again, x, a globally
declared integer initialized to 0, should be passed as the argument. What
is the result of running this program, and how does it differ from the
first program?
-
Perhaps compute() should be run in a critical section. Fix your
program so that it runs correctly by using the mutex lock and unlock
synchronization operations available for the pthreads library.
-
One problem with synchronization is that it can add overhead to the running
time of a program. You are to measure how much overhead a lock/unlock pair
introduces if no waiting is required. Note that we are not interested in
the overhead of the compute() function, only the overhead of the
lock/unlock pair.
Write a program that measures the time for a lock/unlock pair using
gettimeofday() (see man 3c gettimeofday). Your program
should measure the time for 1 lock/unlock pair (do this twice) and the
average time for 10, 100, 1000, and 100000 lock/unlock pairs. To obtain
each of the average values, perform a single timing measurement of the
appropriate number of pairs, then divide by the number of pairs measured.
The output of your program must have the following format:
Time for 1 pair = YOUR VALUE
Time for 1 pair = YOUR VALUE
10 pairs, avg time = YOUR VALUE
100 pairs, avg time = YOUR VALUE
1000 pairs, avg time = YOUR VALUE
100000 pairs, avg time = YOUR VALUE
What is the time unit of these numbers? Each of these numbers measures
the same thing, so explain why you get different answers. Which number
do you expect to be closest to the actual value?
-
What do you think compute() does? What other functions does compute
appear to call?
-
a) Adapt your multiping program from Lab 2 so that it outputs
the hosts sorted by (latency) distance; i.e., the hosts that respond the
fastest should be listed first.
b) Make the following modifications to the multiping program above:
-
The new program must be called testping, and should accept one
command line parameter. The parameter will be the name an input file containing
up to 10 host names or Internet addesses, one per line.
-
Instead of running indefinitely, testping should print out 5 reports
and then exit. As before, reports should be printed every 10 seconds or
so.
-
Modify the output so that each host line appears as secs:usecs: host, and
there is an empty line after each periodic report. Note: There should be
no calls to "curses" routines or printing of special control characters.
You do not need to clear the screens!
For example, if the input file contains:
www.cs.toronto.edu
www.toronto.edu
www.eecg.toronto.edu
Then the output should look like:
0:030903: www.cs.toronto.edu
0:039859: www.toronto.edu
0:055107: www.eecg.toronto.edu
0:027880: www.cs.toronto.edu
0:056314: www.eecg.toronto.edu
0:945247: www.toronto.edu
0:037378: www.toronto.edu
0:052461: www.eecg.toronto.edu
0:054199: www.cs.toronto.edu
0:028432: www.cs.toronto.edu
0:045742: www.toronto.edu
0:064377: www.eecg.toronto.edu
0:027718: www.cs.toronto.edu
0:466636: www.toronto.edu
0:531626: www.eecg.toronto.edu
-
Consider the multiping program again. The time needed to ping
a host can vary with each ping. For some hosts, the variance of the time
needed can be quite small, but for others it can be quite large. To get
a better insight into the variance, for each host, construct a histogram
of ping times with four buckets. You are free to choose the time-spans
assigned to each bucket however you wish, as long as you do so intelligently!
After each ping, one of the buckets gets incremented by one. Adapt multiping
to output next to each host 4 numbers indicating the percentage of pings
to that host that fall into each bucket. In addition, keep a running count
of the total pings in each bucket across all hosts being pinged. At the
bottom of your output, display a statistic on the totals of each bucket
and the total over all. Note that the sum of the totals of each bucket
should equal the total overall.
Your program should be called multiping2.
-
Remember Java, your favorite programming language? Write a Java program
that implements a circular buffer, using two concurrent producing
threads and two concurrent consuming threads. Although there is no
class "Monitor" in Java, each object has its own implicit monitor that
is automatically entered when one of the object's synchronized
methods is called. A tutorial on Java threads appears here
.
Optional
-
On a multiprocessor, many read-modify-write instructions, such as Test&Set
and Exchange have negative performance effects because of the memory and
bus bandwidth they consume. Consider the following realistic scenario.
A critical region is highly contended for, and many processes wish to enter
it. They all spin on a lock variable using, say, Test&Set. Each of
these processes will in fact access the lock variable many times before
it can enter the critical region. Because these memory accesses bypass
the cache and because a memory or a bus can only service so many accesses
per second, each such access uses up a portion of the fixed bandwidth.
With many processes spinning on the same lock variable, they can easily
consume over 50% of the memory and bus bandwidth. This has a negative effect
on the performance of the program, because it slows down the process currently
in the critical region, as its bus and memory accesses are now much slower.
This is exactly the opposite of what one should be aiming for, namely to
have the process in the critical region execute faster, allowing other
processes to enter.
-
For this reason, a number of recent processors introduced new primitives
for synchronization, called load-linked and store-conditional.
Synchronization is performed using two instructions:
Load-Linked( *flag )
returns the value of the variable flag, and as a side effect,
sets a user transparent bit called the load link bit (LLbit). This
LLbit forms a breakable link between the load-linked instruction
and the subsequent store-conditional instruction:
Store-Conditional( *flag, value )
Stores value into the variable flag if and only if the
value of the LLbit is still set. If the store was successful (i.e. the
LLbit was still set) then SC returns 1, and 0 if the link was broken. A
link may be broken for a number of reasons, the most important one being
the case where the value of flag was changed since the last LL
instruction. With these LL/SC instructions, memory must be accessed by
a process waiting on a locked variable only when the value of flag
changes, and not for polling.
Your job is to write (on paper, not implement) the functions lock(flag)
and unlock(flag) that behave properly using only the LL/SC instructions,
and not Test&Sets.
Submission Instructions
In your home directory, create a directory called ece344; create
a subdirectory called as3 and then in that directory create files
with your code. You should only submit source files. To submit the assignment
you should use the submission procedure described in the policies
section.
Name your programs for problems #1-4 progx.c where x is the
problem number. Your code for #6a should be named multiping.c,
for #6b testping.c, for #7 multiping2.c, and for #8
prog8.java.
You should have a Makefile that has targets for each of the programs
(prog1, prog2,
prog3, prog4, multiping,
testping, and multiping2).