Y-7 Parallel Systems / Spring 2005

Main Project:
  1. MPI implementations and collective communications for switch-based ethernet clusters (Κρεμμυδάς - Σκυβαλίδας)
  2. Parallelism in the Linux kernel (Καραγιάννης - Μελισσόβας)
  3. Simulator for multistage interconnection networks (Σαούγκος)
  4. Survey of sDSM libraries and experimentation (Μαργαρίτη)
Mini Project (due 31/5/2005):
    Write an extensive (at least 20-30 pages) report about OpenMP: what it is, history of development, description of its directives and operations, examples of how to use it etc.
    It is supposed to include a complete tutorial on OpenMP.
    Try out (code, measure and include in your report) OpenMP programs for pi, matrix times vector, matrix times matrix, in atlantis (icc) and helios.cc
Assignment 6:
  1. Problems 6.2 and 6.7 (for 6.2 a general algorithm is required, for any N)
  2. Prove Amdahl's law using Gustafson's law or vice versa
  3. Find the optimum time for performing total exchange in linear graphs under the multiport model.
  4. (bonus +1 in the final grade) Find, analyze and prove formaly an optimal algorithm for the previous problem.
Assignment 5 (due 24/5/2005):
  1. Measure message setup and per-byte time in helios.cc and in our local net. Use a simple ping-pong benchmark and message sizes of 1 byte - 64 KBytes. Plot the results and use a least squares fitting to determine setup and per-byte time (you should repeat measurements many times and average the results).
  2. Write a C program for mapping arbitrary meshes/tori to hypercubes, using (partial) gray codes. The program should ask for the dimension of the hypercube, the number of dimensions of the mesh, and the number of nodes in each dimension of the mesh.
  3. Problem 4.1 (you may assume square mesh and torus MxM).
Assignment 4 (due 26/4/2005):
  1. Write, execute and measure parallel MPI programs for matrix multiplication using checkerboard and strip partitioning (where each process calculates a set of rows of the result).
    • Repeat your measurements (at least 4 times) and average the results
    • Use 1, 4, 16, 64 processes, 512x512 integer matrices
    • Your report should contain plots of the results and discussion.
    • All prorgams are to be run in helios.cc and our local net.
  2. Reading assignment (for 19/4/2005):
    • Hoard memory allocator: Μαργαρίτη, Σαούγκος
    • Algorithms for locks and barriers: Καραγιάννης, Μελισσόβας
    • HyperCup - forming hypercubes in P2P systems: Κρεμμυδάς, Σκυβαλίδας
Assignment 3 (due 12/4/2005):
  1. Prepare a detailed report on the MESI and MOESI cache coherency protocols.
  2. Write a detailed summary of paper [4] in the "Memory consistency" course notes.
  3. Prepare a detailed report on AMD-64 and Intel Itanium processor architectures, regarding their cache coherence mechanisms and their memory model.
  4. Problems: 3.2 - 3.6
Assignment 2 (due 29/3/2005):
  1. Learn how to time in Unix; find out about high resolution timers; write a summary report (3-4 pages).
  2. Write, execute and measure parallel programs for matrix multiplication, using Posix Threads (pthreads), checkerboard partitioning and self-scheduling. Notes:
    • Repeat your measurements (at least 4 times) and average the results
    • Use 1, 4, 16, 64 threads, 512x512 integer matrices
    • Your report should contain plots of the results and discussion.
    • All prorgams are to be run in atlantis, titan and helios.cc.
    • Here is a serial program for matrix multiplication and two matrices A, B to play with.
  3. Design, implement and test a barrier primitive for pthreads (barrier_init(), barrier_wait(), barrier_destroy()). Since you can easily find code over the internet, a great deal of emphasis will be given on the detailed explanation of how your code works and on the tests you have done.
  4. Problems (on paper only, don't submit programs): 8.1, 8.3-8.6, 8.8, 8.10-8.14
Assignment 1 (due 16/3/2005):
  1. Create your own home page for this course
  2. Survey all parallel UoI machines in Computer Center & Dept. of Computer Science (IP address/name, model type, # processors, O.S., total memory)
  3. Obtain accounts on helios.cc and atlantis.cs.
  4. Problems: 1.1, 1.2 & DBPP 1.1, 1.2, 1.3, 1.6, 1.7

Course notes/material:


Student / URL
Αναστάσιος Καραγιάννης ktasos
Νίκος Κρεμμυδάς nkremmid
Παναγιώτης Σκυβαλίδας
Σπυριδούλα Μαργαρίτη
Σπύρος Μελισσόβας
Δημήτρης Σαούγκος

Course outline:
Introduction: parallel systems and their programming
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA


  • The instructor's
  • I. Foster, Designing and Building Parallel Programms, DBPP, available on-line
  • V. Kumar et al, Introduction to Parallel Computing: Design and Analysis of Algorithms
  • D. Culler et al, Parallel Computer Architecture: A Hardware/Software Approach

The library has many relevant technical journals & conference proceedings. Keep an eye on the following general-purpose magazines, too:
IEEE Computer, IEEE Micro, ACM Computing Surveys

A few relevant links:
Some relevant companies:
AMD, Compaq, Cray, DEC, Fujitsu, HP, IBM, Intel, Meiko, MIPS, Motorola, NEC, SGI, Sun