Four pthread exercises building from embarrassingly-parallel work up to a custom read-write lock around a sorted linked list. Each exercise ships a working implementation, a runtime-collection script, runtime data, and plotted graphs.
First homework of the Parallel Programming course at the University of Athens (Department of Informatics & Telecommunications). Solo project.
Part of a three-piece arc:
- parallel-systems-hw1 (you are here) — pthread
- parallel-systems-hw2 — OpenMP
- parallel-systems-hw3 — MPI
Classical embarrassingly-parallel problem: throw N random darts at the unit square and count how many land inside the unit circle. Each thread runs its own dart batch with a per-thread seed; the final count is summed in the main thread.
cd excersize1
make
./bin/montecarlo <number_of_darts> <number_of_threads>
./scripts/collect_data.sh # sweep darts × threads, dump CSV
python scripts/make_graphs.py # speedup-vs-threads plots into data/graphs/A thread-safe linear congruential generator (my_rand.c) replaces rand/drand48 because the libc PRNGs share state across threads.
Same target — increment a shared counter N times across k threads — implemented two ways and timed back-to-back inside a single program:
pthread_mutex_lock/unlockaround the increment.atomic_fetch_add(&shared_variable, 1)from<stdatomic.h>.
cd excersize2
make
./bin/mutex_vs_atomic <increases_per_thread> <number_of_threads>
./scripts/collect_data.sh
python scripts/make_graphs.pyThe data sweep shows the gap widening with thread count — the mutex version's lock contention dominates while the atomic version's hardware-level LOCK XADD keeps scaling.
Same workload (each thread increments its own slot) implemented twice:
array_increasing.c—long long shared_array[num_of_threads]written by threadiat indexi. Adjacent slots → same cache line → constant invalidation traffic.array_increasing_improved.c— wraps each counter instruct { long long var; char padding[64 - sizeof(long long)]; }, padding each slot to a 64-byte cache line so threads never share a line.
cd excersize3
make # builds both binaries
./bin/array_increasing <iters> <threads>
./bin/array_increasing_improved <iters> <threads>
./scripts/collect_data.sh
python scripts/make_graphs.pyThe runtime gap between the two versions at high thread counts is the false-sharing penalty made visible.
A multi-threaded sorted linked list of integers supporting insert, member, delete, with the access pattern controlled by user-supplied percentages (insert / search / delete split). Each thread runs total_ops operations of types drawn from those percentages.
Two custom read-write lock implementations ship alongside the main program:
my_rwlock_a.c— variant A of the RW lock.my_rwlock_b.c— variant B (different fairness / starvation behaviour).
cd excersize4
make
./bin/pth_ll_rwl_a <thread_count> # prompted interactively for the other params
./bin/pth_ll_rwl_b <thread_count>
./scripts/collect_data.sh
python scripts/make_graphs.pyThe scaffold of the main program follows Pacheco's Introduction to Parallel Programming §4.9.3. The interesting code is in the two my_rwlock_*.c files — that's where the assignment actually asked for original work.
- All four exercises share a
timer.hmacro (GET_TIME) for wall-clock measurement. - Data sweeps for each exercise live in
excersize{1..4}/data/as CSV; plots inexcersize{1..4}/data/graphs/. - Assignment briefs (PDFs) are preserved for exercises 1 and 2; the briefs for exercises 3 and 4 were not preserved.
MIT — applies to my own code in this repo. Assignment-distributed materials (e.g. the linked-list scaffold lifted from Pacheco's textbook in exercise 4) retain their original course / textbook copyright.