Wednesday, April 28, 2010

System V Semaphores versus POSIX Semaphores

POSIX named and unnamed semaphores

POSIX Semaphores
The potential learning curve of System V semaphores is much higher when compared to POSIX semaphores. This will be more understandable after you go through this section and compare it to what you learned in the previous section.

To start with, POSIX comes with simple semantics for creating, initializing, and performing operations on semaphores. They provide an efficient way to handle interprocess communication. POSIX comes with two kinds of semaphores: named and unnamed semaphores.

Named Semaphores
If you look in the man pages, you'll see that a named semaphore is identified by a name, like a System V semaphore, and, similarly, the semaphores have kernel persistence. This implies that these semaphores, like System V, are system-wide and limited to the number that can be active at any one time. The advantage of named semaphores is that they provide synchronization between unrelated process and related process as well as between threads.

A named semaphore is created by calling following function:

sem_t *sem_open(const char *name, int oflag, mode_t mode , int value);
Name of the semaphore to be identified.
Is set to O_CREAT for creating a semaphore (or with O_EXCL if you want the call to fail if it already exists).
Controls the permission setting for new semaphores.
Specifies the initial value of the semaphore.
A single call creates the semaphore, initializes it, and sets permissions on it, which is quite different from the way System V semaphores act. It is much cleaner and more atomic in nature. Another difference is that the System V semaphore identifies itself by means of type int (similar to a fd returned from open()), whereas the sem_open function returns type sem_t, which acts as an identifier for the POSIX semaphores.

From here on, operations will only be performed on semaphores. The semantics for locking semaphores is:

int sem_wait(sem_t *sem);
This call locks the semaphore if the semaphore count is greater than zero. After locking the semaphore, the count is reduced by 1. If the semaphore count is zero, the call blocks.

The semantics for unlocking a semaphore is:

int sem_post(sem_t *sem);
This call increases the semaphore count by 1 and then returns.

Once you're done using a semaphore, it is important to destroy it. To do this, make sure that all the references to the named semaphore are closed by calling the sem_close() function, then just before the exit or within the exit handler call sem_unlink() to remove the semaphore from the system. Note that sem_unlink() would not have any effect if any of the processes or threads reference the semaphore.

Unnamed Semaphores
Again, according to the man pages, an unnamed semaphore is placed in a region of memory that is shared between multiple threads (a thread-shared semaphore) or processes (a process-shared semaphore). A thread-shared semaphore is placed in a region where only threads of an process share them, for example a global variable. A process-shared semaphore is placed in a region where different processes can share them, for example something like a shared memory region. An unnamed semaphore provides synchronization between threads and between related processes and are process-based semaphores.

The unnamed semaphore does not need to use the sem_open call. Instead this one call is replaced by the following two instructions:

sem_t semid;
int sem_init(sem_t *sem, int pshared, unsigned value);
This argument indicates whether this semaphore is to be shared between the threads of a process or between processes. If pshared has value 0, then the semaphore is shared between the threads of a process. If pshared is non-zero, then the semaphore is shared between processes.
The value with which the semaphore is to be initialized.
Once the semaphore is initialized, the programmer is ready to operate on the semaphore, which is of type sem_t. The operations to lock and unlock the semaphore remains as shown previously: sem_wait(sem_t *sem) and sem_post(sem_t *sem). To delete a unnamed semaphore, just call the sem_destroy function.

The last section of this article has a simple worker-consumer demo that has been developed by using a POSIX semaphore.

Differences between System V and POSIX semaphores

There are a number of differences between System V and POSIX semaphores.

One marked difference between the System V and POSIX semaphore implementations is that in System V you can control how much the semaphore count can be increased or decreased; whereas in POSIX, the semaphore count is increased and decreased by 1.
POSIX semaphores do not allow manipulation of semaphore permissions, whereas System V semaphores allow you to change the permissions of semaphores to a subset of the original permission.
Initialization and creation of semaphores is atomic (from the user's perspective) in POSIX semaphores.
From a usage perspective, System V semaphores are clumsy, while POSIX semaphores are straight-forward
The scalability of POSIX semaphores (using unnamed semaphores) is much higher than System V semaphores. In a user/client scenario, where each user creates her own instances of a server, it would be better to use POSIX semaphores.
System V semaphores, when creating a semaphore object, creates an array of semaphores whereas POSIX semaphores create just one. Because of this feature, semaphore creation (memory footprint-wise) is costlier in System V semaphores when compared to POSIX semaphores.
It has been said that POSIX semaphore performance is better than System V-based semaphores.
POSIX semaphores provide a mechanism for process-wide semaphores rather than system-wide semaphores. So, if a developer forgets to close the semaphore, on process exit the semaphore is cleaned up. In simple terms, POSIX semaphores provide a mechanism for non-persistent semaphores.

Memory Leaks using Valgrind

How do I check my C programs under Linux operating systems for memory leaks? How do I debug and profiling Linux executables?

You need to use a tool called Valgrind. It is memory debugging, memory leak detection, and profiling tool for Linux and Mac OS X operating systems. Valgrind is a flexible program for debugging and profiling Linux executables. From the official website:

The Valgrind distribution currently includes six production-quality tools: a memory error detector, two thread error detectors, a cache and branch-prediction profiler, a call-graph generating cache profiler, and a heap profiler. It also includes two experimental tools: a heap/stack/global array overrun detector, and a SimPoint basic block vector generator. It runs on the following platforms: X86/Linux, AMD64/Linux, PPC32/Linux, PPC64/Linux, and X86/Darwin (Mac OS X).

How Do I Install Valgrind?
Type the following command under CentOS / Redhat / RHEL Linux:

# yum install valgrind
Type the following command under Debian / Ubuntu Linux:

# apt-get install valgrind
How Do I use Valgrind?
If you normally run your program like this:

./a.out arg1 arg2

/path/to/myapp arg1 arg2
Use this command line to turn on the detailed memory leak detector:

valgrind --leak-check=yes ./a.out arg1 arg2
valgrind --leak-check=yes /path/to/myapp arg1 arg2
You can also set logfile:

valgrind --log-file=output.file --leak-check=yes --tool=memcheck ./a.out arg1 arg2
Most error messages look like the following:

cat output.file
Sample outputs:

==43284== Memcheck, a memory error detector
==43284== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==43284== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==43284== Command: ./a.out
==43284== Parent PID: 39695
==43284== Invalid write of size 4
==43284== at 0x4004B6: f (in /tmp/a.out)
==43284== by 0x4004C6: main (in /tmp/a.out)
==43284== Address 0x4c1c068 is 0 bytes after a block of size 40 alloc'd
==43284== at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
==43284== by 0x4004A9: f (in /tmp/a.out)
==43284== by 0x4004C6: main (in /tmp/a.out)
==43284== HEAP SUMMARY:
==43284== in use at exit: 40 bytes in 1 blocks
==43284== total heap usage: 1 allocs, 0 frees, 40 bytes allocated
==43284== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==43284== at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
==43284== by 0x4004A9: f (in /tmp/a.out)
==43284== by 0x4004C6: main (in /tmp/a.out)
==43284== LEAK SUMMARY:
==43284== definitely lost: 40 bytes in 1 blocks
==43284== indirectly lost: 0 bytes in 0 blocks
==43284== possibly lost: 0 bytes in 0 blocks
==43284== still reachable: 0 bytes in 0 blocks
==43284== suppressed: 0 bytes in 0 blocks
==43284== For counts of detected and suppressed errors, rerun with: -v
==43284== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)
Sample C Program
Create test.c:


void f(void)
int* x = malloc(10 * sizeof(int));
x[10] = 0; // problem 1: heap block overrun
} // problem 2: memory leak -- x not freed

int main(void)
return 0;
You can compile and run it as follows to detect problems:

gcc test.c
valgrind --log-file=output.file --leak-check=yes --tool=memcheck ./a.out
vi output.file

Linux / UNIX: Displaying Today’s Files Only

How do I list all files created today only using shell command under UNIX or Linux operating systems?

You can use the find command as follows to list today's file in current directory only (i.e. no subdirs):
find -maxdepth 1 -type f -mtime -1
Sample outputs:

In this example, find todays directories only
find -maxdepth 1 -type d -mtime -1

Another old but outdated ls command hack is as follows:
ls -al --time-style=+%D | grep $(date +%D)

Wednesday, April 14, 2010

Linux System Calls

Systemcalls in alphabetical order

System Calls are the calls made from the user space to access the kernel space

_exit - like exit but with fewer actions (m+c)
accept - accept a connection on a socket (m+c!)
access - check user’s permissions for a file (m+c)
acct - not yet implemented (mc)
adjtimex - set/get kernel time variables (-c)
afs syscall - reserved andrew filesystem call (-)
alarm - send SIGALRM at a specified time (m+c)
bdflush - flush dirty buffers to disk (-c)
bind - name a socket for interprocess communication (m!c)
break - not yet implemented (-)
brk - change data segment size (mc)
chdir - change working directory (m+c)
chmod - change file attributes (m+c)
chown - change ownership of a file (m+c)
chroot - set a new root directory (mc)
clone - see fork (m-)
close - close a file by reference (m+c)
connect - link 2 sockets (m!c)
creat - create a file (m+c)
create module - allocate space for a loadable kernel module (-)
delete module - unload a kernel module (-)
dup - create a file descriptor duplicate (m+c)
dup2 - duplicate a file descriptor (m+c)
execl, execlp, execle, ... - see execve (m+!c)
execve - execute a file (m+c)
exit - terminate a program (m+c)
fchdir - change working directory by reference ()
fchmod - see chmod (mc)
fchown - change ownership of a file (mc)
fclose - close a file by reference (m+!c)
fcntl - file/filedescriptor control (m+c)
flock - change file locking (m!c)
fork - create a child process (m+c)
fpathconf - get info about a file by reference (m+!c)
fread - read array of binary data from stream (m+!c)
fstat - get file status (m+c)
fstatfs - get filesystem status by reference (mc)
fsync - write file cache to disk (mc)
ftime - get timezone+seconds since 1.1.1970 (m!c)
ftruncate - change file size (mc)
fwrite - write array of binary datas to stream (m+!c)
get kernel syms - get kernel symbol table or its size (-)
getdomainname - get system’s domainname (m!c)
getdtablesize - get filedescriptor table size (m!c)
getegid - get effective group id (m+c)
geteuid - get effective user id (m+c)
getgid - get real group id (m+c)
getgroups - get supplemental groups (m+c)
gethostid - get unique host identifier (m!c)
gethostname - get system’s hostname (m!c)
getitimer - get value of interval timer (mc)
getpagesize - get size of a system page (m-!c)
getpeername - get address of a connected peer socket (m!c)
getpgid - get parent group id of a process (+c)
getpgrp - get parent group id of current process (m+c)
getpid - get process id of current process (m+c)
getppid - get process id of the parent process (m+c)
getpriority - get a process/group/user priority (mc)
getrlimit - get resource limits (mc)
getrusage - get usage of resources (m)
getsockname - get the adress of a socket (m!c)
getsockopt - get option settings of a socket (m!c)
gettimeofday - get timezone+seconds since 1.1.1970 (mc)
getuid - get real uid (m+c)
gtty - not yet implemented ()
idle - make a process a candidate for swap (mc)
init module - insert a loadable kernel module (-)
ioctl - manipulate a character device (mc)
ioperm - set some i/o port’s permissions (m-c)
iopl - set all i/o port’s permissions (m-c)
ipc - interprocess communication (-c)
kill - send a signal to a process (m+c)
killpg - send a signal to a process group (mc!)
klog - see syslog (-!)
link - create a hardlink for an existing file (m+c)
listen - listen for socket connections (m!c)
llseek - lseek for large files (-)
lock - not implemented yet ()
lseek - change the position ptr of a file descriptor (m+c)
lstat - get file status (mc)
mkdir - create a directory (m+c)
mknod - create a device (mc)
mmap - map a file into memory (mc)
modify ldt - read or write local descriptor table (-)
mount - mount a filesystem (mc)
mprotect - read, write or execute protect memory (-)
mpx - not implemented yet ()
msgctl - ipc message control (m!c)
msgget - get an ipc message queue id (m!c)
msgrcv - receive an ipc message (m!c)
msgsnd - send an ipc message (m!c)
munmap - unmap a file from memory (mc)
nice - change process priority (mc)
oldfstat - no longer existing
oldlstat - no longer existing
oldolduname - no longer existing
oldstat - no longer existing
olduname - no longer existing
open - open a file (m+c)
pathconf - get information about a file (m+!c)
pause - sleep until signal (m+c)
personality - change current execution domain for ibcs (-)
phys - not implemented yet (m)
pipe - create a pipe (m+c)
prof - not yet implemented ()
profil - execution time profile (m!c)
ptrace - trace a child process (mc)
quotactl - not implemented yet ()
read - read data from a file (m+c)
readv - read datablocks from a file (m!c)
readdir - read a directory (m+c)
readlink - get content of a symbolic link (mc)
reboot - reboot or toggle vulcan death grip (-mc)
recv - receive a message from a connected socket (m!c)
recvfrom - receive a message from a socket (m!c)
rename - move/rename a file (m+c)
rmdir - delete an empty directory (m+c)
sbrk - see brk (mc!)
select - sleep until action on a filedescriptor (mc)
semctl - ipc semaphore control (m!c)
semget - ipc get a semaphore set identifier (m!c)
semop - ipc operation on semaphore set members (m!c)
send - send a message to a connected socket (m!c)
sendto - send a message to a socket (m!c)
setdomainname - set system’s domainname (mc)
setfsgid - set filesystem group id ()
setfsuid - set filesystem user id ()
setgid - set real group id (m+c)
setgroups - set supplemental groups (mc)
sethostid - set unique host identifier (mc)
sethostname - set the system’s hostname (mc)
setitimer - set interval timer (mc)
setpgid - set process group id (m+c)
setpgrp - has no effect (mc!)
setpriority - set a process/group/user priority (mc)
setregid - set real and effective group id (mc)
setreuid - set real and effective user id (mc)
setrlimit - set resource limit (mc)
setsid - create a session (+c)
setsockopt - change options of a socket (mc)
settimeofday - set timezone+seconds since 1.1.1970 (mc)
setuid - set real user id (m+c)
setup - initialize devices and mount root (-)
sgetmask - see siggetmask (m)
shmat - attach shared memory to data segment (m!c)
shmctl - ipc manipulate shared memory (m!c)
shmdt - detach shared memory from data segment (m!c)
shmget - get/create shared memory segment (m!c)
shutdown - shutdown a socket (m!c)
sigaction - set/get signal handler (m+c)
sigblock - block signals (m!c)
siggetmask - get signal blocking of current process (!c)
signal - setup a signal handler (mc)
sigpause - use a new signal mask until a signal (mc)
sigpending - get pending, but blocked signals (m+c)
sigprocmask - set/get signal blocking of current process (+c)
sigreturn - not yet used ()
sigsetmask - set signal blocking of current process (c!)
sigsuspend - replacement for sigpause (m+c)
sigvec - see sigaction (m!)
socket - create a socket communication endpoint (m!c)
socketcall - socket call multiplexer (-)
socketpair - create 2 connected sockets (m!c)
ssetmask - see sigsetmask (m)
stat - get file status (m+c)
statfs - get filesystem status (mc)
stime - set seconds since 1.1.1970 (mc)
stty - not yet implemented ()
swapoff - stop swapping to a file/device (m-c)
swapon - start swapping to a file/device (m-c)
symlink - create a symbolic link to a file (m+c)
sync - sync memory and disk buffers (mc)
syscall - execute a systemcall by number (-!c)
sysconf - get value of a system variable (m+!c)
sysfs - get infos about configured filesystems ()
sysinfo - get Linux system infos (m-)
syslog - manipulate system logging (m-c)
system - execute a shell command (m!c)
time - get seconds since 1.1.1970 (m+c)
times - get process times (m+c)
truncate - change file size (mc)
ulimit - get/set file limits (c!)
umask - set file creation mask (m+c)
umount - unmount a filesystem (mc)
uname - get system information (m+c)
unlink - remove a file when not busy (m+c)
uselib - use a shared library (m-c)
ustat - not yet implemented (c)
utime - modify inode time entries (m+c)
utimes - see utime (m!c)
vfork - see fork (m!c)
vhangup - virtually hang up current tty (m-c)
vm86 - enter virtual 8086 mode (m-c)
wait - wait for process termination (m+!c)
wait3 - bsd wait for a specified process (m!c)
wait4 - bsd wait for a specified process (mc)
waitpid - wait for a specified process (m+c)
write - write data to a file (m+c)
writev - write datablocks to a file (m!c)

(m) manual page exists.
(+) POSIX compliant.
(-) Linux specific.
(c) in libc.
(!) not a sole system call.uses a different system call.

Monday, April 5, 2010


1) How do you determine the endianness of the machine using C program?

2) Using a program, how do we know whether stack grows up OR down?


* The permission info for files/directories are stored in the octal form like 777

*The maximum number of threads that may be created by a process is implementation dependent.

>>Terminating Threads:

* There are several ways in which a Pthread may be terminated:
o The thread returns from its starting routine (the main routine for the initial thread).
o The thread makes a call to the pthread_exit subroutine (covered below).
o The thread is canceled by another thread via the pthread_cancel routine (not covered here).
o The entire process is terminated due to a call to either the exec or exit subroutines.

* pthread_exit is used to explicitly exit a thread. Typically, the pthread_exit() routine is called after a thread has completed its work and is no longer required to exist.

* If main() finishes before the threads it has created, and exits with pthread_exit(), the other threads will continue to execute. Otherwise, they will be automatically terminated when main() finishes.

* The programmer may optionally specify a termination status, which is stored as a void pointer for any thread that may join the calling thread.

* Cleanup: the pthread_exit() routine does not close files; any files opened inside the thread will remain open after the thread is terminated.

> All of the system calls that the given libc supports is present in unistd.h file
Those system calls which are not known to libc but known to hardware could be defined using syscall. As an example
If new calls appear that don’t have a stub in libc yet, you can use syscall().
As an example, you can close a file using syscall() like this (not advised):
extern int syscall(int, ...);
int my_close(int filedescriptor)
return syscall(SYS_close, filedescriptor);

> In Linux versions before 2.6.11, the capacity of a pipe was the same as
the system page size (e.g., 4096 bytes on x86). Since Linux 2.6.11,
the pipe capacity is 65536 bytes.
> According to
POSIX.1, pipes only need to be unidirectional


The dup() system call uses
the lowest-numbered, unused descriptor for the new one.
int dup(int oldfd)
the old descriptor is not closed! Both may be used interchangeably

int dup2( int oldfd, int newfd );
the old descriptor is closed with dup2()!

ATOMIC operations are those which are NOT interrupted by any sources including the scheduler

Under Linux, #define PIPE_BUF 4096 and hence the atomic operation is defined for less than or greater than 4KB. Above this size, the operation might split and hence would be NON-ATOMIC

But under POSIX, we have
#define _POSIX_PIPE_BUF 512

#define MSGMAX 4056 /* <= 4056 */ /* max size of message (bytes) */
Messages can be no larger than 4,056 bytes in total size, including the mtype member,
which is 4 bytes in length (long).

Inside kernel, all IPC's are stored as structures. For a message queue, each message is stored as one structure and stored as a singly linked list


>Semaphores can best be described as counters used to control access to shared resources by multiple processes.
> Used as the MOST DIFFICULT to GRASP amongst the 3 IPC's

Every ANSI C compiler is required to support at least:
• 31 parameters in a function definition
• 31 arguments in a function call
• 509 characters in a source line
• 32 levels of nested parentheses in an expression
• The maximum value of long int can't be any less than 2,147,483,647, (i.e., long integers
are at least 32 bits)


Process & Threads
Processes contain information about program resources and program execution state, including:

* Process ID, process group ID, user ID, and group ID
* Environment
* Working directory.
* Program instructions
* Registers
* Stack
* Heap
* File descriptors
* Signal actions
* Shared libraries
* Inter-process communication tools (such as message queues, pipes, semaphores, or shared memory).

Thread maintains its own:

* Stack pointer
* Registers
* Scheduling properties (such as policy or priority)
* Set of pending and blocked signals
* Thread specific data.

* Mutex is an abbreviation for "mutual exclusion". Mutex variables are one of the primary means of implementing thread synchronization and for protecting shared data when multiple writes occur.

* A mutex variable acts like a "lock" protecting access to a shared data resource. The basic concept of a mutex as used in Pthreads is that only one thread can lock (or own) a mutex variable at any given time. Thus, even if several threads try to lock a mutex only one thread will be successful. No other thread can own that mutex until the owning thread unlocks that mutex. Threads must "take turns" accessing protected data.

* Mutexes can be used to prevent "race" conditions.

* Very often the action performed by a thread owning a mutex is the updating of global variables. This is a safe way to ensure that when several threads update the same variable, the final value is the same as what it would be if only one thread performed the update. The variables being updated belong to a "critical section".

* A typical sequence in the use of a mutex is as follows:
o Create and initialize a mutex variable
o Several threads attempt to lock the mutex
o Only one succeeds and that thread owns the mutex
o The owner thread performs some set of actions
o The owner unlocks the mutex
o Another thread acquires the mutex and repeats the process
o Finally the mutex is destroyed

* When several threads compete for a mutex, the losers block at that call - an unblocking call is available with "trylock" instead of the "lock" call.

* When protecting shared data, it is the programmer's responsibility to make sure every thread that needs to use a mutex does so. For example, if 4 threads are updating the same data, but only one uses a mutex, the data can still be corrupted.

Creating and Destroying Mutexes

pthread_mutex_init (mutex,attr)

pthread_mutex_destroy (mutex)

pthread_mutexattr_init (attr)

pthread_mutexattr_destroy (attr)


* Mutex variables must be declared with type pthread_mutex_t, and must be initialized before they can be used. There are two ways to initialize a mutex variable:

1. Statically, when it is declared. For example:
pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;

2. Dynamically, with the pthread_mutex_init() routine. This method permits setting mutex object attributes, attr.

The mutex is initially unlocked.

* The attr object is used to establish properties for the mutex object, and must be of type pthread_mutexattr_t if used (may be specified as NULL to accept defaults). The Pthreads standard defines three optional mutex attributes:
o Protocol: Specifies the protocol used to prevent priority inversions for a mutex.
o Prioceiling: Specifies the priority ceiling of a mutex.
o Process-shared: Specifies the process sharing of a mutex.

Note that not all implementations may provide the three optional mutex attributes.

* The pthread_mutexattr_init() and pthread_mutexattr_destroy() routines are used to create and destroy mutex attribute objects respectively.

* pthread_mutex_destroy() should be used to free a mutex object which is no longer needed.

Mutex Variables
Locking and Unlocking Mutexes

pthread_mutex_lock (mutex)

pthread_mutex_trylock (mutex)

pthread_mutex_unlock (mutex)


* The pthread_mutex_lock() routine is used by a thread to acquire a lock on the specified mutex variable. If the mutex is already locked by another thread, this call will block the calling thread until the mutex is unlocked.

* pthread_mutex_trylock() will attempt to lock a mutex. However, if the mutex is already locked, the routine will return immediately with a "busy" error code. This routine may be useful in preventing deadlock conditions, as in a priority-inversion situation.

* pthread_mutex_unlock() will unlock a mutex if called by the owning thread. Calling this routine is required after a thread has completed its use of protected data if other threads are to acquire the mutex for their work with the protected data. An error will be returned if:
o If the mutex was already unlocked
o If the mutex is owned by another thread

For more info,

One primamry difference between using POSIX mutex is
The program execution within critical section (lock) becomes sequential. Only one thread can execute. Whereas if there is NO LOCKING, the execution goes concurrently. To test this, use a sleep(10) in between the locks and verify