Wednesday, December 1, 2010

What is an address space

When a program is built(compilation, assembler invocation, and linking)
it is stored in a file with a special format. One such format is
ELF(Executable and Linking Format)in Unix like systems. It has the
necessary information ( in the ELF header and program header) to help
the program loader to load and create the process image in memory at
runtime, and also do run time linking (for shared libraries linked
At the time program is built, storage for all uninitialised data (static or
global) are not allocated, but only the total size is noted in program
header table of the ELF file. Later when program is being loaded, loader
read the metadata from the ELF file and allocate one large chunk (page
aligned) of memory to hose all such variables. This section is called
BSS(Block Static Storage or Block started by symbol).
Data( storage for all initialized static and global variables) and STACK
section should be familiar to you. HEAP is created by dynamic invocation
by memory-alloc routines of your program. In addition to this there will
be shared memory mappings as well. If you process a file using mmap()
system call interface, those regions are also mapped to your program.
Now the sum total of all such memory is called the address space of your

Now what’s an address? Is that a virtual address, physical address
or logical address?
The value you see when you apply an & operator to a C variable
is actually the program-relative logical address. It has to be
processed by segmentation unit(coupled with the value in the CS
register) to create a linear address and then processed by the
paging unit to create the physical address in your RAM. So these
details are not at the control of C, but is at the control of the
memory management subsystems of the operating system.

Tuesday, November 9, 2010

USB Enumeration Process

Typical USB 2.0 Sequence

The steps below are a typical sequence of events that occurs during enumeration of a USB 2.0 device under Windows. Device firmware shouldn’t assume that enumeration requests and events will occur in a particular order. To function successfully, a device must detect and respond to any control request or other bus event at any time.

1. The system has a new device. A user attaches a device to a USB port, or the system powers up with a device attached. The port may be on the root hub at the host or on a hub that connects downstream from the host. The hub provides power to the port, and the device is in the Powered state. The device can draw up to 100 mA from the bus.

2. The hub detects the device. The hub monitors the voltages on the signal lines (D+ and D-) at each of its ports. The hub has a pull-down resistor of 14.25k–24.8kW on each line. A device has a pull-up resistor of 900–1575W on D+ for a full-speed device or D- for a low-speed device. High-speed-capable devices attach at full speed. On attaching to a port, the device’s pull-up brings its line high, enabling the hub to detect that a device is attached. On detecting a device, the hub continues to provide power but doesn’t yet transmit USB traffic to the device. Chapter 15 has more on how hubs detect devices.

3. The host learns of the new device. Each hub uses its interrupt endpoint to report events at the hub. The report indicates only whether the hub or a port (and if so, which port) has experienced an event. On learning of an event, the host sends the hub a Get Port Status request to find out more. Get Port Status and the other hub-class requests described are standard requests that all hubs support. The information returned tells the host when a device is newly attached.

4. The hub detects whether a device is low or full speed. Just before resetting the device, the hub determines whether the device is low or full speed by examining the voltages on the two signal lines. The hub detects the device’s speed by determining which line has a higher voltage when idle. The hub sends the information to the host in response to the next Get Port Status request. A USB 1.x hub may instead detect the device’s speed just after a bus reset. USB 2.0 requires speed detection before the reset so the hub knows whether to check for a high-speed-capable device during reset as described below.

5. The hub resets the device. When a host learns of a new device, the host sends the hub a Set Port Feature request that asks the hub to reset the port. The hub places the device’s USB data lines in the Reset condition for at least 10 ms. Reset is a special condition where both D+ and D- are logic low. (Normally, the lines have opposite logic states.) The hub sends the reset only to the new device. Other hubs and devices on the bus don’t see the reset.

6. The host learns if a full-speed device supports high speed. Detecting whether a device supports high speed uses two special signal states. In the Chirp J state, only the D+ line is driven and in the Chirp K state, only the D- line is driven.

During the reset, a device that supports high speed sends a Chirp K. A high-speed-capable hub detects the Chirp K and responds with a series of alternating Chirp K and Chirp J. On detecting the pattern KJKJKJ, the device removes its full-speed pull-up and performs all further communications at high speed. If the hub doesn’t respond to the device’s Chirp K, the device knows it must continue to communicate at full speed. All high-speed devices must be capable of responding to control requests at full speed.

7. The hub establishes a signal path between the device and the bus. The host verifies that the device has exited the reset state by sending a Get Port Status request. A bit in the returned data indicates whether the device is still in the reset state. If necessary, the host repeats the request until the device has exited the reset state.

When the hub removes the reset, the device is in the Default state. The device’s USB registers are in their reset states, and the device is ready to respond to control transfers at endpoint zero. The device communicates with the host using the default address of 00h.

8. The host sends a Get Descriptor request to learn the maximum packet size of the default pipe. The host sends the request to device address 00h, endpoint zero. Because the host enumerates only one device at a time, only one device will respond to communications addressed to device address 00h even if several devices attach at once.

The eighth byte of the device descriptor contains the maximum packet size supported by endpoint zero. A Windows host requests 64 bytes but after receiving just one packet (whether or not it has 64 bytes), the host begins the Status stage of the transfer. On completing the Status stage, Windows requests the hub to reset the device as in step 5 above. The USB 2.0 specification doesn’t require a reset here. The reset is a precaution that ensures that the device will be in a known state when the reset ends.

9. The host assigns an address. When the reset is complete, the host controller assigns a unique address to the device by sending a Set Address request. The device completes the Status stage of the request using the default address and then implements the new address. The device is now in the Address state. All communications from this point on use the new address. The address is valid until the device is detached, a hub resets the port, or the system reboots. On the next enumeration, the host may assign a different address to the device.

10. The host learns about the device’s abilities. The host sends a Get Descriptor request to the new address to read the device descriptor. This time the host retrieves the entire descriptor. The descriptor contains the maximum packet size for endpoint zero, the number of configurations the device supports, and other basic information about the device.

The host continues to learn about the device by requesting the one or more configuration descriptors specified in the device descriptor. A request for a configuration descriptor is actually a request for the configuration descriptor followed by all of its subordinate descriptors up to the number of bytes requested. A Windows host begins by requesting just the configuration descriptor’s nine bytes. Included in these bytes is the total length of the configuration descriptor and its subordinate descriptors.

Windows then requests the configuration descriptor again, this time requesting the number of bytes in the retrieved total length. The device responds by sending the configuration descriptor followed by all of the configuration’s subordinate descriptors, including interface descriptor(s), with each interface descriptor followed by any endpoint descriptors for the interface. Some configurations also have class- or vendor-specific descriptors. This chapter has more on what the descriptors contain.

11. The host assigns and loads a device driver (except for composite devices). After learning about a device from its descriptors, the host looks for the best match in a driver to manage communications with the device. Windows hosts use INF files to identify the best match. The INF file may be a system file for a USB class or a vendor-provided file that contains the device’s Vendor ID and Product ID. Chapter 9 has more about selecting a driver.

For devices that have been enumerated previously, Windows may use stored information instead of searching the INF files. After the operating system assigns and loads the driver, the driver may request the device to resend descriptors or send other class-specific descriptors.

An exception to this sequence is composite devices, which can have different drivers assigned to multiple interfaces in a configuration. The host can assign these drivers only after enabling the interfaces, so the host must first configure the device as described below.

12. The host’s device driver selects a configuration. After learning about a device from the descriptors, the device driver requests a configuration by sending a Set Configuration request with the desired configuration number. Many devices support only one configuration. If a device supports multiple configurations, the driver can decide which configuration to request based on information the driver has about how the device will be used, or the driver can ask the user what to do or just select the first configuration. (Many drivers only select the first configuration.) On receiving the request, the device implements the requested configuration. The device is now in the Configured state and the device’s interface(s) are enabled.

For composite devices, the host can now assign drivers. As with other devices, the host uses the information retrieved from the device to find a driver for each active interface in the configuration. The device is then ready for use.

Hubs are also USB devices, and the host enumerates a newly attached hub in the same way as other devices. If the hub has devices attached, the host enumerates these after the hub informs the host of their presence.

Attached state. If the hub isn’t providing power to a device’s VBUS line, the device is in the Attached state. The absence of power may occur if the hub has detected an over-current condition or if the host requests the hub to remove power from the port. With no power on VBUS, the host and device can’t communicate, so from their perspective, the situation is the same as when the device isn’t attached.

Suspend State. A device enters the Suspend state after detecting no bus activity, including SOF markers, for at least 3 ms. In the Suspend state, the device should limit its use of bus power. Both configured and unconfigured devices must support this state. Chapter 16 has more about the Suspend state.

Wednesday, October 20, 2010

Things to know of spinlocks

Spin locks can be used in interrupt handlers, whereas semaphores cannot be used because they sleep
If a lock is used in an interrupt handler, you must also disable local interrupts (interrupt requests on the current processor) before obtaining the lock

So the pseudo code of spin lock

Disable preempt()

If used in ISR
Disable interrupts()
spin lock()
//critical region


spinlock_t mr_lock = SPIN_LOCK_UNLOCKED;
unsigned long flags;

spin_lock_irqsave(&mr_lock, flags);
/* critical region ... */
spin_unlock_irqrestore(&mr_lock, flags);


The fact that a contended spin lock causes threads to spin (essentially wasting processor time) while waiting for the lock to become available is important. This behavior is the point of the spin lock. It is not wise to hold a spin lock for a long time. This is the nature of the spin lock: a lightweight single-holder lock that should be held for short durations. An alternative behavior when the lock is contended is to put the current thread to sleep and wake it up when it becomes available. Then the processor can go off and execute other code. This incurs a bit of overhead most notably the two context switches required to switch out of and back into the blocking thread, which is certainly a lot more code than the handful of lines used to implement a spin lock. Therefore, it is wise to hold spin locks for less than the duration of two context switches. Because most of us have better things to do than measure context switches, just try to hold the lock as little time as possible[1]. The semaphores provide a lock that makes the waiting thread sleep, rather than spin, when contended.


Acquires given lock

Disables local interrupts and acquires given lock

Saves current state of local interrupts, disables local interrupts, and acquires given lock

Releases given lock

Releases given lock and enables local interrupts

Releases given lock and restores local interrupts to given previous state

Dynamically initializes given spinlock_t

Tries to acquire given lock; if unavailable, returns nonzero

Returns nonzero if the given lock is currently acquired, otherwise it returns zero

certain locking precautions must be taken when working with bottom halves. The function spin_lock_bh() obtains the given lock and disables all bottom halves. The function spin_unlock_bh() performs the inverse.

Because a bottom half may preempt process context code, if data is shared between a bottom half process context, you must protect the data in process context with both a lock and the disabling of bottom halves. Likewise, because an interrupt handler may preempt a bottom half, if data is shared between an interrupt handler and a bottom half, you must both obtain the appropriate lock and disable interrupts.

Thursday, June 24, 2010

Kernel Directory Structure

Browse: (with cross reference)
Directory structure
include: public headers
kernel: core kernel components (e.g., scheduler)
arch: hardware-dependent code
fs: file systems
mm: memory management
ipc: inter-process communication
drivers: device drivers
usr: user-space code
lib: common libraries

Monday, June 21, 2010

Adding a new system call

Assume we’d like add a system call calles “mycall” that takes two integers as input and return their sum.
1) Add “.long sys_mycall” at the end of the list in the file syscall_table.S.
---Full path for the file syscall_table.S is
NOTE: You can use the command sudo gedit syscall_table.S to edit the file
syscall_table.S. Beware that if you do not have the root privileges, system will not allow you to edit
any of the kernel files.
2) In file unistd.h
- Add #define __NR_mycall at the end of the list
(e.g. If the number of the last system call Last_System_Call_Num is 319 then the
line you shall add should be #define __NR_mycall 320.).
- Increment NR_syscalls by one (e.g. If before adding your system call total number of
system calls is 320, then this will be #define NR_syscalls 321.).
---Full path for the file unistd.h is /usr/src/linux/include/asm-i386/unistd.h
3) Add the following line at the end of the file syscalls.h:
asmlinkage long sys_mycall(int i, int j);
---Full path for the file syscalls.h is /usr/src/linux/include/linux/syscalls.h
4) Add mycall/ to core-y += in Makefile.
The line in the end shall look like:
core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ mycall/
---Full path for Makefile is /usr/src/linux/Makefile.
5) Create a new directory in /usr/src/linux and name it mycall.
6) Create a new file called mycall.c in /usr/src/linux/mycall. Contents of the file shall be
as follows:
/*----------Start of mycall.c----------*/
#include <linux/linkage.h>
asmlinkage long sys_mycall(int i, int j) {
/*-----------End of mycall.c-----------*/
*asmlinkage is used to look for the arguments on the kernel stack.
7) Create Makefile in /usr/src/linux/mycall. Makefile shall be like:
########## Start of Makefile ##########
obj-y := mycall.o
########## End of Makefile ##########
8) Create the following userspace program to test your system call and name it testmycall.c. The
contents of this file shall be:
/*---------- Start of testmycall.c File ----------*/
#include <unistd.h>
#include <stdio.h>
#define __NR_mycall
long mycall(int i, int j) {
return syscall(__NR_mycall, i, j);
int main() {
printf(“%d\n”, mycall(10, 20));
return 0;
/*---------- End of testmycall.c File ----------*/
9) Compile testmycall.c to test whether your system call works.
gcc testmycall.c –o testmycall

Tuesday, May 25, 2010

System call Internals

Try this link

Sytem Call Internals

Linux Linkers and Loaders

Creating a package using autoconf and automake

Please do refer this directly

In case this link does NOT turn up, use the following info
A tutorial for porting to autoconf & automake

l.u. 21/11/2005

A first disclaimer is that I don't really like autoconf and automake. This is not the place for longer dissertations, so I won't spend more words on this. However, it is a matter of facts that many users just like to fetch your application, and issue the usual ./configure && make && make install right ahead.
So, this is a synthetic tutorial for moving a Makefile-based program to an autohell- (this is a popular way to refer to { autoconf, automake, libtool } that I will encourage) enabled package.

A somewhat complete, likely example is given here. If your PRE is not exactly like the one proposed, you just probably won't need to perform the corresponding following steps.

Definition of the problem:

PRE: you have a tree with

* sources in src/
* documentation in doc/
* man pages in man/
* some scripts in scripts/ (in general, stuff to be installed but not compiled)
* examples in examples/

POST: you want to

* check for the availability of the needed headers/libraries
* possibly adjust some things (say some path in scripts, or in docimentation) at compile-time
* install everything in its adequate place

So, this is what to do for moving with the very minimum effort:

1. Cleaning up
Move away every possible Makefile you have in the package (rename it for now)
2. Generating
Run autoscan:

$ autoscan

autoscan tries to produce a suitable file (autoconf's driver) by performing simple analyses on the files in the package. This is enough for the moment (many people are just happy with it as permanent). Autoscan actually produces a configure.scan file, so let it have the name autoconf will look for:

$ mv configure.scan

-- note: was the name used for autoconf files, now deprecated.
3. Adjusting things
Adjust the few things left to you by autoscan: open with your favourite editor

$ vim

look in the very first lines for the following:


and replace with your stuff, e.g.:

AC_INIT(pippo, 2.6,

4. Generating a first configure script
At this point, you're ready to make autoconf produce the configure script:

$ autoconf

This produces two files: autom4te.cache and configure. The first one is a directory used for speeding up the job of autohell tools, and may be removed when releasing the package. The latter is the shell script called by final users.
In this status, what the configure script does is just checking for requirements as suggested by autoscan, so nothing very conclusive yet.
5. Generating suitable Makefiles
We have the system checking part. We now want the building and installing part. This is given by a cooperation of automake and autoconf. Automake generates some "templates" that autoconf-generated scripts will traduce into actual Makefile. A first, "main" automake file is needed in the root of the package:

$ vim

list the subdirectories where work is needed:

SUBDIRS = src doc examples man scripts

the first line sets the mode automake will behave like. "foreign" means not GNU, and is common for avoiding boring messages about files organized differently from what gnu expects.
The second line shows a list of subdirectories to descend for further work. The first one has stuff to compile, while the rest just needs installing, but we don't care in this file. We now prepare the file for each of these directories. Automake will step into each of them and produce the corresponding file. Those .in files will be used by autoconf scripts to produce the final Makefiles.
Edit src/

$ vim src/

and insert:

# what flags you want to pass to the C compiler & linker
CFLAGS = --pedantic -Wall -std=c99 -O2

# this lists the binaries to produce, the (non-PHONY, binary) targets in
# the previous manual Makefile
bin_PROGRAMS = targetbinary1 targetbinary2 [...] targetbinaryN
targetbinary1_SOURCES = targetbinary1.c myheader.h [...]
targetbinary2_SOURCES = targetbinary2.c
targetbinaryN_SOURCES = targetbinaryN.c

This was the most difficult one. In general, the uppercase, suffix part like "_PROGRAMS" is called primary and tells partially what to perform on the argument; the lowecase, prefix (it's not given a name) tells the directory where to install.


installs binaries in $(PREFIX)/bin , and


installs in $(PREFIX)/sbin . More primaries will appear in the following, and here is a complete list of primaries. Not all can be prefixed to such primaries (see later for how to work around this problem).

Let us now move to mans:

$ vim man/

insert the following in it:

man_MANS = firstman.1 secondman.8 thirdman.3 [...]

yes, automake will deduce by itself what's needed for installing from this. Now edit for scripts:

$ vim scripts/


bin_SCRIPTS = [...]

The primary "SCRIPTS" instruct makefiles to just install the arguments, without compiling of course.

So far so good. Two jobs remain to define: installing examples and installing plain docs. This is the nasty part, as automake doesn't handle primaries for installing in the usual $(PREFIX)/share/doc/pippo . The workaround is to specify a further variable and using it as prefix:

$ vim doc/

docdir = $(datadir)/doc/@PACKAGE@

if "abc" is wanted for prefix, "abcdir" is to be specified. E.g. the code above expands to /usr/local/share/doc/pippo ("@PACKAGE@" will be expanded by autoconf when producing the final Makefile, see below). $(datadir) is known by all configure scripts it generates. You may look for the list of directory variables.

Similarly for examples, but we want to install in $(PREFIX)/share/examples/pippo , so:

$ vim examples/

exampledir = $(datarootdir)/doc/@PACKAGE@
example_DATA = sample1.dat sample2.dat [...]

All these files now exist, but autoconf has now to be told about them.
6. Integrating the checking (autoconf) part and the building (automake) part
We insert now some macros in for telling autoconf that the final Makefiles have to be produced after ./configure :

$ vim

right after AC_INIT(), let initialize automake:

AM_INIT_AUTOMAKE(pippo, 2.6)

then, let autoconf generate a configure script that will output Makefiles for all of the above directories:

AC_OUTPUT(Makefile src/Makefile doc/Makefile examples/Makefile man/Makefile scripts/Makefile)

7. Making tools output the configure script and Makefile templates
we have now complete instructions for generating the famous configure script run by the users when installing, that both checks for building/running requirements and generates Makefiles for actually building and installing everything in place. Let now actually make tools generate such script:

$ aclocal

This generates a file aclocal.m4 that contains macros for automake things, e.g. AM_INIT_AUTOMAKE.

$ automake --add-missing

Automake now reads and the top-level, interprets them (e.g. see further work has to be done in some subdirectories) and, for each produces a The argument --add-missing tells automake to provide default scripts for reporting errors, installing etc, so it can be omitted in the next runs.
Finally, let autoconf build the configure script:

$ autoconf

This produces the final, full-featured configure shell script.
8. Further customizations
if you need to perform custom checks, or actions in configure, just write the (shell) code somewhere in (before OUTPUT commands), then run autoconf again. For some checks, autoconf may already provide some macro: look in the list of autoconf macros before writing useless code.

How do things work from now on
The user first runs:

$ ./configure

The shell script just generated will:

1. scan for dependencies on the basis of the AC_* macros instructed in If there's something wrong/missing in the system, an opportune error message will be dumped.
2. for each Makefile requested in AC_OUTPUT(), translate the template for generating the final Makefile. The main makefile will provide the most common targets like install, clean, distclean, uninstall et al.

if configure succeeds, all the Makefile files are available. The user then issues:

$ make

The target all from the main Makefile will be worked. This target expands into all the hidden targets to first build what you requested. Then, by mean of

# make install

everything is installed.

Monday, May 24, 2010


Unportable Code:
implementation-defined— The compiler-writer chooses what happens, and has to document it.
Example: whether the sign bit is propagated, when shifting an int right.
unspecified— The behavior for something correct, on which the standard does not impose any requirements.
Example: the order of argument evaluation.
Bad Code:
undefined— The behavior for something incorrect, on which the standard does not impose any requirements. Anything is allowed to happen, from nothing, to a warning message to program termination, to CPU meltdown, to launching nuclear missiles (assuming you have the correct hardware option installed).
Example: what happens when a signed integer overflows.
a constraint— This is a restriction or requirement that must be obeyed. If you don't, your program behavior becomes undefined in the sense above. Now here's an amazing thing: it's easy to tell if something is a constraint or not, because each topic in the standard has a subparagraph labelled "Constraints" that lists them all. Now here's an even more amazing thing: the standard specifies [5] that compilers only have to produce error messages for violations of syntax and constraints! This means that any semantic rule that's not in a constraints subsection can be broken, and since the behavior is undefined, the compiler is free to do anything and doesn't even have to warn you about it!

Example: the operands of the % operator must have integral type. So using a non-integral type with % must cause a diagnostic.
Example of a rule that is not a constraint: all identifiers declared in the C standard header files are reserved for the implementation, so you may not declare a function called malloc() because a standard header file already has a function of that name. But since this is not a constraint, the rule can be broken, and the compiler doesn't have to warn you

Portable Code:

strictly-conforming— A strictly-conforming program is one that:
• only uses specified features.
• doesn't exceed any implementation-defined limit.
• has no output that depends on implementation-defined, unspecified, or undefined features.

This was intended to describe maximally portable programs, which will always produce the identical output whatever they are run on. In fact, it is not a very interesting class because it is so small compared to the universe of conforming programs. For example, the following program is not strictly conforming:

int main() { (void) printf("biggest int is %d", INT_MAX);
return 0;}
/* not strictly conforming: implementation-defined output! */

conforming— A conforming program can depend on the nonportable features of an implementation. So a program is conforming with respect to a specific implementation, and the same program may be nonconforming using a different compiler. It can have extensions, but not extensions that alter the
behavior of a strictly-conforming program. This rule is not a constraint, however, so don't expect the compiler to warn you about violations that render your program nonconforming!
The program example above is conforming.

Thursday, May 13, 2010

Grep Tips

Searching Files on UNIX
On MPE you can display files using the :Print command, Fcopy, Magnet, or Qedit (with pattern match searches). On HP-UX you can display files using cat and even better using more (and string search using the slash "/" command), and Qedit (including searches of $Include files, and so on), but if you really want to search for patterns of text like a UNIX guru, grep is the tool for you.
Text version.

cat report.c {prints file on stdout, no pauses}
cat -v -e -t dump {show non-printing characters too}
cat >newfile {reads from stdin, writes to 'newfile'}
cat rpt1.c inp.c test.s >newfile {combine 3 files into 1}
more report.c {space for next page, q to quit}
ps -a | more {page through the full output of ps}
grep smug *.txt {search *.txt files for 'smug'}

MPE users will take a while to remember that more, like most UNIX tools, responds to a Return by printing the next line, not the next screen. Use the Spacebar to print the next page. Type "q" to quit. To scan ahead to find a string pattern, type "/" and enter a regular expression to match. For further help, type "h".

Searching Files Using UNIX grep
The grep program is a standard UNIX utility that searches through a set of files for an arbitrary text pattern, specified through a regular expression. Also check the man pages as well for egrep and fgrep. The MPE equivalents are MPEX and Magnet, both third-party products. By default, grep is case-sensitive (use -i to ignore case). By default, grep ignores the context of a string (use -w to match words only). By default, grep shows the lines that match (use -v to show those that don't match).
Text version.

% grep BOB tmpfile {search 'tmpfile' for 'BOB' anywhere in a line}
% grep -i -w blkptr * {search files in CWD for word blkptr, any case}
% grep run[- ]time *.txt {find 'run time' or 'run-time' in all txt files}
% who | grep root {pipe who to grep, look for root}

Understanding Regular Expressions
Regular Expressions are a feature of UNIX. They describe a pattern to match, a sequence of characters, not words, within a line of text. Here is a quick summary of the special characters used in the grep tool and their meaning:
Text version.

^ (Caret) = match expression at the start of a line, as in ^A.
$ (Question) = match expression at the end of a line, as in A$.
\ (Back Slash) = turn off the special meaning of the next character, as in \^.
[ ] (Brackets) = match any one of the enclosed characters, as in [aeiou]. Use Hyphen "-" for a range, as in [0-9].
[^ ] = match any one character except those enclosed in [ ], as in [^0-9].
. (Period) = match a single character of any value, except end of line.
* (Asterisk) = match zero or more of the preceding character or expression.
\{x,y\} = match x to y occurrences of the preceding.
\{x\} = match exactly x occurrences of the preceding.
\{x,\} = match x or more occurrences of the preceding.

As an MPE user, you may find regular expressions difficult to use at first. Please persevere, because they are used in many UNIX tools, from more to perl. Unfortunately, some tools use simple regular expressions and others use extended regular expressions and some extended features have been merged into simple tools, so that it looks as if every tool has its own syntax. Not only that, regular expressions use the same characters as shell wildcarding, but they are not used in exactly the same way. What do you expect of an operating system built by graduate students?

Since you usually type regular expressions within shell commands, it is good practice to enclose the regular expression in single quotes (') to stop the shell from expanding it before passing the argument to your search tool. Here are some examples using grep:

Text version.

grep smug files {search files for lines with 'smug'}
grep '^smug' files {'smug' at the start of a line}
grep 'smug$' files {'smug' at the end of a line}
grep '^smug$' files {lines containing only 'smug'}
grep '\^s' files {lines starting with '^s', "\" escapes the ^}
grep '[Ss]mug' files {search for 'Smug' or 'smug'}
grep 'B[oO][bB]' files {search for BOB, Bob, BOb or BoB }
grep '^$' files {search for blank lines}
grep '[0-9][0-9]' file {search for pairs of numeric digits}

Back Slash "\" is used to escape the next symbol, for example, turn off the special meaning that it has. To look for a Caret "^" at the start of a line, the expression is ^\^. Period "." matches any single character. So b.b will match "bob", "bib", "b-b", etc. Asterisk "*" does not mean the same thing in regular expressions as in wildcarding; it is a modifier that applies to the preceding single character, or expression such as [0-9]. An asterisk matches zero or more of what precedes it. Thus [A-Z]* matches any number of upper-case letters, including none, while [A-Z][A-Z]* matches one or more upper-case letters.

The vi editor uses \< \> to match characters at the beginning and/or end of a word boundary. A word boundary is either the edge of the line or any character except a letter, digit or underscore "_". To look for if, but skip stiff, the expression is \. For the same logic in grep, invoke it with the -w option. And remember that regular expressions are case-sensitive. If you don't care about the case, the expression to match "if" would be [Ii][Ff], where the characters in square brackets define a character set from which the pattern must match one character. Alternatively, you could also invoke grep with the -i option to ignore case.

Here are a few more examples of grep to show you what can be done:

Text version.

grep '^From: ' /usr/mail/$USER {list your mail}
grep '[a-zA-Z]' {any line with at least one letter}
grep '[^a-zA-Z0-9] {anything not a letter or number}
grep '[0-9]\{3\}-[0-9]\{4\}' {999-9999, like phone numbers}
grep '^.$' {lines with exactly one character}
grep '"smug"' {'smug' within double quotes}
grep '"*smug"*' {'smug', with or without quotes}
grep '^\.' {any line that starts with a Period "."}
grep '^\.[a-z][a-z]' {line start with "." and 2 lc letters}

Wednesday, April 28, 2010

System V Semaphores versus POSIX Semaphores

POSIX named and unnamed semaphores

POSIX Semaphores
The potential learning curve of System V semaphores is much higher when compared to POSIX semaphores. This will be more understandable after you go through this section and compare it to what you learned in the previous section.

To start with, POSIX comes with simple semantics for creating, initializing, and performing operations on semaphores. They provide an efficient way to handle interprocess communication. POSIX comes with two kinds of semaphores: named and unnamed semaphores.

Named Semaphores
If you look in the man pages, you'll see that a named semaphore is identified by a name, like a System V semaphore, and, similarly, the semaphores have kernel persistence. This implies that these semaphores, like System V, are system-wide and limited to the number that can be active at any one time. The advantage of named semaphores is that they provide synchronization between unrelated process and related process as well as between threads.

A named semaphore is created by calling following function:

sem_t *sem_open(const char *name, int oflag, mode_t mode , int value);
Name of the semaphore to be identified.
Is set to O_CREAT for creating a semaphore (or with O_EXCL if you want the call to fail if it already exists).
Controls the permission setting for new semaphores.
Specifies the initial value of the semaphore.
A single call creates the semaphore, initializes it, and sets permissions on it, which is quite different from the way System V semaphores act. It is much cleaner and more atomic in nature. Another difference is that the System V semaphore identifies itself by means of type int (similar to a fd returned from open()), whereas the sem_open function returns type sem_t, which acts as an identifier for the POSIX semaphores.

From here on, operations will only be performed on semaphores. The semantics for locking semaphores is:

int sem_wait(sem_t *sem);
This call locks the semaphore if the semaphore count is greater than zero. After locking the semaphore, the count is reduced by 1. If the semaphore count is zero, the call blocks.

The semantics for unlocking a semaphore is:

int sem_post(sem_t *sem);
This call increases the semaphore count by 1 and then returns.

Once you're done using a semaphore, it is important to destroy it. To do this, make sure that all the references to the named semaphore are closed by calling the sem_close() function, then just before the exit or within the exit handler call sem_unlink() to remove the semaphore from the system. Note that sem_unlink() would not have any effect if any of the processes or threads reference the semaphore.

Unnamed Semaphores
Again, according to the man pages, an unnamed semaphore is placed in a region of memory that is shared between multiple threads (a thread-shared semaphore) or processes (a process-shared semaphore). A thread-shared semaphore is placed in a region where only threads of an process share them, for example a global variable. A process-shared semaphore is placed in a region where different processes can share them, for example something like a shared memory region. An unnamed semaphore provides synchronization between threads and between related processes and are process-based semaphores.

The unnamed semaphore does not need to use the sem_open call. Instead this one call is replaced by the following two instructions:

sem_t semid;
int sem_init(sem_t *sem, int pshared, unsigned value);
This argument indicates whether this semaphore is to be shared between the threads of a process or between processes. If pshared has value 0, then the semaphore is shared between the threads of a process. If pshared is non-zero, then the semaphore is shared between processes.
The value with which the semaphore is to be initialized.
Once the semaphore is initialized, the programmer is ready to operate on the semaphore, which is of type sem_t. The operations to lock and unlock the semaphore remains as shown previously: sem_wait(sem_t *sem) and sem_post(sem_t *sem). To delete a unnamed semaphore, just call the sem_destroy function.

The last section of this article has a simple worker-consumer demo that has been developed by using a POSIX semaphore.

Differences between System V and POSIX semaphores

There are a number of differences between System V and POSIX semaphores.

One marked difference between the System V and POSIX semaphore implementations is that in System V you can control how much the semaphore count can be increased or decreased; whereas in POSIX, the semaphore count is increased and decreased by 1.
POSIX semaphores do not allow manipulation of semaphore permissions, whereas System V semaphores allow you to change the permissions of semaphores to a subset of the original permission.
Initialization and creation of semaphores is atomic (from the user's perspective) in POSIX semaphores.
From a usage perspective, System V semaphores are clumsy, while POSIX semaphores are straight-forward
The scalability of POSIX semaphores (using unnamed semaphores) is much higher than System V semaphores. In a user/client scenario, where each user creates her own instances of a server, it would be better to use POSIX semaphores.
System V semaphores, when creating a semaphore object, creates an array of semaphores whereas POSIX semaphores create just one. Because of this feature, semaphore creation (memory footprint-wise) is costlier in System V semaphores when compared to POSIX semaphores.
It has been said that POSIX semaphore performance is better than System V-based semaphores.
POSIX semaphores provide a mechanism for process-wide semaphores rather than system-wide semaphores. So, if a developer forgets to close the semaphore, on process exit the semaphore is cleaned up. In simple terms, POSIX semaphores provide a mechanism for non-persistent semaphores.

Memory Leaks using Valgrind

How do I check my C programs under Linux operating systems for memory leaks? How do I debug and profiling Linux executables?

You need to use a tool called Valgrind. It is memory debugging, memory leak detection, and profiling tool for Linux and Mac OS X operating systems. Valgrind is a flexible program for debugging and profiling Linux executables. From the official website:

The Valgrind distribution currently includes six production-quality tools: a memory error detector, two thread error detectors, a cache and branch-prediction profiler, a call-graph generating cache profiler, and a heap profiler. It also includes two experimental tools: a heap/stack/global array overrun detector, and a SimPoint basic block vector generator. It runs on the following platforms: X86/Linux, AMD64/Linux, PPC32/Linux, PPC64/Linux, and X86/Darwin (Mac OS X).

How Do I Install Valgrind?
Type the following command under CentOS / Redhat / RHEL Linux:

# yum install valgrind
Type the following command under Debian / Ubuntu Linux:

# apt-get install valgrind
How Do I use Valgrind?
If you normally run your program like this:

./a.out arg1 arg2

/path/to/myapp arg1 arg2
Use this command line to turn on the detailed memory leak detector:

valgrind --leak-check=yes ./a.out arg1 arg2
valgrind --leak-check=yes /path/to/myapp arg1 arg2
You can also set logfile:

valgrind --log-file=output.file --leak-check=yes --tool=memcheck ./a.out arg1 arg2
Most error messages look like the following:

cat output.file
Sample outputs:

==43284== Memcheck, a memory error detector
==43284== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==43284== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==43284== Command: ./a.out
==43284== Parent PID: 39695
==43284== Invalid write of size 4
==43284== at 0x4004B6: f (in /tmp/a.out)
==43284== by 0x4004C6: main (in /tmp/a.out)
==43284== Address 0x4c1c068 is 0 bytes after a block of size 40 alloc'd
==43284== at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
==43284== by 0x4004A9: f (in /tmp/a.out)
==43284== by 0x4004C6: main (in /tmp/a.out)
==43284== HEAP SUMMARY:
==43284== in use at exit: 40 bytes in 1 blocks
==43284== total heap usage: 1 allocs, 0 frees, 40 bytes allocated
==43284== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==43284== at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
==43284== by 0x4004A9: f (in /tmp/a.out)
==43284== by 0x4004C6: main (in /tmp/a.out)
==43284== LEAK SUMMARY:
==43284== definitely lost: 40 bytes in 1 blocks
==43284== indirectly lost: 0 bytes in 0 blocks
==43284== possibly lost: 0 bytes in 0 blocks
==43284== still reachable: 0 bytes in 0 blocks
==43284== suppressed: 0 bytes in 0 blocks
==43284== For counts of detected and suppressed errors, rerun with: -v
==43284== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)
Sample C Program
Create test.c:


void f(void)
int* x = malloc(10 * sizeof(int));
x[10] = 0; // problem 1: heap block overrun
} // problem 2: memory leak -- x not freed

int main(void)
return 0;
You can compile and run it as follows to detect problems:

gcc test.c
valgrind --log-file=output.file --leak-check=yes --tool=memcheck ./a.out
vi output.file

Linux / UNIX: Displaying Today’s Files Only

How do I list all files created today only using shell command under UNIX or Linux operating systems?

You can use the find command as follows to list today's file in current directory only (i.e. no subdirs):
find -maxdepth 1 -type f -mtime -1
Sample outputs:

In this example, find todays directories only
find -maxdepth 1 -type d -mtime -1

Another old but outdated ls command hack is as follows:
ls -al --time-style=+%D | grep $(date +%D)

Wednesday, April 14, 2010

Linux System Calls

Systemcalls in alphabetical order

System Calls are the calls made from the user space to access the kernel space

_exit - like exit but with fewer actions (m+c)
accept - accept a connection on a socket (m+c!)
access - check user’s permissions for a file (m+c)
acct - not yet implemented (mc)
adjtimex - set/get kernel time variables (-c)
afs syscall - reserved andrew filesystem call (-)
alarm - send SIGALRM at a specified time (m+c)
bdflush - flush dirty buffers to disk (-c)
bind - name a socket for interprocess communication (m!c)
break - not yet implemented (-)
brk - change data segment size (mc)
chdir - change working directory (m+c)
chmod - change file attributes (m+c)
chown - change ownership of a file (m+c)
chroot - set a new root directory (mc)
clone - see fork (m-)
close - close a file by reference (m+c)
connect - link 2 sockets (m!c)
creat - create a file (m+c)
create module - allocate space for a loadable kernel module (-)
delete module - unload a kernel module (-)
dup - create a file descriptor duplicate (m+c)
dup2 - duplicate a file descriptor (m+c)
execl, execlp, execle, ... - see execve (m+!c)
execve - execute a file (m+c)
exit - terminate a program (m+c)
fchdir - change working directory by reference ()
fchmod - see chmod (mc)
fchown - change ownership of a file (mc)
fclose - close a file by reference (m+!c)
fcntl - file/filedescriptor control (m+c)
flock - change file locking (m!c)
fork - create a child process (m+c)
fpathconf - get info about a file by reference (m+!c)
fread - read array of binary data from stream (m+!c)
fstat - get file status (m+c)
fstatfs - get filesystem status by reference (mc)
fsync - write file cache to disk (mc)
ftime - get timezone+seconds since 1.1.1970 (m!c)
ftruncate - change file size (mc)
fwrite - write array of binary datas to stream (m+!c)
get kernel syms - get kernel symbol table or its size (-)
getdomainname - get system’s domainname (m!c)
getdtablesize - get filedescriptor table size (m!c)
getegid - get effective group id (m+c)
geteuid - get effective user id (m+c)
getgid - get real group id (m+c)
getgroups - get supplemental groups (m+c)
gethostid - get unique host identifier (m!c)
gethostname - get system’s hostname (m!c)
getitimer - get value of interval timer (mc)
getpagesize - get size of a system page (m-!c)
getpeername - get address of a connected peer socket (m!c)
getpgid - get parent group id of a process (+c)
getpgrp - get parent group id of current process (m+c)
getpid - get process id of current process (m+c)
getppid - get process id of the parent process (m+c)
getpriority - get a process/group/user priority (mc)
getrlimit - get resource limits (mc)
getrusage - get usage of resources (m)
getsockname - get the adress of a socket (m!c)
getsockopt - get option settings of a socket (m!c)
gettimeofday - get timezone+seconds since 1.1.1970 (mc)
getuid - get real uid (m+c)
gtty - not yet implemented ()
idle - make a process a candidate for swap (mc)
init module - insert a loadable kernel module (-)
ioctl - manipulate a character device (mc)
ioperm - set some i/o port’s permissions (m-c)
iopl - set all i/o port’s permissions (m-c)
ipc - interprocess communication (-c)
kill - send a signal to a process (m+c)
killpg - send a signal to a process group (mc!)
klog - see syslog (-!)
link - create a hardlink for an existing file (m+c)
listen - listen for socket connections (m!c)
llseek - lseek for large files (-)
lock - not implemented yet ()
lseek - change the position ptr of a file descriptor (m+c)
lstat - get file status (mc)
mkdir - create a directory (m+c)
mknod - create a device (mc)
mmap - map a file into memory (mc)
modify ldt - read or write local descriptor table (-)
mount - mount a filesystem (mc)
mprotect - read, write or execute protect memory (-)
mpx - not implemented yet ()
msgctl - ipc message control (m!c)
msgget - get an ipc message queue id (m!c)
msgrcv - receive an ipc message (m!c)
msgsnd - send an ipc message (m!c)
munmap - unmap a file from memory (mc)
nice - change process priority (mc)
oldfstat - no longer existing
oldlstat - no longer existing
oldolduname - no longer existing
oldstat - no longer existing
olduname - no longer existing
open - open a file (m+c)
pathconf - get information about a file (m+!c)
pause - sleep until signal (m+c)
personality - change current execution domain for ibcs (-)
phys - not implemented yet (m)
pipe - create a pipe (m+c)
prof - not yet implemented ()
profil - execution time profile (m!c)
ptrace - trace a child process (mc)
quotactl - not implemented yet ()
read - read data from a file (m+c)
readv - read datablocks from a file (m!c)
readdir - read a directory (m+c)
readlink - get content of a symbolic link (mc)
reboot - reboot or toggle vulcan death grip (-mc)
recv - receive a message from a connected socket (m!c)
recvfrom - receive a message from a socket (m!c)
rename - move/rename a file (m+c)
rmdir - delete an empty directory (m+c)
sbrk - see brk (mc!)
select - sleep until action on a filedescriptor (mc)
semctl - ipc semaphore control (m!c)
semget - ipc get a semaphore set identifier (m!c)
semop - ipc operation on semaphore set members (m!c)
send - send a message to a connected socket (m!c)
sendto - send a message to a socket (m!c)
setdomainname - set system’s domainname (mc)
setfsgid - set filesystem group id ()
setfsuid - set filesystem user id ()
setgid - set real group id (m+c)
setgroups - set supplemental groups (mc)
sethostid - set unique host identifier (mc)
sethostname - set the system’s hostname (mc)
setitimer - set interval timer (mc)
setpgid - set process group id (m+c)
setpgrp - has no effect (mc!)
setpriority - set a process/group/user priority (mc)
setregid - set real and effective group id (mc)
setreuid - set real and effective user id (mc)
setrlimit - set resource limit (mc)
setsid - create a session (+c)
setsockopt - change options of a socket (mc)
settimeofday - set timezone+seconds since 1.1.1970 (mc)
setuid - set real user id (m+c)
setup - initialize devices and mount root (-)
sgetmask - see siggetmask (m)
shmat - attach shared memory to data segment (m!c)
shmctl - ipc manipulate shared memory (m!c)
shmdt - detach shared memory from data segment (m!c)
shmget - get/create shared memory segment (m!c)
shutdown - shutdown a socket (m!c)
sigaction - set/get signal handler (m+c)
sigblock - block signals (m!c)
siggetmask - get signal blocking of current process (!c)
signal - setup a signal handler (mc)
sigpause - use a new signal mask until a signal (mc)
sigpending - get pending, but blocked signals (m+c)
sigprocmask - set/get signal blocking of current process (+c)
sigreturn - not yet used ()
sigsetmask - set signal blocking of current process (c!)
sigsuspend - replacement for sigpause (m+c)
sigvec - see sigaction (m!)
socket - create a socket communication endpoint (m!c)
socketcall - socket call multiplexer (-)
socketpair - create 2 connected sockets (m!c)
ssetmask - see sigsetmask (m)
stat - get file status (m+c)
statfs - get filesystem status (mc)
stime - set seconds since 1.1.1970 (mc)
stty - not yet implemented ()
swapoff - stop swapping to a file/device (m-c)
swapon - start swapping to a file/device (m-c)
symlink - create a symbolic link to a file (m+c)
sync - sync memory and disk buffers (mc)
syscall - execute a systemcall by number (-!c)
sysconf - get value of a system variable (m+!c)
sysfs - get infos about configured filesystems ()
sysinfo - get Linux system infos (m-)
syslog - manipulate system logging (m-c)
system - execute a shell command (m!c)
time - get seconds since 1.1.1970 (m+c)
times - get process times (m+c)
truncate - change file size (mc)
ulimit - get/set file limits (c!)
umask - set file creation mask (m+c)
umount - unmount a filesystem (mc)
uname - get system information (m+c)
unlink - remove a file when not busy (m+c)
uselib - use a shared library (m-c)
ustat - not yet implemented (c)
utime - modify inode time entries (m+c)
utimes - see utime (m!c)
vfork - see fork (m!c)
vhangup - virtually hang up current tty (m-c)
vm86 - enter virtual 8086 mode (m-c)
wait - wait for process termination (m+!c)
wait3 - bsd wait for a specified process (m!c)
wait4 - bsd wait for a specified process (mc)
waitpid - wait for a specified process (m+c)
write - write data to a file (m+c)
writev - write datablocks to a file (m!c)

(m) manual page exists.
(+) POSIX compliant.
(-) Linux specific.
(c) in libc.
(!) not a sole system call.uses a different system call.

Monday, April 5, 2010


1) How do you determine the endianness of the machine using C program?

2) Using a program, how do we know whether stack grows up OR down?


* The permission info for files/directories are stored in the octal form like 777

*The maximum number of threads that may be created by a process is implementation dependent.

>>Terminating Threads:

* There are several ways in which a Pthread may be terminated:
o The thread returns from its starting routine (the main routine for the initial thread).
o The thread makes a call to the pthread_exit subroutine (covered below).
o The thread is canceled by another thread via the pthread_cancel routine (not covered here).
o The entire process is terminated due to a call to either the exec or exit subroutines.

* pthread_exit is used to explicitly exit a thread. Typically, the pthread_exit() routine is called after a thread has completed its work and is no longer required to exist.

* If main() finishes before the threads it has created, and exits with pthread_exit(), the other threads will continue to execute. Otherwise, they will be automatically terminated when main() finishes.

* The programmer may optionally specify a termination status, which is stored as a void pointer for any thread that may join the calling thread.

* Cleanup: the pthread_exit() routine does not close files; any files opened inside the thread will remain open after the thread is terminated.

> All of the system calls that the given libc supports is present in unistd.h file
Those system calls which are not known to libc but known to hardware could be defined using syscall. As an example
If new calls appear that don’t have a stub in libc yet, you can use syscall().
As an example, you can close a file using syscall() like this (not advised):
extern int syscall(int, ...);
int my_close(int filedescriptor)
return syscall(SYS_close, filedescriptor);

> In Linux versions before 2.6.11, the capacity of a pipe was the same as
the system page size (e.g., 4096 bytes on x86). Since Linux 2.6.11,
the pipe capacity is 65536 bytes.
> According to
POSIX.1, pipes only need to be unidirectional


The dup() system call uses
the lowest-numbered, unused descriptor for the new one.
int dup(int oldfd)
the old descriptor is not closed! Both may be used interchangeably

int dup2( int oldfd, int newfd );
the old descriptor is closed with dup2()!

ATOMIC operations are those which are NOT interrupted by any sources including the scheduler

Under Linux, #define PIPE_BUF 4096 and hence the atomic operation is defined for less than or greater than 4KB. Above this size, the operation might split and hence would be NON-ATOMIC

But under POSIX, we have
#define _POSIX_PIPE_BUF 512

#define MSGMAX 4056 /* <= 4056 */ /* max size of message (bytes) */
Messages can be no larger than 4,056 bytes in total size, including the mtype member,
which is 4 bytes in length (long).

Inside kernel, all IPC's are stored as structures. For a message queue, each message is stored as one structure and stored as a singly linked list


>Semaphores can best be described as counters used to control access to shared resources by multiple processes.
> Used as the MOST DIFFICULT to GRASP amongst the 3 IPC's

Every ANSI C compiler is required to support at least:
• 31 parameters in a function definition
• 31 arguments in a function call
• 509 characters in a source line
• 32 levels of nested parentheses in an expression
• The maximum value of long int can't be any less than 2,147,483,647, (i.e., long integers
are at least 32 bits)


Process & Threads
Processes contain information about program resources and program execution state, including:

* Process ID, process group ID, user ID, and group ID
* Environment
* Working directory.
* Program instructions
* Registers
* Stack
* Heap
* File descriptors
* Signal actions
* Shared libraries
* Inter-process communication tools (such as message queues, pipes, semaphores, or shared memory).

Thread maintains its own:

* Stack pointer
* Registers
* Scheduling properties (such as policy or priority)
* Set of pending and blocked signals
* Thread specific data.

* Mutex is an abbreviation for "mutual exclusion". Mutex variables are one of the primary means of implementing thread synchronization and for protecting shared data when multiple writes occur.

* A mutex variable acts like a "lock" protecting access to a shared data resource. The basic concept of a mutex as used in Pthreads is that only one thread can lock (or own) a mutex variable at any given time. Thus, even if several threads try to lock a mutex only one thread will be successful. No other thread can own that mutex until the owning thread unlocks that mutex. Threads must "take turns" accessing protected data.

* Mutexes can be used to prevent "race" conditions.

* Very often the action performed by a thread owning a mutex is the updating of global variables. This is a safe way to ensure that when several threads update the same variable, the final value is the same as what it would be if only one thread performed the update. The variables being updated belong to a "critical section".

* A typical sequence in the use of a mutex is as follows:
o Create and initialize a mutex variable
o Several threads attempt to lock the mutex
o Only one succeeds and that thread owns the mutex
o The owner thread performs some set of actions
o The owner unlocks the mutex
o Another thread acquires the mutex and repeats the process
o Finally the mutex is destroyed

* When several threads compete for a mutex, the losers block at that call - an unblocking call is available with "trylock" instead of the "lock" call.

* When protecting shared data, it is the programmer's responsibility to make sure every thread that needs to use a mutex does so. For example, if 4 threads are updating the same data, but only one uses a mutex, the data can still be corrupted.

Creating and Destroying Mutexes

pthread_mutex_init (mutex,attr)

pthread_mutex_destroy (mutex)

pthread_mutexattr_init (attr)

pthread_mutexattr_destroy (attr)


* Mutex variables must be declared with type pthread_mutex_t, and must be initialized before they can be used. There are two ways to initialize a mutex variable:

1. Statically, when it is declared. For example:
pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;

2. Dynamically, with the pthread_mutex_init() routine. This method permits setting mutex object attributes, attr.

The mutex is initially unlocked.

* The attr object is used to establish properties for the mutex object, and must be of type pthread_mutexattr_t if used (may be specified as NULL to accept defaults). The Pthreads standard defines three optional mutex attributes:
o Protocol: Specifies the protocol used to prevent priority inversions for a mutex.
o Prioceiling: Specifies the priority ceiling of a mutex.
o Process-shared: Specifies the process sharing of a mutex.

Note that not all implementations may provide the three optional mutex attributes.

* The pthread_mutexattr_init() and pthread_mutexattr_destroy() routines are used to create and destroy mutex attribute objects respectively.

* pthread_mutex_destroy() should be used to free a mutex object which is no longer needed.

Mutex Variables
Locking and Unlocking Mutexes

pthread_mutex_lock (mutex)

pthread_mutex_trylock (mutex)

pthread_mutex_unlock (mutex)


* The pthread_mutex_lock() routine is used by a thread to acquire a lock on the specified mutex variable. If the mutex is already locked by another thread, this call will block the calling thread until the mutex is unlocked.

* pthread_mutex_trylock() will attempt to lock a mutex. However, if the mutex is already locked, the routine will return immediately with a "busy" error code. This routine may be useful in preventing deadlock conditions, as in a priority-inversion situation.

* pthread_mutex_unlock() will unlock a mutex if called by the owning thread. Calling this routine is required after a thread has completed its use of protected data if other threads are to acquire the mutex for their work with the protected data. An error will be returned if:
o If the mutex was already unlocked
o If the mutex is owned by another thread

For more info,

One primamry difference between using POSIX mutex is
The program execution within critical section (lock) becomes sequential. Only one thread can execute. Whereas if there is NO LOCKING, the execution goes concurrently. To test this, use a sleep(10) in between the locks and verify

Sunday, March 21, 2010

Sample App to test whether high res timers are available


int check_timer(void);

int main()

if (check_timer())
fprintf(stderr,"NO: High resolution timers not available\n");
fprintf(stderr,"YES: High resolution timers are available\n");

return 0;


int check_timer(void)
struct timespec ts;

if (clock_getres(CLOCK_MONOTONIC, &ts))
return 1;

return (ts.tv_sec != 0 || ts.tv_nsec != 1);

Monday, March 8, 2010

RT Linux

1) Download the rt patch that is closer to kernel version that you are using
2) Apply the patch. This might be painful sometimes as some files like hrtimer.c would be the most modified. The patch might fail for these files depending upon the gap between the kernel version and rt patch version
3)Rebuild the kernel

Now that the kernel is made RT, we should make the thread RT. This would be done as follows

int create_thread()
unsigned int ulRetval = 0;

ulRetval = pthread_attr_init(&stThreadAttr);
ulRetval = pthread_attr_setinheritsched(&stThreadAttr, PTHREAD_EXPLICIT_SCHED);
ulRetval = pthread_attr_setschedpolicy(&stThreadAttr, SCHED_FIFO);
param.sched_priority = 45; (less than 50)
ulRetval = pthread_attr_setschedparam(&stThreadAttr, ¶m);
slReturn = pthread_create( &thread, &stThreadAttr, func1, NULL);

Linux Address Types

The following is a list of address types used in Linux
User virtual addresses

These are the regular addresses seen by user-space programs. User addresses are either 32 or 64 bits in length, depending on the underlying hardware architecture, and each process has its own virtual address space.

Physical addresses

The addresses used between the processor and the system's memory. Physical addresses are 32- or 64-bit quantities; even 32-bit systems can use 64-bit physical addresses in some situations.

Bus addresses

The addresses used between peripheral buses and memory. Often they are the same as the physical addresses used by the processor, but that is not necessarily the case. Bus addresses are highly architecture dependent, of course.
Kernel logical addresses

These make up the normal address space of the kernel. These addresses map most or all of main memory, and are often treated as if they were physical addresses. On most architectures, logical addresses and their associated physical addresses differ only by a constant offset. Logical addresses use the hardware's native pointer size, and thus may be unable to address all of physical memory on heavily equipped 32-bit systems. Logical addresses are usually stored in variables of type unsigned long or void *. Memory returned from kmalloc has a logical address.

Kernel virtual addresses

These differ from logical addresses in that they do not necessarily have a direct mapping to physical addresses. All logical addresses are kernel virtual addresses; memory allocated by vmalloc also has a virtual address (but no direct physical mapping). The function kmap, described later in this chapter, also returns virtual addresses. Virtual addresses are usually stored in pointer variables.

If you have a logical address, the macro __pa() (defined in ) will return its associated physical address. Physical addresses can be mapped back to logical addresses with __va(), but only for low-memory pages.

Low memory

Memory for which logical addresses exist in kernel space. On almost every system you will likely encounter, all memory is low memory.

High memory

Memory for which logical addresses do not exist, because the system contains more physical memory than can be addressed with 32 bits.

MMAP with example

Mapping a device means associating a range of user-space addresses to device memory. Whenever the program reads or writes in the assigned address range, it is actually accessing the device. In the X server example, using mmap allows quick and easy access to the video card's memory. For a performance-critical application like this, direct access makes a large difference.

#include <linux/version.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
# include <linux/modversions.h>
#include <asm/io.h>

/* character device structures */
static dev_t mmap_dev;
static struct cdev mmap_cdev;

/* methods of the character device */
static int mmap_open(struct inode *inode, struct file *filp);
static int mmap_release(struct inode *inode, struct file *filp);
static int mmap_mmap(struct file *filp, struct vm_area_struct *vma);

/* the file operations, i.e. all character device methods */
static struct file_operations mmap_fops = {
.open = mmap_open,
.release = mmap_release,
.mmap = mmap_mmap,
.owner = THIS_MODULE,

// internal data
// length of the two memory areas
#define NPAGES 16
// pointer to the vmalloc'd area - alway page aligned
static char *vmalloc_area;

/* character device open method */
static int mmap_open(struct inode *inode, struct file *filp)
static int mmap_open(struct inode *inode, struct file *filp)
return 0;
/* character device last close method */
static int mmap_release(struct inode *inode, struct file *filp)
return 0;
// helper function, mmap's the vmalloc'd area which is not physically contiguous
int mmap_vmem(struct file *filp, struct vm_area_struct *vma)
int ret;
long length = vma->vm_end - vma->vm_start;
unsigned long start = vma->vm_start;
char *vmalloc_area_ptr = (char *)vmalloc_area;
unsigned long pfn;

printk(KERN_INFO"mmap_vmem is invoked\n");
/* check length - do not allow larger mappings than the number of
pages allocated */
if (length > NPAGES * PAGE_SIZE)
return -EIO;

/* loop over all pages, map it page individually */
while (length > 0) {
pfn = vmalloc_to_pfn(vmalloc_area_ptr);
if ((ret = remap_pfn_range(vma, start, pfn, PAGE_SIZE,
PAGE_SHARED)) < 0) {
return ret;
start += PAGE_SIZE;
vmalloc_area_ptr += PAGE_SIZE;
length -= PAGE_SIZE;
return 0;

/* character device mmap method */
static int mmap_mmap(struct file *filp, struct vm_area_struct *vma)
printk(KERN_INFO"mmap_mmap is invoked\n");
/* at offset 0 we map the vmalloc'd area */
if (vma->vm_pgoff == 0) {
return mmap_vmem(filp, vma);
#if 0
/* at offset NPAGES we map the kmalloc'd area */
if (vma->vm_pgoff == NPAGES) {
return mmap_kmem(filp, vma);
/* at any other offset we return an error */
return -EIO;

/* module initialization - called at module load time */
static int __init mmap_init(void)
int ret = 0;
int i;
char *my_char_ptr, *my_char_ptr_2;
int *my_int_ptr;
/* allocate a memory area with vmalloc. */
#if 1
if ((vmalloc_area = (char *)vmalloc(NPAGES * PAGE_SIZE)) == NULL) {
ret = -ENOMEM;
goto out_vfree;
/* get the major number of the character device */
if ((ret = alloc_chrdev_region(&mmap_dev, 0, 1, "mmap")) < 0) {
printk(KERN_ERR "could not allocate major number for mmap\n");
goto out_vfree;

/* initialize the device structure and register the device with the kernel */
cdev_init(&mmap_cdev, &mmap_fops);
if ((ret = cdev_add(&mmap_cdev, mmap_dev, 1)) < 0) {
printk(KERN_ERR "could not allocate chrdev for mmap\n");
goto out_unalloc_region;
#if 0
/* mark the pages reserved */
for (i = 0; i < NPAGES * PAGE_SIZE; i+= PAGE_SIZE) {
SetPageReserved(vmalloc_to_page((void *)(((unsigned long)vmalloc_area) + i)));
/* store a pattern in the memory - the test application will check for it */
#if 1
my_char_ptr = vmalloc_area;
memcpy(my_char_ptr," ------This is from kernel space",100); my_int_ptr = (int *)vmalloc_area + 100;
*my_int_ptr = 1000;
my_char_ptr_2 = (char *)((int *)vmalloc_area + 100 + 4);
memcpy(my_char_ptr_2," ----This is second message from kernel space",100);

return ret;

unregister_chrdev_region(mmap_dev, 1);

return ret;

/* module unload */
static void __exit mmap_exit(void)
int i;

/* remove the character deivce */
unregister_chrdev_region(mmap_dev, 1);
#if 1
/* unreserve the pages */
for (i = 0; i < NPAGES * PAGE_SIZE; i+= PAGE_SIZE) {
SetPageReserved(vmalloc_to_page((void *)(((unsigned long)vmalloc_area) + i)));
/* free the memory areas */
// kfree(kmalloc_ptr);

MODULE_DESCRIPTION("mmap demo driver");
MODULE_AUTHOR("Martin Frey ");

Build this as a module for some built kernel and insert the module

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>

#define NPAGES 16

/* this is a test program that opens the mmap_drv.
It reads out values of the kmalloc() and vmalloc()
allocated areas and checks for correctness.
You need a device special file to access the driver.
The device special file is called 'node' and searched
in the current directory.
To create it
- load the driver
'insmod mmap_mod.o'
- find the major number assigned to the driver
'grep mmapdrv /proc/devices'
- and create the special file (assuming major number 254)
'mknod node c 254 0'

char *my_first_ptr;

int main(void)
int fd;
unsigned char *vadr;
unsigned int *kadr;

int len = NPAGES * getpagesize();

if ((fd=open("node", O_RDWR|O_SYNC))<0)

vadr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (vadr == MAP_FAILED)

printf("The contents of the virtual address space in user space is %s\n", vadr);
memcpy(vadr,"This is from user space", 50);
if(munmap(vadr, len) == -1)
perror("munmap failed\n");

Build this as a user space application and run this application
Note: You might have to create a node using mknod function. Create as explained in the mmap_test.c file and then run
The application should print the message "This is a message from kernel"