File descriptors in shell

Usually people don't know or have forgotten that they can open file descriptors in shell.

Opening a file descriptor in shell can be useful for two things: manipulating several input and output as the same time, and for performance.

How to create a file descriptors


exec 6</tmp/foo, opens the file /tmp/foo for input on the file descriptor #6. This is equivalent to the system call open("/tmp/foo", O_RDONLY)

exec 7>/tmp/bar, opens the file /tmp/bar for output on the file descriptor #7. If the file already exist it is recreated. This is equivalent to the system call open("/tmp/bar", O_WRONLY|O_CREAT).

exec 7>>/tmp/bar, opens the file /tmp/bar for appending on the file descriptor #7. If the file does not exist it is created. This is equivalent to the system call open("/tmp/bar", O_APPEND|O_CREAT).

exec 6&- close the file descriptor.

You can also do other file descriptor manipulation such as duplicating file descriptors, or redirecting standard IO to files. To learn about it I invite you to read the sh, ksh or bash man page.

Performance


When your shell script evaluates a line like "echo $x >>/tmp/file", the file is opened, the content of the variable is written and the file is then closed. If this is done a couple of times in your shell script, that's fine. But if you have a loop with several thousand writes, using a file descriptor can dramatically improve the performace of your script.

Here is a quick test I have done with the following shell script.
#!/bin/bash
cat /tmp/foo | while read a
do
    echo $a >>/tmp/bar
done

#!/bin/bash
cat /tmp/foo | while read a
do
echo $a >>/tmp/bar
done


The input file is a file containing 2.000.000 lines. This means that the shell script is going to open, write, and close /tmp/bar 2 million times. After running the shell, I get the following results.
$ time ./bar.sh 

real	18m26.416s
user	4m9.820s
sys	8m42.876s

$ time ./bar.sh

real 18m26.416s
user 4m9.820s
sys 8m42.876s


Now using the same shell script with file descriptors.
#!/bin/bash

# open the files
exec 6</tmp/foo
exec 7>/tmp/bar
# data "processing"
cat <&6 | while read a
do
    echo $a >&7
done

#!/bin/bash

# open the files
exec 6</tmp/foo
exec 7>/tmp/bar
# data "processing"
cat <&6 | while read a
do
echo $a >&7
done


I get these results. As you can see we have a performance increase of almost 4.5.
$ time ./bar.sh 

real	4m30.501s
user	2m22.004s
sys	1m17.993s

$ time ./bar.sh

real 4m30.501s
user 2m22.004s
sys 1m17.993s


Playing with several file descriptors


Here is a small example of a shell script using several file descriptors. This script reads a file with the following columns "ip_address trafic_in trafic_out", and writes two files trafic_in and trafic_out.
#!/bin/bash

# open the two output files
exec 6>/tmp/trafic_in.dat
exec 7>/tmp/trafic_out.dat

#open the file containing the data for input.
exec 8</tmp/all_trafic.dat

# data processing
grep -v '^#' <&8 | while read line
do
    set - $(echo $line)
    echo "${1}	${2}" >&6
    echo "${1}	${3}" >&7
done

#close the file descriptors
exec 6<&-
exec 7<&-

#!/bin/bash

# open the two output files
exec 6>/tmp/trafic_in.dat
exec 7>/tmp/trafic_out.dat

#open the file containing the data for input.
exec 8</tmp/all_trafic.dat

# data processing
grep -v '^#' <&8 | while read line
do
set - $(echo $line)
echo "${1} ${2}" >&6
echo "${1} ${3}" >&7
done

#close the file descriptors
exec 6<&-
exec 7<&-

When executed this code produces the following output
$ head -4 /tmp/all_trafic.dat 
# Host                             In (bytes)    Out (bytes)  Total (bytes)
172.16.1.1                         2728242803    17456323158    20184565961
172.16.1.2                        62238068877   146358768518   208596837395
172.16.1.3                       123056619706   150682371892   273738991598

$ ./split_trafic.sh
$ 
$ head -4 /tmp/trafic_in.dat 
172.16.1.1	2728242803
172.16.1.2	62238068877
172.16.1.3	123056619706
172.16.1.4	7684221078
$ head -4 /tmp/trafic_out.dat 
172.16.1.1	17456323158
172.16.1.2	146358768518
172.16.1.3	150682371892
172.16.1.4	1931647367

$ head -4 /tmp/all_trafic.dat
# Host In (bytes) Out (bytes) Total (bytes)
172.16.1.1 2728242803 17456323158 20184565961
172.16.1.2 62238068877 146358768518 208596837395
172.16.1.3 123056619706 150682371892 273738991598

$ ./split_trafic.sh
$
$ head -4 /tmp/trafic_in.dat
172.16.1.1 2728242803
172.16.1.2 62238068877
172.16.1.3 123056619706
172.16.1.4 7684221078
$ head -4 /tmp/trafic_out.dat
172.16.1.1 17456323158
172.16.1.2 146358768518
172.16.1.3 150682371892
172.16.1.4 1931647367

 

Comments

Posted by: najaf husain zaidi Oct 11, 2008 @ 02:41

great article . thaks a lot najaf husain zaidi

Leave a message

(Required)
(Required and not displayed)
(Optional)
obfuscated letters Enter the text shown in the image