Sunday, August 17, 2014

Unix Prog: File I/O efficiency

1. When doing the I/O(read, write), we need to define the buffer size, which can significantly affect the I/O efficiency.

fileio.c:
 #include<unistd.h>  
 #include<fcntl.h>  
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<errno.h>  
   
 #define BUFFSIZE 4096  
 // By defining the BUFFSIZE to disk block size, it will make  
 // I/O highly efficient  
   
 int main()  
 {  
  int n;  
  char buff[BUFFSIZE];  
   
  // Read the information from standard input, and write same info  
  // to standard output  
   
  // Note: the way to detect if "read error" happens is checking  
  // if the read info's size is positive  
  // and the way to detect if "write error" happens is checking  
  // if the write info's size is equal to "n" we passed in  
   
  // read can only read info whose size is less than the one we  
  // requested, when reaching the end of regular file  
  // write is different, if returned "written size" is not equal  
  // to the size we requested, then error must occur  
  while((n = read(STDIN_FILENO, buff, BUFFSIZE)) > 0) {  
   if (write(STDOUT_FILENO, buff, n) != n) {  
    printf("write error!\n");  
    exit(1);  
   }  
  }  
   
  if (n < 0) {  
   printf("read error!\n");  
   exit(2);  
  }  
   
  exit(0);  
 }  

shell:
 ubuntu@ip-172-31-23-227:~$ cat test.txt  
 Hello world!  
 HELLO WORLD!  
 ubuntu@ip-172-31-23-227:~$ ./io.out <test.txt  
 Hello world!  
 HELLO WORLD!  

2. sync, fsync, fdatasync
delay write: in order to improve the write efficiency, whenever user requests "write" system call, unix will put the data into one of queued buffers for writing to disk at some later time.

For some special applications, like database, it will need the "write" operation to be effective immediately.

Then following 3 system calls could solve our problems:
All of them are defined at /usr/include/unistd.h
1) fsync: for specific file, wait for the disk writes to be completed before running
2) fdatasync: similar from fsync, but only affect the data part of the file
3) sync: queues all modified block buffers for writing and returns
 ubuntu@ip-172-31-23-227:~$ less /usr/include/unistd.h  
 ......  
 /* Make all changes done to FD actually appear on disk.  
   
   This function is a cancellation point and therefore not marked with  
   __THROW. */  
 extern int fsync (int __fd);  
 ......  
 /* Make all changes done to all files actually appear on disk. */  
 extern void sync (void) __THROW;  
 ......  
 /* Synchronize at least the data part of a file with the underlying  
   media. */  
 extern int fdatasync (int __fildes);  
 ......  

fileio.c:
 #include<unistd.h>  
 #include<fcntl.h>  
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<errno.h>  
   
 int main()  
 {  
  char buff[] = "Hello world!";  
  int fd;  
   
  if((fd = open("test.txt", O_RDWR)) == -1) {  
   printf("open error!\n");  
   exit(1);  
  }  
   
  if(write(fd, buff, 12) != 12) {  
   printf("write error!\n");  
   exit(2);  
  }  
   
  // Make all buffers associated with fd to appear on disk  
  if(fsync(fd) == -1) {  
   printf("fsync error!\n");  
   exit(3);  
  }  
   
  // Make all buffer's data part associated with fd to appear on disk  
  if(fdatasync(fd) == -1) {  
   printf("fdatasync error!\n");  
   exit(4);  
  }  
   
  // Make all queued buffer to appear on disk  
  // sync doesn't return value, so we can't use the return code to  
  // detect if it is good or bad.  
  sync();  
   
  close(fd);  
  exit(0);  
 }  

No comments:

Post a Comment