Sunday, November 9, 2014

Unix Prog: Memory-Mapped I/O

1. Memory-Mapped I/O
Memory-Mapped I/O lets us map a file on disk into a buffer in memory so that, when we fetch bytes from the buffer, the corresponding bytes of the file are read. When we store data in the buffer, the corresponding bytes are automatically written in the file.

2. mmap
System Definition:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/mman.h  
 ......  
 /* Map addresses starting near ADDR and extending for LEN bytes. from  
   OFFSET into the file FD describes according to PROT and FLAGS. If ADDR  
   is nonzero, it is the desired mapping address. If the MAP_FIXED bit is  
   set in FLAGS, the mapping will be at ADDR exactly (which must be  
   page-aligned); otherwise the system chooses a convenient nearby address.  
   The return value is the actual mapping address chosen or MAP_FAILED  
   for errors (in which case `errno' is set). A successful `mmap' call  
   deallocates any previous mapping for the affected region. */  
   
 #ifndef __USE_FILE_OFFSET64  
 extern void *mmap (void *__addr, size_t __len, int __prot,  
           int __flags, int __fd, __off_t __offset) __THROW;  
 ......  

mmap rule:
1) first argument addr is the address of where we want the mapped region to start. We normally set this to 0 to allow the system to choose the starting address. The return value is either starting address of the mapped area or MAP_FAILED.
2) Second argument len indicates the number of bytes to map. fd indicates the file descriptor that is to be mapped, we have to open the file before mapping. off indicates the starting offset in the file of the bytes to map.
3) Third argument prot specifies the protection of the mapped region:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/mman-linux.h  
 ......  
 #define PROT_READ    0x1       /* Page can be read. */  
 #define PROT_WRITE   0x2       /* Page can be written. */  
 #define PROT_EXEC    0x4       /* Page can be executed. */  
 #define PROT_NONE    0x0       /* Page can not be accessed. */  
 ......  
The protection bit is the bitwise OR combination of several bits, and it can not violate the file descriptor open mode.
4) Fourth argument flag affects various attributes of the mapped region:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/mman-linux.h  
 ......  
 /* Sharing types (must choose one and only one of these). */  
 #define MAP_SHARED   0x01      /* Share changes. */  
 #define MAP_PRIVATE   0x02      /* Changes are private. */  
 #ifdef __USE_MISC  
 # define MAP_TYPE    0x0f      /* Mask for type of mapping. */  
 #endif  
   
 /* Other flags. */  
 #define MAP_FIXED    0x10      /* Interpret addr exactly. */  
 ......  
MAP_FIXED: this flag indicates that the returned value must be equal to first argument addr(must be page-aligned). If this flag is not specified, mmap will not guarantee the returning address is first argument. Instead, it will try to find the correct address nearby.

MAP_SHARED: this flag indicates that once data is written to the mapped memory region, then data will be written to the file directly

MAP_PRIVATE: this flag indicates that once data is written to the mapped memory, a private copy of the mapped file is created.

5) The value of addr and offset are required to be multiples of the system's virtual memory page size, which can be obtained by sysconf function, with an argument of _SC_PAGESIZE.

6) If addr is the multiples of the system's virtual memory page size, then the mapped region must start from the beginning of some virtual memory page. In this case, if "len" is not the multiples of page size, the OS will still allocate multiples of page size to cover the entire length. For the "remaining part" in the last page, any modification will not affect the content in file.

7) When modifying the mapped region, two signals may get generated:
SIGSEGV: we have tried to access memory that is not available to us, or, we have tried to modify content in a mapped region that specified to mmap as read-only.
SIGBUS: we have tried to access part of mapped region that doesn't make sense at the time of the access. If the file is truncated to be empty, then any portion of mapped region is not making sense any more.

8) A memory-mapped region is inherited by a child across a fork, but not inherited across the exec.

3. mprotect:
We can change the permission on an existing mapping by calling mprotect:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/mman.h  
 .......  
 /* Change the memory protection of the region starting at ADDR and  
   extending LEN bytes to PROT. Returns 0 if successful, -1 for errors  
   (and sets errno). */  
 extern int mprotect (void *__addr, size_t __len, int __prot) __THROW;  
 ......  

Same as mmap, addr must be the multiple of virtual memory page size.

4. msync
If the pages in a shared mapping have been modified, we can call msync to flush the changes to the file that backs the mapping.
Note: If the mapping is private, the file mapped is not modified.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/mman.h  
 ......  
 /* Synchronize the region starting at ADDR and extending LEN bytes with the  
   file it maps. Filesystem operations on a file being mapped are  
   unpredictable before this is done. Flags are from the MS_* set.  
   
   This function is a cancellation point and therefore not marked with  
   __THROW. */  
 extern int msync (void *__addr, size_t __len, int __flags);  
 ......  

Same as above, addr must be the multiple of virtual memory page size.

The third argument, flags allows us some control over how the memory is flushed.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/mman-linux.h  
 ......  
 /* Flags to `msync'. */  
 #define MS_ASYNC    1        /* Sync memory asynchronously. */  
 #define MS_SYNC     4        /* Synchronous memory sync. */  
 #define MS_INVALIDATE  2        /* Invalidate the caches. */  
 ......  

MS_ASYNC will simply schedule the pages to be written and then return
MS_SYNC will make msync wait for the write to complete before returning.
MS_INVALIDTE will make operating system to discard any pages that are out of sync with the underlying storage.

4. munmap
System Definition:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/mman.h  
 ......  
 /* Deallocate any mapping for the region starting at ADDR and extending LEN  
   bytes. Returns 0 if successful, -1 for errors (and sets errno). */  
 extern int munmap (void *__addr, size_t __len) __THROW;  
 ......  

same as above, addr need to be the multiples of virtual memory page size.
A memory-mapped region is automatically unmapped when the process terminates or by calling munmap directly. Closing the filedes doesn't unmap the region.
munmap doesn't affect the object that was mapped, the call to munmap does not cause the contents of the mapped region to be written to the disk file.

5. Example:
map.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
 #include<fcntl.h>  
 #include<string.h>  
 #include<sys/mman.h>  
   
 int main(int argc, char* argv[])  
 {  
  int fdin, fdout;  
  void *src, *dst;  
  struct stat statbuf;  
   
  // Open the input file, create the output file  
  if((fdin = open(argv[1], O_RDONLY)) < 0) {  
   printf("can't open the file.\n");  
   exit(1);  
  }  
   
  if((fdout = open(argv[2], O_RDWR | O_CREAT | O_TRUNC)) < 0) {  
   printf("can't create the file for writing.\n");  
   exit(2);  
  }  
   
  // Get the input file size  
  if(fstat(fdin, &statbuf) < 0) {  
   printf("fstat error!\n");  
   exit(3);  
  }  
   
  // Setup the output file size, in order to make sure memory mapping  
  if(lseek(fdout, statbuf.st_size - 1, SEEK_SET) == -1) {  
   printf("lseek error!\n");  
   exit(4);  
  }  
   
  if(write(fdout, "", 1) != 1) {  
   printf("write error!\n");  
  }  
   
  // Setup the memory mapping for input file  
  if((src = mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0)) == MAP_FAILED) {  
   printf("mmap error for ouptut!\n");  
   exit(5);  
  }  
   
  // Setup the memory mapping for output file  
  if((dst = mmap(0, statbuf.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fdout, 0)) == MAP_FAILED) {  
   printf("mmap error for output!\n");  
   exit(5);  
  }  
   
  // Do the memory copy, note that fdout is mapped privately, so any data won't be  
  // written to the new file even if the msync is called.  
  memcpy(dst, src, statbuf.st_size);  
   
  // Unmap the output mapped region  
  munmap(dst, statbuf.st_size);  
   
  // re-establish the mapped region for output publicly(MAP_SHARED);  
  if((dst = mmap(0, statbuf.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdout, 0)) == MAP_FAILED) {  
   printf("mmap error for output!\n");  
   exit(5);  
  }  
   
  // This time, data will be written to output file immediately after memcpy  
  memcpy(dst, src, statbuf.st_size);  
  exit(0);  
 }  

shell:
 ubuntu@ip-172-31-23-227:~$ ./map.out test1.txt test2.txt  
 ubuntu@ip-172-31-23-227:~$ cat test2.txt  
 Hello world!  
 ubuntu@ip-172-31-23-227:~$ cat test1.txt  
 Hello world!  

Note:
Memory-mapped I/O is more efficient than standard I/O. Since memory mapped I/O only transfers data among kernel buffers. Standard I/O need to copy data in kernel buffer to application buffer, then copy data from application buffer to kernel buffer: read from input file, write to output file.

No comments:

Post a Comment