Saturday, September 13, 2014

Unix Prog: Clone Process -- fork

1. System Definition:

 ubuntu@ip-172-31-23-227:~$ less /usr/include/unistd.h  
 ......  
 /* Clone the calling process, creating an exact copy.  
   Return -1 for errors, 0 to the new process,  
   and the process ID of the new process to the old process. */  
 extern __pid_t fork (void) __THROWNL;  
 ......  

Returning 0 to the new child: each child can only have one parent, and it can always use getppid to get the parent process id.

Returning the process ID of new child to the parent process: this way could allow one process to generate more than one processes. And there is no other system call for a process to get the child process id.

The child process is a copy of the parent process, including the data, heap, stack etc, but they share the text segment(instructions to execute the program)

Copy-On-Write Technique: at modern implementations, data, heap, stack are not copied in the very beginning, but if child process need to modify part of them, that part will be written as one clone. This way could reduce the space consumption.

2. Example:

proc.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
   
 int glob = 0;  
 char buf[] = "a write to std out without buffering!\n";  
   
 int main(int argc, char* argv[])  
 {  
  int val = 0;  
  pid_t pid;  
   
  // Following buffer will be output immediately, the reason to use "sizeof(buf)-1" is:
  // sizeof will count the ending "null", in order to remove the ending "null", we subtract one
  // from the entire length
  if(write(STDOUT_FILENO, buf, sizeof(buf)-1) != sizeof(buf) - 1) {  
   printf("write error!\n");  
   exit(1);  
  }  
   
  // This will go through the buffer. If standard output is terminal, it would  
  // be line buffered, if standard output is file, it would be fully buffered  
  printf("before fork: \n");  
   
  if((pid = fork()) < 0) {  
   printf("fork error!\n");  
   exit(2);  
  }  
  else if(pid == 0) {// child process execution  
   val++;  
   glob++;  
  }  
  else { // parent process execution  
   sleep(3);  
  }  
   
  // Since parent process slept 3 seconds, normally child process should  
  // execute the below statement firstly, then parent process  
  // But this is depending on what scheduling algorithm OS is using.  
  printf("Process ID: %d, Parent Process ID: %d val: %d, glob: %d\n",  
      getpid(), getppid(), val, glob);  
   
  exit(0);  
 }  

shell:
1) standard output is the terminal in this case. So it will output "a write to std out without buffering!" immediately at the old process, since it does not get buffered at all. Since it is line buffered, so when printing out "before fork: \n", it flush the buffer immediately when encountering the new line operator.
Then, parent process clone a new process, sleep for 3 seconds, child process output firstly, after 3 seconds, parent process output.From the output we can see that , child's parent process id is the old process's id.

2) standard ouptut is the file in this case, which means, standard output is fully buffered.
Firstly, the old process output the "a write to std out without buffering!" immediately since it is not buffered at all.
Secondly, the old process put the line "before fork: \n" into the buffer waiting to be flushed.
Thirdly, the old process clones a new process, which contains the buffer including "before fork" but not flushed yet. Then old process sleep for 3 seconds, child process get executed firstly, when child process exits, all content inside the buffer get flushed, including "before fork" line, and its own output line.
Finally, the old process wake up, and move forward to save another line into buffer, exit. Then all contents in the buffer are flushed.

That's why we can see 2 lines of "before fork" in the output file.
 ubuntu@ip-172-31-23-227:~$ ./proc.out  
 a write to std out without buffering!  
 before fork:  
 Process ID: 4218, Parent Process ID: 4217 val: 1, glob: 1  
 Process ID: 4217, Parent Process ID: 4069 val: 0, glob: 0  
 ubuntu@ip-172-31-23-227:~$ ./proc.out >output.txt  
 ubuntu@ip-172-31-23-227:~$ cat output.txt  
 a write to std out without buffering!  
 before fork:  
 Process ID: 4220, Parent Process ID: 4219 val: 1, glob: 1  
 before fork:  
 Process ID: 4219, Parent Process ID: 4069 val: 0, glob: 0  

3. File Sharing between processes

When process clones a new process, in the child process, the file descriptor inherited from parent process is pointing to the same file table entry. The means that both parent process and child process are operating the same file table entry's variables, like offset etc. Either process after writing information, the offset in the file table entry will be updated, and another process will append to the end of content after the other one writes.

proc.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
   
 int main(int argc, char* argv[])  
 {  
  pid_t pid;  
  char buf_1[] = "child: a write to stdout!\n";  
  char buf_2[] = "parent: a write to stdout!\n";  
   
  if((pid = fork()) < 0) {  
   printf("fork error!\n");  
   exit(0);  
  }  
  else if(pid == 0) { // Child process execution  
   sleep(2);  
   if(write(STDOUT_FILENO, buf_1, sizeof(buf_1)-1) != sizeof(buf_1) - 1) {  
    printf("write error!\n");  
    exit(1);  
   }  
   exit(0);  
  }  
  else { // parent process execution  
   if(write(STDOUT_FILENO, buf_2, sizeof(buf_2)-1) != sizeof(buf_2) -1) {  
    printf("write error!\n");  
    exit(1);  
   }  
   close(STDOUT_FILENO);  
  }  
   
  exit(0);  
 }  

shell:
After cloning the process, the child process sleep for 2 seconds, during which, parent process make one output, update the offset in file table entry, close its file descriptor, but it doesn't delete the file table entry, since there is still another file descriptor in child process referring to it.
After 2 seconds, child process wake up, append content right after the content written by parent process.
 ubuntu@ip-172-31-23-227:~$ ./proc.out >output.txt  
 ubuntu@ip-172-31-23-227:~$ cat output.txt  
 parent: a write to stdout!  
 child: a write to stdout!  

4. two use cases for fork
1) For network servers, when one request comes, it will just fork another child process to serve the request, parent process will come back to wait again for the new request to come
2) For shell, it will fork a new process to run different section of code, and parent process will come back to wait for next command from user.

No comments:

Post a Comment