Sunday, November 16, 2014

Unix Prog: Socket Addressing(2)

1. Address Lookup
The network configuration can be kept in a number of places, like static files(/etc/hosts, /etc/services etc.), or can be managed by a name service, such as DNS(Domain Name System) or NIS(Network Information Service).

1) host
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Description of data base entry for a single host. */  
 struct hostent  
 {  
  char *h_name;         /* Official name of host. */  
  char **h_aliases;       /* Alias list. */  
  int h_addrtype;        /* Host address type. */  
  int h_length;         /* Length of address. */  
  char **h_addr_list;      /* List of addresses from name server. */  
 #if defined __USE_MISC || defined __USE_GNU  
 # define    h_addr h_addr_list[0] /* Address, for backward compatibility.*/  
 #endif  
 };  
   
 /* Open host data base files and mark them as staying open even after  
   a later search if STAY_OPEN is non-zero.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void sethostent (int __stay_open);  
   
 /* Close host data base files and clear `stay open' flag.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void endhostent (void);  
   
 /* Get next entry from host data base file. Open data base if  
   necessary.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct hostent *gethostent (void);  
   
 /* Return entry from host data base which address match ADDR with  
   length LEN and type TYPE.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct hostent *gethostbyaddr (const void *__addr, __socklen_t __len,  
                    int __type);  
   
 /* Return entry from host data base for host with NAME.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct hostent *gethostbyname (const char *__name);  
 ......  

gethostent: return the next entry in the host database file.
sethostent: open the file or rewind it if it is already open
endhostent: close the file

2) networks

 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Open network data base files and mark them as staying open even  
   after a later search if STAY_OPEN is non-zero.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void setnetent (int __stay_open);  
   
 /* Close network data base files and clear `stay open' flag.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void endnetent (void);  
   
 /* Get next entry from network data base file. Open data base if  
   necessary.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct netent *getnetent (void);  
   
 /* Return entry from network data base which address match NET and  
   type TYPE.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct netent *getnetbyaddr (uint32_t __net, int __type);  
   
 /* Return entry from network data base for network with NAME.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct netent *getnetbyname (const char *__name);  
 ......  
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/netdb.h  
 ......  
 /* Description of data base entry for a single network. NOTE: here a  
   poor assumption is made. The network number is expected to fit  
   into an unsigned long int variable. */  
 struct netent  
 {  
  char *n_name;         /* Official name of network. */  
  char **n_aliases;       /* Alias list. */  
  int n_addrtype;        /* Net address type. */  
  uint32_t n_net;        /* Network number. */  
 };  
 ......  

3) Protocols
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Description of data base entry for a single service. */  
 struct protoent  
 {  
  char *p_name;         /* Official protocol name. */  
  char **p_aliases;       /* Alias list. */  
  int p_proto;         /* Protocol number. */  
 };  
   
 /* Open protocol data base files and mark them as staying open even  
   after a later search if STAY_OPEN is non-zero.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void setprotoent (int __stay_open);  
   
 /* Close protocol data base files and clear `stay open' flag.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void endprotoent (void);  
   
 /* Get next entry from protocol data base file. Open data base if  
   necessary.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct protoent *getprotoent (void);  
   
 /* Return entry from protocol data base for network with NAME.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct protoent *getprotobyname (const char *__name);  
   
 /* Return entry from protocol data base which number is PROTO.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct protoent *getprotobynumber (int __proto);  
 ......  

4) services:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Description of data base entry for a single service. */  
 struct servent  
 {  
  char *s_name;         /* Official service name. */  
  char **s_aliases;       /* Alias list. */  
  int s_port;          /* Port number. */  
  char *s_proto;        /* Protocol to use. */  
 };  
   
 /* Open service data base files and mark them as staying open even  
   after a later search if STAY_OPEN is non-zero.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void setservent (int __stay_open);  
   
 /* Close service data base files and clear `stay open' flag.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern void endservent (void);  
   
 /* Get next entry from service data base file. Open data base if  
   necessary.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct servent *getservent (void);  
   
 /* Return entry from network data base for network with NAME and  
   protocol PROTO.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct servent *getservbyname (const char *__name, const char *__proto);  
   
 /* Return entry from service data base which matches port PORT and  
   protocol PROTO.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern struct servent *getservbyport (int __port, const char *__proto);  
 ......  

5) Address Information
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Translate name of a service location and/or a service name to set of  
   socket addresses.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern int getaddrinfo (const char *__restrict __name,  
             const char *__restrict __service,  
             const struct addrinfo *__restrict __req,  
             struct addrinfo **__restrict __pai);  
   
 /* Free `addrinfo' structure AI including associated storage. */  
 extern void freeaddrinfo (struct addrinfo *__ai) __THROW;  
 ......
 /* Structure to contain information about address of a service provider.  */
 struct addrinfo
 {
   int ai_flags;                 /* Input flags.  */
   int ai_family;                /* Protocol family for socket.  */
   int ai_socktype;              /* Socket type.  */
   int ai_protocol;              /* Protocol for socket.  */
   socklen_t ai_addrlen;         /* Length of socket address.  */
   struct sockaddr *ai_addr;     /* Socket address for socket.  */
   char *ai_canonname;           /* Canonical name for service location.  */
   struct addrinfo *ai_next;     /* Pointer to next in list.  */
 };
 ......  

getaddrinfo: providing the host name, service name or both, to return one linked list of addrinfo strucutures.
freeaddrinfo: release each addrinfo structure in the linked list.

If getaddrinfo fails, we need to call gai_strerror to convert the error code returned into an error message.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Convert error return from getaddrinfo() to a string. */  
 extern const char *gai_strerror (int __ecode) __THROW;  
 ......  

6) Converts an address into a host name and a service name
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 /* Translate a socket address to a location and service name.  
   
   This function is a possible cancellation point and therefore not  
   marked with __THROW. */  
 extern int getnameinfo (const struct sockaddr *__restrict __sa,  
             socklen_t __salen, char *__restrict __host,  
             socklen_t __hostlen, char *__restrict __serv,  
             socklen_t __servlen, int __flags);  
 ......  

getnameinfo converts an unix networking address into a host name and a service name. flags give us some control about how the translation is done:

 ubuntu@ip-172-31-23-227:~$ less /usr/include/netdb.h  
 ......  
 # define NI_NUMERICHOST 1    /* Don't try to look up hostname. */  
 # define NI_NUMERICSERV 2    /* Don't convert port number to name. */  
 # define NI_NOFQDN   4    /* Only return nodename portion. */  
 # define NI_NAMEREQD  8    /* Don't return numeric addresses. */  
 # define NI_DGRAM    16   /* Look up UDP service rather than TCP. */  
 ......  

Unix Prog: Socket Addressing(1)

1. Basic Concept

The machine's network address helps us identify the computer on the network. And the service helps us identify the particular process on that computer

2. Byte Ordering
The byte order is a characteristic of the processor architecture, dictating how bytes are ordered within larger data types, such as integers.

big-endian: for value 0x04030201, 4 occupies lowest memory address, 1 occupies highest memory address

little-endian: for value 0x04030201, 4 occupies highest memory address, 1 occupies lowest memory address

Network protocols specify a byte ordering so that heterogeneous computer systems can exchange protocol information without confusing the byte ordering. So applications need to translate processor's byte order to the network byte order or in reverse.

System Calls:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netinet/in.h  
 ......  
 /* Functions to convert between host and network byte order.  
   
   Please note that these functions normally take `unsigned long int' or  
   `unsigned short int' values as arguments and also return them. But  
   this was a short-sighted decision since on different systems the types  
   may have different representations but the values are always the same. */  
   
 extern uint32_t ntohl (uint32_t __netlong) __THROW __attribute__ ((__const__));  
 extern uint16_t ntohs (uint16_t __netshort)  
    __THROW __attribute__ ((__const__));  
 extern uint32_t htonl (uint32_t __hostlong)  
    __THROW __attribute__ ((__const__));  
 extern uint16_t htons (uint16_t __hostshort)  
    __THROW __attribute__ ((__const__));  
 ......  

n represents network, h represents host, l represents long, s represents short.
ntohl represents: Convert network byte order to host byte order, unsigned long
ntohs represents: Convert network byte order to host byte order, unsigned short
htonl represents: Convert host byte order to network byte order, unsigned long
htons represents: Convert host byte order to network byte order, unsigned short

3. Address Formats:
1) Generic sockaddr structure
An address identifies a socket endpoint in a particular communication domain. The  address format is specific to the particular domain.

In order to let different socket address structure to be passed to general socket functions, unix has one generic sockaddr address structure, which can be generated by casting domain specific address structure.

 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/socket.h  
 ......  
 /* Structure describing a generic socket address. */  
 struct sockaddr  
  {  
   __SOCKADDR_COMMON (sa_);  /* Common data: address family and length. */  
   char sa_data[14];      /* Address data. */  
  };  
 ......  

2) IPV4 Address Structure
IPV4 Address can be converted to generic sockaddr structure
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netinet/in.h  
 ......  
 /* Internet address. */  
 typedef uint32_t in_addr_t;  
 struct in_addr  
  {  
   in_addr_t s_addr;  
  };  
 ......  
 /* Structure describing an Internet socket address. */  
 struct sockaddr_in  
  {  
   __SOCKADDR_COMMON (sin_);  
   in_port_t sin_port;         /* Port number. */  
   struct in_addr sin_addr;      /* Internet address. */  
   
   /* Pad to size of `struct sockaddr'. */  
   unsigned char sin_zero[sizeof (struct sockaddr) -  
               __SOCKADDR_COMMON_SIZE -  
               sizeof (in_port_t) -  
               sizeof (struct in_addr)];  
  };  
 ......  

3) IPV6 Address Structure
IPV6 address can also be converted to generic sockaddr structure
 ubuntu@ip-172-31-23-227:~$ less /usr/include/netinet/in.h  
 ......  
 /* IPv6 address */  
 struct in6_addr  
  {  
   union  
    {  
     uint8_t __u6_addr8[16];  
 #if defined __USE_MISC || defined __USE_GNU  
     uint16_t __u6_addr16[8];  
     uint32_t __u6_addr32[4];  
 #endif  
    } __in6_u;  
 #define s6_addr         __in6_u.__u6_addr8  
 #if defined __USE_MISC || defined __USE_GNU  
 # define s6_addr16       __in6_u.__u6_addr16  
 # define s6_addr32       __in6_u.__u6_addr32  
 #endif  
  };  
 ......  
 struct sockaddr_in6  
  {  
   __SOCKADDR_COMMON (sin6_);  
   in_port_t sin6_port;    /* Transport layer port # */  
   uint32_t sin6_flowinfo;   /* IPv6 flow information */  
   struct in6_addr sin6_addr; /* IPv6 address */  
   uint32_t sin6_scope_id;   /* IPv6 scope-id */  
  };  
 ......  

4) Conversion between the binary address format and a string dotted-decimal notation(for example, 127.0.0.1)

 ubuntu@ip-172-31-23-227:~$ less /usr/include/arpa/inet.h  
 ......  
 /* Convert from presentation format of an Internet number in buffer  
   starting at CP to the binary network format and store result for  
   interface type AF in buffer starting at BUF. */  
 extern int inet_pton (int __af, const char *__restrict __cp,  
            void *__restrict __buf) __THROW;  
   
 /* Convert a Internet address in binary network format for interface  
   type AF in buffer starting at CP to presentation form and place  
   result in buffer of length LEN astarting at BUF. */  
 extern const char *inet_ntop (int __af, const void *__restrict __cp,  
                char *__restrict __buf, socklen_t __len)  
    __THROW;  
 ......  

the first argument, af, represents the domain type, only following 2 domains are supported:
AF_INET, AF_INET6 (Only 2 domains are supported: AF_INET, AF_INET6)

Unix Prog: Socket Descriptors

1. Unix Socket
A socket is an abstraction of a communication endpoint.
Socket descriptors are implemented as file descriptors in the unix system.
Many of the functions that deal with file descriptors, such as read and write, will work with socket descriptor.

2. Create a socket descriptor.
System Call:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/socket.h  
 ......  
 /* Create a new socket of type TYPE in domain DOMAIN, using  
   protocol PROTOCOL. If PROTOCOL is zero, one is chosen automatically.  
   Returns a file descriptor for the new socket, or -1 for errors. */  
 extern int socket (int __domain, int __type, int __protocol) __THROW;  
 ......  

domain argument determines the nature of the communication, including the address format.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/socket.h  
 ......  
 #define AF_UNSPEC    PF_UNSPEC  
 #define AF_LOCAL    PF_LOCAL  
 #define AF_UNIX     PF_UNIX  
 ......  
 #define AF_INET     PF_INET  
 ......  
 #define AF_INET6    PF_INET6  
 ......  

type argument determines the type of the socket, which further determines the communication characteristic.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/socket_type.h  
 ......  
 /* Types of sockets. */  
 enum __socket_type  
 {  
  SOCK_STREAM = 1,       /* Sequenced, reliable, connection-based  
                   byte streams. */  
 #define SOCK_STREAM SOCK_STREAM  
  SOCK_DGRAM = 2,        /* Connectionless, unreliable datagrams  
                   of fixed maximum length. */  
 #define SOCK_DGRAM SOCK_DGRAM  
  SOCK_RAW = 3,         /* Raw protocol interface. */  
 #define SOCK_RAW SOCK_RAW  
  SOCK_RDM = 4,         /* Reliably-delivered messages. */  
 #define SOCK_RDM SOCK_RDM  
  SOCK_SEQPACKET = 5,      /* Sequenced, reliable, connection-based,  
                   datagrams of fixed maximum length. */  
 #define SOCK_SEQPACKET SOCK_SEQPACKET  
  SOCK_DCCP = 6,        /* Datagram Congestion Control Protocol. */  
 #define SOCK_DCCP SOCK_DCCP  
  SOCK_PACKET = 10,       /* Linux specific way of getting packets  
                   at the dev level. For writing rarp and  
                   other similar things on the user level. */  
 #define SOCK_PACKET SOCK_PACKET  
   
  /* Flags to be ORed into the type parameter of socket and socketpair and  
    used for the flags parameter of paccept. */  
   
  SOCK_CLOEXEC = 02000000,   /* Atomically set close-on-exec flag for the  
                   new descriptor(s). */  
 #define SOCK_CLOEXEC SOCK_CLOEXEC  
  SOCK_NONBLOCK = 00004000   /* Atomically mark descriptor(s) as  
                   non-blocking. */  
 #define SOCK_NONBLOCK SOCK_NONBLOCK  
 };  
 ......  

protocol argument is usually zero, to select the default protocol for the given domain and socket type.
Default protocol for SOCK_STREAM socket in the AF_INET communication domain is TCP.
Defautl protocol for SOCK_DGRAM socket in the AF_INET communication domain is UDP.

Note: calling "socket" system call is similar from calling "open". You get a file descriptor that can be used for I/O. When we are done, we call "close" to relinquish the access of it.

Note: although socket descriptor is one file descriptor, but it doesn't mean we apply all system calls using file descriptors on socket descriptors, for example, lseek(socket doesn't have concept of offset).

3. Socket Type
1) SOCK_DGRAM Interface: no logical connection needs to exist between peers for them to communicate. A datagram provides a connection-less service.

A datagram is a self-contained message, which includes the counter-party address. Sending a datagram is analogous to mailing someone a letter. You can mail many letters, but you can't guarantee the order of delivery, and some might get lost along the way.

2) SOCK_STREAM Interface: requires that before exchanging data, you setup a logical connection between your socket and the socket belonging to the peer you want to communicate with.

Message contain no addressing information, as a point-to-point virtual connection exists between both ends, and the connection itself implies a particular source and destination.

With a SOCK_STREAM socket, applications are unaware of message boundaries. When we read data from a socket, it might not return the same number of bytes written by the process sending data. We will eventually get everything sent to us, but it might take several function calls.

3) SOCK_SEQPACKET Interface: it is same as SOCK_STREAM socket except that we get a message-based service instead of a byte-stream service. The amount of data received from a SOCK_SEQPACKET socket is the same amount as was written.

OS just "pack up" the entire message as in same packet and send to us.

4. Shut down socket
Communication on a socket is bidirectional. We can disable the I/O on a socket with the shutdown function.

 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/socket.h  
 ......  
 /* Shut down all or part of the connection open on socket FD.  
   HOW determines what to shut down:  
    SHUT_RD  = No more receptions;  
    SHUT_WR  = No more transmissions;  
    SHUT_RDWR = No more receptions or transmissions.  
   Returns 0 on success, -1 for errors. */  
 extern int shutdown (int __fd, int __how) __THROW;  
 ......  

close is similar from shutdown, but it deallocate the entire socket(only if the socket descriptor is not duplicated with dup, if that is the case, the socket will not be deallocated until the last socket descriptor is closed).

Unix Prog: Client Server Model

1. Model A:


In this model, parent process is the client process. Client process "fork" one child process, acting as the server, processing the request and pass back to client process(parent).

1) The server can be a set-user-ID program, giving it special privileges.
2) The server can determine the real identity of the client by looking at the real user id
3) With the client identity we can do additional permission checking on server side, to determine whether server should give the file back to client based on its user id
4) Problem: child can pass back the file content back to parent, this works for regular file, but not working for device files. Also, child process can not pass the file descriptor back to the parent.

2. Model B:

In this model, parent has one well-know FIFO, which receives requests from clients. And for each client, there is one specific FIFO to receive the reply from server.

3. Model C:
Same as above, just replace FIFOs with message queues.

In that case, client send their requests through well known message queue, also, each client create their own message queue with key "IPC_PRIVATE". Message sent to server must contain the client specific message queue id. In the future(server already know the message queue id of specific clients), all requests and replies can go though client specific message queues.

Problems with message queues:
1) each client specific message usually just have one message on it: a request from client or a response from the server. And OS has upper limit of how many IPC structures can exist system wide
2) If server need to read from multiple message queues, there is no I/O multiplexing like select/poll working with message queues. But if using FIFO(the file in file system), we can use select/poll to read from multiple FIFOs.

4. Model D:

Message Queues:
1) A single queue can be used between the server and all the clients, using the type field of each message to indicate the message recipient. If the type is 1, we can assume the message is for server, if the type is child process id, we can let child process to pick up their own messages.

For Model B, C, D:
When using the message queue, it is very difficult for server to identify the client(know the client's effective user id). Based on msg_lspid of queue, server can know the client process's process id, but there is no way for server to identify the effective user id of client to do the permission checking.

When using FIFOs, the server calls either stat or fstat on the client-specific FIFO, to get the owner id of the FIFO file. Here we assume the client's effective user id is the owner id of the FIFO file.

When using other IPC Structure(message queue, semaphore, shared memory), we can let client create their own IPC structure, then server use ipc_perm structure associated with each IPC structure to identify the creator(cuid and cgid fields)

Unix Prog: IPC Shared Memory(2)

1. Example of memory layout
shm.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
 #include<sys/shm.h>  
   
 #define ARRAY_SIZE 40000  
 #define MALLOC_SIZE 100000  
 #define SHM_SIZE 100000  
 #define SHM_MODE 0600 // user-read(S_IRUSR 0400), user-write(S_IWUSR 0200), BIT OR => 0600  
   
 char array[ARRAY_SIZE]; // Uninitialized data = bss  
   
 int main(int argc, char* argv[])  
 {  
  int shmid;  
  char *ptr, *shmptr;  
   
  // Print out address range of uninitialied array and local variable  
  printf("array[] from %lx to %lx\n", (unsigned long)&array[0], (unsigned long)&array[ARRAY_SIZE]);  
  printf("stack around %lx\n", (unsigned long)&shmid);  
   
  // Malloc the memory from heap, and print out the address range  
  if((ptr = malloc(MALLOC_SIZE)) == NULL) {  
   printf("malloc error");  
   exit(1);  
  }  
  printf("malloced from %lx to %lx\n", (unsigned long)ptr, (unsigned long)ptr + MALLOC_SIZE);  
   
  // Create the shared memory segment  
  if((shmid = shmget(IPC_PRIVATE, SHM_SIZE, SHM_MODE)) < 0) {  
   printf("shmget error!\n");  
   exit(2);  
  }  
   
  // Attach the local memory to shared memory  
  if((shmptr = shmat(shmid, 0, 0)) == (void*)-1) {  
   printf("shmat error!\n");  
   exit(3);  
  }  
  printf("shared memory attached from %lx to %lx\n", (unsigned long)shmptr, (unsigned long)shmptr+SHM_SIZE);  
   
  // Remove the shared memory segment  
  if(shmctl(shmid, IPC_RMID, 0) < 0) {  
   printf("shmctl error!\n");  
   exit(4);  
  }  
   
  exit(0);  
 }  

shell:
From the following input, we could see the places of different memory in process of current os.
Based on output:
uninitialized data(array) is in lowest address range, heap memory range is right after, and then stack memory, and finally shared memory is in highest memory range.
 ubuntu@ip-172-31-23-227:~$ ./shm.out  
 array[] from 6010a0 to 60ace0  
 stack around 7fffb5fc6b5c  
 malloced from 19b5010 to 19cd6b0  
 shared memory attached from 7f07cdc82000 to 7f07cdc9a6a0  

2. Example of Memory Mapping:
Besides using the shared memory segment, we can also use the mmap to simplify the procedure. mmap can used to share the memory region among multiple processes if they share the common ancestor and MAP_SHARED flag is specified.

mmap.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<fcntl.h>  
 #include<sys/mman.h>  
   
 #define NLOOPS 1000  
 #define SIZE sizeof(long)  
 int pfd1[2], pfd2[2];  
   
 void TELL_WAIT(void)  
 {  
  // Create two pipes here, one for parent -> child  
  // one for child -> parent  
  if(pipe(pfd1) < 0 || pipe(pfd2) < 0) {  
   printf("pipe error!\n");  
   exit(1);  
  }  
 }  
   
 void TELL_PARENT(pid_t pid)  
 {  
  if(write(pfd2[1], "c", 1) != 1) {  
   printf("write error !\n");  
   exit(2);  
  }  
 }  
   
 void WAIT_PARENT(void)  
 {  
  char c;  
   
  if(read(pfd1[0], &c, 1) != 1) {  
   printf("read error!\n");  
   exit(3);  
  }  
   
  if(c != 'p') {  
   printf("wait parent: incorrect data!\n");  
   exit(4);  
  }  
 }  
   
 void TELL_CHILD(pid_t pid)  
 {  
  if(write(pfd1[1], "p", 1) != 1) {  
   printf("write error!\n");  
   exit(2);  
  }  
 }  
   
 void WAIT_CHILD(void)  
 {  
  char c;  
   
  if(read(pfd2[0], &c, 1) != 1) {  
   printf("read error!\n");  
   exit(3);  
  }  
   
  if(c != 'c') {  
   printf("wait child: incorrect data!\n");  
   exit(4);  
  }  
 }  
   
 int update(long* ptr)  
 {  
  return((*ptr)++); // return the value before increment  
 }  
   
 int main(int argc, char* argv[])  
 {  
  int fd, i, counter;  
  pid_t pid;  
  void *area;  
   
  // Open the file descriptor  
  if((fd = open("/dev/zero", O_RDWR)) < 0) {  
   printf("open error!\n");  
   exit(1);  
  }  
   
  // Map the memory to the file descriptor  
  if((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)) == MAP_FAILED) {  
   printf("mmap error!\n");  
  }  
  close(fd);  
   
  TELL_WAIT();  
  if((pid = fork()) < 0) {  
   printf("fork error!\n");  
   exit(5);  
  }  
  else if(pid > 0) { // parent process  
   for(i = 0; i < NLOOPS; i+=2) {  
    if((counter = update((long*)area))!= i) {  
     printf("parent: expected %d, got %d.\n", i, counter);  
     exit(6);  
    }  
   
    TELL_CHILD(pid);  
    WAIT_CHILD();  
   }  
  } else { // child process  
   for(i = 1; i < NLOOPS + 1; i+=2) {  
    WAIT_PARENT();  
    if((counter = update((long*)area) != i)) {  
     printf("child: expected %d, got %d\n", i, counter);  
     exit(6);  
    }  
   
    TELL_PARENT(getppid());  
   }  
  }  
   
  exit(0);  
 }  

shell:
Nothing is outputted.
So the parent and child process operate the mapped memory region pointed to by "area", and value of memory is incremented as expected while both processes touch it one by one.
Actually area in parent and child are different memory region, but they are mapped by using mmap to map it to the external /dev/zero file, which indicates that the external map could be empty.

Note: if we specify the MAP_PRIVATE when doing the mmap, it would not work. since this flag will make the memory to map to a private copy of file instead of shared one, which make another process can't see the change in memory since they are different and not mapped.

Note:
Modern unix system support anonymous memory mapping, which is almost same as above example, the only difference is:
We don't use /dev/zero to do the mapping, we let OS to select one anonymous region for shared mapping, we can change mmap statement to:

if((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0)) == MAP_FAILED)

We add one more flag: MAP_ANON to indicate the anonymous mapping, and fd as -1, since we don't need to specify the file path.

And the result is, OS will select one anonymous region for mapping.

Saturday, November 15, 2014

Unix Prog: IPC Shared Memory(1)

1. Shared Memory Overview
Shared memory allows two or more processes to share a given region of memory. This is fastest form of IPC, because the data doesn't need to be copied between the client and the server.

But accessing to a given shared region among multiple processes need synchronization. Often, semaphores are used to synchronize shared memory access.

The kernel maintains a structure with at least the following members for each shared memory segment:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/shm.h  
 ......  
 /* Data structure describing a shared memory segment. */  
 struct shmid_ds  
  {  
   struct ipc_perm shm_perm;      /* operation permission struct */  
   size_t shm_segsz;          /* size of segment in bytes */  
   __time_t shm_atime;         /* time of last shmat() */  
 #ifndef __x86_64__  
   unsigned long int __glibc_reserved1;  
 #endif  
   __time_t shm_dtime;         /* time of last shmdt() */  
 #ifndef __x86_64__  
   unsigned long int __glibc_reserved2;  
 #endif  
   __time_t shm_ctime;         /* time of last change by shmctl() */  
 #ifndef __x86_64__  
   unsigned long int __glibc_reserved3;  
 #endif  
   __pid_t shm_cpid;          /* pid of creator */  
   __pid_t shm_lpid;          /* pid of last shmop */  
   shmatt_t shm_nattch;        /* number of current attaches */  
   __syscall_ulong_t __glibc_reserved4;  
   __syscall_ulong_t __glibc_reserved5;  
  };  
 ......  

2. System Calls:
1) shmget
shmget is used to create a new shared memory or obtain an existing shared memory.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/shm.h  
 ......  
 /* Get shared memory segment. */  
 extern int shmget (key_t __key, size_t __size, int __shmflg) __THROW;  
 ......  

When a new segment is created, the following members of the shmid_ds structure are initialized:
ipc_perm structure is initialized. shm_lpid, shm_nattach, shm_atime, and shm_dtime are all set to 0. shm_ctime is set to the current time. shm_segsz is set to the size requested.

The size parameter is the size of the shared memory segment in bytes. Implementation will usually round up the size to a multiple of the system's page size.But if size is not the multiple of page size, the remainder of the last page will be unavailable. If creating the new shared memory segment, we must specify the size. If we are referencing the existing segment, we can specify the size to 0.

2) shmctl
shmctl is the catchall for various shared memory operations.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/shm.h  
 ......  
 /* Shared memory control operation. */  
 extern int shmctl (int __shmid, int __cmd, struct shmid_ds *__buf) __THROW;  
 ......  

Unix supports following commands:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/ipc.h  
 ......  
 /* Control commands for `msgctl', `semctl', and `shmctl'. */  
 #define IPC_RMID    0        /* Remove identifier. */  
 #define IPC_SET     1        /* Set `ipc_perm' options. */  
 #define IPC_STAT    2        /* Get `ipc_perm' options. */  
 ......  
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/shm.h  
 ......  
 /* Commands for `shmctl'. */  
 #define SHM_LOCK    11       /* lock segment (root only) */  
 #define SHM_UNLOCK   12       /* unlock segment (root only) */  
 ......  

IPC_STAT: Fetch the shmid_ds structure for this segment, storing it in the structure pointed to by buf.
IPC_SET: Set the following three fields from the structure pointed to by buf in the shmid_ds structure associated with this shared memory segment: shm_perm.uid, shm_perm.gid, and shm_perm.mode. It can only be executed by a process whose effective user ID equals shm_perm.cuid or shm_perm.uid or by a process with superuser privileges.
IPC_RMID: Remove the shared memory segment set from the system. The segment is not removed until the last process using the segment terminates or detaches it. It can only be executed by a process whose effective user ID equals shm_perm.cuid or shm_perm.uid or by a process with superuser privileges.
SHM_LOCK: Lock the shared memory segment in memory. only can be executed by super user.
SHM_UNLOCK: Unlock the shared memory segment. only can be executed by super user

3) shmat:
shmat is used to help process attach the shared memory segment to its own address space.

 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/shm.h  
 ......  
 /* Attach shared memory segment. */  
 extern void *shmat (int __shmid, const void *__shmaddr, int __shmflg)  
    __THROW;  
 ......  

a) If addr is 0, the segment is attached at the first available address selected by the kernel. This is the recommended technique.
b) If addr is nonzero and SHM_RND is not specified, the segment is attached at the address given by addr.
c) If addr is nonzero and SHM_RND is specified, the segment is attached at the address given by (addr - (addr % SHMLBA)). It round the address down to the next multiple of SHMLBA.

 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/shm.h  
 ......  
 /* Flags for `shmat'. */  
 #define SHM_RDONLY   010000     /* attach read-only else read-write */  
 #define SHM_RND     020000     /* round attach address to SHMLBA */  
 #define SHM_REMAP    040000     /* take-over region on attach */  
 #define SHM_EXEC    0100000     /* execution access */  
 ......  
 /* Segment low boundary address multiple. */  
 #define SHMLBA     (__getpagesize ())  
 extern int __getpagesize (void) __THROW __attribute__ ((__const__));  
 ......  

d) if SHM_RDONLY bit is set, the segment is attached read-only, otherwise, the segment is attached read-write.
e) shmat return the address at which the segment is attached, or -1 if an error occured. If shmat succeeds, the kernel will increment the shm_nattach counter in the shmid_ds structure associated with the shared memory segment.

4) shmdt:
shmdt is used to detach the memory segment from the local address.
Note: this doesn't remove the shared memory segment from the system. It still exists until some process specifically removes it by calling shmctl with a command of IPC_RMID.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/shm.h  
 ......  
 /usr/include/x86_64-linux-gnu/sys/shm.h:extern int shmdt (const void *__shmaddr) __THROW;  
 ......  

The addr argument is the value that was returned by a previous call to shmat. If successful, shmdt will decrement the shm_nattch counter in the associated shmid_ds structure.

Unix Prog: IPC Semaphores

1. Semaphore Overview
A semaphore is a counter used to provide access to a shared data object for multiple processes.

To obtain a shared resource, a process needs to:
1) Test the semaphore that controls the resource
2) If the value of the semaphore is positive, the process can use the resource. In this case, the process decrements the semaphore value by 1.
3) If the value of the semaphore is 0, the process goes to sleep until the semaphore value is greater than 0

To implement semaphores correctly, the test of a semaphore's value and the decrementing of this value must be an atomic operation.. So, semaphores are normally implemented inside the kernel

2. XSI Semaphore

XSI Semaphore is more complicated.
1) A semaphore is not simply a single non-negative value. Instead, we have to define a semaphore as a set of one or more semaphore values.
2) The creation of a semaphore(semget) is independent of its initialization(semctl)
3) All forms of XSI IPC remain in existence even when no process is using them, we have to worry about a program that terminates without releasing the semaphores it has been allocated.

The kernel maintains a semid_ds structure for each semaphore set:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/sem.h  
 ......  
 /* Data structure describing a set of semaphores. */  
 struct semid_ds  
 {  
  struct ipc_perm sem_perm;       /* operation permission struct */  
  __time_t sem_otime;          /* last semop() time */  
  __syscall_ulong_t __glibc_reserved1;  
  __time_t sem_ctime;          /* last time changed by semctl() */  
  __syscall_ulong_t __glibc_reserved2;  
  __syscall_ulong_t sem_nsems;     /* number of semaphores in set */  
  __syscall_ulong_t __glibc_reserved3;  
  __syscall_ulong_t __glibc_reserved4;  
 };  
 ......  

For each semaphore in semaphore set:
semval: current semaphore value, always >= 0
sempid: pid for last operation
semncnt: number of processes awaiting semval > curval
semzcnt: number of processes awaiting semval == 0

3. semaphore system calls
1) semget
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/sem.h  
 ......  
 /* Get semaphore. */  
 extern int semget (key_t __key, int __nsems, int __semflg) __THROW;  
 ......  

semget can be used to create a new semaphore or obtain the semaphore id of existing one.

When a new semaphore is created, following members of semid_ds structure are initialized:
ipc_perm structure is initialized; set_otime is set to 0, sem_ctime is set to the current time, sem_nsems is set to nsems.

nsems represents the number of semaphores in set. If creating new semaphore set, we must specify nsems. If referring to existing semaphore set, nsems could be 0.

2) semctl
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/sem.h  
 ......  
 /* Semaphore control operation. */  
 extern int semctl (int __semid, int __semnum, int __cmd, ...) __THROW;  
 ......  
semctl function is the catch all for various semaphore operations. Fourth argument is optional, depending on the command requested, and if present, is of type semun.

 ubuntu@ip-172-31-23-227:~$ less /usr/include/linux/sem.h  
 ......  
 /* arg for semctl system calls. */  
 union semun {  
     int val;            /* value for SETVAL */  
     struct semid_ds *buf;  /* buffer for IPC_STAT & IPC_SET */  
     unsigned short *array; /* array for GETALL & SETALL */  
     struct seminfo *__buf; /* buffer for IPC_INFO */  
     void *__pad;  
 };  
   
 struct seminfo {  
     int semmap;  
     int semmni;  
     int semmns;  
     int semmnu;  
     int semmsl;  
     int semopm;  
     int semume;  
     int semusz;  
     int semvmx;  
     int semaem;  
 };  
 ......  

All commands supported:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/ipc.h  
 ......  
 /* Control commands for `msgctl', `semctl', and `shmctl'. */  
 #define IPC_RMID    0        /* Remove identifier. */  
 #define IPC_SET     1        /* Set `ipc_perm' options. */  
 #define IPC_STAT    2        /* Get `ipc_perm' options. */  
 ......  
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/sem.h  
 ......  
 /* Commands for `semctl'. */  
 #define GETPID     11       /* get sempid */  
 #define GETVAL     12       /* get semval */  
 #define GETALL     13       /* get all semval's */  
 #define GETNCNT     14       /* get semncnt */  
 #define GETZCNT     15       /* get semzcnt */  
 #define SETVAL     16       /* set semval */  
 #define SETALL     17       /* set all semval's */  
 ......  

IPC_STAT: Fetch the semid_ds structure for this set, storing it in the structure pointed to by semun->buf
IPC_SET: Set the sem_perm.uid, sem_perm.gid and sem_perm.mode fields from the structure pointed to by semun->buf in the semid_ds structure associated with this set.
IPC_RMID: Remove the semaphore set from the system. This removal is immediate.
GETVAL: Return the value of semval for the member semnum.
SETVAL: Set the value of semval for the member semnum.
GETPID: Return the value of sempid for the member semnum.
GETNCNT: Return the value of semncnt for the member semnum.
GETZCNT: Return the value of semzcnt for the member semnum
GETALL: Fetch all the semaphore values in the set. These values are stored in the array pointed to by semun-> array
SETALL: Set all the semaphore values in the set to the values pointed to by semun->array

3) semop:
semop atomically performs an array of operations on a semaphore set.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/sem.h  
 ......  
 /* Operate on semaphore. */  
 extern int semop (int __semid, struct sembuf *__sops, size_t __nsops) __THROW;  
 ......  

One semaphore operation can be represented by "struct sembuf"
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/sem.h  
 ......  
 /* Structure used for argument to `semop' to describe operations. */  
 struct sembuf  
 {  
  unsigned short int sem_num;  /* semaphore number */  
  short int sem_op;       /* semaphore operation */  
  short int sem_flg;      /* operation flag */  
 };  
 ......  

sem_flg could be:
SEM_UNDO, IPC_NOWAIT

nsops represents the number "struct sembuf"

sem_op can be negative, 0, or positive.
a) If sem_op is positive: The value of sem_op is added to the specified semaphore's value. If undo flag is specified, sem_op is also subtracted from the semaphore's adjustment value for this process.
b) If sem_op is negative: we want to obtain resources that the semaphore controls. If the semaphore's value is greater than or equal to the absolute value of sem_op(the resources are available), the absolute value of sem_op is subtracted from the semaphore's value. If the undo flag is specfied, the absolute value of sem_op is also added to the semaphore's adjustment value for this process.

If the semaphore's value is less than the absolute value of sem_op(resources are not available), the following conditions apply:

If IPC_NOWAIT is specfied, semop returns with an error of EAGAIN

If IPC_NOWAIT is not specified,the semncnt value for this semaphore is incremented(one more process is going to sleep), and the calling process is suspended until:
the semaphore's value becomes greater than or equal to the absolute value of sem_op, or, the semaphore is removed from the system, or, a signal is caught by the process, and the signal handler returns, the value of semncnt for this semaphore is decremented(since the process is waken up), function returns with EINTR.

c) If sem_op is 0, this means that the calling process wants to wait until the semaphore's value becomes 0

If the semaphore's value is currently 0, the function returns immediately

If the semaphore's value is nonzero, the following conditions apply:
If IPC_NOWAIT is specified, return is made with an error of EAGAIN
If IPC_NOWAIT is not specified, the semzcnt value for this semaphore is incremented(one more process waiting for 0 semaphore value is going to sleep), and the calling process is suspended until:

the semaphore's value becomes 0, or the semaphore is removed from the system, or a signal is caught by the process and the signal handler returns.

Note: semop function operates atomically.

4. Semaphore Adjustment on exit:
Whenever we make semaphore option, we can specify SEM_UNDO flag to let semaphore "memorize" the adjustment value. So when the process terminates, the kernel checks whether the process has any adjustment value for the semaphore, if so, applies the adjustment to the corresponding semaphore.

For example, we have one semaphore value 1; then a process get the resource and decrease the semaphore value to 1 (sem_op is -1 in this case) while specifying the SEM_UNDO flag, then kernel memorizes the adjustment value of this process for this semaphore to be "+1". Then when process terminates, the adjustment will be applied to this semaphore making value from 0 to 1 again, allowing other processes to use the resource again.

If user use any SETVAL/SETALL command to manually setup the semaphore value, adjustment of this semaphore for all processes will become 0.

5. Time Comparison between record locking and semaphore

We use semaphore to control whether to allocate and release the resource.
We create the file using first file byte to control  whether to allocate and release the resource(lock the first byte or release the first byte)

Based on testing results:
semaphore is apparently faster than record locking, but if locking a single resource, and don't need all the fancy features of XSI semaphores, record locking is preferred.

Unix Prog: IPC Message Queue

1. Message Queue Overview
A message queue is a linked list of messages stored within the kernel and identified by a message queue identifier. 

Using msgget can create a new queue or refer to one existing queue.
Using msgsnd can add new messages to the end of a queue.
Every message has a positive long integer type field, a non-negative length, and the actual data bytes.
Using msgrcv can fetch a message from the a queue, and we don't have to fetch the message in a first-in, first-out order, instead, we can fetch messages based on their type field.

Each message queue has the following msqid_ds structure associated with it:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/msq.h  
 ......  
 struct msqid_ds  
 {  
  struct ipc_perm msg_perm;   /* structure describing operation permission */  
  __time_t msg_stime;      /* time of last msgsnd command */  
 #ifndef __x86_64__  
  unsigned long int __glibc_reserved1;  
 #endif  
  __time_t msg_rtime;      /* time of last msgrcv command */  
 #ifndef __x86_64__  
  unsigned long int __glibc_reserved2;  
 #endif  
  __time_t msg_ctime;      /* time of last change */  
 #ifndef __x86_64__  
  unsigned long int __glibc_reserved3;  
 #endif  
  __syscall_ulong_t __msg_cbytes; /* current number of bytes on queue */  
  msgqnum_t msg_qnum;      /* number of messages currently on queue */  
  msglen_t msg_qbytes;     /* max number of bytes allowed on queue */  
  __pid_t msg_lspid;      /* pid of last msgsnd() */  
  __pid_t msg_lrpid;      /* pid of last msgrcv() */  
  __syscall_ulong_t __glibc_reserved4;  
  __syscall_ulong_t __glibc_reserved5;  
 };  
 ......  
Above structure defined the current status of the queue

2. Message Queue System Calls
1) msgget
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/msg.h  
 ......  
 /* Get messages queue. */  
 extern int msgget (key_t __key, int __msgflg) __THROW;  
 ......  

msgget can be used to create a new queue or refer to one existing queue.
When a new queue is created, following members of the msqid_ds structure are initialized:
ipc_perm structure is initialized. msg_qnum, msg_lspid, msg_lrpid, msg_stime, and msg_rtime are all set to 0, msg_ctime is set to the current time, msg_qbytes is set to the system limit.

On success, msgget returns the non-negative queue ID.

2) msgctl

 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/msg.h  
 ......  
 /* Message queue control operation. */  
 extern int msgctl (int __msqid, int __cmd, struct msqid_ds *__buf) __THROW;  
 ......  

msgctl function performs various operations on a queue.
The cmd argument specifies the command to be performed:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/ipc.h  
 ......  
 /* Control commands for `msgctl', `semctl', and `shmctl'. */  
 #define IPC_RMID    0        /* Remove identifier. */  
 #define IPC_SET     1        /* Set `ipc_perm' options. */  
 #define IPC_STAT    2        /* Get `ipc_perm' options. */  
 #ifdef __USE_GNU  
 # define IPC_INFO    3        /* See ipcs. */  
 #endif  
 ......  

IPC_STAT: Fetch the msgid_ds structure for this queue, storing it in the structure pointed to by buf.
IPC_SET: Copy the following fields from the structure pointed to by buf to the msqid_ds structure associated with this queue: msg_perm.uid, msg_perm.gid, msg_perm.mode, and msg_qbytes. This command can only be executed by a process whose effective user id equals msg_perm.cuid or msg_perm.uid or by a process with superuser privilege.
IPC_RMID: rmeove the message queue from the system and any data still on the queue.The removal is immediate. And it can only be executed by a process whose effective user id equals msg_perm.cuid or msg_perm.uid or by a process with super user privilege.

3) msgsnd
msgsnd can be used to place a message into the message queue. And messages are always placed in the end of the queue.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/msg.h  
 ......  
 /* Send message to message queue.  
   
   This function is a cancellation point and therefore not marked with  
   __THROW. */  
 extern int msgsnd (int __msqid, const void *__msgp, size_t __msgsz,  
           int __msgflg);  
 ......  

msgp argument points to the long integer that contains the positive integer message type, and it is immediately followed by the message data.
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/msg.h  
 ......  
 /* Template for struct to be used as argument for `msgsnd' and `msgrcv'. */  
 struct msgbuf  
  {  
   __syscall_slong_t mtype;  /* type of received/sent message */  
   char mtext[1];       /* text of the message */  
  };  
 ......  
msgp points to a msgbuf structure. The message type can be used by the receiver to fetch messages in an order other than first in , first out.

Note: length of mtext should be same as third argument msgsz in real world.

A flag value of IPC_NOWAIT can be specified. If the message queue is full, specifying the IPC_NOWAIT causes msgsnd to return immediately with an error of EAGAIN. If IPC_NOWAIT is not specified, we are blocked until there is room for the message, the queue is removed from the system, or a signal is caught and signal handler returns.

After msgsnd returns successfully, the msqid_ds structure associated with the message queue is updated to indicate the process ID that made the call

4) msgrcv
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/msg.h  
 ......  
 /* Receive message from message queue.  
   
   This function is a cancellation point and therefore not marked with  
   __THROW. */  
 extern ssize_t msgrcv (int __msqid, void *__msgp, size_t __msgsz,  
             long int __msgtyp, int __msgflg);  
 ......  

As with msgsnd, the ptr argument points to a long integer(where the message type of the returned message is stored) followed by a data buffer for the actual message data. msgp can point to the struct msgbuf as indicated above.

If the returned message buffer is larger than msgsz, and the MSG_NOERROR bit in flag is set, and the message is truncated, and message is totally removed from the queue. If MSG_NOERROR bit is not set in flag, an error of E2BIG is returned, and message still stays on the queue.

For fourth argument, if msgtyp == 0, the first message on the queue is returned. If msgtyp > 0, the first message on the queue whose message type equals type is returned. If msgtyp < 0, the first message on the queue whose message type is the lowest value less than or equal to the absolute value of type is returned.

In client server model, message from clients normally has the type which is the process id of client. And server can based on process id as type to retrieve message from specific client.

If IPC_NOWAIT is specified, msgrc will return -1 with errno set to ENOMSG if a message of the specified type is not available. If IPC_NOWAIT is not specified, the operation blocks until a message of the specified type is available, the queue is removed from the ssytem, or a signal is caught and the signal handler returns.

Unix Prog: XSI IPC Overview

1. XSI IPC
Three types of XSI IPC: message queue, semaphores, and shared memory.
All three objects share some similarities as following.

2. Identifiers and Keys
Each IPC structure(message queue, semaphores, and shared memory) is identified in the kernel by a non-negative integer. To send or fetch a message to or from a message queue, we need to know the identifier for the message queu.

Each IPC structure is associated with a key that acts as an external name, compared to identifier, which is internal name. Whenever an IPC structure is being created(msgget, semget, or shmget), a key must be specified.

Several ways for clients and server to rendezvous at the same IPC structure:
1) The server can create a new IPC structure by specifying a key of IPC_PRIVATE and store the returned identifier somewhere, like a file, for the client to obtain
2) The server create a new IPC structure using IPC_PRIVATE Key, then the resulting identifier is available to the child after the fork.
3) The client and the server can agree on a key by defining the key in a common header. The server then creates a new IPC structure specifying this key. If the key is already associated with one existing IPC structure, the server need to delete that structure and re-create it.
4) The client and server can agree on a pathname and project id(between 0 and 255), and call the function ftok to convert these two values into a key.

System Definition:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/ipc.h  
 ......  
 /* Generates key for System V style IPC. */  
 extern key_t ftok (const char *__pathname, int __proj_id) __THROW;  
 ......  

3. Create and Get the IPC Structure
For 3 get functions: msgget, semget, and shmget, all need to specify a key and an integer flag.

Create the IPC Structure with key IPC_PRIVATE:
1) After server create the IPC structure with key IPC_PRIVATE, then it is not possible to be conflict with any existing structure, since it would create a new structure.
2) When client want to refer to this structure, for example, sending a message to the message queue, the client must know the identifier of this structure.

Create the IPC Structure with a key who is not currently associated with an IPC structure of the particular type and the IPC_CREAT bit of flag is specified.
1) In this case, when client want to refer to this IPC structure, it need to specify the same key type without specifying IPC_CREAT bit flag.

Create the IPC Structure with both IPC_CREAT and IPC_EXCL bits set:
If the specified key refer to some existing IPC structure, , it will return an error of EEXIST.

4. Permission Structure
XSI IPC associates ipc_perm structure with each IPC structure:
System Definition:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/bits/ipc.h  
 ......  
 /* Data structure used to pass permission information to IPC operations. */  
 struct ipc_perm  
  {  
   __key_t __key;           /* Key. */  
   __uid_t uid;            /* Owner's user ID. */  
   __gid_t gid;            /* Owner's group ID. */  
   __uid_t cuid;            /* Creator's user ID. */  
   __gid_t cgid;            /* Creator's group ID. */  
   unsigned short int mode;      /* Read/write permission. */  
   unsigned short int __pad1;  
   unsigned short int __seq;      /* Sequence number. */  
   unsigned short int __pad2;  
   __syscall_ulong_t __glibc_reserved1;  
   __syscall_ulong_t __glibc_reserved2;  
  };  
 ......  

All above fields are initialized when IPC structure is created, at a later time, we can modify the uid, gid, and mode fields by calling msgctl, semctl, or shmctl.  And the caller must be either the creator of the IPC structure or the super user.

mode member represents the permission bits. In IPC permission structure, only 6 bits are specified:
1) user read 2) user write 3) group read 4) group write 5) other read 6) other write

5. Advantages and Disadvantages:
1) IPC structure is system wide and do not have one reference count. If we create a message queue, place some messages on the queue, and then terminate, the message queue and its contents are not deleted.
For pipe, when the last process to reference it terminates, the data in there is completely removed.
For FIFO, although the file stays in the file system. But any data left in a FIFO is removed when the last process to reference FIFO terminates.

2) IPC structure is not known by names in the file system, we can't access them and modify their properties with the standard I/O functions

3) These forms of IPC don't use file descriptors, we can't use the multiplexed I/O functions(select and poll) with them.

Unix Prog: IPC: FIFO

1. FIFO File
Pipes can be used only between related processes when a common ancestor has created the pipe.
With FIFO, unrelated processes can exchange data.

FIFO is a type of file, st_mode member of the stat structure indicates that a file is FIFO. http://dreamchaser1986unixdev.blogspot.com/2014/08/unix-prog-files-types_19.html

System Call used to create the FIFO file:
 ubuntu@ip-172-31-23-227:~$ less /usr/include/x86_64-linux-gnu/sys/stat.h  
 ......  
 /* Create a new FIFO named PATH, with permission bits MODE. */  
 extern int mkfifo (const char *__path, __mode_t __mode)  
    __THROW __nonnull ((1));  
 ......  

Once the FIFO file is created, we can open, read, write, close, unlink it like treating regular files.

If the O_NONBLOCK bit is not set in FIFO file descriptor:
1) An open for read will block until some other process open it for writing. And a write will block until some other process open it for reading.
Otherwise:
2) An open for read will return immediately. open for write-only returns -1 with errno set to ENXIO if no process has the FIFO for reading.

If there are multiple writers writing to the same FIFO, we need to make sure each write is atomic to avoid interleaving different writes. As long as the data amount get written doesn't exceed the PIPE_BUF, each write is atomic.

Usage of FIFO:
1) Used by shell commands to pass data from one shell pipelines to another without creating intermediate temp files
2) Used as rendezvous point in client-server applications to pass data between clients and server.

2. Example:
tr.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
   
 int main(int argc, char* argv[])  
 {  
  char line[20];  
  fgets(line, 20, stdin);  
  fputs(line, stdout);  
   
  exit(0);  
 }  

tr2.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
   
 int main(int argc, char* argv[])  
 {  
  char line[20];  
  fgets(line, 20, stdin);  
  fputs("tr2.out:", stdout);  
  fputs(line, stdout);  
   
  exit(0);  
 }  

shell:
mkfifo creates one FIFO file from the shell;
tr2.out reads from fifo , but right now no processes write to fifo, so tr2.out block there.
launch the tr.out to read from text file and then pipe to "tee", whose job is to pass text from standard input to standard output and the fifo file specified after "tee".

So the final output includes 2 lines:
first line is the text from tr.out, output to standard output
second line is the text from tr2.out, tr2.out get the text from fifo, which is then from tee.
 ubuntu@ip-172-31-23-227:~$ mkfifo fifo
 ubuntu@ip-172-31-23-227:~$ ./tr2.out < fifo &  
 [1] 1536  
 ubuntu@ip-172-31-23-227:~$ ./tr.out < text | tee fifo  
 Hello world!  
 tr2.out:Hello world!  

Following diagram can illustrate this:


2. Client Server Communication Example

In client-server communication model, server has one well-known FIFO, which is used to receive different requests sent from clients. If client specified the process id in the request, then after server process the request, server can build the client-specific FIFO to send reply back to client.

For example, if client's process id is 122, then server can create the FIFO called: client_122, then process 122 can use this FIFO specifically to receive the reply from server.

Problem:
If the client crashes, it will leave the client-specific FIFO in file system.
The server also need to catch the SIGPIPE, since it is possible that client send request and then terminate before receiving the reply.

Thursday, November 13, 2014

Unix Prog: IPC Coprocesses

1. Coprocesses:

A coprocess normally runs in the background from a shell, and its standard input and standard output are connected to another program using a pipe. Following diagram can illustrate this:

2. Coprocess Example
Here we implement a very simple coprocess add2:

add2.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<unistd.h>  
 #include<string.h>  
   
 #define MAXLINE 20  
   
 int main(void)  
 {  
  int n, int1, int2;  
  char line[MAXLINE];  
   
  while((n = read(STDIN_FILENO, line, MAXLINE)) > 0) {  
   line[n] = 0;  
   if(sscanf(line, "%d%d", &int1, &int2) == 2) {  
    memset(line, 0, MAXLINE);  
    sprintf(line, "%d\n", int1 + int2);  
    n = strlen(line);  
    if(write(STDOUT_FILENO, line, n) != n) {  
     printf("write error!\n");  
     exit(1);  
    }  
   } else {  
    if(write(STDOUT_FILENO, "invalid args\n", 13) != 13) {  
     printf("write error!\n");  
     exit(2);  
    }  
   }  
  }  
   
  exit(0);  
 }  

main.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<signal.h>  
 #include<unistd.h>  
 #include<string.h>  
   
 #define MAXLINE 20  
   
 void sig_pipe(int signo)  
 {  
  printf("SIGPIPE caught\n");  
  exit(1);  
 }  
   
 int main(int argc, char* argv[])  
 {  
  int n, fd1[2], fd2[2];  
  pid_t pid;  
  char line[MAXLINE];  
   
  // Setup the signal for SIGPIPE  
  if(signal(SIGPIPE, sig_pipe) == SIG_ERR) {  
   printf("signal error!\n");  
   exit(1);  
  }  
   
  // Setup the pipe file descriptors  
  if(pipe(fd1) < 0 || pipe(fd2) < 0) {  
   printf("pipe error!\n");  
   exit(2);  
  }  
   
  // Create the child process  
  if((pid = fork()) < 0) {  
   printf("fork error!\n");  
   exit(3);  
  } else if (pid > 0) { // Parent process  
   close(fd1[0]);  
   close(fd2[1]);  
   
   while(fgets(line, MAXLINE, stdin) != NULL) {  
    n = strlen(line);  
    if(write(fd1[1], line, n) != n) {  
     printf("write error to pipe\n");  
     exit(4);  
    }  
    if((n = read(fd2[0], line, MAXLINE)) < 0) {  
     printf("read error from pipe\n");  
     exit(5);  
    }  
    if(n == 0) {  
     printf("child closed pipe");  
     break;  
    }  
   
    // output the result  
    line[n] == 0;  
    if(fputs(line, stdout) == EOF) {  
     printf("fputs error!\n");  
     exit(6);  
    }  
   }  
   
   if(ferror(stdin)) {  
    printf("fgets error on stdin");  
    exit(6);  
   }  
   
   exit(0);  
  } else { // child process  
   close(fd1[1]);  
   close(fd2[0]);  
   if(fd1[0] != STDIN_FILENO) {  
    if(dup2(fd1[0], STDIN_FILENO) != STDIN_FILENO) {  
     printf("dup2 error to stdin\n");  
     exit(1);  
    }  
    close(fd1[0]);  
   }  
   
   if(fd2[1] != STDOUT_FILENO) {  
    if(dup2(fd2[1], STDOUT_FILENO) != STDOUT_FILENO) {  
     printf("dup2 error to stdout\n");  
     exit(1);  
    }  
    close(fd2[1]);  
   }  
   
   if(execl("./add2.out", "add2.out", (char*)0) < 0) {  
    printf("execl error!\n");  
    exit(2);  
   }  
  }  
   
  exit(0);  
 }  

shell:
The process just establish two pipes with child process, who execute the add2.out. Parent process input string to child process, child process read the input and add numbers and put result into output, which then go back to the parent process.

Note: we can't use standard I/O at child process, since if detecting that child process's output is a pipe, child process will make I/O full buffer, and then we can not see the output immediately.
But we can choose to setup the buffer option of standard I/O with setvbuf
 ubuntu@ip-172-31-23-227:~$ ./main.out  
 90 100  
 190  

Wednesday, November 12, 2014

Unix Prog: popen, pclose(3)

1. Filter Program

We'll build two process piped up together in following way:

2. Source code:
filter.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<ctype.h>  
   
 int main(int argc, char *argv[])  
 {  
  int c;  
   
  while((c = getchar()) != EOF) {  
   if(isupper(c)) c = tolower(c);  
   if(putchar(c) == EOF) {  
    printf("output error!\n");  
    exit(1);  
   }  
   if(c == '\n')  
    fflush(stdout);  
  }  
   
  exit(0);  
 }  

main.c:
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<sys/wait.h>  
   
 int main(int argc, char* argv[])  
 {  
  char line[20];  
  FILE * fpin;  
   
  // Open the filter.out, fpin is used to read input from filter.out  
  if((fpin = popen("./filter.out", "r")) == NULL) {  
   printf("popen error!\n");  
   exit(1);  
  }  
   
  // Make a loop to read from the filter.out and output to standard output  
  for(;;) {  
   fputs("prompt>", stdout);  
   fflush(stdout);  
   if(fgets(line, 20, fpin) == NULL) break;  
   if(fputs(line, stdout) == EOF) {  
    printf("fputs error to pipe\n");  
    exit(2);  
   }  
  }  
   
  // close the stream  
  if(pclose(fpin) == -1) {  
   printf("pclose error!\n");  
   exit(3);  
  }  
   
  putchar('\n');  
  exit(0);  
 }  

shell:
After running the program, user provide input at terminal. And filter read its input, convert the string into lowercase and then transfer to main.out.
If user type Ctrl D, main.out will read the NULL from filter.out and program exits.
 ubuntu@ip-172-31-23-227:~$ ./main.out  
 prompt>HELLO WORLD!  
 hello world!  
 prompt>