Rserve

About Rserve
Documentation
Examples
FAQ
News
Download/Files

GIT access
Technical Info
Check results
Package R docs

Development of the Rserve

Any interested developers are welcome to improve Rserve since it is released under GPL. You can download the sources from the download section. As you can easily see from the directory structure I hold the sources in a CVS repository, therefore anyone who wants to contribute to the development should send me an e-mail (Simon.Urbanek@r-project.org) in order to obtain access to the CVS.

Technical Documentation for Rserve

This document describes the protocols and structures used by Rserve (version 0.1-9). This information is helpful for implementing Rserve clients.

Rserve communication is performed over any reliable connection-oriented protocol (usually TCP/IP; Rserve 0.1-9 supports TCP/IP and local unix sockets). After connection is established, the server sends 32 bytes representing the ID-string defining the capabilities of the server. Each attribute of the ID-string is 4 bytes long and is meant to be user- readable (i.e. use no special characters), and it's a good idea to make "\r\n\r\n" the last attribute.

the ID string must be of the form:

   [0] "Rsrv" - R-server ID signature
   [4] "0100" - version of the R server protocol
   [8] "QAP1" - protocol used for communication (here Quad Attributes Packets v1)
   [12] any additional attributes follow. \r\n and '-' are ignored.

optional attributes (in any order; it is legitimate to put dummy attributes, like "----" or " " between attributes):

   "R151"  - version of R (here 1.5.1)
   "ARpt"  - authorization required (here "pt"=plain text, "uc"=unix crypt)
             connection will be closed
             if the first packet is not CMD_login.
	     if more AR.. methods are specified, then client is free to
	     use the one he supports (usually the most secure)
   "K***"  - key if encoded authentification is challenged (*** is the key)
             for unix crypt the first two letters of the key are the salt
	     required by the server
   "TLS\n" - switching to TLS is supported

The protocol specified in the third attribute (here QAP1) is used immediately after the ID string was transmitted.

QAP1 message oriented protocol

QAP1 (quad attributes protocol v1) is a message oriented protocol, i.e. the initiating side (here the client) sends a message and awaits a response. The message contains both the action to be taken and any necessary data. The response contains a response code and any associated data. Every message consists of a header and data part (which can be empty). The header is structured as follows:

  [0]  (int) command
  [4]  (int) length of the message (bits 0-31)
  [8]  (int) offset of the data part
  [12] (int) length of the message (bits 32-63)

command specifies the request or response type.
length specifies the number of bytes belonging to this message (excluding the header).
offset specifies the offset of the data part, where 0 means directly after the header (which is normally the case)
length2 high bits of the length (must be 0 if the packet size is smaller than 4GB)

The header must always be transmitted en-block. Data part can be split into packets of an arbitrary size. Each message consists of 16 bytes (the header) plus data. Therefore a message consists of length+16 bytes (where length is the size of the data payload).

The data part contains any additional parameters that are send along with the command. Each parameter consists of a 4-byte header:

  [0]  (byte) type
  [1]  (24-bit int) length

Types used by the current Rserve implementation (for list of all supported types see Rsrv.h):

DT_INT (4 bytes) integer
DT_STRING (n bytes) null terminated string
DT_BYTESTREAM (n bytes) any binary data
DT_SEXP R's encoded SEXP, see below

all int and double entries throughout the transfer are encoded in Intel-endianess format:
int=0x12345678 -> char[4]=(0x78,0x56,x34,0x12) functions/macros for converting from native to protocol format are available in Rsrv.h.

Commands supported by Rserve

Supported commands:

    command           parameters    | response data

    CMD_login         DT_STRING     | -
    CMD_voidEval      DT_STRING     | -
    CMD_eval          DT_STRING or  | DT_SEXP
                      DT_SEXP
    CMD_shutdown      [DT_STRING]   | -
    CMD_openFile      DT_STRING     | -
    CMD_createFile    DT_STRING     | -
    CMD_closeFile     -             | -
    CMD_readFile      [DT_INT]      | DT_BYTESTREAM
    CMD_writeFile     DT_BYTESTREAM | -
    CMD_removeFile    DT_STRING     | -
    CMD_setSEXP       DT_STRING,    | -
                      DT_SEXP
    CMD_assignSEXP    DT_STRING,    | -
                      DT_SEXP
    CMD_setBufferSize DT_INT        | -
    CMD_setEncoding   DT_STRING     | - (since 0.5-3)
since 0.6:
    CMD_ctrlEval      DT_STRING     | -
    CMD_ctrlSource    DT_STRING     | -
    CMD_ctrlShutdown  -             | -
since 1.7:
    CMD_switch        DT_STRING     | -
    CMD_keyReq        DT_STRING     | DT_BYTESTREAM
    CMD_secLogin      DT_BYTESTREAM | -
    CMD_OCcall        DT_SEXP       | DT_SEXP

(Parameters in brackets [] are optional)

Responses:
The CMD_RESP mask is set for all responses. Each response consists of the response command (RESP_OK or RESP_ERR - least significant 24 bit) and the status code (most significant 8 bits). For a list of all currently supported status codes see ERR_... in Rsrv.h.

Note: Commands with four highest bits set (0xf0) are reserved as internal/special and their data payload does not follow the regular pattern. They are used in R-to-R communication with R serialization instead of parameters. They should not be used by clients.

Encoding of SEXP R expression

R SEXP value (DT_SEXP) are recursively encoded in a similar way as the parameter attributes. Each SEXP consists of a 4-byte header and the actual contents. The header is of the form:

  [0]  (byte) eXpression Type
  [1]  (24-bit int) length

The expression type consists of the actual type (least significant 6 bits) and attributes. Follwing expression types are supported:

  XT_NULL          data: - 
+ XT_S4            data: -
  XT_VECTOR        data: (n*?) SEXP
  XT_CLOS          data: SEXP formals, SEXP body[, SEXP env]
+ XT_SYMNAME       data: (?) char,char,..,NUL
+ XT_LIST_NOTAG    data: same as XT_VECTOR
+ XT_LIST_TAG      data: (2*n*?) SEXP value, SEXP tag, ...
+ XT_LANG_NOTAG    data: same as XT_LIST_NOTAG (LANGSXP)
+ XT_LANG_TAG      data: same as XT_LIST_TAG (LANGSXP)
+ XT_VECTOR_EXP    data: same as XT_VECTOR (EXPSXP)

  XT_ARRAY_INT     data: (n*4) int,int,.. 
  XT_ARRAY_DOUBLE  data: (n*8) double,double,.. 
  XT_ARRAY_STR     data: (?) string,NUL,string,NUL,.. 
                         [must be padded with non-NUL]
  XT_ARRAY_BOOL    data: same as XT_RAW
+ XT_RAW           data: (1) int n (n) byte,byte,..
+ XT_ARRAY_CPLX    data: (n*16) double(re),double(im),..

  XT_UNKNOWN       data: (4) int - SEXP type as defined in R

The following types have been removed in protocol 0103 (Rserve 0.5)

- XT_INT           data: (4) int 
- XT_DOUBLE        data: (8) double 
- XT_STR           data: same as XT_SYMNAME
- XT_LANG          data: same as XT_LIST
- XT_SYM           data: (n) char symbol name 
- XT_BOOL          data: (1) byte boolean
			      (1=TRUE, 0=FALSE, 2=NA)
- XT_LIST          data: SEXP head, SEXP vals, [SEXP tag]

- = removed in protocol 0103 (Rserve 0.5)
+ = new since protocol 0103 (Rserve 0.5)

Attributes:
XT_HAS_ATTR - if this flag is set then the SEXP has an attribute list which is stored before the actual expression. In this case the layout looks as follows:

  [0]   (4) header SEXP: len=4+m+n, XT_HAS_ATTR is set
  [4]   (4) header attribute SEXP: len=n
  [8]   (n) data attribute SEXP
  [8+n] (m) data SEXP

Additions in version 0.2
Since version 0.2-0 the ID string reports version 0101 because of a change that makes it partially incompatible with previous versions. Main change is the fact that Rserve reporting version 0100 incorrectly omitted DT_SEXP header from the response to CMD_eval commands. This means that clients should check the version reported by Rserve and provide fix (for 0100 you can assume that CMD_eval always returns contents of a SEXP even if no DT_SEXP header is sent). Rserve reporting 0101 responds consistently, i.e. the proper DT_SEXP header is sent.
Second change is the requirement to pad strings with zeros so the length of the parameter/content is divisible by 4. Depending on the platform used the server may respond with ERR_inv_par if the parameters are not correctly alligned. Rserve reporting 0101 will itself pad strings in such manner when sending responses to the client.

Update: 2003-09-18: The previous documentation incorrectly stated that the second entry of the 4 byte headers (response and attribute) was 12-bit int, whereas it is in fact a 24-bit int. This was corrected now.

Additions in version 0.3
Rserve version 0.3 reports ID string version 0102 because support for large data was added. Previous versions were limited by the 24-bit length of parameters and SEXPs. The 0.3 version enhances the protocol by adding special flag DT_LARGE to parameter types and XT_LARGE to eXpression types. If this flag is set then the header is 8 bytes long (instead of previously 4 bytes). The additional 4 bytes are used for the parameter/expression length leading to a total of 56-bit maximum length of an expression or parameter (that is 65536TB which should be sufficient). Any data smaller 0x800000 (8MB) must be still coded in the original 4-byte header format. Current Rserve sends only data larger 0xfffff0 (16MB-16) in the large data format. Clients are encouraged to use the same threshold, but it's not required by the protocol.

Additions in version 0.4
Rserve version 0.4 stopped using scalar types (e.g XT_INT, XT_DOUBLE etc.) in responses and use corresponding array types instead. This is consistent with the interpretation of types in R. Future versions of Rserve will likley remove those scalar types alltogether. This should simplify the implmentation of clients as they don't need to distinguish scalars (as R doesn't distinguis them, either). Rserve 0.4 adds support for sessions, but since this is a sever-side add-on (new commands), it doesn't change the protocol format.

Additions in version 0.5
Rserve version 0.5 reports ID string version 0103 because several new types have been introduced and others have been removed. The above list of types was enhanced to indicate which types are new in 0.5. Note that new version of clients is necessary to support the enhancements. The main changes are: added support for complex numbers, better support for attributes, dotted-pair lists are now stored more efficiently. Clients will now also be able to assign more complex structures including attributes.

Since 0.5-3 a new command CMD_setEncoding has been added. Nonetheless for most installations it should be safe to set the encoding on the server side (e.g. for Java clients use encoding utf8 in the Rserve configuration file).

Note: during the development of the 0.5 version some types have been temporarily changed (e.g. XT_ARRAY_STR had a leading length integer at some point). Please make sure you have the latest version (0.5-0 release or later).

Additions in version 0.6
Rserve version 0.6 adds support for control commands. These commands allow a client to control the behavior of the server (when enabled). This functionality can be enabled using control enable in the configurations file. See NEWS for details on the configuration and access file changes. The new commands are CMD_ctrlEval which evaluates a given string in the global environemnt of the server, CMD_ctrlSource which sources a (server-side) file and CMD_ctrlShutdown which shuts down the server. The side-effect of the former two commands persists for all subequent client connections, i.e., it is possible to update data in a running Rserve. Technically, the control commands are processed asynchronously (i.e., an OK result for those commands merely means that the control command was successfully queued up, not that it was executed) after serving clients and for the duration of the control commands no new connections are served (they will be queued up but not served). Therefore control commands should be preferrably short-lived.
(CMD_ctrlShutdown supersedes CMD_shutdown because it allows restricting shutdown access to admin users and the latter may be re-mapped to the former if control commands are enabled, disallowing regular users to perform a shutdown.)
Note: some clients name their methods server... instead of ctrl becuase what it really means is that the command is executed inside the master server instance.

Additions in version 1.7
Rserve version 1.7 was a major re-write of many parts (hence the additional increase of the major version) and it has added other protocols (HTTP, text WebSockets and QAP WebSockets tunnel). All of them offer TLS/SSL variants. New commands include CMD_keyReq, CMD_secLogin for secure authentication (using RSA) over insecure channels, CMD_switch for switching to TLS transport (if this is allowed, "TLS\n" attribute will be sent in the ID string).

Starting with 1.7 Rserve can support (when enabled) out-of-band (OOB) commands, which is nested communication during an eval. OOB messages are initited by evaluated R code (typically via self.oobSend or self.oobMessage). Each OOB message has the CMD_OOB bitmask set to distinguish it from a regular command/response. Two classes of messages exist: OOB_SEND is uni-directional (from server to client), OOB_MSG is bi-directional (request from server to client, expecting a response from client to server). If a client doesn't support OOB, it MUST either abort the connection or send ERR_unsupportedCmd when encountering OOB_MSG to avoid a deadlock (eval will not finish until OOB_MSG receives a response).

Another major change is the new, optional object capability mode in which all commands are disabled except for CMD_OCcall. In this mode the server does not send an ID string, but instead sends a regular QAP1 message with CMD_OCinit. This message is guaranteed to have at least 16 bytes of payload so it will satisfy the read for an ID string. The command has been chosen to correspond to "RsOC" (in little-endian) as to identify this mode. The payload is DT_SEXP which holds all initial capabilities that can be used in CMD_OCcall. Each CMD_OCcall is DT_SEXP encoding a call (i.e., LANGSXP) with an OCref object in place of the closure. Rserve will de-reference it before calling eval. The main purpose of this mode is to create a basis for a secure interface where arbitrary evaluation is not possible. Only code exposed by capabilities can be executed.

Additions in version 1.8
Rserve version 1.8 introduces the ability to send messages between Rserve child instances connected to the same server. This allows re-attaching to sessions without the need for side-channels. The support for sessions (CMD_detachSession, CMD_detachedVoidEval, CMD_attachSession) is deprecated by this new feature.
Details will follow as we finalize 1.8