This document describes the protocols and structures used by Rserve (version 0.1-9). This information is helpful for implementing Rserve clients.
Rserve communication is performed over any reliable connection-oriented
protocol (usually TCP/IP; Rserve 0.1-9 supports TCP/IP and local unix sockets).
After connection is established, the server
sends 32 bytes representing the ID-string defining the capabilities of the server.
Each attribute of the ID-string is 4 bytes long and is meant to be user-
readable (i.e. use no special characters), and it's a good idea to make
"\r\n\r\n" the last attribute.
the ID string must be of the form:
[0] "Rsrv" - R-server ID signature
[4] "0100" - version of the R server protocol
[8] "QAP1" - protocol used for communication (here Quad Attributes Packets v1)
[12] any additional attributes follow. \r\n and '-' are ignored.
optional attributes
(in any order; it is legitimate to put dummy attributes, like "----" or
" " between attributes):
"R151" - version of R (here 1.5.1)
"ARpt" - authorization required (here "pt"=plain text, "uc"=unix crypt)
connection will be closed
if the first packet is not CMD_login.
if more AR.. methods are specified, then client is free to
use the one he supports (usually the most secure)
"K***" - key if encoded authentification is challenged (*** is the key)
for unix crypt the first two letters of the key are the salt
required by the server
"TLS\n" - switching to TLS is supported
The protocol specified in the third attribute (here QAP1) is used immediately after the ID string was transmitted.
QAP1 message oriented protocol
QAP1 (quad attributes protocol v1) is a message oriented protocol, i.e. the initiating side (here the client) sends a message and awaits a response. The message contains both the action to be taken and any necessary data. The response contains a response code and any associated data. Every message consists of a header and data part (which can be empty). The header is structured as follows:
[0] (int) command
[4] (int) length of the message (bits 0-31)
[8] (int) offset of the data part
[12] (int) length of the message (bits 32-63)
command specifies the request or response type.
length specifies the number of bytes belonging to this message (excluding the header).
offset specifies the offset of the data part, where 0 means directly after the header (which is normally the case)
length2 high bits of the length (must be 0 if the packet size is smaller than 4GB)
The header must always be transmitted en-block. Data part can be split into packets of an arbitrary size. Each message consists of 16 bytes (the header) plus data. Therefore a message consists of length+16 bytes (where length is the size of the data payload).
The data part contains any additional parameters that are send along with the command. Each parameter consists of a 4-byte header:
[0] (byte) type
[1] (24-bit int) length
Types used by the current Rserve implementation (for list of all supported types see Rsrv.h):
- DT_INT (4 bytes) integer
- DT_STRING (n bytes) null terminated string
- DT_BYTESTREAM (n bytes) any binary data
- DT_SEXP R's encoded SEXP, see below
all int and double entries throughout the transfer are encoded in Intel-endianess format:
int=0x12345678 -> char[4]=(0x78,0x56,x34,0x12)
functions/macros for converting from native to protocol format are available in Rsrv.h.
Commands supported by Rserve
Supported commands:
command parameters | response data
CMD_login DT_STRING | -
CMD_voidEval DT_STRING | -
CMD_eval DT_STRING or | DT_SEXP
DT_SEXP
CMD_shutdown [DT_STRING] | -
CMD_openFile DT_STRING | -
CMD_createFile DT_STRING | -
CMD_closeFile - | -
CMD_readFile [DT_INT] | DT_BYTESTREAM
CMD_writeFile DT_BYTESTREAM | -
CMD_removeFile DT_STRING | -
CMD_setSEXP DT_STRING, | -
DT_SEXP
CMD_assignSEXP DT_STRING, | -
DT_SEXP
CMD_setBufferSize DT_INT | -
CMD_setEncoding DT_STRING | - (since 0.5-3)
since 0.6:
CMD_ctrlEval DT_STRING | -
CMD_ctrlSource DT_STRING | -
CMD_ctrlShutdown - | -
since 1.7:
CMD_switch DT_STRING | -
CMD_keyReq DT_STRING | DT_BYTESTREAM
CMD_secLogin DT_BYTESTREAM | -
CMD_OCcall DT_SEXP | DT_SEXP
(Parameters in brackets [] are optional)
Responses:
The CMD_RESP mask is set for all responses. Each response consists of the response command (RESP_OK or RESP_ERR - least significant 24 bit) and the status code (most significant 8 bits). For a list of all currently supported status codes see ERR_... in Rsrv.h.
Note: Commands with four highest bits set (0xf0) are reserved as internal/special and their data payload does not follow the regular pattern. They are used in R-to-R communication with R serialization instead of parameters. They should not be used by clients.
Encoding of SEXP R expression
R SEXP value (DT_SEXP) are recursively encoded in a similar way as the parameter attributes. Each SEXP consists of a 4-byte header and the actual contents. The header is of the form:
[0] (byte) eXpression Type
[1] (24-bit int) length
The expression type consists of the actual type (least significant 6 bits) and attributes. Follwing expression types are supported:
XT_NULL data: -
+ XT_S4 data: -
XT_VECTOR data: (n*?) SEXP
XT_CLOS data: SEXP formals, SEXP body[, SEXP env]
+ XT_SYMNAME data: (?) char,char,..,NUL
+ XT_LIST_NOTAG data: same as XT_VECTOR
+ XT_LIST_TAG data: (2*n*?) SEXP value, SEXP tag, ...
+ XT_LANG_NOTAG data: same as XT_LIST_NOTAG (LANGSXP)
+ XT_LANG_TAG data: same as XT_LIST_TAG (LANGSXP)
+ XT_VECTOR_EXP data: same as XT_VECTOR (EXPSXP)
XT_ARRAY_INT data: (n*4) int,int,..
XT_ARRAY_DOUBLE data: (n*8) double,double,..
XT_ARRAY_STR data: (?) string,NUL,string,NUL,..
[must be padded with non-NUL]
XT_ARRAY_BOOL data: same as XT_RAW
+ XT_RAW data: (1) int n (n) byte,byte,..
+ XT_ARRAY_CPLX data: (n*16) double(re),double(im),..
XT_UNKNOWN data: (4) int - SEXP type as defined in R
The following types have been removed in protocol 0103 (Rserve 0.5)
- XT_INT data: (4) int
- XT_DOUBLE data: (8) double
- XT_STR data: same as XT_SYMNAME
- XT_LANG data: same as XT_LIST
- XT_SYM data: (n) char symbol name
- XT_BOOL data: (1) byte boolean
(1=TRUE, 0=FALSE, 2=NA)
- XT_LIST data: SEXP head, SEXP vals, [SEXP tag]
- = removed in protocol 0103 (Rserve 0.5)
+ = new since protocol 0103 (Rserve 0.5)
Attributes:
XT_HAS_ATTR - if this flag is set then the SEXP has an attribute list which is stored before the actual expression. In this case the layout looks as follows:
[0] (4) header SEXP: len=4+m+n, XT_HAS_ATTR is set
[4] (4) header attribute SEXP: len=n
[8] (n) data attribute SEXP
[8+n] (m) data SEXP
Additions in version 0.2
Since version 0.2-0 the ID string reports version 0101 because of a change that makes it partially incompatible with previous versions. Main change is the fact that Rserve reporting version 0100 incorrectly omitted DT_SEXP header from the response to CMD_eval commands. This means that clients should check the version reported by Rserve and provide fix (for 0100 you can assume that CMD_eval always returns contents of a SEXP even if no DT_SEXP header is sent). Rserve reporting 0101 responds consistently, i.e. the proper DT_SEXP header is sent.
Second change is the requirement to pad strings with zeros so the length of the parameter/content is divisible by 4. Depending on the platform used the server may respond with ERR_inv_par if the parameters are not correctly alligned. Rserve reporting 0101 will itself pad strings in such manner when sending responses to the client.
Update: 2003-09-18: The previous documentation incorrectly stated that the second entry of the 4 byte headers (response and attribute) was 12-bit int, whereas it is in fact a 24-bit int. This was corrected now.
Additions in version 0.3
Rserve version 0.3 reports ID string version 0102 because support for large data was added. Previous versions were limited by the 24-bit length of parameters and SEXPs. The 0.3 version enhances the protocol by adding special flag DT_LARGE to parameter types and XT_LARGE to eXpression types. If this flag is set then the header is 8 bytes long (instead of previously 4 bytes). The additional 4 bytes are used for the parameter/expression length leading to a total of 56-bit maximum length of an expression or parameter (that is 65536TB which should be sufficient). Any data smaller 0x800000 (8MB) must be still coded in the original 4-byte header format. Current Rserve sends only data larger 0xfffff0 (16MB-16) in the large data format. Clients are encouraged to use the same threshold, but it's not required by the protocol.
Additions in version 0.4
Rserve version 0.4 stopped using scalar types (e.g XT_INT, XT_DOUBLE etc.) in responses and use corresponding array types instead. This is consistent with the interpretation of types in R. Future versions of Rserve will likley remove those scalar types alltogether. This should simplify the implmentation of clients as they don't need to distinguish scalars (as R doesn't distinguis them, either). Rserve 0.4 adds support for sessions, but since this is a sever-side add-on (new commands), it doesn't change the protocol format.
Additions in version 0.5
Rserve version 0.5 reports ID string version 0103 because several new types have been introduced and others have been removed. The above list of types was enhanced to indicate which types are new in 0.5. Note that new version of clients is necessary to support the enhancements. The main changes are: added support for complex numbers, better support for attributes, dotted-pair lists are now stored more efficiently. Clients will now also be able to assign more complex structures including attributes.
Since 0.5-3 a new command CMD_setEncoding has been added. Nonetheless for most installations it should be safe to set the encoding on the server side (e.g. for Java clients use encoding utf8 in the Rserve configuration file).
Note: during the development of the 0.5 version some types have been temporarily changed (e.g. XT_ARRAY_STR had a leading length integer at some point). Please make sure you have the latest version (0.5-0 release or later).
Additions in version 0.6
Rserve version 0.6 adds support for control commands. These commands allow a client to control the behavior of the server (when enabled). This functionality can be enabled using control enable in the configurations file. See NEWS for details on the configuration and access file changes. The new commands are CMD_ctrlEval which evaluates a given string in the global environemnt of the server, CMD_ctrlSource which sources a (server-side) file and CMD_ctrlShutdown which shuts down the server. The side-effect of the former two commands persists for all subequent client connections, i.e., it is possible to update data in a running Rserve. Technically, the control commands are processed asynchronously (i.e., an OK result for those commands merely means that the control command was successfully queued up, not that it was executed) after serving clients and for the duration of the control commands no new connections are served (they will be queued up but not served). Therefore control commands should be preferrably short-lived.
(CMD_ctrlShutdown supersedes CMD_shutdown because it allows restricting shutdown access to admin users and the latter may be re-mapped to the former if control commands are enabled, disallowing regular users to perform a shutdown.)
Note: some clients name their methods server... instead of ctrl becuase what it really means is that the command is executed inside the master server instance.
Additions in version 1.7
Rserve version 1.7 was a major re-write of many parts (hence the additional increase of the major version) and it has added other protocols (HTTP, text WebSockets and QAP WebSockets tunnel). All of them offer TLS/SSL variants. New commands include CMD_keyReq, CMD_secLogin for secure authentication (using RSA) over insecure channels, CMD_switch for switching to TLS transport (if this is allowed, "TLS\n" attribute will be sent in the ID string).
Starting with 1.7 Rserve can support (when enabled) out-of-band (OOB) commands, which is nested communication during an eval. OOB messages are initited by evaluated R code (typically via self.oobSend or self.oobMessage). Each OOB message has the CMD_OOB bitmask set to distinguish it from a regular command/response. Two classes of messages exist: OOB_SEND is uni-directional (from server to client), OOB_MSG is bi-directional (request from server to client, expecting a response from client to server). If a client doesn't support OOB, it MUST either abort the connection or send ERR_unsupportedCmd when encountering OOB_MSG to avoid a deadlock (eval will not finish until OOB_MSG receives a response).
Another major change is the new, optional object capability mode in which all commands are disabled except for CMD_OCcall. In this mode the server does not send an ID string, but instead sends a regular QAP1 message with CMD_OCinit. This message is guaranteed to have at least 16 bytes of payload so it will satisfy the read for an ID string. The command has been chosen to correspond to "RsOC" (in little-endian) as to identify this mode. The payload is DT_SEXP which holds all initial capabilities that can be used in CMD_OCcall. Each CMD_OCcall is DT_SEXP encoding a call (i.e., LANGSXP) with an OCref object in place of the closure. Rserve will de-reference it before calling eval. The main purpose of this mode is to create a basis for a secure interface where arbitrary evaluation is not possible. Only code exposed by capabilities can be executed.
Additions in version 1.8
Rserve version 1.8 introduces the ability to send messages between Rserve child instances connected to the same server. This allows re-attaching to sessions without the need for side-channels. The support for sessions (CMD_detachSession, CMD_detachedVoidEval, CMD_attachSession) is deprecated by this new feature.
Details will follow as we finalize 1.8
|