NAME

Rserve::Parser - Functions for parsing R data files

SYNOPSIS

use Rserve::ParserState;
use Rserve::Parser;

my $state = Rserve::ParserState->new(
    data => 'file.rds'
);
say $state->at
say $state->next->at;

DESCRIPTION

You shouldn't create instances of this class, it exists mainly to handle deserialization of R data files.

FUNCTIONS

This library is inspired by monadic parser frameworks from the Haskell world, like Packrat or Parsec. What this means is that parsers are constructed by combining simpler parsers.

The library offers a selection of basic parsers and combinators. Each of these is a function (think of it as a factory) that returns another function (the actual parser) which receives the current parsing state (Rserve::ParserState) as the argument and returns a two-element array reference (called for brevity "a pair" in the following text) with the result of the parser in the first element and the new parser state in the second element. If the parser fails, say if the current state is "a" where a number is expected, it returns undef to signal failure.

The descriptions of individual functions below use a shorthand because the above mechanism is implied. Thus, when any_char is described as "parses any character", it really means that calling any_char will return a function that when called with the current state will return "a pair of the character...", etc.

CHARACTER PARSERS

any_char

Parses any character, returning a pair of the character at the current State's position and the new state, advanced by one from the starting state. If the state is at the end ($state-eof> is true), returns undef to signal failure.

char

char($c)

Parses the given character $c, returning a pair of the character at the current State's position if it is equal to $c and the new state, advanced by one from the starting state. If the state is at the end ($state-eof> is true) or the character at the current position is not $c, returns undef to signal failure.

string

string($s)

Parses the given string $s, returning a pair of the sequence of characters starting at the current State's position if it is equal to $s and the new state, advanced by length($s) from the starting state. If the state is at the end ($state-eof> is true) or the string starting at the current position is not $s, returns undef to signal failure.

NUMBER PARSERS

endianness

endianness($end)

The $end argument is optional and if given, this function sets the byte order used by parsers in the module to be little-endian if $end is "<" or big-endian if $end is ">". This function changes the module's state and remains in effect until the next change.

When called with no arguments, endianness returns the current byte order in effect. The starting byte order is big-endian.

any_uint8

any_uint16

any_uint24

any_uint32

Parses an 8-, 16-, 24-, or 32-bit unsigned integer, returning a pair of the integer starting at the current State's position and the new state, advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value of endianness. If there are not enough elements left in the data from the current position, returns undef to signal failure.

uint8

uint16

uint24

uint32

uint8($n)
uint16($n)
uint24($n)
uint32($n)

Parses the specified 8-, 16-, 24-, and 32-bit unsigned integer $n, returning a pair of the integer at the current State's position if it is equal $n and the new state. The new state is advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value of endianness. If there are not enough elements left in the data from the current position or the current position is not $n, returns undef to signal failure.

any_int8

any_int16

any_int24

any_int32

Parses an 8-, 16-, 24-, and 32-bit signed integer, returning a pair of the integer starting at the current State's position and the new state, advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value of endianness. If there are not enough elements left in the data from the current position, returns undef to signal failure.

int8

int16

int24

int32

int8($n)
int16($n)
int24($n)
int32($n)

Parses the specified 8-, 16-, 24-, and 32-bit signed integer $n, returning a pair of the integer at the current State's position if it is equal $n and the new state. The new state is advanced by 1, 2, 3, or 4 bytes from the starting state, depending on the parser. The integer value is determined by the current value of endianness. If there are not enough elements left in the data from the current position or the current position is not $n, returns undef to signal failure.

any_real32

any_real64

Parses an 32- or 64-bit real number, returning a pair of the number starting at the current State's position and the new state, advanced by 4 or 8 bytes from the starting state, depending on the parser. The real value is determined by the current value of endianness. If there are not enough elements left in the data from the current position, returns undef to signal failure.

any_int32_na

any_real64_na

Parses a 32-bit signed integer or 64-bit real number, respectively, but recognizing R-style missing values (NAs): INT_MIN for integers and a special NaN bit pattern for reals. Returns a pair of the number value (undef if a NA) and the new state, advanced by 4 or 8 bytes from the starting state, depending on the parser. If there are not enough elements left in the data from the current position, returns undef to signal failure.

SEQUENCING

seq

seq($p1, $p2, ...)

This combinator applies parsers $p1, $p2, ... in sequence, using the returned parse state of $p1 as the input parse state to $p2, etc. Returns a pair of the concatenation of all the parsers' results and the parsing state returned by the final parser. If any of the parsers returns undef, seq will return it immediately without attempting to apply any further parsers.

many_till

many_till($p, $end)

This combinator applies a parser $p until parser $end succeeds. It does this by alternating applications of $end and $p; once $end succeeds, the function returns the concatenation of results of preceding applications of $p. (Thus, if $end succeeds immediately, the 'result' is an empty list.) Otherwise, $p is applied and must succeed, and the procedure repeats. Returns a pair of the concatenation of all the $p's results and the parsing state returned by the final parser. If any applications of $p returns undef, many_till will return it immediately.

count

count($n, $p)

This combinator applies the parser $p exactly $n times in sequence, threading the parse state through each call. Returns a pair of the concatenation of all the parsers' results and the parsing state returned by the final application. If any application of $p returns undef, count will return it immediately without attempting any more applications.

with_count

with_count($num_p, $p)
with_count($p)

This combinator first applies parser $num_p to get the number of times that $p should be applied in sequence. If only one argument is given, any_uint32 is used as the default value of $num_p. (So with_count works by getting a number $n by applying $num_p and then calling count $n, $p.) Returns a pair of the concatenation of all the parsers' results and the parsing state returned by the final application. If the initial application of $num_p or any application of $p returns undef, with_count will return it immediately without attempting any more applications.

choose

choose($p1, $p2, ...)

This combinator applies parsers $p1, $p2, ... in sequence, until one of them succeeds, when it immediately returns the parser's result. If all of the parsers fail, choose fails and returns undef.

COMBINATORS

bind

bind($p1, $f)

This combinator applies parser $p1 and, if it succeeds, calls function $f using the first element of $p1's result as the argument. The call to $f needs to return a parser, which bind applies to the parsing state after $p1's application.

The bind combinator is an essential building block for most combinators described so far. For instance, with_count can be written as:

bind($num_p,
     sub {
         my $n = shift;
         count $n, $p;
     })

mreturn

mreturn($value)

Returns a parser that when applied returns $value without changing the parsing state.

error

error($message)

Returns a parser that when applied croaks with the $message and the current parsing state.