OGDL Specification

Revision 2015.9, September 2015

Introduction

OGDL is a textual format that represents trees or graphs of data, where the nodes are strings and the edges are space or indentation. Its main objectives are simplicity and readability.

The data model

The text format specified here represents a directed graph G=(N,E), where N is an ordered bag of nodes and E a relation NxN. Nodes are represented by strings. Each member of E is an arc (edge), and is represented by space between nodes.

OGDL does not support types in its basic form (the one described in this document), and nodes default to type string. Parsers that support OGDL Schema can preload an OGDL document that acts as a schema, and allow the nodes to have other types.

The support for ordering and duplicates precludes the use of a maps as the in-memory representation of OGDL, and instead named lists should be used. This fact, which can be seen as a drawback, is one of the reasons for OGDL to exist.

The byte stream read by the parser doesn't need to be encoded in Unicode UTF-8, but it can. The only requirement on the character encoding is that it should be ASCII transparent (ASCII itself, UTF-8). Since OGDL can be used in embedded systems with limited memory we do not want an extra dependency on Unicode.

Strings and white space

Strings and space are the basics of OGDL. If a string contains spaces then it has to be quoted. These two elements form a tree, where strings are childs of the immediately preceding lower indented string. The tree can be converted into a graph if necessary, by using special strings (see the level 2 grammar).

?: Raw string and escape sequences
a
  b
  "string with spaces"

This first form of writing OGDL, where each node begins on a new line, is called the canonical form.

More nodes on the same line are possible. The canonical form:
a
  b
can be written as:
a b
The nesting level of subnodes on a different line does not change if another subnode is placed on the same line:
a b
  c
is the same as:
a
  b
  c

Comma

The comma has the effect of reseting the level of indentation to that of the beginning of the line.

The comma rule is a deviation from the basic string/space approach. The reason for it is that it permits a compacter form of writing lists which may be very useful in configuration files. Instead of writing for example:

ports
  80
  81
  84
  8080

you can write it this way:

ports
  80, 81, 84, 8080
Be aware:
a
  c d, e f

Text block

A text block is a string with possibly newlines in it. A standalone '\' character at the end of a line introduces a text block, as in the following example.

text_block \
  This is a multiline
  description

Indentation spaces are deleted from the text. When, within a block, a line is less indented than the previous one, this indentation becomes the new level of indentation. If it is more indented, the extra spaces are considered part of the text and not deleted.

This example is equivalent to the previous one:

text_block \
   This is a multiline
  description

Note that quoted strings can have newlines, and the example above can also we written as:

text_block
  "This is a multiline
   description"

Comments

The '#' character introduces a comment. Comments are treated as white space, and thus ignored.

# this is a comment
#this not
content #not_a_comment
this#neither
[!](Comprobar qué hacen TOML y YAML)

To be considered as a the start of a comment, the '#' character must be surrounded by whitespace.

End of stream

Any character that is not a space, break or word character will end the OGDL stream. That means that the parser will exit when it finds such a character. Most characters below ASCII 32 (space) will end the current stream.

This mechanism may be used, for example, in log files, where many OGDL fragments can be concatenated in one file; pointing the parser to the start of any of them will return that fragment only.

Cycles

config
  ip
    192.168.1.1
  alt_ip
    :ip  

The #= introduces an arc to a node defined by a path. The path is relative to the parent of the arc source node, which in this case is 'config'. The path syntax follows the OGDL Path specification.

Grammar

OGDL is specified as a series of layers:

It is not requiered that tools comply with both layers, since the inclusion of cycles complicates both the parser and emitter. It depends on the field of application.

The following grammar rules or productions are written in a simplified EBNF format similar to the one used in the XML specification (see http://www.w3.org/TR/xml11/#sec-notation), except that single quotes enclose single characters.

Note: Since OGDL is layout sensitive, a context free grammar such as EBNF can not fully express it. Additional comments cover the missing features.

Tree grammar

[1]  char_text  ::= integer > 32
[3]  char_space ::= 32 | 9
[4]  char_break ::= 13 | 10
[5]  char_end   ::= integer < 32 -( 9, 10, 13 )

These productions use the integer as the base type for representing a character. Any character that is not char_text, char_space or char_break is considered the end of the OGDL stream, and makes the parser stop and return, without signaling an error.

[6]  word     ::= ( char_text - '#' - ' \'' - '"')+ (char_word - ' \'' - '"')*
[7]  string   ::= (char_text | char_space)+
[8]  break    ::= 10 | 13 | (13 10)
[9]  comment  ::= (SOL|space) '#' string? break
[10] quoted   ::= '\''  string '\'' | '"' string '"'
[11] space    ::= char_space+
[12] space(n) ::= char_space*n ; where n is the equivalent number of spaces.
[13] block(n) ::= '\' break (space(>n) string? break)+

[12] is the indentation production. It corresponds to the equivalent number of spaces between the start of a line and the beginning of the first scalar node. For any two consecutive scalars preceded by indentation, the second is child of the first one if it is more indented. Intermixing of spaces and tabs is NOT allowed: either tabs or spaces should be used for indentation within a document.

A quote character that has to be included in the string should be preceded by '\'. If the string contains line breaks, leading spaces on each new line are stripped off. The initial indentation is defined by the first line after a break. The indentation is decreased if word characters appear at a lower indentation, but it is never increased. Escape sequences that are recognized are \", \' and \\. The character '\' should be considered literal in all other cases.

A block is a scalar leaf node, i.e., it cannot be parent of other nodes. It is used for holding literal text. The only transformation that it undergoes is leading space stripping, according to the indentation rules. A block is child of the scalar that precedes it.

[14] element  ::= (word | quoted | group)
[17] line(n)  ::= .........TBD....... break  
[18] graph    ::= line* char_end

Graph grammar

[19] arc ::= "#=" relative_path space? break
[20] relative_path ::= '.'* ogdl_path

This production represents a directed arc to a node specified by relative_path. This path is relative to the node above the source node of the arc. By inserting one or more dots before the path, the relative position can be moved upwards.

7. Character encoding

OGDL streams must parse well without explicit encoding information for all ASCII transparent encodings. All special characters used in OGDL that define structure and delimit tokens are part of the US-ASCII (ISO646-US) set. This guarantees that tools that support only single byte streams will work on any 8-bit fixed or variable length encoded stream, particularly UTF-8 and ISO8859 variants. Since the conversion from bytes to characters and back is outside the scope of OGDL, it is up to the application to decide how to treat non-printable characters which are outside the ASCII space.



A. Changes to this document

See the Change list