OGDL Specification

Revision 2018.2, February 2018

Introduction

OGDL is a textual format that represents trees or graphs of data, where the nodes are strings and the edges are space or indentation. Its main objectives are simplicity and readability.

The data model

The text format specified here represents a directed graph G=(N,E), where N is an ordered bag of nodes and E a relation NxN. Nodes are represented by strings. Each member of E is an arc (edge), and is represented by space between nodes.

OGDL does not support types in its basic form (the one described in this document), and nodes default to type string. Parsers that support OGDL Schema can preload an OGDL document that acts as a schema, and allow the nodes to have other types.

The support for ordering and duplicates precludes the use of a maps as the in-memory representation of OGDL.

The byte stream read by the parser can have any ASCII transparent character encoding, such as Unicode UTF-8.

Strings and white space

Strings and space are the basics of OGDL. If a string contains spaces then it has to be quoted. These two elements form a tree, where strings are childs of the immediately preceding lower indented string. The tree can be converted into a graph if necessary, by using a special syntax (see the level 2 grammar).

a
  b
  "string with spaces"

This first form of writing OGDL, where each node begins on a new line, is called the canonical form.

More nodes on the same line are possible. The canonical form:
a
  b
can be written as:
a b
The nesting level of subnodes on a different line does not change if another subnode is placed on the same line:
a b
  c
is the same as:
a
  b
  c

Text block

A text block is a string with possibly newlines in it. A standalone '\' character at the end of a line introduces a text block, as in the following example.

text_block \
  This is a multiline
  description

Indentation spaces are deleted from the text. When, within a block, a line is less indented than the previous one, this indentation becomes the new level of indentation. If it is more indented, the extra spaces are considered part of the text and not deleted.

This example is equivalent to the previous one:

text_block \
   This is a multiline
  description

Note that quoted strings can have newlines, and the example above can also we written as:

text_block
  "This is a multiline
   description"

Comments

The string "# " introduces a comment, if preceded by white space (newline, space, tab) . Comments are treated as white space, and thus ignored.

# this is a comment
#this not
content #not_a_comment
this#neither

End of stream

Any character that is not a space, break or word character will end the OGDL stream. That means that the parser will exit when it finds such a character. Most characters below ASCII 32 (space) will end the current stream.

This mechanism may be used, for example, in log files, where many OGDL fragments can be concatenated in one file; pointing the parser to the start of any of them will return that fragment only.

Cycles

config
  ip
    192.168.1.1
  alt_ip
    :ip  

A semicolon followed by a path represents a reference or arc to another node in the stream. The path syntax follows the OGDL Path specification. In the example, 'config.alt_ip' will have the same content as 'config.ip'. The path is relative, and will resolve at any level higher than the one where the reference is placed. For example:

config
  ip
    192.168.1.1
  alt_ips
    ip1 
      :ip  

Here config.alt_ips.ip will be equivalent to config.ip.

Grammar

OGDL is specified as a series of layers:

It is not requiered that tools comply with both layers, since the inclusion of cycles complicates both the parser and emitter. It depends on the field of application.

The following grammar rules or productions are written in a simplified EBNF format similar to the one used in the XML specification (see http://www.w3.org/TR/xml11/#sec-notation), except that single quotes enclose single characters.

Note: Since OGDL is layout sensitive, a context free grammar such as EBNF can not fully express it. Additional comments cover the missing features.

Tree grammar

[1]  char_text  ::= integer > 32
[3]  char_space ::= 32 | 9
[4]  char_break ::= 13 | 10
[5]  char_end   ::= integer < 32 -( 9, 10, 13 )

These productions use the integer as the base type for representing a character. Any character that is not char_text, char_space or char_break is considered the end of the OGDL stream, and makes the parser stop and return, without signaling an error.

[6]  word     ::= ( char_text - '#' - ' \'' - '"')+ (char_word - ' \'' - '"')*
[7]  string   ::= (char_text | char_space)+
[8]  break    ::= 10 | 13 | (13 10)
[9]  comment  ::= ^char_text '#' string? break
[10] quoted   ::= '\''  string '\'' | '"' string '"'
[11] space    ::= char_space+
[12] space(n) ::= char_space*n ; where n is the equivalent number of spaces (can be 0)
[13] block(n) ::= '\' (comment|break) (space(>n) string break)+

[12] is the indentation production. It corresponds to the equivalent number of spaces between the start of a line and the beginning of the first scalar node. For any two consecutive scalars preceded by indentation, the second is child of the first one if it is more indented. Intermixing of spaces and tabs is NOT allowed: either tabs or spaces should be used for indentation within a document.

A quote character that has to be included in the string should be preceded by '\'. If the string contains line breaks, leading spaces on each new line are stripped off. The initial indentation is defined by the first line after a break. The indentation is decreased if word characters appear at a lower indentation, but it is never increased. Escape sequences that are recognized are \", \' and \\. The character '\' should be considered literal in all other cases.

A block is a scalar leaf node, i.e., it cannot be parent of other nodes. It is used for holding literal text. The only transformation that it undergoes is leading space stripping, according to the indentation rules. A block is child of the scalar that precedes it.

[14] element  ::= (word | quoted)
[15] list     ::= element (space? ','? space? element)*
[16] line(n)  ::= space(n) list? space? ((element space? block)|(comment? break))
[17] graph    ::= line* char_end

Graph grammar

[18] arc ::= ":" ogdl_path space? break

This production represents a directed arc to a node specified by a path (as per OGDL Path). More precisely, it represents a set of arcs to the subnodes of the node specified by a path. The path is relative and can resolve at any level higher than the current one. Levels that are closer to the one where the arc is placed take precedence.

Character encoding

OGDL streams must parse well without explicit encoding information for all ASCII transparent encodings. All special characters used in OGDL that define structure and delimit tokens are part of the US-ASCII (ISO646-US) set. This guarantees that tools that support only single byte streams will work on any 8-bit fixed or variable length encoded stream, particularly UTF-8 and ISO8859 variants. Since the conversion from bytes to characters and back is outside the scope of OGDL, it is up to the application to decide how to treat non-printable characters which are outside the ASCII space.



Changes to this document

See the Change list