Revision 2012.3, 21 Mar 2012 Copyright (c) 2002-2012, Rolf Veen This is an open standard. See the license.
OGDL is a textual format that represents trees or graphs of data, where the nodes are strings and the edges are space or indentation. Its main objectives are simplicity and readability.
The text format specified here represents a directed graph G=(N,E), where N is an ordered bag of nodes and E a relation NxN. Nodes are represented by strings. Each member of E is an arc (edge), and is represented by space between nodes.
Strings and space are the basics of OGDL (if a string contains spaces then it has to be quoted). These two elements form a tree, where strings are childs of the immediately preceding lower indented string. The tree can be converted into a graph if necessary, by using special strings (see the level 2 grammar).
The characters that delimit strings are normally white space characters, but there are some exceptions: the comma and the parenthesis, that permit an inline or compact form (see 3.2 and 3.3).
a b "string with spaces"
This first form of writing OGDL, without using commas and parenthesis, is called the canonical form.
The comma has the effect of reseting the level of indentation to that of the beginning of the line. The example above can be rewritten:
a b, "string with spaces"
The same example, but using parenthesis:
a ( b, "string with spaces" )
It could also have been written without the extra spaces:
a(b,"string with spaces")
This specification doesn't support nodes after a group (a group is a sequence of nodes surrounded by parenthesis).
A text block is a string with possibly newlines in it. A standalone '\' character at the end of a line introduces a text block, as in the following example.
text_block \ This is a multiline description
Indentation spaces are deleted from the text. When, within a block, a line is less indented than the previous one, this indentation becomes the new level of indentation. If it is more indented, the extra spaces are considered part of the text and not deleted.
This example is equivalente to the previous:
text_block \ This is a multiline description
The '#' character introduces a comment. Comments are treated as white space, and thus ignored.
# this is a comment #this also this#not
To be considered as a comment, a string must start with '#'. The same character in the middle or the end of a string is not considered as the start of a comment.
The special combination "#?" is reserved for an optional metadata block, which is also expressed in OGDL. This is explained later.
Any character that is not a space, break or word character will end the OGDL stream. That means that the parser will exit when it finds such a character. Most characters below ASCII 32 (space) will end the current stream.
This mechanism may be used in, for example, in log files, where many OGDL fragments can be concatenated in one file; pointing the parser to the start of any of them will return that fragment only.
a
b
c
#{2
In this example node 'c' has an arc pointing to node 'b'. The number 2 points to the node 2 lines before the current node, in the canonical form of OGDL, where each node begins on a new line.
| Character | ASCII decimal value | Combinations | |
|---|---|---|---|
| ( | 40 | ||
| ) | 41 | ||
| , | 44 | ||
| # | 35 | #? #{ | |
| " | 34 | ||
| ' | 39 | ||
| \ | 92 | \' \" \\ | EOL in quoted and block; escape char in quoted |
| CR | 13 | ||
| NL | 10 | ||
| SP | 32 | ||
| TAB | 9 |
OGDL is specified as a series of layers:
It is not requiered that tools comply with both layers. It is possible that someone wants to implement only layer 1, since the inclusion of cycles complicates both the parser and emitter. It depends on the field of application. Presenting tools or libraries as 'OGDL 1.0 level 1' compliant is correct.
[1] char_text ::= integer > 32
[2] char_word ::= char_text - ',' - '(' - ')'
[3] char_space ::= 32 | 9
[4] char_break ::= 13 | 10
[5] char_end ::= integer - char_text - char_space - char_break
These productions use the integer as the base type for representing a character, and then only positive values. Any character that is not char_text, char_space or char_break is considered the end of the OGDL stream, and makes the parser stop and return, without signaling an error.
[6] word ::= ( char_word - '#' - ' \'' - '"')+ char_word*
[7] comment ::= '#' (char_word | char_space)* break
[8] break ::= 10 | 13 | (13 10)
[9] end ::= char_end
[10] space ::= char_space+
[11] space(n) ::= char_space*n ; where n is the equivalent number of spaces.
[12] quoted ::= ('\''|'"') (char_word | char_space | break)* ('\''|'"') ; where starting
and ending character are the same
[13] block(n) ::= '\' space? break (space(>n) (char_word | char_space)* break)+
[11] is the indentation production. It corresponds to the equivalent number of spaces between the start of a line and the beginning of the first scalar node. For any two consecutive scalars preceded by indentation, the second is child of the first one if it is more indented. Intermixing of spaces and tabs is NOT allowed: either tabs or spaces should be used for indentation within a document.
A quote character that has to be included in the string should be preceded by '\'. If the string contains line breaks, leading spaces on each new line are stripped off. The initial indentation is defined by the first line after a break. The indentation is decreased if word characters appear at a lower indentation, but it is never increased. Lines ending with '\' are concatenaded. Escape sequences that are recognized are \", \' and \\. The character '\' should be considered literal in other cases.
A block is a scalar leaf node, i.e., it cannot be parent of other nodes. It is to be used for holding a block of literal text. The only transformation that it undergoes is leading space stripping, according to the indentation rules. A block is child of the scalar that starts it.
[15] scalar ::= (word | single_quoted | double_quoted )
[16] sequence ::= (scalar|group) ( (space? ',')? space? (scalar|group) )*
[17] group ::= '(' space? sequence? space? ')'
[18] line(n) ::=
[19] graph ::= line* end
[19] reference ::= '#' '{' number (space (char_word | char_space)* )? break
This production represents an arc to a node which is number nodes above in the canonical form of OGDL
OGDL streams must parse well without explicit encoding information for all ASCII transparent encodings. Even if OGDL doesn't mandate the use of Unicode, it does encorage its use.
All special characters used in OGDL that define structure and delimit tokens are part of the US-ASCII (ISO646-US) set. This guarantees that tools that support only single byte streams will work on any 8-bit fixed or variable length encoded stream, particularly UTF-8 and ISO8859 variants. Since the conversion from bytes to characters and back is outside the scope of OGDL, it is up to the application to decide how to treat non-printable characters which are outside the ASCII space.
The '#?' character combination used as a top level node (not necessarily the first one) is reserved for comunication between the OGDL stream and the parser. It is not mandatory and allows for future enhancements of the standard, if any. For example, some optional behavior could be switched on. Normally meta-information will not be part of the in-memory graph. Meta-information is written in OGDL, as can be seen in the following examples.
#? ogdl 1.0
#? ( ogdl 1.0, encoding iso-8859-1 )
The meta-information keys that are currently reserved are: ogdl, encoding and schema.
OGDL streams are guaranted to round-trip in the presence of a capable parser and emitter, while maintaining a simple in-memory structure of nested nodes. Depending on the precision of the parser-emitter chain, the resulting stream may differ from the original in format or not. Comments are normally not preserved.
20110920 Level 2 simplified (#{N)
Deleted '--' as break.
Production 1 simplified (asume valid chars)
(short form for tables not included because
it ads confusion in case of cycles).
20051220 Comments are thrown away. The tentative part
(tables) is left out for this version.
Nodes after groups not allowed.
20051215 Space after '#' not needed in comments.
Other small corrections.
20050403 Some descriptive text added.
Defined a new EOS sequence consisting of two dashes.
Make optional the spaces around '(', ')' and ','
20040614 Tabs and spaces can not be intermixed in indentation.
20040305 Comments, meta-information added.
Semicolons deleted, comma chages meaning.
Round-tripping chapter added.
Some productions commented.
20031117 Renamed to Version 1.0
New cycle productions (were &{} and *{}).
Unicode BOM mandates Unicode stream.
Implementor decides whether he/she needs level 2.
20030902 Initial release