Internet-Draft | CBOR Extended Diagnostic Notation (EDN) | November 2024 |
Bormann | Expires 7 May 2025 | [Page] |
The Concise Binary Object Representation (CBOR) (STD 94, RFC 8949) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.¶
In addition to the binary interchange format, CBOR from the outset (RFC 7049) defined a text-based "diagnostic notation" in order to be able to converse about CBOR data items without having to resort to binary data. RFC 8610 extended this into what is known as Extended Diagnostic Notation (EDN).¶
This document consolidates the definition of EDN, sets forth a further step of its evolution, and is intended to serve as a single reference target in specifications that use EDN.¶
It specifies an extension point for adding application-oriented extensions to the diagnostic notation. It then defines two such extensions that enhance EDN with text representations of epoch-based date/times and of IP addresses and prefixes (RFC 9164).¶
A few further additions close some gaps in usability. The document modifies one extension originally specified in Appendix G.4 of RFC 8610 to enable further increasing usability. To facilitate tool interoperation, this document specifies a formal ABNF grammar, and it adds media types.¶
(This "cref" paragraph will be removed by the RFC editor:)
The present revision -13
reflects the branches "roll-up"
and "roll-up-2" in
the repository, an attempt to contain the entire specification of
EDN in this document, instead of describing updates to the
existing documents RFC 8949 and RFC 8610.
Editorial work on the branch "roll-up-2" might continue.
The exact reflection of this document being a replacement for both
Section 8 of RFC 8949 and Appendix G of RFC 8610 needs to be
recorded in the metadata and in abstract and introduction.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://cbor-wg.github.io/edn-literal/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-cbor-edn-literals/.¶
Discussion of this document takes place on the cbor Working Group mailing list (mailto:[email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/cbor/. Subscribe at https://www.ietf.org/mailman/listinfo/cbor/.¶
Source for this draft and an issue tracker can be found at https://github.com/cbor-wg/edn-literal.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 7 May 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The Concise Binary Object Representation (CBOR) (STD 94, RFC 8949) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.¶
In addition to the binary interchange format, CBOR from the outset (Section 6 of [RFC7049], now Section 8 of RFC 8949 [STD94]) defined a text-based "diagnostic notation" in order to be able to converse about CBOR data items without having to resort to binary data. Appendix G of [RFC8610] extended this into what is known as Extended Diagnostic Notation (EDN).¶
Diagnostic notation syntax is based on JSON, with extensions for representing CBOR constructs such as binary data and tags. (Standardizing this together with the actual interchange format does not serve to create another interchange format, but enables the use of a shared diagnostic notation in tools for and in documents about CBOR.)¶
This document consolidates the definition of EDN, sets forth a further step of its evolution, and is intended to serve as a single reference target in specifications that use EDN.¶
It specifies an extension point for adding application-oriented extensions to the diagnostic notation. It then defines two such extensions that enhance EDN with text representations of epoch-based date/times and of IP addresses and prefixes [RFC9164].¶
A few further additions close some gaps in usability. The document modifies one extension originally specified in Appendix G.4 of [RFC8610] to enable further increasing usability. To facilitate tool interoperation, this document specifies a formal ABNF grammar. (See Section 5.1 for an overall ABNF grammar as well as the ABNF definitions in Section 5.2 for grammars for both the byte string presentations predefined in [STD94] and the application-extensions defined here.)¶
In addition, this document finally registers a media type identifier and a content-format for CBOR diagnostic notation. This does not elevate its status as an interchange format, but recognizes that interaction between tools is often smoother if media types can be used.¶
Note that EDN is not meant to be the only text-based representation of CBOR data items. For instance, [YAML] [RFC9512] is able to represent most CBOR data items, possibly requiring use of YAML's extension points. YAML does not provide certain features that can be useful with tools and documents needing text-based representations of CBOR data items (such as embedded CBOR or encoding indicators), but it does provide a host of other features that EDN does not provide such as anchor/alias data sharing, at a cost of higher implementation and learning complexity.¶
Section 2 of this document has been built from Section 8 of RFC 8949 [STD94] and Appendix G of [RFC8610]. The latter provided a number of useful extensions to the diagnostic notation originally defined in Section 6 of [RFC7049]. Section 8 of RFC 8949 [STD94] and Appendix G of [RFC8610] have collectively been called "Extended Diagnostic Notation" (EDN), giving the present document its name.¶
After introductory material, Section 3 introduces the concept of application-oriented extension literals and defines the "dt" and "ip" extensions. Section 4 defines mechanisms for dealing with unknown application-oriented literals and deliberately elided information. Section 5 gives the formal syntax of EDN in ABNF, with explanations for some features of and additions to this syntax, as an overall grammar (Section 5.1) and specific grammars for the content of app-string and byte-string literals (Section 5.2). This is followed by the conventional sections for IANA Considerations (6), Security considerations (7), and References (8.1, 8.2). An informational comparison of EDN with CDDL follows in Appendix A.¶
Section 8 of RFC 8949 [STD94] defines the original CBOR diagnostic notation, and Appendix G of [RFC8610] supplies a number of extensions to the diagnostic notation that result in the Extended Diagnostic Notation (EDN). The diagnostic notation extensions include popular features such as embedded CBOR (encoded CBOR data items in byte strings) and comments. A simple diagnostic notation extension that enables representing CBOR sequences was added in Section 4.2 of [RFC8742]. As diagnostic notation is not used in the kind of interchange situations where backward compatibility would pose a significant obstacle, there is little point in not using these extensions.¶
Therefore, when we refer to "diagnostic notation", we mean to include the original notation from Section 8 of RFC 8949 [STD94] as well as the extensions from Appendix G of [RFC8610], Section 4.2 of [RFC8742], and the present document. However, we stick to the abbreviation "EDN" as it has become quite popular and is more sharply distinguishable from other meanings than "DN" would be.¶
In a similar vein, the term "ABNF" in this document refers to the language defined in [STD68] as extended in [RFC7405], where the "characters" of Section 2.3 of RFC 5234 [STD68] are Unicode scalar values.¶
The term "CDDL" (Concise Data Definition Language) refers to the data definition language defined in [RFC8610] and its registered extensions (such as those in [RFC9165]), as well as [I-D.ietf-cbor-update-8610-grammar]. Additional information about the relationship between the two languages EDN and CDDL is captured in Appendix A.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP14] (RFC2119) (RFC8174) when, and only when, they appear in all capitals, as shown here.¶
Section 8 of RFC 8949 [STD94] states the objective of defining a common human-readable diagnostic notation with CBOR. In particular, it states:¶
All actual interchange always happens in the binary format.¶
One important application of EDN is the notation of CBOR data for humans: in specifications, on whiteboards, and for entering test data. A number of features, such as comments inside prefixed string literals, are mainly useful for people-to-people communication via EDN. Programs also often output EDN for diagnostic purposes, such as in error messages or to enable comparison (including generation of diffs via tools) with test data.¶
For comparison with test data, it is often useful if different
implementations generate the same (or similar) output for the same
CBOR data items.
This is comparable to the objectives of deterministic serialization
for CBOR data items themselves (Section 4.2 of RFC 8949 [STD94]).
However, there are even more representation variants in EDN than in
binary CBOR, and there is little point in specifically endorsing a
single variant as "deterministic" when other variants may be more
useful for human understanding, e.g., the << >>
notation as
opposed to h''
; an EDN generator may have quite a few options
that control what presentation variant is most desirable for the
application that it is being used for.¶
Because of this, a deterministic representation is not defined for EDN, and there is no expectation for "roundtripping" from EDN to CBOR and back, i.e., for an ability to convert EDN to binary CBOR and back to EDN while achieving exactly the same result as the original input EDN — the original EDN possibly was created by humans or by a different EDN generator.¶
However, there is a certain expectation that EDN generators can be configured to some basic output format, which:¶
looks like JSON where that is possible;¶
inserts encoding indicators only where the binary form differs from preferred encoding;¶
uses hexadecimal representation (h''
) for byte strings, not
b64''
or embedded CBOR (<<>>
);¶
does not generate elaborate blank space (newlines, indentation) for
pretty-printing, but does use common blank spaces such as after ,
and :
.¶
Additional features such as ensuring deterministic map ordering
(Section 4.2 of RFC 8949 [STD94]) on output, or even deviating from the basic
configuration in some systematic way, can further assist in comparing
test data.
Information obtained from a CDDL model can help in choosing
application-oriented literals or specific string representations such
as embedded CBOR or b64''
in the appropriate places.¶
CBOR is a binary interchange format. To facilitate documentation and debugging, and in particular to facilitate communication between entities cooperating in debugging, this document defines a simple human-readable diagnostic notation. All actual interchange always happens in the binary format.¶
Note that diagnostic notation truly was designed as a diagnostic format; it originally was not meant to be parsed. Therefore, no formal definition (as in ABNF) was given in the original documents. Recognizing that formal grammars can aid interoperation of tools and usability of documents that employ EDN, Section 5 now provides ABNF definitions.¶
EDN is a true superset of JSON as it is defined in [STD90] in conjunction with [RFC7493] (that is, any interoperable [RFC7493] JSON text also is an EDN text), extending it both to cover the greater expressiveness of CBOR and to increase its usability.¶
EDN borrows the JSON syntax for numbers (integer and floating-point, Section 2.3), certain simple values (Section 2.7), UTF-8 [STD63] text strings, arrays, and maps (maps are called objects in JSON; the diagnostic notation extends JSON here by allowing any data item in the map key position).¶
As EDN is used for truly diagnostic purposes, its implementations MAY support generation and possibly ingestion of EDN for CBOR data items that are well-formed but not valid. It is RECOMMENDED that an implementation enables such usage only explicitly by an API flag. Validity of CBOR data items is discussed in Section 5.3 of RFC 8949 [STD94], with basic validity discussed in Section 5.3.1 of RFC 8949 [STD94], and tag validity discussed in Section 5.3.2 of RFC 8949 [STD94]. Tag validity is more likely a subject for individual application-oriented extensions, while the two cases of basic validity (for text strings and for maps) are addressed in Sections 2.4.5 and 2.5.2 under the heading of validity.¶
The rest of this section provides an overview over specific features of EDN, starting with certain common syntactical features and then going through kinds of CBOR data items roughly in the order of CBOR major types. Any additional detailed syntax discussion needed has been deferred to Section 5.1.¶
Sometimes it is useful to indicate in the diagnostic notation which of several alternative representations were actually used; for example, a data item written >1.5< by a diagnostic decoder might have been encoded as a half-, single-, or double-precision float.¶
The convention for encoding indicators is that anything starting with
an underscore and all immediately following characters that are alphanumeric or
underscore is an encoding indicator, and can be ignored by anyone not
interested in this information. For example, _
or _3
.
Encoding indicators are always
optional.¶
(In the following, an abbreviation of the form ai=
nn gives nn as
the numeric value of the field additional information, the low-order 5
bits of the initial byte: see Section 3 of RFC 8949 [STD94].)¶
An underscore followed by a decimal digit n
indicates that the
preceding item (or, for arrays and maps, the item starting with the
preceding bracket or brace) was encoded with an additional information
value of ai=
24+n
. For example, 1.5_1
is a half-precision floating-point
number, while 1.5_3
is encoded as double precision.¶
The encoding indicator _
is an abbreviation of what would in full
form be _7
, which is not used.
Therefore, an underscore _
on its own stands for indefinite length
encoding (ai=31
).
(Note that this encoding indicator is only available behind the opening
brace/bracket for map
and array
(Section 2.5.1): strings have a special syntax
streamstring
for indefinite length encoding except for the special
cases ''_ and ""_ (Section 2.4.2).)¶
The encoding indicators _0
to _3
can be used to indicate ai=24
to ai=27
, respectively.¶
Surprisingly, Section 8.1 of RFC 8949 [STD94] does not address ai=0
to
ai=23
— the assumption seems to have been that preferred serialization
(Section 4.1 of RFC 8949 [STD94]) will be used when converting CBOR
diagnostic notation to an encoded CBOR data item, so leaving out the
encoding indicator for a data item with a preferred serialization
will implicitly use ai=0
to ai=23
if that is possible.
The present specification allows making this explicit:¶
_i
("immediate") stands for encoding with ai=0
to ai=23
.¶
While no pressing use for further values for encoding indicators comes to mind, this is an extension point for EDN; Section 6.2 defines a registry for additional values.¶
Encoding Indicators are discussed in further detail in Section 2.4.2 for indefinite length strings and in Section 2.5.1 for arrays and maps.¶
In addition to JSON's decimal number literals, EDN provides hexadecimal, octal, and binary number literals in the usual C-language notation (octal with 0o prefix present only).¶
The following are equivalent:¶
4711 0x1267 0o11147 0b1001001100111¶
As are:¶
1.5 0x1.8p0 0x18p-4¶
Numbers composed only of digits (of the respective base) are
interpreted as CBOR integers (major type 0/1, or where the number
cannot be represented in this way, major type 6 with tag 2/3).
A leading "+
" sign is a no-op, and a leading "-
" sign inverts the
sign of the number.
So 0
, 000
, +0
all represent the same integer zero, as does -0
;
1
, 001
, +1
and +0001
all stand for the same integer one, and
-1
and -0001
both designate the same integer minus one.¶
Using a decimal point (.
) and/or an exponent (e
for decimal, p
for hexadecimal) turns the number into a floating point number (major
type 7) instead, irrespective of whether it is an integral number
mathematically.
Note that, in floating point numbers, 0.0
is not the same number as
-0.0
, even if they are mathematically equal.¶
The non-finite floating-point numbers Infinity
, -Infinity
, and NaN
are
written exactly as in this sentence (this is also a way they can be
written in JavaScript, although JSON does not allow them).¶
See Section 5.1, Paragraph 7, Item 3 for additional details of the EDN number syntax.¶
(Note that literals for further number formats, e.g., for representing rational numbers as fractions, or for NaNs with non-zero payloads, can be added as application-oriented literals. Background information beyond that in [STD94] about the representation of numbers in CBOR can be found in the informational document [I-D.bormann-cbor-numbers].)¶
CBOR distinguishes two kinds of strings: text strings (the bytes in the string constitute UTF-8 [STD63] text, major type 3), and byte strings (CBOR does not further characterize the bytes that constitute the string, major type 2).¶
EDN notates text strings in a form compatible to that of notating text
strings in JSON (i.e., as a double-quoted string literal), with a
number of usability extensions.
In JSON, no control characters are allowed to occur
directly in text string literals; if needed, they can be specified using
escapes such as \t
or \r
.
In EDN, string literals additionally can contain newlines (LINEFEED
U+000A), which are copied into the resulting string like other
characters in the string literal.
To deal with variability in platform presentation of newlines, any
carriage return characters (U+000D) that may be present in the EDN
string literal are not copied into the resulting string (see Section 5.1, Paragraph 7, Item 2).
No other control characters can occur directly in a string literal,
and the handling of escaped characters (\r
etc.) is as in JSON.¶
JSON's escape scheme for characters that are not on Unicode's basic
multilingual plane (BMP) is cumbersome.
EDN keeps it, but also adds the syntax \u{NNN}
where NNN is the
Unicode scalar value as a hexadecimal number.
This means the following are equivalent (the first o
is escaped as
\u{6f}
for no particular reason):¶
"D\u{6f}mino's \u{1F073} + \u{2318}" # \u{}-escape 3 chars "Domino's \uD83C\uDC73 + \u2318" # escape JSON-like "Domino's 🁳 + ⌘" # unescaped¶
EDN adds a number of ways to notate byte strings, some of which provide detailed access to the bits within those bytes (see Section 2.4.3). However, quite often, byte strings carry bytes that can be meaningfully notated as UTF-8 text. Analogously to text string literals delimited by double quotes, EDN allows the use of single quotes (without a prefix) to express byte string literals with UTF-8 text; for instance, the following are equivalent:¶
'hello world' h'68656c6c6f20776f726c64'¶
The escaping rules of JSON strings are applied equivalently for
text-based byte string literals, e.g., \\
stands for a single
backslash and \'
stands for a single quote.
(See Section 5.1, Paragraph 7, Item 7 for details.)¶
Single-quoted string literals can be prefixed by a sequence of ASCII letters and digits, starting with a letter, and using either lower case or upper case throughout. >false<, >true<, >null<, and >undefined< cannot be used as such prefixes. This means that the text string value (the "content") of the single-quoted string literal is not used directly as a byte string, but is further processed in a way that is defined by the meaning given to the prefix. Depending on the prefix, the result of that processing can, but need not be, a byte string value.¶
Prefixed string literals (which are always single-quoted after the
prefix) are used both for base-encoded byte string literals (see Section 2.4.3) and for
application-oriented extension literals (see Section 3, called app-string).
(Additional base-encoded string literals can be defined as
application-oriented extension literals by registering their prefixes;
there is no fundamental difference between the two predefined
base-encoded string literal prefixes (h
, b64
) and any such potential
future extension literal prefixes.)¶
The detailed chunk structure of byte and text strings encoded with
indefinite length can be
notated in the form (_ h'0123', h'4567') and (_ "foo", "bar").
However, for an indefinite-length string with no chunks inside, (_ )
would be ambiguous as to whether a byte string (encoded 0x5fff) or a text string
(encoded 0x7fff) is meant and is therefore not used.
The basic forms ''_
and ""_
can be used instead and are reserved for
the case of no chunks only --- not as short forms for the (permitted,
but not really useful) encodings with only empty chunks, which
need to be notated as (_ ''), (_ ""), etc.,
to preserve the chunk structure.¶
Besides the unprefixed byte string literals that are analogous to JSON text
string literals, EDN provides base-encoded byte string literals.
These are notated as prefixed string literals that carry one of the base encodings [RFC4648], without
padding, i.e., the base encoding is
enclosed in a single-quoted string literal, prefixed by >h< for base16 or
>b64< for base64 or base64url (the actual encodings of the latter do
not overlap, so the string remains unambiguous).
For example, the byte string consisting of the four bytes 12 34 56 78
(given in hexadecimal here) could be written h'12345678'
or b64'EjRWeA'
.¶
(Note that Section 8 of RFC 8949 [STD94] also mentions >b32< for base32 and >h32< for base32hex. This has not been implemented widely and therefore is not directly included in this specification. These and further byte string formats now can easily be added back as application-oriented extension literals.)¶
Examples often benefit from some blank space (spaces, line breaks) in byte strings. In EDN, blank space is ignored in prefixed byte strings; for instance, the following are equivalent:¶
h'48656c6c6f20776f726c64' h'48 65 6c 6c 6f 20 77 6f 72 6c 64' h'4 86 56c 6c6f 20776 f726c64'¶
Note that the internal syntax of prefixed single-quote literals such as h'' and b64'' can allow comments as blank space (see Section 2.1). Since slash characters are allowed in b64'', only inline comments are available in b64 string literals.¶
h'68656c6c6f20776f726c64' h'68 65 6c /doubled l!/ 6c 6f # hello 20 /space/ 77 6f 72 6c 64' /world/¶
Where a byte string is to carry an embedded CBOR-encoded item, or more
generally a sequence of zero or more such items, the diagnostic
notation for these zero or more CBOR data items, separated by commas,
can be enclosed in <<
and >>
to notate the byte string
resulting from encoding the data items and concatenating the result.
For
instance, each pair of columns in the following are equivalent:¶
<<1>> h'01' <<1, 2>> h'0102' <<"hello", null>> h'65 68656c6c6f f6' <<>> h''¶
To be valid CBOR, Section 5.3.1 of RFC 8949 [STD94] requires that text strings are byte sequences in UTF-8 [STD63] form. EDN provides several ways to construct such byte strings (see Section 5.1, Paragraph 7, Item 7 for details). These mechanisms might operate on subsequences that do not themselves constitute UTF-8, e.g., by building larger sequences out of concatenating the subsequences; for validity of a text string resulting from these mechanisms it is only of importance that the result is UTF-8. Both double-quoted and single-quoted string literals have been defined such that they lead to byte sequences that are UTF-8: the source language of EDN is UTF-8, and all escaping mechanisms lead only to adding further UTF-8 characters. Only prefixed string literals can generate non-UTF-8 byte sequences.¶
EDN borrows the JSON syntax for arrays and maps. (Maps are called objects in JSON.)¶
For maps, EDN extends the JSON syntax by allowing any data item in the map key position (before the colon).¶
JSON requires the use of a comma as a separator character between the elements of an array as well as between the members (key/value pairs) of a map. (These commas also were required in the original diagnostic notation defined in [STD94] and [RFC8610].) The separator commas are now optional in the places where EDN syntax allows commas. (Stylistically, leaving out the commas is more idiomatic when they occur at line breaks.)¶
In addition, EDN also allows, but does not require, a trailing comma before the closing bracket/brace, enabling an easier to maintain "terminator" style of their use.¶
In summary, the following eight examples are all equivalent:¶
[1, 2, 3] [1, 2, 3,] [1 2 3] [1 2 3,] [1 2, 3] [1 2, 3,] [1, 2 3] [1, 2 3,]¶
as are¶
{1: "n", "x": "a"} {1: "n", "x": "a",} {1: "n" "x": "a"} # etc.¶
A single underscore can be written after the opening brace of a map or
the opening bracket of an array to indicate that the data item was
represented in indefinite-length format. For example, [_ 1, 2]
contains an indicator that an indefinite-length representation was
used to represent the data item [1, 2]
.¶
EDN uses JSON syntax for the simple values True (>true
<), False
(>false
<), and Null (>null
<).
Undefined is written >undefined
< as in JavaScript.¶
Other simple values are given as "simple()" with the appropriate
integer in the parentheses. For example, >simple(42)
< indicates major
type 7, value 42.¶
This document extends the syntax used in diagnostic notation for byte string literals to also be available for application-oriented extensions.¶
As per Section 8 of RFC 8949 [STD94], the diagnostic notation can notate byte strings in a number of [RFC4648] base encodings, where the encoded text is enclosed in single quotes, prefixed by an identifier (»h« for base16, »b32« for base32, »h32« for base32hex, »b64« for base64 or base64url).¶
This syntax can be thought to establish a name space, with the names
"h", "b32", "h32", and "b64" taken, but other names being unallocated.
The present specification defines additional names for this namespace,
which we call application-extension identifiers.
For the quoted string, the same rules apply as for byte strings.
In particular, the escaping rules that were adapted from JSON strings
are applied
equivalently for application-oriented extensions, e.g., within the
quoted string \\
stands
for a single backslash and \'
stands for a single quote.¶
An application-extension identifier is a name consisting of a lower-case ASCII letter (a-z) and zero or more additional ASCII characters that are either lower-case letters or digits (a-z0-9).¶
Application-extension identifiers are registered in a registry (Section 6.1).¶
Prefixing a single-quoted string, an application-extension identifier is used to build an application-oriented extension literal, which stands for a CBOR data item the value of which is derived from the text given in the single-quoted string using a procedure defined in the specification for an application-extension identifier.¶
An application-extension (such as dt
) MAY also define the meaning of
a variant prefix built out of the application-extension identifier by
replacing each lower-case character by its upper-case counterpart (such
as DT
), for building an application-oriented extension literal using
that all-uppercase variant as the prefix of a single-quoted string.¶
As a convention for such definitions, using the all-uppercase variant
implies making use of a tag appropriate for this application-oriented
extension (such as tag number 1 for DT
).¶
Examples for application-oriented extensions to CBOR diagnostic notation can be found in the following sections.¶
The application-extension identifier "dt" is used to notate a date/time literal that can be used as an Epoch-Based Date/Time as per Section 3.4.2 of RFC 8949 [STD94].¶
The text of the literal is a Standard Date/Time String as per Section 3.4.1 of RFC 8949 [STD94].¶
The value of the literal is a number representing the result of a
conversion of the given Standard Date/Time String to an Epoch-Based
Date/Time.
If fractional seconds are given in the text (production
time-secfrac
in Figure 4), the value is a
floating-point number; the value is an integer number otherwise.
In the all-upper-case variant of the app-prefix, the value is enclosed
in a tag number 1.¶
As an example, the CBOR diagnostic notation¶
dt'1969-07-21T02:56:16Z', dt'1969-07-21T02:56:16.5Z', DT'1969-07-21T02:56:16Z'¶
is equivalent to¶
-14159024, -14159023.5, 1(-14159024)¶
See Section 5.2.3 for an ABNF definition for the content of dt
literals.¶
The application-extension identifier "ip" is used to notate an IP address literal that can be used as an IP address as per Section 3 of [RFC9164].¶
The text of the literal is an IPv4address or IPv6address as per Section 3.2.2 of [RFC3986].¶
With the lower-case app-string prefix ip
, the value of the literal is a
byte string representing the binary IP address.
With the upper-case app-string prefix IP
, the literal is such a byte string
tagged with tag number 54, if an IPv6address is used, or tag number
52, if an IPv4address is used.¶
As an additional case, the upper-case app-string prefix IP''
can be used
with an IP address prefix such as 2001:db8::/56
or 192.0.2.0/24
, with the equivalent tag as its value.
(Note that [RFC9164] representations of address prefixes need to
implement the truncation of the address byte string as described in
Section 4.2 of [RFC9164]; see example below.)
For completeness, the lower-case variant ip'2001:db8::/56'
or ip'192.0.2.0/24'
stands for
an unwrapped [56,h'20010db8']
or [24,h'c00002']
; however, in this case the information
on whether an address is IPv4 or IPv6 often needs to come from the context.¶
Note that there is no direct representation of the "Interface format"
defined in Section 3.1.3 of [RFC9164], an address combined with an
optional prefix length and an optional zone identifier.
This can be represented as in 52([ip'192.0.2.42',24])
, if needed.¶
Examples: the CBOR diagnostic notation¶
ip'192.0.2.42', IP'192.0.2.42', IP'192.0.2.0/24', ip'2001:db8::42', IP'2001:db8::42', IP'2001:db8::/64'¶
is equivalent to¶
h'c000022a', 52(h'c000022a'), 52([24,h'c00002']), h'20010db8000000000000000000000042', 54(h'20010db8000000000000000000000042'), 54([64,h'20010db8'])¶
See Section 5.2.4 for an ABNF definition for the content of ip
literals.¶
In some cases, an EDN consumer cannot construct actual CBOR items that represent the CBOR data intended for eventual interchange. This document defines stand-in representation for two such cases:¶
The EDN consumer does not know (or does not implement) an application-extension identifier used in the EDN document (Section 4.1) but wants to preserve the information for a later processor.¶
The generator of some EDN intended for human consumption (such as in a specification document) may not want to include parts of the final data item, destructively replacing complete subtrees or possibly just parts of a lengthy string by elisions (Section 4.2).¶
Implementation note: Typically, the ultimate applications will fail if they encounter tags unknown to them, which the ones defined in this section likely are. Where chains of tools are involved in processing EDN, it may be useful to fail earlier than at the ultimate receiver in the chain unless specific processing options (e.g., command line flags) are given that indicate which of these stand-ins are expected at this stage in the chain.¶
When ingesting CBOR diagnostic notation, any application-oriented extension literals are usually decoded and transformed into the corresponding data item during ingestion. If an application-extension is not known or not implemented by the ingesting process, this is usually an error and processing has to stop.¶
However, in certain cases, it can be desirable to exceptionally carry an uninterpreted application-oriented extension literal in an ingested data item, allowing to postpone its decoding to a specific later stage of ingestion.¶
This specification defines a CBOR Tag for this purpose:
The Diagnostic Notation Unresolved Application-Extension Tag, tag
number CPA999 (Section 6.5).
The content of this tag is an array of two text strings: The
application-extension identifier, and the (escape-processed) content
of the single-quoted string.
For example, dt'1969-07-21T02:56:16Z'
can be provisionally represented as
/CPA/ 999(["dt", "1969-07-21T02:56:16Z"])
.¶
If a stage of ingestion is not prepared to handle the Unresolved Application-Extension Tag, this is an error and processing has to stop, as if this stage had been ingesting an unknown or unimplemented application-extension literal itself.¶
RFC-Editor: This document uses the CPA (code point allocation) convention described in [I-D.bormann-cbor-draft-numbers]. For each usage of the term "CPA", please remove the prefix "CPA" from the indicated value and replace the residue with the value assigned by IANA; perform an analogous substitution for all other occurrences of the prefix "CPA" in the document. Finally, please remove this note.¶
When using EDN for exposition in a document or on a whiteboard, it is often useful to be able to leave out parts of an EDN document that are not of interest at that point of the exposition.¶
To facilitate this, this specification
supports the use of an ellipsis (notated as three or more dots
in a row, as in ...
) to indicate parts of an EDN document that have
been elided (and therefore cannot be reconstructed).¶
Upon ingesting EDN as a representation of a CBOR data item for further processing, the occurrence of an ellipsis usually is an error and processing has to stop.¶
However, it is useful to be able to process EDN documents with ellipses in the automation scripts for the documents using them. This specification defines a CBOR Tag that can be used in the ingestion for this purpose: The Diagnostic Notation Ellipsis Tag, tag number CPA888 (Section 6.5). The content of this tag either is¶
null (indicating a data item entirely replaced by an ellipsis), or it is¶
an array, the elements of which are alternating between fragments of a string and the actual elisions, represented as ellipses carrying a null as content.¶
Elisions can stand in for entire subtrees, e.g. in:¶
[1, 2, ..., 3] { "a": 1, "b": ..., ...: ... }¶
A single ellipsis (or key/value pair of ellipses) can imply eliding multiple elements in an array (members in a map); if more detailed control is required, a data definition language such as CDDL can be employed. (Note that the stand-in form defined here does not allow multiple key/value pairs with an ellipsis as a key: the CBOR data item would not be valid.)¶
Subtree elisions can be represented in a CBOR data item by using
/CPA/888(null)
as the stand-in:¶
[1, 2, 888(null), 3] { "a": 1, "b": 888(null), 888(null): 888(null) }¶
Elisions also can be used as part of a (text or byte) string:¶
{ "contract": "Herewith I buy" + ... + "gned: Alice & Bob", "signature": h'4711...0815', }¶
The example "contract" combines string concatenation via the +
operator (Section 5.1) with
ellipses; while the example
"signature" uses special syntax that allows the use of ellipses
between the bytes notated inside h''
literals.¶
String elisions can be represented in a CBOR data item by a stand-in that wraps an array of string fragments alternating with ellipsis indicators:¶
{ "contract": /CPA/888(["Herewith I buy", 888(null), "gned: Alice & Bob"]), "signature": 888([h'4711', 888(null), h'0815']), }¶
Note that the use of elisions is different from "commenting out" EDN text, e.g.:¶
{ "signature": h'4711/.../0815', # ...: ... }¶
The consumer of this EDN will ignore the comments and therefore will have no idea after ingestion that some information has been elided; validation steps may then simply fail instead of being informed about the elisions.¶
This section collects grammars in ABNF form ([STD68] as extended in [RFC7405]) that serve to define the syntax of EDN and some application-oriented literals.¶
Implementation note: The ABNF definitions in this section are intended to be useful in a Parsing Expression Grammar (PEG) parser interpretation (see Appendix A of [RFC8610] for an introduction into PEG).¶
This subsection provides an overall ABNF definition for the syntax of CBOR extended diagnostic notation.¶
For simplicity, the internal parsing for the built-in EDN prefixes is
specified in the same way.
ABNF definitions for h''
and b64''
are provided in Section 5.2.1 and
Section 5.2.2.
However, the prefixes b32''
and h32''
are not in wide use and an
ABNF definition in this document could therefore not be based on
implementation experience.¶
While an ABNF grammar defines the set of character strings that are considered to be valid EDN by this ABNF, the mapping of these character strings into the generic data model of CBOR is not always obvious.¶
The following additional items should help in the interpretation:¶
As mentioned in the terminology (Section 1.2), the ABNF terminal values in this document define Unicode scalar values (characters) rather than their UTF-8 encoding. For example, the Unicode PLACE OF INTEREST SIGN (U+2318) would be defined in ABNF as %x2318.¶
Unicode CARRIAGE RETURN (U+000D, often seen escaped as "\r" in many
programming languages) that exist in the input (unescaped) are
ignored as if they were not in the input wherever they appear.
This is most important when they are found in (text or byte) string
contexts (see the "unescaped" ABNF rule).
On some platforms, a carriage return is always added in front of a
LINE FEED (U+000A, also often seen escaped as "\n" in many
programming languages), but on other platforms, carriage returns are
not used at line breaks.
The intent behind ignoring unescaped carriage returns is to ensure
that input generated or processed on either of these kinds of
platforms will generate the same bytes in the CBOR data items
created from that input.
(Platforms that use just a CARRIAGE RETURN to signify an end of line
are no longer relevant and the files they produce are out of scope
for this document.)
If a carriage return is needed in the CBOR data item, it can be
added explicitly using the escaped form \r
.¶
decnumber
stands for an integer in the usual decimal notation, unless at
least one of the optional parts starting with "." and "e" are
present, in which case it stands for a floating point value in the
usual decimal notation. Note that the grammar now allows 3.
for
3.0
and .3
for 0.3
(also for hexadecimal floating point
below); implementers are advised that some platform numeric parsers
accept only a subset of the floating point syntax in this document
and may require some preprocessing to use here.¶
hexint
, octint
, and binint
stand for an integer in the usual base 16/hexadecimal
("0x"), base 8/octal ("0o"), or base 2/binary ("0b") notation.
hexfloat
stands
for a floating point number in the usual hexadecimal notation (which
uses a mantissa in hexadecimal and an exponent in decimal notation,
see Section 5.12.3 of [IEEE754], Section 6.4.4.2 of [C], or Section
5.13.4 of [Cplusplus]; floating-suffix/floating-point-suffix from
the latter two is not used here).¶
For hexint
, octint
, binint
, and when decnumber
stands for an integer, the
corresponding CBOR data item is represented using major type 0 or 1
if possible, or using tag 2 or 3 if not.
In the latter case, this specification does not define any encoding
indicators that apply.
If fine control over encoding is desired, this can be expressed by
being explicit about the representation as a tag:
E.g., 987654321098765432310
, which is equivalent to 2(h'35 8a 75
04 38 f3 80 f5 f6')
in its preferred serialization, might be
written as 2_3(h'00 00 00 35 8a 75 04 38 f3 80 f5 f6'_1)
if
leading zeros need to be added during serialization to obtain
specific sizes for tag head, byte string head, and the overall byte
string.¶
When decnumber
stands for a floating point value, and for
hexfloat
and nonfin
, a floating point data item with major
type 7 is used in preferred serialization (unless modified by an
encoding indicator, which then needs to be _1
, _2
, or _3
).
For this, the number range needs to fit into an [IEEE754] binary64 (or the size
corresponding to the encoding indicator), and the precision will be
adjusted to binary64 before further applying preferred serialization
(or to the size corresponding to the encoding indicator).
Tag 4/5 representations are not generated in these cases.
Future app-prefixes could be defined to allow more control for
obtaining a tag 4/5 representation directly from a hex or decimal
floating point literal.¶
spec
stands for an encoding indicator.
See Section 2.2 for details.¶
Extended diagnostic notation allows a (text or byte) string to be
built up from multiple (text or byte) string literals, separated by
a +
operator; these are then concatenated into a single string.¶
string
, string1e
, string1
, and ellipsis
realize: (1) the
representation of strings in this form split up into multiple
chunks, and (2) the use of ellipses to represent elisions
(Section 4.2).¶
Note that the syntax defined here for concatenation of components
uses an explicit +
operator between the components to be
concatenated (Appendix G.4 of [RFC8610] used simple juxtaposition,
which was not widely implemented and got in the way of making the use
of commas optional in other places via the rule OC
).¶
Text strings and byte strings do not mix within such a
concatenation, except that byte string literal notation can be used
inside a sequence of concatenated text string notation literals, to
encode characters that may be better represented in an encoded way.
The following four text string values (adapted from Appendix G.4 of [RFC8610] by updating to explicit +
operators) are equivalent:¶
"Hello world" "Hello " + "world" "Hello" + h'20' + "world" "" + h'48656c6c6f20776f726c64' + ""¶
Similarly, the following byte string values are equivalent:¶
'Hello world' 'Hello ' + 'world' 'Hello ' + h'776f726c64' 'Hello' + h'20' + 'world' '' + h'48656c6c6f20776f726c64' + '' + b64'' h'4 86 56c 6c6f' + h' 20776 f726c64'¶
The semantic processing of these constructs is governed by the following rules:¶
A single ...
is a general ellipsis, which by itself can stand
for any data item.
Multiple adjacent concatenated ellipses are equivalent to a single
ellipsis.¶
An ellipsis can be concatenated (on one or both sides) with string
chunks (string1
); the result is a CBOR tag number CPA888 that contains an
array with joined together spans of such chunks plus the ellipses
represented by 888(null)
.¶
If there is no ellipsis in the concatenated list, the result of processing the list will always be a single item.¶
The bytes in the concatenated sequence of string chunks are simply joined together, proceeding from left to right. If the left hand side of a concatenation is a text string, the joining operation results in a text string, and that result needs to be valid UTF-8. If the left hand side is a byte string, the right hand side also needs to be a byte string.¶
Some of the strings may be app-strings. If the result type of the app-string is an actual (text or byte) string, joining of those string chunks occurs as with chunks directly notated as string literals; otherwise the occurrence of more than one app-string or an app-string together with a directly notated string cannot be processed.¶
This subsection provides ABNF definitions for the content of
application-oriented extension literals defined in [STD94] and in this
specification.
These grammars describe the decoded content of the sqstr
components that
combine with the application-extension identifiers used as prefixes to form
application-oriented extension literals.
Each of these may make integrate ABNF rules defined in Figure 1,
which are not always repeated here.¶
The syntax of the content of byte strings represented in hex,
such as h''
, h'0815'
, or h'/head/ 63 /contents/ 66 6f 6f'
(another representation of << "foo" >>
), is described by the ABNF in Figure 2.
This syntax accommodates both lower case and upper case hex digits, as
well as blank space (including comments) around each hex digit.¶
The syntax of the content of byte strings represented in base64 is described by the ABNF in Figure 2.¶
This syntax allows both the classic (Section 4 of [RFC4648]) and the URL-safe (Section 5 of [RFC4648]) alphabet to be used. It accommodates, but does not require base64 padding. Note that inclusion of classic base64 makes it impossible to have in-line comments in b64, as "/" is valid base64-classic.¶
The syntax of the content of dt
literals can be described by the
ABNF for date-time
from [RFC3339] as summarized in Section 3 of [RFC9165]:¶
The syntax of the content of ip
literals can be described by the
ABNF for IPv4address
and IPv6address
in Section 3.2.2 of [RFC3986],
as included in slightly updated form in Figure 5.¶
RFC Editor: please replace RFC-XXXX with the RFC number of this RFC, [IANA.cbor-diagnostic-notation] with a reference to the new registry group, and remove this note.¶
IANA is requested to create an "Application-Extension Identifiers" registry in a new "CBOR Diagnostic Notation" registry group [IANA.cbor-diagnostic-notation], with the policy "expert review" (Section 4.5 of RFC 8126 [BCP26]).¶
The experts are instructed to be frugal in the allocation of application-extension identifiers that are suggestive of generally applicable semantics, keeping them in reserve for application-extensions that are likely to enjoy wide use and can make good use of their conciseness. The expert is also instructed to direct the registrant to provide a specification (Section 4.6 of RFC 8126 [BCP26]), but can make exceptions, for instance when a specification is not available at the time of registration but is likely forthcoming. If the expert becomes aware of application-extension identifiers that are deployed and in use, they may also initiate a registration on their own if they deem such a registration can avert potential future collisions.¶
Each entry in the registry must include:¶
a lower case ASCII [STD80] string that starts with a letter and can
contain letters and digits after that ([a-z][a-z0-9]*
). No other
entry in the registry can have the same application-extension identifier.¶
a brief description¶
a reference document that provides a description of the application-extension identifier¶
The initial content of the registry is shown in Table 1; all initial entries have the Change Controller "IETF".¶
Application-extension Identifier | Description | Reference |
---|---|---|
h | Reserved | RFC8949 |
b32 | Reserved | RFC8949 |
h32 | Reserved | RFC8949 |
b64 | Reserved | RFC8949 |
false | Reserved | RFC-XXXX |
true | Reserved | RFC-XXXX |
null | Reserved | RFC-XXXX |
undefined | Reserved | RFC-XXXX |
dt | Date/Time | RFC-XXXX |
ip | IP Address/Prefix | RFC-XXXX |
IANA is requested to create an "Encoding Indicators" registry in the newly created "CBOR Diagnostic Notation" registry group [IANA.cbor-diagnostic-notation], with the policy "specification required" (Section 4.6 of RFC 8126 [BCP26]).¶
The experts are instructed to be frugal in the allocation of encoding indicators that are suggestive of generally applicable semantics, keeping them in reserve for encoding indicator registrations that are likely to enjoy wide use and can make good use of their conciseness. If the expert becomes aware of encoding indicators that are deployed and in use, they may also solicit a specification and initiate a registration on their own if they deem such a registration can avert potential future collisions.¶
Each entry in the registry must include:¶
an ASCII [STD80] string that starts with an underscore letter and
can contain zero or more underscores, letters and digits after that
(_[_A-Za-z0-9]*
). No other entry in the registry can have the same
Encoding Indicator.¶
a brief description.
This description may employ an abbreviation of the form ai=
nn,
where nn is the numeric value of the field additional information, the
low-order 5 bits of the initial byte (see Section 3 of RFC 8949 [STD94]).¶
a reference document that provides a description of the application-extension identifier¶
The initial content of the registry is shown in Table 2; all initial entries have the Change Controller "IETF".¶
Encoding Indicator | Description | Reference |
---|---|---|
_ | Indefinite Length Encoding (ai=31) | RFC8949, RFC-XXXX |
_i | ai=0 to ai=23 | RFC-XXXX |
_0 | ai=24 | RFC8949, RFC-XXXX |
_1 | ai=25 | RFC8949, RFC-XXXX |
_2 | ai=26 | RFC8949, RFC-XXXX |
_3 | ai=27 | RFC8949, RFC-XXXX |
IANA is requested to add the following Media-Type to the "Media Types" registry [IANA.media-types].¶
Name | Template | Reference |
---|---|---|
cbor-diagnostic | application/cbor-diagnostic | RFC-XXXX, Section 6.3 |
application¶
cbor-diagnostic¶
N/A¶
N/A¶
binary (UTF-8)¶
none¶
Section 6.3 of RFC XXXX¶
Tools interchanging a human-readable form of CBOR¶
The syntax and semantics of fragment identifiers is as specified for "application/cbor". (At publication of RFC XXXX, there is no fragment identification syntax defined for "application/cbor".)¶
CBOR WG mailing list ([email protected]), or IETF Applications and Real-Time Area ([email protected])¶
LIMITED USE¶
CBOR diagnostic notation represents CBOR data items, which are the format intended for actual interchange. The media type application/cbor-diagnostic is intended to be used within documents about CBOR data items, in diagnostics for human consumption, and in other representations of CBOR data items that are necessarily text-based such as in configuration files or other data edited by humans, often under source-code control.¶
IETF¶
no¶
IANA is requested to register a Content-Format number in the "CoAP Content-Formats" sub-registry, within the "Constrained RESTful Environments (CoRE) Parameters" Registry [IANA.core-parameters], as follows:¶
Content-Type | Content Coding | ID | Reference |
---|---|---|---|
application/cbor-diagnostic | - | TBD1 | RFC-XXXX |
TBD1 is to be assigned from the space 256..9999, according to the procedure "IETF Review or IESG Approval", preferably a number less than 1000.¶
RFC-Editor: This document uses the CPA (code point allocation) convention described in [I-D.bormann-cbor-draft-numbers]. For each usage of the term "CPA", please remove the prefix "CPA" from the indicated value and replace the residue with the value assigned by IANA; perform an analogous substitution for all other occurrences of the prefix "CPA" in the document. Finally, please remove this note.¶
In the "CBOR Tags" registry [IANA.cbor-tags], IANA is requested to assign the tags in Table 5 from the "specification required" space (suggested assignments: 888 and 999), with the present document as the specification reference.¶
Tag | Data Item | Semantics | Reference |
---|---|---|---|
CPA888 | null or array | Diagnostic Notation Ellipsis | RFC-XXXX |
CPA999 | array | Diagnostic Notation Unresolved Application-Extension |
RFC-XXXX |
The security considerations of [STD94] and [RFC8610] apply.¶
The EDN specification provides two explicit extension points, application-extension identifiers (Section 6.1) and encoding indicators (Section 6.2). Extensions introduced this way can have their own security considerations (see, e.g., Section 5 of [I-D.ietf-cbor-edn-e-ref]). When implementing tools that support the use of EDN extensions, the implementer needs to be careful not to inadvertently introduce a vector for an attacker to invoke extensions not planned for by the tool operator, who might not have considered security considerations of specific extensions such as those posed by their use of dereferenceable identifiers (Section 6 of [I-D.bormann-t2trg-deref-id]). For instance, tools might require explicitly enabling the use of each extension that is not on an allowlist. This task can possibly be made less onerous by combining it with a mechanism for supplying any parameters controlling such an extension.¶
This appendix is for information.¶
EDN was designed as a language to provide a human-readable representation of an instance, i.e., a single CBOR data item or CBOR sequence. CDDL was designed as a language to describe an (often large) set of such instances (which itself constitutes a language), in the form of a data definition or grammar (or sometimes called schema).¶
The two languages share some similarities, not the least because they have mutually inspired each other. But they have very different roots:¶
EDN syntax is an extension to JSON syntax [STD90]. (Any (interoperable) JSON text is also valid EDN.)¶
For engineers that are using both EDN and CDDL, it is easy to write "CDDLisms" or "EDNisms" into their drafts that are meant to be in the other language. (This is one more of the many motivations to always validate formal language instances with tools.)¶
Important differences include:¶
Comment syntax. CDDL inherits ABNF's semicolon-delimited end of
line characters, while EDN finds nothing in JSON that could be inherited here.
Inspired by JavaScript, EDN simplifies JavaScript's copy of the
original C comment syntax to be delimited by single slashes (where
line breaks are not of interest); it also adds end-of-line comments
starting with #
.¶
Syntax for tags. CDDL's tag syntax is part of the system for referring to CBOR's fundamentals (the major type 6, in this case) and (with [I-D.ietf-cbor-update-8610-grammar]) allows specifying the actual tag number separately, while EDN's tag syntax is a simple decimal number and a pair of parentheses.¶
Embedded CBOR. EDN has a special syntax to describe the content of byte strings that are encoded CBOR data items. CDDL can specify these with a control operator, which looks very different.¶
The concept of application-oriented extensions to diagnostic notation, as well as the definition for the "dt" extension, were inspired by the CoRAL work by Klaus Hartke.¶
(TBD)¶
2.1. Comments
For presentation to humans, EDN text may benefit from comments. JSON famously does not provide for comments, and the original diagnostic notation in Section 6 of [RFC7049] inherited this property.¶
EDN now provides two comment syntaxes, which can be used where the syntax allows blank space (outside of constructs such as numbers, string literals, etc.):¶
inline comments, delimited by slashes ("
/
"):¶In a position that allows blank space, any text within and including a pair of slashes is considered blank space (and thus effectively a comment).¶
end-of-line comments, delimited by "
#
" and an end of line (LINE FEED, U+000A):¶In a position that allows blank space, any text within and including a pair of a "
#
" and the end of the line is considered blank space (and thus effectively a comment).¶Comments can be used to annotate a CBOR structure as in:¶
or, combining the use of inline and end-of-line comments:¶