Internet-Draft | Packed CBOR: Table permutations | July 2024 |
Amsüss | Expires 30 January 2025 | [Page] |
Packed CBOR is a compression mechanism for Concise Binary Object Representation (CBOR) that can be used without a decompression step. This document introduces a means for altering configured tables to optimize the use of efficient encoding points by shuffling around entries in the packing tables.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://chrysn.codeberg.page/packed-shuffle/draft-amsuess-cbor-packed-by-reference.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-amsuess-cbor-packed-shuffle/.¶
Discussion of this document takes place on the CBOR Working Group mailing list (mailto:[email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/cbor/. Subscribe at https://www.ietf.org/mailman/listinfo/cbor/.¶
Source for this draft and an issue tracker can be found at https://codeberg.org/chrysn/packed-shuffle.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 30 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
[ See abstract. ]¶
This document uses the "CPA" convention of [I-D.bormann-cbor-draft-numbers]. If it reaches the RFC editing stage, as IANA processes the IANA Considerations, this note is to be removed, and all occurrences of "CPA" will have been replaced with allocated numbers.¶
CBOR tag CPA115 describes a permutation of the CBOR packing tables around a rump. Unlike tags CPA113 and most other table setup tags, it does not add items; instead, it modifies the table such that items with very short encoded lengths (most efficient are the first 16 items of the shared table, and the first item of the argument items in straight use, and the first 8 items of the inverted items) can be used even when they were not originally assigned to those positions.¶
Without excluding others, there are two groups of use cases envisioned:¶
Rearranging preconfigured tables.¶
If a media type comes with a pre-configured table, and/or when one ore more external tables are referenced (e.g. by using [I-D.amsuess-cbor-packed-by-reference]), a single tag CPA115 can be used around the main rump of the document to pull the most frequently used entries into the prominent positions.¶
Local optimization.¶
If some table entries are frequent in a sub-tree of the document, using tag CPA115 around that subtree can be beneficiall to overall compactness even when the table was set up locally (using tag CPA113) with an ordering suitable for the whole document.¶
Packed-Shuffle = #6.<cpa115>([ shuffle-shared, ?shuffle-argument, rump]) rump = any shuffle-shared = shuffle shuffle-argument = shuffle shuffle = [+(offset, ?length)] offset = uint length = nint cpa115 = 115 ; preliminary value, see IANA considerations¶
Inside the rump, each table has a permutation applied compared to the outer table,
described in the suffle-shared
value for the shared table,
and in the shuffle-argument
value for the argument table (which defaults to the identical permutation).¶
The applied permutations are described in inverse
(so that a table index used in the rump gets mapped to a table index outside this tag):
The item shuffle
consecutively lists the lowest items of the inner table
by indicating their postions in the outer table.
A negative number indicates that a group of in total (1 - length) items are included
consecutively starting with the item before the negative number.
This creates a kind of run-length encoding.
At the end of the shuffle
array,
all items that were not referenced
are appended in their original sequence
from the outer table.¶
An inner index can be calculated in a way suitable to constrained systems by applying the algorighm in Appendix A.¶
[ I don't think there's anything to add? ]¶
The security considerations of [I-D.ietf-cbor-packed] apply.¶
The following algorithm illustrates how table lookup can be implemented without the need for variable size storage per compression table.¶
An implementation in fixed memory is expected to accommodate up to a fixed size of nested table setup tags.
When parsing the shuffle item,
if may calculate early_item_count
and store it for later use,
along with a reference to the position of the shuffle table,
through which it iterates again for every item to be unpacked.¶
def early_item_count(shuffle): count = 0 for (offset, length) in shuffle: length = 1 if length is None else 1 - length count += length return count def lookup(index, shuffle, outertable): shuffle_early_items = early_item_count(shuffle) if index < shuffle_early_items: for (offset, length) in shuffle: length = 1 if length is None else 1 - length if index < length: return outertable[offset + index] index -= length else: outer_index = index - shuffle_early_items for (offset, length) in shuffle: length = 1 if length is None else 1 - length if offset <= outer_index: outer_index += length return outertable[outer_index]¶
For this to progress it will need some real use cases, not just synthetic examples.¶
An unoptimized version of this can be achieved with tag CPA113 from [I-D.ietf-cbor-packed], without the numeric encoding of items and the special encoding for consecutive items. Using CPA113 this way is relatively easy for shared items; for argument items, it requires referencing them from the shared table or performing a no-op concatenation with an empty item.¶
Is skipping of used values worth it? If we accepted that this tag produced not a permutation but an extended copy of the outer table, we could do without the double iteration / pre-computation of the number of items. This would be a simplification, and more similar to using CPA113. The advantage of the permutation construction is that the non-optimized entries are not pushed back more than necessary, saving a bit (in some cases, literally) in the encoded size of those points.¶
Is there maybe a different expression of the permutation that has the same nice properties added complexity? (Stating the number of items in front of the shuffle item saves the double processing, but adds redundant information and does not help with the branching in the implementation).¶
Whether this document should progress on from the initial version should depend on whether good use cases are identified and whether it outperforms alternative solutions by a sufficient margin.¶