On Encodings

  1. Encodings, In Detail
    1. Compact Integer Encodings
      1. Data Model Encodings
        1. path
        2. Capabilities
          1. Private Encodings
            1. Path Extends Path
              1. Area in Area
                1. Communal Capability

                Status: Proposal (as of 17.01.2024)

                Those encodings referenced from the Meadowcap specification have status Candidate.

                alj: TODO: improve the TOC rendering (styling, top, Home?)alj: TODO: fix preview iframe heightA perhaps curious feature of the Willow data model is that its specification does not talk about encodings. We11Let’s be honest: Aljoscha strongly believe that specifications should concern themselves with purely logical data types as long as possible, treating encodings as a minor and ultimately interchangeable detail. When specifications define concepts in terms of their encodings, results usually end up miserably underspecified (see JSON) or full of incidental complexity (see XML).

                Nevertheless, protocols that deal with persistent storage and network transmissions eventually have to serialise data. In this document, we give both some generic definitions around arbitrary encodings, and some specific encodings that recur throughout the Willow family of specifications.

                Encodings, In Detail

                The Willow protocols are generic over user-supplied parameters. Invariably, some values of the user-supplied types need to be encoded, so there also have to be user-supplied definitions of how to encode things. Hence, we need to precisely specify some properties and terminology around encodings.

                Let be a set of values we want to encode. Then an encoding relationIn mathy terms: an encoding relation is a left-unique, left-total binary relation on which defines a prefix code. assigns to each value in at least one (but possibly many) finite bytestrings, called the codes of that value. The assignment must satisfy the following properties:

                • each bytestring is the code of at most one value in , and
                • no code is a prefix of any other code.

                An encoding relation in which every value has exactly one corresponding codeThe one-to-one mapping between values and codes lets us deterministically hash or sign values. is called an encoding function.

                Quite often, we define an encoding relation for some set of values, and then specify a subset of its codes to obtain an encoding function. We call such a specialisation to a function a canonic subset or encoding.

                A nice technique for achieving small codes is to not encode a value itself, but to encode merely how a value differs from some reference value. This requires both the party doing the encoding and the party doing the decoding to know that reference value, of course. We formally capture this notion in the concepts of relative encoding relationsSince the definitions require rather careful parsing, they are followed by an example. and relative encoding functions:

                alj: TODO: fix font selection in Math post punctuationLet be a set of values we want to encode relative to some set of values . For each in let denote the set of values relative to which can be encoded. Then a relative encoding relation assigns to each pair of an in and an in at least one (but possibly many) finite bytestrings, called the -relative codes of . For every fixed in the assignment must satisfy the following properties:

                If there is exactly one -relative code for every in and every in we call the relative encoding relation a relative encoding function.

                A toy example: we can encode unsigned bytes (natural numbers between and inclusive) relative to other unsigned bytes by encoding the difference between them as a bytestring of length one (the difference as a single unsigned byte). For simplicity, we allow this only when the difference is non-negative. So the -relative code of would be the byte 0x05 (because ). It would not be possible to encode relative to however, since is a negative number.

                In this example, both and are the set of unsigned bytes. For any unsigned byte is the set of unsigned bytes greater than or equal to For any fixed unsigned byte all -relative code are unique (which also implies prefix-freeness, since all codes have the same length), so we have indeed defined a valid relative encoding relation. Note that codes might be duplicated across different — the -relative code of and the -relative code of are both 0x01, for example — but this is perfectly ok.

                Compact Integer Encodings

                In various places, we need to encode U64 values. When we expect small values to be significantly more common than large values, we can save space by using a variable-length encoding that encodes smaller numbers in fewer bytes than larger numbers. In this section, we define some building blocks for doing just that.

                The basic idea is to use a tag — a small bitstring — to encode how many bytes will be used for encoding the actual number. We allow for actual encodings of either one, two, four, or eight bytes. To indicate those four options, we need tags of at least two bits. We can also allot more bits to a tag. When we do, the four greatest numbers that can be represented in the tag indicate the four possible encoding widths in bytes. Using any smaller number as the tag indicates that the tag itself is the U64.

                Some examples before we give a formal definition: when using a four-bit tag, there are five different ways of encoding the number

                • the tag can be ( in binary), indicating an encoding in eight additional bytes, or
                • the tag can be ( in binary), indicating an encoding in four additional bytes, or
                • the tag can be ( in binary), indicating an encoding in two additional bytes, or
                • the tag can be ( in binary), indicating an encoding in one additional byte, or
                • the tag can be ( in binary), indicating an encoding in zero additional bytes.

                When using a two-bit tag for the number , there are only three options:

                • the tag can be ( in binary), indicating an encoding in eight additional bytes, or
                • the tag can be ( in binary), indicating an encoding in four additional bytes, or
                • the tag can be ( in binary), indicating an encoding in two additional bytes.

                Now for the formal definitions:

                alj: Fix colors and fonts in defs, refs, and math mode thereof.Let n be a U64, and let tag_width be a natural number between two and eight inclusive. Then the natural number t is a (valid) tag for n of width tag_width in all of the following cases:

                Note that there might be multiple tags to choose from for any given combination of a U64 and a width, but once the tag is selected, the corresponding compact U64 encoding is unique.For any U64 n and any tag t for n of width tag_width, the corresponding compact U64 encoding is:

                • the unsigned little-endian 8-byte encoding of n if
                • the unsigned little-endian 4-byte encoding of n if
                • the unsigned little-endian 2-byte encoding of n if
                • the unsigned little-endian 1-byte encoding of n if and
                • the empty string otherwise.

                Examples: the minimal tag for of width four is and the minimal tag for of width two is We call a tag minimal for some given U64 and width if the corresponding compact U64 encoding is shorter than that for any other valid tag for the same U64 and width. This notion coincides with being the numerically least tag for a given U64 and width.

                Data Model Encodings

                We now list some useful encodings for the types of the Willow data model. We assume availability of encoding functions for various parameters of Willow:

                Path Encoding

                We define an encoding relation EncodePath for Path. The codes in EncodePath for any Path val are the bytestrings that are concatenations of the following form:

                BitsBig-Endian Bitfield
                0 – 3A tag of width for the sum of the lengths of the Components of val.
                4 – 7A tag of width for the number of Components of val.
                The corresponding compact U64 encoding for bits 0 – 3.
                The corresponding compact U64 encoding for bits 4 – 7.
                For every Component comp of val but the final one, a concatenation of the following form:
                BitsBig-Endian Bitfield
                0 – 7A tag of width for the length of comp.
                The corresponding compact U64 encoding for bits 0 – 7.
                The raw bytes of comp.
                The length of the final Component can be reconstructed from the total length and the lengths of all prior Components.The raw bytes of the final Component of val, or the empty string if val has zero components.

                We define the encoding function encode_path as the canonic subset of EncodePath obtained by using only minimal tags.

                alj: Either make this visually more pleasing or remove it again.An example: encoding the Path blogideasfun with encode_path yields

                0C 03 00 04 62 6C 6F 67 00 05 69 64 65 61 73 66 75 6E, because


                We define a relative encoding relation EncodePathRelativePath for any Path relative to any Path. Let val be any Path, and let rel be any Path.

                Let prefix_count be a natural number such that the first prefix_count Components of val are equal to the first prefix_count Components of rel.

                Then the codes in EncodePathRelativePath for val relative to rel are the bytestrings that are concatenations of the following form:

                BitsBig-Endian Bitfield
                0 – 7A tag of width for prefix_count.
                The corresponding compact U64 encoding for bits 0 – 7.
                Any code in EncodePath for the difference from rel to val.

                We define the relative encoding function path_rel_path as the canonic subset of EncodePathRelativePath obtained by


                We define a relative encoding relation EncodePathExtendsPath for any Path relative to any Path which is a prefix of val. The codes in EncodePathExtendsPath for any Path val relative to any Path which is a prefix of val rel are the bytestrings that are concatenations of the following form:

                Any code in EncodePath for the difference from rel to val.

                We define the relative encoding function path_extends_path as the canonic subset of EncodePathExtendsPath obtained by using only minimal tags, and using only the canonic subsets of all sub-encodings.

                Capabilities

                Encodings for Meadowcap and McSubspaceCapabilities.alj: fix Meadowcap link coloralj: TODO add hsections to make for a useful TOC

                alj: TODO mc subspace cap absolute


                alj: TODO mc cap absolute


                We define a relative encoding relation EncodeMcSubspaceCapabilityRelativePrivateInterest for any McSubspaceCapability relative to any PersonalPrivateInterest with rel.private_interest.subspace_id == any and rel.private_interest.namespace_id == val.namespace_key. The codes in EncodeMcSubspaceCapabilityRelativePrivateInterest for any McSubspaceCapability val relative to any fitting PersonalPrivateInterest rel are the bytestrings that are concatenations of the following form:

                BitsBig-Endian Bitfield
                0 – 7A tag of width for the number of pairs in val.delegations.
                The corresponding compact U64 encoding for bits 0 – 7.
                Any code in encode_namespace_sig for val.initial_authorisation.
                For every pair (pk, sig) of val.delegations but the final one, a concatenation of the following form:alj: TODO fix styling of nested encoding without bitfields
                Any code in encode_user_pk for pk.
                Any code in encode_user_sig for sig.

                Private Encodings

                We now define some relative encodings which take care to not reveal certain parts of the values being encoded. We use these in the Private Interest Intersection parts of the invalid-reference.

                Private Path Encoding

                We start with a building block for more complex private encodings: the ability to encode a Path relative to one of its prefixes, while keeping secret all Components that coincide with a third Path.

                The context necessary to privately encode Paths.
                 
                The Path whose Components are to be kept private.
                 
                The prefix relative to which we encode.
                 
                rel: Path,
                }

                We define a relative encoding relation EncodePrivatePathExtendsPath for any Path relative to any PrivatePathContext such that rel.rel is a prefix of val and rel.private is related to val. Let val be any Path, and let rel be any fitting PrivatePathContext.

                Let val_count be the number of components in val. Let rel_count be the number of components in rel.rel. Let private_count be the number of components in rel.private.

                Then the codes in EncodePrivatePathExtendsPath for val relative to rel are the bytestrings that are concatenations of the following form:

                Private Area Encoding

                Next, we build up to private Area encoding: we encode an Area that is included in another Area, while keeping secret a PrivateInterest which includes both.

                The context necessary to privately encode Areas.
                 
                The PrivateInterest to be kept private.
                 
                The containing Area relative to which we encode.
                 
                rel: Area,
                }

                We define a relative encoding relation EncodePrivateAreaInArea for any Area relative to any PrivateAreaContext such that rel.rel includes val and rel.private includes to rel. Let val be any Area, and let rel be any fitting PrivateAreaContext.

                Let start_from_start and end_from_start be arbitrary Bools. If rel.rel.times.end is open, they must both be true.

                Then the codes in EncodePrivateAreaInArea for val relative to rel are the bytestrings that are concatenations of the following form:

                BitsBig-Endian Bitfield
                01 iff val.subspace_id != rel.rel.subspace_id
                11 iff val.subspace_id == open
                21 iff start_from_start
                31 iff end_from_start
                4, 5A tag of width for either val.times.start- rel.rel.times.start (if start_from_start), or rel.rel.times.end - val.times.start (otherwise).
                6, 7A tag of width for either val.times.end- rel.rel.times.start (if end_from_start), or rel.rel.times.end - val.times.end (otherwise). If val.times.end == open, these two bits can be set arbitrarily instead.
                The corresponding compact U64 encoding for bits 4, 5.
                The corresponding compact U64 encoding for bits 6, 7, or the empty string if val.times.end == open.
                Any relative code in EncodePrivatePathExtendsPath for val.path, relative to the PrivatePathContext with private := rel.private.path and rel := rel.rel.path.

                Communal Capability Encoding

                We define a relative encoding relation EncodeCommunalCapabilityRelativePrivateInterest for any CommunalCapability with val.access_mode == read relative to any Note that CommunalCapabilities are always restricted to a single SubspaceId, so PersonalPrivateInterests with subspace_id == any are never needed.PersonalPrivateInterest with rel.private_interest.subspace_id == val.user_key ,rel.private_interest.namespace_id == val.namespace_key, and such that the path of the granted area of val is a prefix of rel.private_interest.path, and such that rel.user_key is equal to the final UserPublicKey in val.delegations (if there is at least one delegation), or equal to val.user_key otherwise. Let val be any such CommunalCapability, and let rel be any fitting PersonalPrivateInterest.

                To efficiently and privately encode the Areas in the delegations of val, we define a sequence of PrivateAreaContexts. is the PrivateAreaContext whose private is rel and whose rel is the -th Area in val.delegations. Further, we define as the PrivateAreaContext whose private is rel and whose rel is the subspace area of SubspaceId rel.private.private_interest.subspace_id.

                Then the codes in EncodeCommunalCapabilityRelativePrivateInterest for val relative to rel are the bytestrings that are concatenations of the following form:

                For every -th triplet (area, pk, sig) of val.delegations but the final one, a concatenation of the following form:
                Any relative code in EncodePrivateAreaInArea for area, relative to .
                Any code in encode_user_pk for pk.
                Any code in encode_user_sig for sig.