Erasure Encoding of Files in NFSv4.2

Internet-Draft	erasure encoding	November 2024
Haynes	Expires 9 May 2025	[Page]

Abstract

Parallel NFS (pNFS) allows a separation between the metadata (onto a metadata server) and data (onto a storage device) for a file. The Flexible File Version 2 Layout Type is defined in this document as an extension to pNFS that allows the use of storage devices that require only a limited degree of interaction with the metadata server and use already-existing protocols. Data replication is also added to provide integrity.¶

1. Introduction

In Parallel NFS (pNFS) (see Section 12 of [RFC8881]), the metadata server returns layout type structures that describe where file data is located. There are different layout types for different storage systems and methods of arranging data on storage devices. [RFC8435] defined the Flexible File Version 1 Layout Type used with file-based data servers that are accessed using the NFS protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC8881], and NFSv4.2 [RFC7862].¶

The Client Side Mirroring (see Section 8 of [RFC8435]), introduced with the first version of the Flexible File Layout Type, provides for replication of data but does not provide for integrity of data. In the event of an error, an user would be able to repair the file by silvering the mirror contents. I.e., they would pick one of the mirror instances and replicate it to the other instance locations.¶

However, lacking integrity checks, silent corruptions are not able to be detected and the choice of what constitutes the good copy is difficult. This document updates the Flexible File Layout Type to version 2 by providing data integrity for erasure encoding. It introduces new variants of COMMIT4 (see Section 18.3 of [RFC8881]) , READ4 (see Section 18.22 of [RFC8881]) , and WRITE4 (see Section 18.32 of [RFC8881]) to allow for the transmission of integrity checking.¶

Using the process detailed in [RFC8178], the revisions in this document become an extension of NFSv4.2 [RFC7862]. They are built on top of the external data representation (XDR) [RFC4506] generated from [RFC7863].¶

1.1. Definitions

block:: One of the resulting blocks to be exchanged with a data server after a transformation has been applied to a data block. Note that the resulting block may be a different size than the data block.¶
Client Side Mirroring:: A file based replication method where copies are maintained in parallel.¶
data block:: A block of data in the client's cache for a file.¶
Erasure Encoding:: A data protection scheme where a block of data is replicated into fragments and additional redundant fragments are added to achieve parity. The new blocks are stored in different locations.¶
Client Side Erasure Encoding:: A file based integrity method where copies are maintained in parallel.¶
consistency of payload:: A payload is consistent when all contained blocks have the same owner, i.e., they share the same writing client and transaction id.¶
integrity of data:: Data integrity refers to the accuracy, consistency, and reliability of data throughout its life cycle.¶
payload:: The set of metadata header and transformed blocks generate per data block by the erasure encoding type. Note that the resulting blocks might be of type active, parity, spare, or repair.¶
replication of data:: Data replication is making and storing multiple copies of data in different locations.¶
write hole:: A write hole is a data corruption scenario where either two clients are trying to write to the same block or one client is overwriting an existing block of data.¶

1.2. Requirements Language

The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

2. Flexible File Version 2 Layout Type

In order to introduce erasure encoding to pNFS, a new layout type of LAYOUT4_FLEX_FILES_V2 needs to be defined. While we could define a new layout type per erasure encoding type, there exist use cases where multiple erasure encoding types exist in the same layout.¶

The original layouttype4 introduced in [RFC8881] is modified to as in Figure 1.¶

       enum layouttype4 {
           LAYOUT4_NFSV4_1_FILES   = 1,
           LAYOUT4_OSD2_OBJECTS    = 2,
           LAYOUT4_BLOCK_VOLUME    = 3,
           LAYOUT4_FLEX_FILES      = 4,
           LAYOUT4_FLEX_FILES_V2   = 5
       };

       struct layout_content4 {
           layouttype4             loc_type;
           opaque                  loc_body<>;
       };

       struct layout4 {
           offset4                 lo_offset;
           length4                 lo_length;
           layoutiomode4           lo_iomode;
           layout_content4         lo_content;
       };

Figure 1

This document defines structures associated with the layouttype4 value LAYOUT4_FLEX_FILES_V2. [RFC8881] specifies the loc_body structure as an XDR type 'opaque'. The opaque layout is uninterpreted by the generic pNFS client layers but is interpreted by the Flexible File Version 2 Layout Type implementation. This section defines the structure of this otherwise opaque value, ffv2_layout4.¶

2.1. ffv2_encoding_type

   /// enum ffv2_encoding_type {
   ///     FFV2_ENCODING_MIRRORED       = 0x1;
   /// };

Figure 2

The ffv2_encoding_type (see Figure 2) encompasses a new IANA registry for 'Flex Files V2 Erasure Encoding Type Registry' (see Section 9.3). I.e., instead of defining a new Layout Type for each Erasure Encoding, we define a new Erasure Encoding Type. Except for FFV2_ENCODING_MIRRORED, each of the types is expected to employ the new operations in this document.¶

FFV2_ENCODING_MIRRORED offers replication of data and not integrity of data. As such, it does not need operations like WRITE_BLOCK4 (see Section 6.5).¶

2.2. ff_flags4

   const FF_FLAGS_NO_LAYOUTCOMMIT4   = 0x00000001;
   const FF_FLAGS_NO_IO_THRU_MDS    = 0x00000002;
   const FF_FLAGS_NO_READ_IO        = 0x00000004;
   const FF_FLAGS_WRITE_ONE_MIRROR  = 0x00000008;
   typedef uint32_t            ff_flags4;

Figure 3

ff_flags4 is defined as in Section 5.1 of [RFC8435] and is shown in Figure 3 for reference.¶

2.3. ffv2_file_info4

   /// struct ffv2_file_info4 {
   ///     stateid4                fffi_stateid;
   ///     nfs_fh4                 fffi_fh_vers;
   /// };

Figure 4

The ffv2_file_info4 is a new structure to help with the stateid issue discussed in Section 5.1 of [RFC8435]. I.e., in version 1 of the Flexible File Layout Type, there was the singleton ffds_stateid combined with the ffds_fh_vers array. I.e., each NFSv4 version has its own stateid. In Figure 4, each NFSv4 file handle has a one-to-one correspondence to a stateid.¶

2.4. ffv2_ds_flags4

   /// const FFV2_DS_FLAGS_ACTIVE        = 0x00000001;
   /// const FFV2_DS_FLAGS_SPARE         = 0x00000002;
   /// const FFV2_DS_FLAGS_PARITY        = 0x00000004;
   /// const FFV2_DS_FLAGS_REPAIR        = 0x00000008;
   /// typedef uint32_t            ffv2_ds_flags4;

Figure 5

The ffv2_layout4 (in Figure 5) flags detail the state of the data servers. With Erasure Encoding algorithms, there are both Systematic and Non-Systematic approaches. In the Systematic, the bits for integrity are placed amoungst the resulting transformed block. Such an implementation would typically see FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_SPARE data servers. The FFV2_DS_FLAGS_SPARE ones allow the client to repair a payload with enaging the metadata server. I.e., if one of the FFV2_DS_FLAGS_ACTIVE did not respond to a WRITE_BLOCK4, the client could fail the block to the FFV2_DS_FLAGS_SPARE data server.¶

With the Non-Systematic approach, the data and integrity live on different data servers. Such an implementation would typically see FFV2_DS_FLAGS_ACTIVE and FFV2_DS_FLAGS_PARITY data servers. If the implementation wanted to allow for local repair, it would also use FFV2_DS_FLAGS_SPARE. Note that with a Non-Systematic approach, it is possible to update parts of the blocks, see Section 6.5.3.2.¶

See [Plank97] for further reference to storage layouts for encoding.¶

2.5. ffv2_data_server4

   /// struct ffv2_data_server4 {
   ///     deviceid4               ffds_deviceid;
   ///     uint32_t                ffds_efficiency;
   ///     ffv2_file_info4         ffds_file_info<>;
   ///     fattr4_owner            ffds_user;
   ///     fattr4_owner_group      ffds_group;
   ///     ffv2_ds_flags4          ffds_flags;
   /// };

Figure 6

The ffv2_data_server4 (in Figure 6) describes a data file and how to access it via the different NFS protocols.¶

2.6. ffv2_encoding_type_data

   /// union ffv2_encoding_type_data switch
   ///         (ffv2_encoding_type fetd_encoding) {
   ///     case FFV2_ENCODING_MIRRORED:
   ///         void;
   /// };

Figure 7

The ffv2_encoding_type_data (in Figure 7) describes erasure encoding type specific fields. I.e., this is how the encoding type can communicate the need for counts of active, spare, parity, and repair types of blocks.¶

2.7. ffv2_mirror4

   /// struct ffv2_mirror4 {
   ///     ffv2_data_server4       ffm_data_servers<>;
   ///     ffv2_encoding_type_data ffm_encoding_type_data;
   /// };

Figure 8

The ffv2_mirror4 (in Figure 8) describes the Flexible File Layout Version 2 specific fields.¶

2.8. ffv2_layout4

   /// struct ffv2_layout4 {
   ///     length4                 ffl_stripe_unit;
   ///     ffv2_mirror4            ffl_mirrors<>;
   ///     ff_flags4               ffl_flags;
   ///     uint32_t                ffl_stats_collect_hint;
   /// };

Figure 9

The ffv2_layout4 (in Figure 9) describes the Flexible Files Layout Version 2.¶

2.9. ffv2_layouthint4

/// union ffv2_mirrors_hint switch (ffv2_encoding_type ffmh_type) {
///     case FFV2_ENCODING_MIRRORED:
///         void;
/// };
///
/// struct ffv2_layouthint4 {
///     ffv2_encoding_type fflh_supported_types<>;
///     ffv2_mirrors_hint fflh_mirrors_hint;
/// };

Figure 10

The ffv2_layouthint4 (in Figure 10) describes the layout_hint (see Section 5.12.4 of [RFC8881]) that the client can provide to the metadata server.¶

2.10. Mixing of Encoding Types

Note that effectively, multiple encoding types can be present in a Flexible Files Version 2 Layout Type layout. The ffv2_layout4 has an array of ffv2_mirror4, each of which has a ffv2_encoding_type. The main reason to allow for this is to provide for either the assimilation of a non-erasure encoded file to an erasure encoded file or the exporting of an erasure encoded file to a non-erasure encoded file.¶

Assume there is an additional ffv2_encoding_type of FFV2_ENCODING_REED_SOLOMON and it needs 4 active blocks, 2 parity blocks, and 2 spare blocks. The user wants to actively assimilate a regular file. As such, a layout might be as represented in Figure 11. As this is an assimilation, most of the data reads will be satisfied by READ4 (see Section 18.22 of [RFC8881]) calls to index 0. However, as this is also an active file, there could also be READ_BLOCK4 (see Section 6.3) calls to the other indexes.¶

         +---------------------------------------------------+
         | ffv2_layout4:                                     |
         +---------------------------------------------------+
         |     ffl_mirrors[0]:                               |
         |         ffm_data_servers:                         |
         |             ffv2_data_server4[0]                  |
         |                 ffds_flags: 0                     |
         |         ffm_encoding: FFV2_ENCODING_MIRRORED      |
         +---------------------------------------------------+
         |     ffl_mirrors[1]:                               |
         |         ffm_data_servers:                         |
         |             ffv2_data_server4[0]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_ACTIVE  |
         |             ffv2_data_server4[1]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_ACTIVE  |
         |             ffv2_data_server4[2]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_ACTIVE  |
         |             ffv2_data_server4[3]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_ACTIVE  |
         |             ffv2_data_server4[4]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_PARITY  |
         |             ffv2_data_server4[5]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_PARITY  |
         |             ffv2_data_server4[6]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_SPARE   |
         |             ffv2_data_server4[7]                  |
         |                 ffds_flags: FFV2_DS_FLAGS_SPARE   |
         |     ffm_encoding: FFV2_ENCODING_REED_SOLOMON      |
         +---------------------------------------------------+

Figure 11

When performing I/O via a FFV2_ENCODING_MIRRORED encoding type, the non-transformed data will be used, Whereas with other encoding types, a metadata header and transformed block will be sent. Further, when reading data from the instance files, the client MUST be prepared to have one of the encoding types supply data and the other type not to supply data. I.e., the READ_BLOCK4 call might return rlr_eof set to true (see Figure 37), which indicates that there is no data, where the READ4 call might return eof to be false, which indicates that there is data. The client MUST determine that there is in fact data.¶

An example use case is the active assimilation of a file to ensure integrity. As the client is helping to translated the file to the new encoding scheme, it is actively modifying the file. As such, it might be sequentially reading the file in order to translate. The READ4 call would be returning data and the READ_BLOCK4 would not be returning data. As the client overwrites the file, the WRITE4 call and the WRITE_BLOCK4 call would both have data sent. Finally, if the client read back a section which had been modified earlier, both the READ4 and READ_BLOCK4 calls would return data.¶

3. Erasure Encoding

Erasure Encoding takes an data block and transforms it to a payload to send to the data servers (see Figure 12). It generates a metadata header and transformed block per data server. The header is metadata information for the transformed block. From now on, the metadata is simply referred to as the header and the transformed block as the block. The payload of a data block is the set of generated headers and blocks for that data block.¶

The change_id is an unique identifier generated by the client to describe the current write transaction. The client_id is an unique identifier assigned by the metadata server to describe which client is making the current write transaction. The seq_id describes the index across payload. The eff_len is the length of the data within the block. Finally, the crc32 is the 32 bit crc calculation of the header (with the crc32 field being 0) and the block. By combining the two parts of the payload, integrity is ensured for both the parts.¶

While the data block might have a length of 4kB, that does not necessarily mean that the length of the block is 4kB. That length is determined by the erasure encoding type algorithm. For example, Reed Solomon might have 4kB blocks with the data integrity being compromised by parity blocks. Another example would be the Mojette Transformation, which might have 1kB block lengths.¶

The payload contains redundancy which will allow the erasure encoding type algorithm to repair blocks in the payload as it is transformed back to a data block (see Figure 17). A payload is consistent when all of the contained headers share the same change_id and client_id. It has integrity when it is consistent and the blocks all pass the crc32 checks.¶

3.1. Encoding a Data Block

                      +-----------------+
                      |  data block     |
                      +-----------------+
                      |                 |
                      | 3kB data        |
                      |                 |
                      +-----------------+
                      | 1kB empty       |
                      +-------+---------+
                              |
                              |
       +----------------------+-----------------------+
       |      Erasure Encoding (Transform Forward)    |
       +----+-------------------------------------+---+
            |                                     |
            |                                     |
        +---+----------------+         +----------+---------+
        | HEADER             |         | HEADER             |
        +--------------------+         +--------------------+
        | change_id: 3       |         | change_id: 3       |
        | client_id: 6       |         | client_id: 6       |
        | seq_id   : 0       |         | seq_id   : 5       |
        | eff_len  : 3kB     |  ...    | eff_len  : 3kB     |
        | crc32    :         |         | crc32    :         |
        +--------------------+         +--------------------+
        | BLOCK              |         | BLOCK              |
        +--------------------+         +--------------------+
        | data: ....         |         | data: ....         |
        +--------------------+         +--------------------+
             Data Server 1                 Data Server 6

Figure 12

Each data block of the file resident in the client's cache of the file will be encoded into N different payloads to be sent to the data servers as shown in Figure 12. As WRITE_BLOCK4 (see Section 6.5) can encode multiple write_block4 into a single transaction, a more accurate description of a WRITE_BLOCK4 might be as in Figure 13.¶

        +------------------------------------+
        | WRITE_BLOCK4args                   |
        +------------------------------------+
        | wba_stateid: 0                     |
        | wba_offset: 1                      |
        | wba_stable: FILE_SYNC4             |
        | wba_seq_id: 0                      |
        | wba_owner:                         |
        |            bo_change_id: 3         |
        |            bo_client_id: 6         |
        | wba_block[0]:                      |
        |            wb_crc    :  0x32ef89   |
        |            wb_effective_len  : 4kB |
        |            wb_block  :  ......     |
        | wba_block[1]:                      |
        |            wb_crc    :  0x56fa89   |
        |            wb_effective_len  : 4kB |
        |            wb_block  :  ......     |
        | wba_block[2]:                      |
        |            wb_crc    :  0x7693af   |
        |            wb_effective_len  : 3kB |
        |            wb_block  :  ......     |
        +------------------------------------+

Figure 13

pay attention to the 128 bits alignment for wb_block_valDF¶

This describes a 3 block write of data from an offset of 1 block in the file. As each block shares the wba_owner, it is only presented once. I.e., the data server will be able to construct the header for each wba_block from the wba_seq_id, wba_owner, wb_effective_len, and wb_crc.¶

Assuming that there were no issues, Figure 14 illustrates the results. The payload sequence id is implicit in the WRITE_BLOCK4args.¶

        +-------------------------------+
        | WRITE_BLOCK4resok             |
        +-------------------------------+
        | wbr_count: 3                  |
        | wbr_committed: FILE_SYNC4     |
        | wbr_writeverf: 0xf1234abc     |
        | wbr_owners[0]:                |
        |            bo_block_id: 1     |
        |            bo_change_id: 3    |
        |            bo_client_id: 6    |
        |            bo_activated: true |
        | wbr_owners[1]:                |
        |            bo_block_id: 2     |
        |            bo_change_id: 3    |
        |            bo_client_id: 6    |
        |            bo_activated: true |
        | wbr_owners[2]:                |
        |            bo_block_id: 3     |
        |            bo_change_id: 3    |
        |            bo_client_id: 6    |
        |            bo_activated: true |
        +-------------------------------+

Figure 14

3.1.1. Calculating the CRC32

        +---+----------------+
        | HEADER             |
        +--------------------+
        | change_id: 7       |
        | client_id: 6       |
        | seq_id   : 0       |
        | eff_len  : 3kB     |
        | crc32    : 0       |
        +--------------------+
        | BLOCK              |
        +--------------------+
        | data:  ....        |
        +--------------------+
             Data Server 1

Figure 15

Assuming the header and payload as in Figure 15, the crc32 needs to be calculated in order to fill in the wb_crc field. In this case, the crc32 is calculated over the 5 fields as shown in the header and the data of the block. In this example, it is calculated to be 0x21de8. The resulting WRITE_BLOCK4 is shown in Figure 16.¶

        +------------------------------------+
        | WRITE_BLOCK4args                   |
        +------------------------------------+
        | wba_stateid: 0                     |
        | wba_offset: 1                      |
        | wba_stable: FILE_SYNC4             |
        | wba_seq_id: 0                      |
        | wba_owner:                         |
        |            bo_change_id: 7         |
        |            bo_client_id: 6         |
        | wba_block[0]:                      |
        |            wb_crc    :  0x21de8    |
        |            wb_effective_len  : 3kB |
        |            wb_block  :  ......     |
        +------------------------------------+

Figure 16

3.2. Decoding a Data Block

             Data Server 1                 Data Server 6
        +--------------------+         +--------------------+
        | HEADER             |         | HEADER             |
        +--------------------+         +--------------------+
        | change_id: 1       |         | change_id: 1       |
        | client_id: 6       |         | client_id: 6       |
        | seq_id   : 0       |         | seq_id   : 5       |
        | eff_len  : 3kB     |  ...    | eff_len  : 3kB     |
        | crc32    :         |         | crc32    :         |
        +--------------------+         +--------------------+
        | BLOCK              |         | BLOCK              |
        +--------------------+         +--------------------+
        | data:  ....        |         | data:  ....        |
        +---+----------------+         +----------+---------+
            |                                     |
            |                                     |
       +----+-------------------------------------+---+
       |      Erasure Decoding (Transform Reverse)    |
       +----------------------+-----------------------+
                              |
                              |
                      +-------+---------+
                      |  data block     |
                      +-----------------+
                      |                 |
                      | 3kB data        |
                      |                 |
                      +-----------------+
                      | 1kB empty       |
                      +-----------------+

Figure 17

When reading blocks via a READ_BLOCK4 operation, the client will decode the headers and payload into data blocks as shown in Figure 17. If the resulting data block is to be sized less than a data block, i.e., the rb_effective_len is less than the data block size, then the inverse transformation MUST fill the remainder of the data block with 0s. It must appear as a freshly written data block which was not completely filled.¶

Note that at this time, the client could detect issues in the integrity of the data. The handling and repair are out of the scope of this document and MUST be addressed in the document describing each erasure encoding type.¶

3.2.1. Checking the CRC32

        +------------------------------------+
        | READ_BLOCK4resok                   |
        +------------------------------------+
        | rbr_eof: false                     |
        | rbr_blocks[0]:                     |
        |            rb_crc: 0x21de8         |
        |            rb_effective_len  : 3kB |
        |            rb_owner:               |
        |                 bo_block_id: 1     |
        |                 bo_change_id: 7    |
        |                 bo_client_id: 6    |
        |                 bo_activated: true |
        |            rb_block  :  ......     |
        +------------------------------------+

Figure 18

Assuming the READ_BLOCK4 results as in Figure 18, the crc32 needs to be checked in order to ensure data integrity. Conceptually, a header and payload can be built as shown in Figure 19. The crc32 is calculated over the 5 fields as shown in the header and the 3kB of data block. In this example, it is calculated to be 0x21de8. Thus this payload for the data server has data integrity.¶

        +---+----------------+
        | HEADER             |
        +--------------------+
        | change_id: 7       |
        | client_id: 6       |
        | seq_id   : 0       |
        | eff_len  : 3kB     |
        | crc32    : 0       |
        +--------------------+
        | BLOCK              |
        +--------------------+
        | data:  ....        |
        +--------------------+
             Data Server 1

Figure 19

4. Blocks and Activating

Unlike the regular NFSv4.2 I/O operations, the base unit of I/O in this document is the block. The raw data stream is encoded/decoded into blocks as described in Section 3. Each block has the concept of whether it is activated or pending activation. This is crucial in detecting write holes. A write hole occurs either when two different clients write to the same block concurrently or when a client overwrites existing data. In the first scenario, the order of writes is not deterministic and can result in a mixture of blocks in the payload. In the last scenario, network partitions or client restarts can result in partial writes. In both cases, the blocks have to be repaired, either by abandoning the new I/O or by sorting out the winner. Note that unlike the case of the encoding type detecting data integrity issues (see Section 3.2), the case of write holes is in the scope of this document.¶

What is out of scope of this document is the manner in which the data servers implement the semantics of the new operations. I.e., the data servers might be able to leverage the native file system to achieve the semantics or it might completely implement a multi-file approach to stage WRITE_BLOCK4 results and then shuffle blocks when the ACTIVATE_BLOCK4 or ROLLBACK_BLOCK4 operations activate the data.¶

4.1. Dead or Partitioned Client

Consider a client which was in the middle of sending WRITE_BLOCK4 to a set of data servers and it crashes. Regardless of whether it comes back online or not, the metadata server can detect that the client had restarted when it had an outstanding LAYOUTIOMODE4_RW on the file. The metadata server can assign the file to a repair program, which would basically scan the entire file with READ_BLOCK_STATUS4. When it determines that it does not have enough payload blocks to rebuild the data block, it can determine that the I/O for that data block was not complete and throw away the blocks.¶

Note that the repair process can throw away the blocks by using the ROLLBACK_BLOCK4 operation to unstage the pending written blocks.¶

4.2. Client Overwrite

Consider a client which gets back conflicting information in the WRITE_BLOCK4 results. Assume that we had written to 6 data servers with WRITE_BLOCK4s as in Figure 20. And we get the results as in Figure 21.¶

        +------------------------------------+
        | WRITE_BLOCK4args                   |
        +------------------------------------+
        | wba_stateid: 0                     |
        | wba_offset: 1                      |
        | wba_stable: FILE_SYNC4             |
        | wba_seq_id: 0                      |
        | wba_owner:                         |
        |            bo_change_id: 3         |
        |            bo_client_id: 6         |
        | wba_block[0]:                      |
        |            wb_crc    :  0x32ef89   |
        |            wb_effective_len  : 4kB |
        |            wb_block  :  ......     |
        | wba_block[1]:                      |
        |            wb_crc    :  0x56fa89   |
        |            wb_effective_len  : 4kB |
        |            wb_block  :  ......     |
        +------------------------------------+

Figure 20

Figure 21 shows that the first block was an overwrite and an activation has to be done in order for the newly written block to be returned in a READ_BLOCK4. Assume that the next four data servers had the same type of response.¶

                Data Server 1
        +--------------------------------+
        | WRITE_BLOCK4resok              |
        +--------------------------------+
        | wbr_count: 2                   |
        | wbr_committed: FILE_SYNC4      |
        | wbr_writeverf: 0xf1234abc      |
        | wbr_owners[0]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 2     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        | wbr_owners[1]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: false |
        | wbr_owners[2]:                 |
        |            bo_block_id: 2      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        +--------------------------------+

Figure 21

But assume that data server 4 does not respond to the WRITE_BLOCK4 operation. While the client can detect this and send the WRITE_BLOCK4 to any data server marked as FFV2_DS_FLAGS_SPARE, it might decide to see if the data server did in fact do the transaction. It might also be the case that there are no data servers marked as FFV2_DS_FLAGS_SPARE. The client issues a READ_BLOCK_STATUS4 (see Figure 22) and gets the results in Figure 23. This indicates that data server 4 did not get the WRITE_BLOCK4 request.¶

In general, the client can either resend the WRITE_BLOCK4 request, determine by the erasure encoding type that there is sufficient payload blocks present to decode the data block, or ROLLBACK_BLOCK4 the existing blocks to back out the change.¶

                Data Server 4
        +--------------------------------+
        | READ_BLOCK_STATUS4args         |
        +--------------------------------+
        | rbsa_stateid: 0                |
        | rbsa_offset: 1                 |
        | rbsa_count: 3                  |
        +----------+---------------------+

Figure 22

                Data Server 4
        +--------------------------------+
        | READ_BLOCK_STATUS4resok        |
        +--------------------------------+
        | rbsr_eof: true                 |
        | rbsr_blocks[0]:                |
        |            bo_block_id: 1      |
        |            bo_change_id: 2     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        +--------------------------------+

Figure 23

4.3. Racing Clients

Assume that the client has written to 6 data servers with WRITE_BLOCK4s as in Figure 20. But now it gets back the conflicting results in Figure 24 and Figure 25. From this, it can detect that there was a race with another client. Note, even though both clients present the same bo_change_id, nothing can be inferred as to the ordering of the two transactions. In some cases, bo_client_id 10 won the race and in some cases, bo_client_id 6 won the race.¶

As a subsequent READ_BLOCK4 will produce garbage, the clients need to agree on how to fix this issue without any communication. A simplistic approach is for each client to retry the WRITE_BLOCK4 until such time as the payload is consistent. Note, this does not mean that both clients win, it just means that one of them wins.¶

Another option is for the clients to report a LAYOUTERROR4 (see Section 15.6 of [RFC7862]) to the metadata server with an error of NFS4ERR_ERASURE_ENCODING_NOT_CONSISTENT. That would then allow the metadata server to assign the repairing of the file.¶

                Data Server 1
        +--------------------------------+
        | WRITE_BLOCK4resok              |
        +--------------------------------+
        | wbr_count: 2                   |
        | wbr_committed: FILE_SYNC4      |
        | wbr_writeverf: 0xf1234abc      |
        | wbr_owners[0]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 10    |
        |            bo_activated: true  |
        | wbr_owners[1]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: false |
        | wbr_owners[2]:                 |
        |            bo_block_id: 2      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        +--------------------------------+

Figure 24

                Data Server 2
        +--------------------------------+
        | WRITE_BLOCK4resok              |
        +--------------------------------+
        | wbr_count: 2                   |
        | wbr_committed: FILE_SYNC4      |
        | wbr_writeverf: 0xf1234abc      |
        | wbr_owners[0]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        | wbr_owners[1]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 10    |
        |            bo_activated: false |
        | wbr_owners[2]:                 |
        |            bo_block_id: 2      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        +--------------------------------+

Figure 25

4.3.1. Multiple Writers

Note that nothing prevents pending blocks from accumulating or from more than 2 writers trying to write the same payload. An example of such a WRITE_BLOCK4resok in response to the example of Figure 20 is shown in Figure 26. Note only has client 6 tried to update the block 1, but all of clients 6, 7, and 20 are attempting to update it.¶

                Data Server 2
        +--------------------------------+
        | WRITE_BLOCK4resok              |
        +--------------------------------+
        | wbr_count: 2                   |
        | wbr_committed: FILE_SYNC4      |
        | wbr_writeverf: 0xf1234abc      |
        | wbr_owners[0]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        | wbr_owners[1]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 4     |
        |            bo_client_id: 6     |
        |            bo_activated: false |
        | wbr_owners[2]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 20    |
        |            bo_client_id: 7     |
        |            bo_activated: false |
        | wbr_owners[3]:                 |
        |            bo_block_id: 1      |
        |            bo_change_id: 3     |
        |            bo_client_id: 10    |
        |            bo_activated: false |
        | wbr_owners[4]:                 |
        |            bo_block_id: 2      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        +--------------------------------+

Figure 26

4.4. Reader and Writer Racing

In addition to the above write hole scenarios, a further complication is a racing reader and writer. If the client reads a block and determines that the payload is not consistent (i.e., not all of the payload blocks share the same client_id and change_id), then it can assume that it has encountered a race with another client writing to the file. It SHOULD retry the READ_BLOCK4 operation until payload consistency is achieved. It may determine to send a LAYOUTERROR4 to the metadata server with an error of NFS4ERR_ERASURE_ENCODING_NOT_CONSISTENT. And should it hang forever? Perhaps a new layout error that the client can send the MDS? Or should it probe with READ_BLOCK_STATUS4 to try to repair? TH Perhaps a LAYOUTERROR_BLOCK4 to send an encoding type specific location? TH¶

5. New Infrastructure

5.1. Errors

5.1.1. Error 10097 - NFS4ERR_ERASURE_ENCODING_NOT_CONSISTENT

The client encountered a payload in which the blocks were inconsistent and stays inconsistent. As the client can not tell if another client is actively writing, it informs the metadata server of this error via LAYOUTERROR4. The metadata server can then arrange for repair of the file.¶

Note that due to the opaqueness of the clientid4, the client can not differentiate between boot instances of the metadata server or client, but the metadata server can do that differentiation. I.e., it can tell if the inconsistency is from the same client, whether that client is active and actively writing to the file (i.e., does the client have the file open and with a LAYOUTIOMODE4_RW layout?).¶

5.1.2. Error 10098 - NFS4ERR_ERASURE_ENCODING_NOT_SUPPORTED

The client requested a ffv2_encoding_type which the metadata server does not support. I.e., if the client sends a layout_hint requesting an erasure encoding type that the metadata server does not support, this error code can be returned. The client might have to send the layout_hint several times to determine the overlapping set of supported erasure encoding types.¶

5.1.3. Error 10099 - NFS4ERR_ERASURE_ENCODING_BLOCK_MISMATCH

The client requested to the data server to update the header only and the data server can not find a matching block at that offset.¶

5.2. EXCHGID4_FLAG_USE_PNFS_DS

/// const EXCHGID4_FLAG_USE_ERASURE_DS      = 0x00100000;

Figure 27

When a data server connects to a metadata server it can via EXCHANGE_ID (see Section 18.35 of [RFC8881]) state its pNFS role. The data server can use EXCHGID4_FLAG_USE_ERASURE_DS (see Figure 27) to indicate that it supports the new NFSv4.2 operations introduced in this document. Section 13.1 [RFC8881] describes the interaction of the various pNFS roles masked by EXCHGID4_FLAG_MASK_PNFS. However, that does not mask out EXCHGID4_FLAG_USE_ERASURE_DS. I.e., EXCHGID4_FLAG_USE_ERASURE_DS can be used in combination with all of the pNFS flags.¶

If the data server sets EXCHGID4_FLAG_USE_ERASURE_DS during the EXCHANGE_ID operation, then it MUST support: ACTIVATE_BLOCK4, READ_BLOCK_STATUS4, READ_BLOCK4, ROLLBACK_BLOCK4, and WRITE_BLOCK4. Further, note that this support is orthoganol to the Erasure Encoding Type selected. The data server is unaware of which type is driving the I/O. It is also unaware of the payload layout or what type of block it is serving.¶

5.3. Block Owner

/// struct block_owner4 {
///     uint32_t    bo_block_id;
///     changeid4   bo_change_id;
///     clientid4   bo_client_id;
///     bool        bo_activated;
/// };

Figure 28

The block_owner4 (see Figure 28) is used to determine when and by whom a block was written. The bo_block_id is used to identify the block and MUST be the index of the block within the file. I.e., it is the offset of the start of the block divided by the block len. The bo_client_id MUST be the client id handed out by the metadata server to the client as the eir_clientid during the EXCHANGE_ID results (see Section 18.35 of [RFC8881]) and MUST NOT be the client id supplied by the data server to the client. I.e., across all data files, the bo_client_id uniquely describes one and only one client.¶

The bo_change_id is like the change attribute (see Section 5.8.1.4 of [RFC8881]) in that each block write by a given client has to have an unique bo_change_id. I.e., it can be determined which transaction across all data files that a block corresponds.¶

The bo_activated is used by the data server to indicate whether the block I/O was activated or pending activation. The first WRITE_BLOCK4 to a location is automatically activated if the WRITE_BLOCK_FLAGS_ACTIVATE_IF_EMPTY is set. Subsequent WRITE_BLOCK4 modifications to that block location are not automatically activated. The client has to ACTIVATE_BLOCK4 the block in order to get it activated.¶

The concept of automatically activating is dependent on the wba_stable field of the WRITE_BLOCK4args.¶

6. New NFSv4.2 Operations

6.1. Operation 77: ACTIVATE_BLOCK4 - Activate Cached Block Data

6.1.1. ARGUMENTS

/// struct ACTIVATE_BLOCK4args {
///     /* CURRENT_FH: file */
///     offset4         aba_offset;
///     count4          aba_count;
///     block_owner4    aba_blocks<>;
/// };

Figure 29

6.1.2. RESULTS

/// struct ACTIVATE_BLOCK4resok {
///     verifier4       abr_writeverf;
/// };

Figure 30

/// union ACTIVATE_BLOCK4res switch (nfsstat4 abr_status) {
///     case NFS4_OK:
///         ACTIVATE_BLOCK4resok   abr_resok4;
///     default:
///         void;
/// };

Figure 31

6.1.3. DESCRIPTION

ACTIVATE_BLOCK4 is COMMIT4 (see Section 18.3 of [RFC8881]) with additional semantics over the block_owner activating the blocks. As such, all of the normal semantics of COMMIT4 directly apply.¶

The main difference between the two operations is that ACTIVATE_BLOCK4 works on blocks and not a raw data stream. As such aba_offset is the starting block offset in the file and not the byte offset in the file. Some erasure encoding types can have different block sizes depending on the block type. Further, aba_count is a count of blocks to activate and not bytes to activate.¶

Further, while it may appear that the combination of aba_offset and aba_count are redundant to aba_blocks, the purpose of aba_blocks is to allow the data server to differentiate between potentially multiple pending blocks.¶

6.2. Operation 78: READ_BLOCK_STATUS4 - Read Block Commit Status from File

6.2.1. ARGUMENTS

/// struct READ_BLOCK_STATUS4args {
///     /* CURRENT_FH: file */
///     stateid4    rbsa_stateid;
///     offset4     rbsa_offset;
///     count4      rbsa_count;
/// };

Figure 32

6.2.2. RESULTS

/// struct READ_BLOCK_STATUS4resok {
///     bool            rbsr_eof;
///     block_owner4    rbsr_blocks<>;
/// };

Figure 33

/// union READ_BLOCK_STATUS4res switch (nfsstat4 rbsr_status) {
///     case NFS4_OK:
///         READ_BLOCK4resok     rbsr_resok4;
///     default:
///         void;
/// };

Figure 34

6.2.3. DESCRIPTION

READ_BLOCK_STATUS4 differs from READ_BLOCK4 in that it only reads active and pending headers in the desired data range.¶

6.3. Operation 79: READ_BLOCK4 - Read Blocks from File

6.3.1. ARGUMENTS

/// struct READ_BLOCK4args {
///     /* CURRENT_FH: file */
///     stateid4    rba_stateid;
///     offset4     rba_offset;
///     count4      rba_count;
/// };

Figure 35

6.3.2. RESULTS

/// struct read_block4 {
///     uint32_t        rb_crc;
///     uint32_t        rb_effective_len;
///     block_owner4    rb_owner;
///     uint32_t        rb_seq_id;
///     opaque          rb_block<>;
/// };

Figure 36

/// struct READ_BLOCK4resok {
///     bool        rbr_eof;
///     read_block4 rbr_blocks<>;
/// };

Figure 37

/// union READ_BLOCK4res switch (nfsstat4 rbr_status) {
///     case NFS4_OK:
///          READ_BLOCK4resok     rbr_resok4;
///     default:
///          void;
/// };

Figure 38

6.3.3. DESCRIPTION

READ_BLOCK is READ4 (see Section 18.22 of [RFC8881]) with additional semantics over the block_owner and the activation of blocks. As such, all of the normal semantics of READ4 directly apply.¶

The main difference between the two operations is that READ_BLOCK works on blocks and not a raw data stream. As such rba_offset is the starting block offset in the file and not the byte offset in the file. Some erasure encoding types can have different block sizes depending on the block type. Further, rba_count is a count of blocks to read and not bytes to read.¶

READ_BLOCK also only returns the activated block at the location. I.e., if a client overwrites a block at offset 10, then tries to read the block without activating it, then the original block is returned.¶

When reading a set of blocks across the data servers, it can be the case that some data servers do not have any data at that location. In that case, the server either returns rbr_eof if the rba_offset exceeds the number of blocks that the data server is aware or it returns an empty block for that block.¶

For example, in Figure 39, the client asks for 4 blocks starting with the 3rd block in the file. The second data server responds as in Figure 40. The client would read this as there is valid data for blocks 2 and 4, there is a hole at block 3, and there is no data for block 5. Note that the data server MUST calculate a valid rb_crc for block 3 based on the generated fields.¶

                Data Server 2
        +--------------------------------+
        | READ_BLOCK4args                |
        +--------------------------------+
        | rba_stateid: 0                 |
        | rba_offset: 2                  |
        | rba_count: 4                   |
        +----------+---------------------+

Figure 39

                Data Server 2
        +--------------------------------+
        | READ_BLOCK4resok               |
        +--------------------------------+
        | rbr_eof: true                  |
        | rbr_blocks[0]:                 |
        |     rb_crc: 0x3faddace         |
        |     rb_effective_len: 4kB      |
        |     rb_owner:                  |
        |            bo_block_id: 2      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        |     rb_seq_id: 1               |
        |     rb_block: ....             |
        | rbr_blocks[0]:                 |
        |     rb_crc: 0xdeade4e5         |
        |     rb_effective_len: 4kB      |
        |     rb_owner:                  |
        |            bo_block_id: 3      |
        |            bo_change_id: 0     |
        |            bo_client_id: 0     |
        |            bo_activated: false |
        |     rb_seq_id: 1               |
        |     rb_block: 0000...00000     |
        | rbr_blocks[0]:                 |
        |     rb_crc: 0x7778abcd         |
        |     rb_effective_len: 2kB      |
        |     rb_owner:                  |
        |            bo_block_id: 4      |
        |            bo_change_id: 3     |
        |            bo_client_id: 6     |
        |            bo_activated: true  |
        |     rb_seq_id: 1               |
        |     rb_block: ....             |
        +--------------------------------+

Figure 40

6.4. Operation 80: ROLLBACK_BLOCK - Rollback Cached Block Data

6.4.1. ARGUMENTS

/// struct ROLLBACK_BLOCK4args {
///     /* CURRENT_FH: file */
///     offset4         rba_offset;
///     count4          rba_count;
///     block_owner4    rba_blocks<>;
/// };

Figure 41

6.4.2. RESULTS

/// struct ROLLBACK_BLOCK4resok {
///     verifier4       rbr_writeverf;
/// };

Figure 42

/// union ROLLBACK_BLOCK4res switch (nfsstat4 rbr_status) {
///     case NFS4_OK:
///         ROLLBACK_BLOCK4resok   rbr_resok4;
///     default:
///         void;
/// };

Figure 43

6.4.3. DESCRIPTION

ROLLBACK_BLOCK4 is a new form like COMMIT4 (see Section 18.3 of [RFC8881]) with additional semantics over the block_owner the rolling back the writing of blocks. As such, all of the normal semantics of COMMIT4 directly apply.¶

The main difference between the two operations is that ROLLBACK_BLOCK4 works on blocks and not a raw data stream. As such rba_offset is the starting block offset in the file and not the byte offset in the file. Some erasure encoding types can have different block sizes depending on the block type. Further, rba_count is a count of blocks to rollback and not bytes to rollback.¶

Further, while it may appear that the combination of rba_offset and rba_count are redundant to rba_blocks, the purpose of rba_blocks is to allow the data server to differentiate between potentially multiple pending blocks.¶

ROLLBACK_BLOCK4 deletes prior WRITE_BLOCK4 transactions. In case of write holes, it allows the client to undo transactions to repair the file.¶

6.5. Operation 81: WRITE_BLOCK4 - Write Blocks to File

6.5.1. ARGUMENTS

/// const WRITE_BLOCK_FLAGS_UPDATE_HEADER_ONLY   = 0x00000001;
/// const WRITE_BLOCK_FLAGS_ACTIVATE_IF_EMPTY      = 0x00000002;

Figure 44

/// struct write_block4 {
///     uint32_t        wb_crc;
///     uint32_t        wb_effective_len;
///     uint32_t        wb_flags;
///     opaque          wb_block<>;
/// };

Figure 45

/// struct guard_block_owner4 {
///     changeid4   gbo_change_id;
///     clientid4   gbo_client_id;
/// };

Figure 46

/// union write_block_guard4 (bool wbg_check) {
///     case TRUE:
///         guard_block_owner4   wbg_block_owner;
///     case FALSE:
///         void;
/// };

Figure 47

/// struct WRITE_BLOCK4args {
///     /* CURRENT_FH: file */
///     stateid4           wba_stateid;
///     offset4            wba_offset;
///     stable_how4        wba_stable;
///     block_owner4       wba_owner;
///     uint32_t           wba_seq_id;
///     write_block_guard4 wba_guard;
///     write_block4       wba_data<>;
/// };

Figure 48

6.5.2. RESULTS

/// struct WRITE_BLOCK4resok {
///     count4          wbr_count;
///     stable_how4     wbr_committed;
///     verifier4       wbr_writeverf;
///     block_owner4    wbr_owners<>;
/// };

Figure 49

/// union WRITE_BLOCK4res switch (nfsstat4 wbr_status) {
///     case NFS4_OK:
///         WRITE_BLOCK4resok    wbr_resok4;
///     default:
///         void;
/// };

Figure 50

6.5.3. DESCRIPTION

WRITE_BLOCK4 is WRITE4 (see Section 18.32 of [RFC8881]) with additional semantics over the block_owner and the activation of blocks. As such, all of the normal semantics of WRITE4 directly apply.¶

The main difference between the two operations is that WRITE_BLOCK4 works on blocks and not a raw data stream. As such wba_offset is the starting block offset in the file and not the byte offset in the file. Some erasure encoding types can have different block sizes depending on the block type. Further, wbr_count is a count of written blocks and not written bytes.¶

If wba_stable is FILE_SYNC4, the data server MUST commit the written header and block data plus all file system metadata to stable storage before returning results. This corresponds to the NFSv2 protocol semantics. Any other behavior constitutes a protocol violation. If wba_stable is DATA_SYNC4, then the data server MUST commit all of the header and block data to stable storage and enough of the metadata to retrieve the data before returning. The data server implementer is free to implement DATA_SYNC4 in the same fashion as FILE_SYNC4, but with a possible performance drop. If wba_stable is UNSTABLE4, the data server is free to commit any part of the header and block data and the metadata to stable storage, including all or none, before returning a reply to the client. There is no guarantee whether or when any uncommitted data will subsequently be committed to stable storage. The only guarantees made by the data server are that it will not destroy any data without changing the value of writeverf and that it will not commit the data and metadata at a level less than that requested by the client.¶

The activation of header and block data interacts with the bo_activated for each of the written blocks. If the data is not committed to stable storage then the bo_activated field MUST NOT be set to true. Once the data is committed to stable storage, then the data server can set the block's bo_activated if one of these conditions apply:¶

it is the first write to that block and the WRITE_BLOCK_FLAGS_ACTIVATE_IF_EMPTY flag is set¶
the ACTIVATE_BLOCK4 is issued later for that block.¶

There are subtle interactions with write holes caused by racing clients. One client could win the race in each case, but because it used a wba_stable of UNSTABLE4, the subsequent writes from the second client with a wba_stable of FILE_SYNC4 can be awarded the bo_activated being set to true for each of the blocks in the payload.¶

Finally, the interaction of wba_stable can cause a client to mistakenly believe that by the time it gets the response of bo_activated of false, that the blocks are not activated. A subsequent READ_BLOCK4 or READ_BLOCK_STATUS4 might show that the bo_activated is true without any interaction by the client via ACTIVATE_BLOCK4. Automatic setting of bo_activated to true if it is the first write should be a performance boost. But it can lead to the client having incorrect information (as above) and trying to ACTIVATE_BLOCK4 a payload that has lost the race. But is that bad? If you have racing clients, there is no guarantee at all as to the contents of the file. TH¶

6.5.3.1. Guarding the Write

A guarded WRITE_BLOCK4 is when the writing of a block MUST fail if wba_guard.wbg_check is set and the target block does not have both the same change_id as the gbo_change_id and the same client_id as the gbo_client_id. This is useful in read-update-write scenarios. The client reads a block, updates it, and is prepared to write it back. It guards the write such that if another writer has modified the block, the data server will reject the modification.¶

Note that as the guard_block_owner4 (see Figure 46 does not have a block_id and the WRITE_BLOCK4 applies to all blocks in the range of wba_offset to the length of wba_data, then each of the target blocks MUST have the same change_id and client_id. The client SHOULD present the smallest set of blocks as possible to meet this requirement.¶

And the complexity goes up here. Does the DS reject only based on active blocks? Or can inactive ones also cause rejection? TH¶

Is the DS supposed to vet all blocks first or proceed to the first error? Or do all blocks and return an array of errors? (This last one is a no-go.) Also, if we do the vet first, what happens if a WRITE_BLOCK4 comes in after the vetting? Are we to lock the file during this process. Even if we do that, we still have the issue of multiple DSes. TH¶

6.5.3.2. Updating the Header Only

Some erasure encoding types keep their blocks in plain text and have parity blocks in order to provide integrity. A common configuration for Reed Solomon is 4 active blocks, 2 parity blocks, and 2 spares. Assuming 4kB data blocks, then each payload delivers 16kB of data and 8kB of parity. If the application modifies the first data block, then all that needs to change is the first active block and the two parity blocks in the payload.¶

In any other approach, only 12kB of the total 24kB has to be written to storage. If that is attempted in the Flexible Files Version 2 Layout Type, then the payload will be deemed as inconsistent. The reason for this is that the change_id for the unmodified blocks will not match those of the modified blocks.¶

The WRITE_BLOCK_FLAGS_UPDATE_HEADER_ONLY flag in wb_flags can be used to save the transmission of the blocks. If it is set, then the wb_block is ignored. It MUST be empty. Note that the client MUST only modify both the wb_crc and the wba_owner.bo_change_id fields in this case. The wb_crc MUST change as the wba_owner.bo_change_id has been modified (see Section 3.1.1).¶

For the purpose of computing the activation state of the block, The data server MUST treat this as an overwrite. Thus, in the response, bo_activated MUST be false.¶

Recallable Object Type Name	Value	RFC	How	Minor Versions
RCA4_TYPE_MASK_FFV2_LAYOUT_MIN	20	RFCTBD10	L	1
RCA4_TYPE_MASK_FFV2_LAYOUT_MAX	21	RFCTBD10	L	1