I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please wait for direction from your document shepherd or AD before posting a new version of the draft. For more information, please see the FAQ at . Document: draft-ietf-nfsv4-rfc3530-migration-update-07.tx Reviewer: Elwyn Davies Review Date: 2016/01/16 IETF LC End Date: 2016/01/18 IESG Telechat date: 2016/01/18 Summary: Almost ready.  There are a few minor issues and a considerable number of nits that need attention.  The text is extremely complex and in some places the logic is not entirely clear.  I would encourage the authors to revisit the various usages of RFC 2119 language;  a number of the instances, especially several of the 'SHOULDs', appear to refer to operational or administrative decisions rather than matters that affect the protocol - and are therefore not actionable by the receiving implementation.  If appropriate the language should be modified. Major issues: ------------------ None Minor issues: ------------------ Bits possibly not covered from RFC 7530: --------------------------------------------------------- With respect to s4: In s9.1.1, para 1 of RFC 7530 there is: Breaking the lease state amounts to the server removing all lock, share reservation, and, where the server is not supporting the CLAIM_DELEGATE_PREV claim type, all delegation state associated with the same client with the same identity. For a discussion of delegation state recovery, see Section 10.2.1 . There is nothing in the update about servers that don't support the CLAIM_DELEGATE_PREV claim type.  Should something be said about this? The general description of the distinction between open owners and lock owners in para 2, 3 and 4 of s9.1.1 is lost if the new text is taken as a true replacement of s9.1.1.  This is undesirable.  ------------------------------------------ Items in the text of this draft: ----------------------------------------- s5.1.3: When a file system is migrated, the source server is queried about the new location of the file system using GETATTR(fs_locations).  Since the file system involved has migrated, the server no longer has a connection with the relevant file system but still needs to respond to the GETATTR appropriately.  Does there need to be some discussion of how long the server needs to hang onto information that will allow it to respond to these requests, particularly since the suggested means of discovery uses an arbitrary file from the file system.  It is possible that I have missed something or my understanding/memory of how fs_locations works is faulty. s7.4.5.1: It would be helpful to indicate the if-then structure of the bullets by making current bullets 3 and 4 sub-bullets of bullet 2.   Indeed, on second thoughts, it is not clear which 'if' (either that in bullet 2 or bullet 3) the 'otherwise' in bullet 4 applies to.  Please make the logic clearer. s7.5, bullet 1: Surely the decision as to whether privacy is needed is an administrative/operational deployment decision?  Transfer integrity is vital and provision should be made for transfer privacy to be possible if the installation requires it, but I can't see that privacy needs to be ensured in all possible circumstances. ======================= Nits/editorial comments: ----------------------------------- General: s/e.g. /e.g., /g (5 instances); s/i.e. /i.e., /g  (5 instances). General (in s5): s/openowner/open-owner/g; s/lockowner/lock-owner/g General: s/filesystem/file system/g (In line with usage in RFC 7530). General: Lack of terminology section. There are a number of terms that might be advantageously incorporated into a terminology section - lock types, client id, stateid, incarnation verifier (and its relation to reboots/'boot verifier'), server (address) trunking, open owners, lock owners spring to mind.  s3 , para 6/s4.1, para 7: The terms 'client boot instance' and (subsequently) 'boot verifier' are not used in RFC 7530.  s9.1.1 uses the term 'client incarnation verifier' (and this term is actually used in s4.1 , para 10 (second bullet after nfs_client_id4 data structure definition).  I personally think is a more descriptive term.  I would be inclined to replace 'boot verifier' with 'incarnation verifier' throughout, and provide a definition in  a terminology section (see above).  This could also cover the case where the incarnation verifier is changed because the structure is destroyed rather then the client host rebooting (as noted in s4.6) although the intention is that it should only change on a reboot. s3, 2nd set of bullets, bullets 2 and 3; Also s4.9, para 1: s/client id/client id string/ s3, last bullet: It would be worth pointing out that there is a complete revision of the definition of the SETCLIENTID operation.  This is relevant because there is a pointer to the definition of SETCLIENTID in s4. s4: Correct section names from RFC 7530.. OLD: The replaced sections are named "client ID" and "Server Release of Clientid." NEW: The replaced sections are named "Client ID" and "Server Release of Client ID." END s4.1, bullet 1: s/client IP address/the client IP address/ s4.1,bullet 2: I would replace 'save' by 'maintain a non-volatile record across reboots of' to make it clear what is intended. s4.1, last para:  The concept of 'server trunking' needs to be defined.  NFSv4.0 per RFC 7530 does not have this concept - it is introduced in NFSv4.1, thus somebody looking from the PoV of a 4.0 implementer need not know anything about trunking.   s4.2: Section references to relevant parts of RFC 7530 that provide definitions of structures and operations would be helpful here. s4.2, bullet 1 of second set of bullets (under struct nfs_client_id): The id field is a variable-length string which uniquely identifies a specific client. Although, we describe it as a string and it is often referred to as a "client string," it should be understood that the protocol defines this as opaque data. Later on s5.1 recommends that opaque data such as this should be encoded in network byte order.  It would be helpful to repeat or introduce this recommendation (make it a MUST???) here to ensure that the id will be identical on servers of whatever endianess.  Also the term 'network byte order' is primarily relevant to data for which the internal representation is a (binary) number.  A better way of saying this might be 'the encoding and decoding processes (e.g., using network byte order for the external representation) for the opaque data result in the same internal representation whatever the endianness of the originating and receiving machines' even if it is somewhat more long winded. s4.2, para 20 (across the boundary between p7 and p8): This shorthand client identifier (a client ID) is assigned by the server and should be chosen so that it will not conflict with a client ID previously assigned by same server. This applies across server restarts or reboots. Would there be any advantage to recommending that servers should try to generate clientid4's that are unique to the server as far as possible as well as different across reboots, etc.?  It would appear to minimise the risk of clashes during migrations but this may be unnecessarily complicated and the client has to cope with the remote possibility anyway. I note that there is a statement in s4.8, para 4 on page 19 Given that it is already highly unlikely that the clientid XC is duplicated by distinct servers, the probability that SCn is duplicated as well has to be considered vanishingly small. Adding the recommendation would support this statement. s4.2, para 7 on page 8: s/of administrative action,/of administrative action./ (swap comma for period/full stop) s4.2, last bullet: o Merger of state under the associated lease with another lease under a different clientid causes the clientid4 serving as the source of the merge to cease being recognized on its server. (Always returns NFS4ERR_STALE_CLIENTID) Should 'clientid causes the clientid4' be 'client ID causes the clientid4'?  There are 10 other instances of clientid (rather than clientid4) where the same question applies. s4.2, para after last bullet on p9: In cases of server or client error resulting in this error, use of SETCLIENTID to establish a new lease is desirable as well. Which error is meant?  There isn't one mentioned in this para - I guess _STALE_CLIENTID but could be 'any of the above errors'.   s4.2, page 9. (See the section entitled "Server Failure and Recovery") and (see the section entitled "lock-owner" for details)   I take these are intended to be sections in RFC 7530 (respectively 9.6.2 and 9.1.5) -  adding these section numbers and the RFC ref would be helpful. s4.2, para last but two: In the last two cases, different recovery procedures are required. See Section 5.1.1 for details. It isn't clear which cases you are referring to here... the last two bullets two paras before or the last two cases referred to in the previous para out of In the event of a server reboot, loss of lease state due to lease expiration, or administrative revocation of a clientid4, the client Please make this more explicit. s4.2, last para: See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM for a complete specification of these operations. Section references would help, plus a note that the definition of SETCLIENTID is now in this document (s7.4) rather than Section 16.33 of [RFC7530].  SETCLIENTID_CONFIRM is at Section 16.34 of [RFC7530].   s4.3, last para: In that event, when the server gets a SETCLIENTID specifying a client id string for which the server has a clientid4 Which of the three events in the previous para is being referred to?  Presumably the deliberate change of the primcipal since the other two are not really recognizable in a controlled way. ;-) s4.4, 2nd bullet on page 11 (para 7): the client to be aware when two server IP addresses are connected to the same server (they return the same server name in responding to an EXCHANGE_ID). The comment about 'return[ing] the same server name' in NFSv4.1 is not the real reason the client is sure they are the same server.  Would suggest changing this to '(Section 2.10.5.1 of [RFC5661] explains how the client is able to assure itself that the connections are to the same logical server.)'  s4.4, 3rd bullet on page 11 (para 9):  Suggest adding a reference to the fs_locations discussions in RFC 7530 - Perhaps '(see Sections 8.1 and 8.42 of [RFC7530])'. s4.4. 4th bullet on page 11 (para 10): in that the two different client id strings sent to different IP addresses may wind up on the same IP address, adding confusion. The phrase 'same IP address' doesn't really express what is going on IMO.  Perhaps 'same (logical) server'. s4.5, para 3: s/per server network addresses./per server network address./ s4.5, last para: Therefore, client implementations that support migration with transparent state migration SHOULD NOT use the non-uniform client id string approach, except where it is necessary for compatibility with existing server implementations Arguably, this is an operational decision rather than something that affects the protocol... possibly s/SHOULD NOT/should not/. s4.6, para 10: s/typically the reboot time./typically at reboot time./ s4.7, para 6: s/the spec/this specification/ s4.7, para 8:   Spurious double quote (") at end of para: enable transparent state migration." s4.7, para 9: Missing period/full stop at end of para. s4.8, para 3: s/blind-sided/surprised/ (blind-sided is rather too idiomatic). s4.8, para 7: s/Servers MUST NOT do that./Servers MUST NOT behave in this way./ s4.8, last para on page 17: In this algorithm, when SETCLIENTID is done it will use the common nfs_client_id4 and specify the current target IP address as part of the callback parameters. I am unclear why (and, indeed, how) the target IP address needs to be incorporated in the callback parameters.  AFAICS, the process described in this section does not examine a callback (by this I mean the results of one of the CB_xxx operations) as part of the process of trunking determination.   The choice of callback_ident (which appears to be the only relevant parameter) seems more likely to be appropriately selected in line with the statement later in the process (on page 19): The specific callback parameters chosen, in terms of cb_client4 and callback_ident, are up to the client and should reflect its preferences as to callback handling for the common clientid, in the event that X and IPn are trunked together. but chosen so that they would be appropriate if the server doesn't end up trunked with any other existing client ID.  Clearly the IP address would be a convenient semi-random value which would work to identify the callback uniquely eventually but the selected value doesn't seem to have any effect on the algorithm.   s4.8, para 1 on page 18: Note that when the client has done previous SETCLIENTID's, to any IP addresses, with more than one principal or authentication flavor, we have the possibility of receiving NFS4ERR_CLID_INUSE, since we do not yet know which of our connections with existing IP addresses might be trunked with our current one. In the event that the SETCLIENTID fails with NFS4ERR_CLID_INUSE, one must try all other combinations of principals and authentication flavors currently in use and eventually one will be correct and not return NFS4ERR_CLID_INUSE. I am not clear how this interacts with the variation of the definition of NFS4ERR_CLID_INUSE given in s7.2.  Would I expect the server to accept authentication flavors other than the one already used for another address leading to the same server?  s7.2 says that the server MAY accept a different flavor if it thinks the request is bona fide.  I don't understand what might or might not lead the server to think an alternative flavor was bona fide - and would it be likely to apply in this case? s4.9, last set of bullets: Should the draft also advise applying a one way function to the MAC address for privacy reasons? s4.8, page 18: o If one or more matching clientid4's is found, none of which is marked unresolved, the new IP address X is entered and marked unresolved. A SETCLIENTID_CONFIRM is done to X using XC and XV. After applying the steps below to each of the lead IP addresses with a matching clientid4, the address will have been resolved: It may have been determined to be part of an already known server as a new IP address to be added to an existing set of IP addresses for that server. Otherwise, it will be recognized as a new server. At the point at which this determination is made, the unresolved indication is cleared and any suspended SETCLIENTID processing is restarted I suspect that the second para in the section quoted here is not intended to be part of the bullet, and should be unindented. ----------------- s5.1, para 2: It would be desirable to mention that the change from big endian in [RFC7530] to network byte order is purely terminological. s5.1.1, para 1:  In the case of migration, the servers involved in the migration of a filesystem SHOULD transfer all server state associated with the migrating filesystem from source to the destination server. Arguably this SHOULD is an operational decision.  It does not affect the protocol or the format of the bits on the wire.  Accordingly I suggest s/SHOULD/should/.  This corresponds with the discussion in [RFC7530].  On the other hand: This must be done in a way that is transparent to the client. is probably a MUST: s/This must be done/If state is transferred it MUST be done/ s5.1.1, para 7: s/NFS version 4.0 protocol/NFS version 4.0 protocol (either in [RFC7530] or this update)/ s5.1.1, para 12: When a server implements migration and it does not transfer state information, it SHOULD provide a filesystem-specific grace period, to allow clients to reclaim locks associated with files in the migrated filesystem. I believe this is a MUST rather than a SHOULD.  The alternative of not doing it doesn't look good!  If so the last sentence of s5..1.1 needs adapting to match. s5.1.1.1: The {v, X, c} notation needs to be defined (presumably as used in s7.4 and Section 16.33 of [RFC7530]).  Section references in this doc or  [RFC7530] ought to be provided for SETCLIENTID (**s7.4 in this doc**) and SETCLIENTID_CONFIRM (s16.34 of [RFC7530]). s5.1.1.2, para 5: How does the server know that the client has been implemented to isolate owners to a particular filesystem?  Looking at the next para, I suspect that the server doesn't know - it merely observes that owners don't sprerad across multiple file systems at the time of migration but there is no guarantee - and it doesn't matter - that the client might not have owners that spread across filesystems some time. s5.1.1.2, para 7 on page 29: Such support only needs to be provided for requests issued before the migration event whose status as the last by sequence is invalidated by the migration event. I can't parse the last line of this sentence. s5.1.1.2, 1st and 2nd bullets on page 30: s/the last request for the owner in effect/the sequence number for the last request for the owner in effect/;  s/longer one less the next sequence to be received./longer one less than the next sequence to be received./ ------------------------ s6.2, para 4: o Some operations which modify locking state are not allowed to return NFS4ERR_DELAY. I think it be worth mentioning which operations these are in this para.  Inspection of Table 7 in RFC7530 indicates to me that the relevant operations are OPEN_CONFIRM and RELEASE_LOCKOWNER plus possibly RENEW.  GETFH, RESTOREFH, SAVEFH and ILLEGAL don't return NFS4ERR_DELAY but don't appear to affect the locking state.  I observe that all of these are mentioned explicitly except for OPEN_CONFIRM:  Does anything need to be said about this extra case? s6.2, para 6 on page 37: s/cannot change target filesystem locking state,/cannot change the target file system locking state,/ s6.2, para 7 on page 37: o Keeping track of the earliest request started which is still in execution (for example, by keeping a list of active requests ordered by request start time). The server can then define T' to be the first time at which the earliest-started active request started after time T. I don't understand what this criterion is driving at.  The second sentence appears to be aimed at identifying the time associated with some action that is associated with the earliest-started request but doesn't actually say what it is AFAICS. s7.1: It would be helpful to identify the sections of RFC 7530 that are affected by the changes summarized here, identifying replacements and modifications.  It would also be worth reordering the summary to match with the order of subsections of s7.1. s7.1, bullets 1 and 2: s/CLID_INUSE/NFS4ERR_CLID_INUSE/ s7.2: See the comment on s4.8, para 1 on page 18 above. s7.3, para 7: s/Thus, a client that is prepared to receive NFS4ERR_MOVED/Thus, if a client is prepared to receive NFS4ERR_MOVED s7.4.5.1: A pointer to Section 9.1.7 of [RFC7530] where the DRC is introduced would be helpful.  BTW I am not sure that a server is *allowed* to be without a DRC - s9.1.7 says it is critical to have something that looks like a DRC. s7.6, para 1: Please reference Section 19 of [RFC7530].  The affected para is the penultimate rather than the ultimate para of Section 19 of [RFC7530] judging from the initial words. s7.6, para 2: See the section "Client Identity Definition" for further discussion. Please add a section reference (Section 4 of this document) s8: Suggest OLD: Is to be modified as specified in Section 7.6. NEW: The security considerations of [RFC7530] remain appropriate with the exception of the modification to the penultimate  paragraph specified in Section 7.6 of this document and the addition of the material in Section 7.5. END