Internet-Draft | moq-mi | October 2024 |
Cenzano-Ferret & Frindell | Expires 24 April 2025 | [Page] |
This protocol can be used to send and receive video and audio over Media over QUIC Transport [MOQT].¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://afrind.github.io/draft-cenzano-media-interop/draft-cenzano-moq-media-interop.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-cenzano-moq-media-interop/.¶
Discussion of this document takes place on the Media Over QUIC Working Group mailing list (mailto:[email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/moq/. Subscribe at https://www.ietf.org/mailman/listinfo/moq/.¶
Source for this draft and an issue tracker can be found at https://github.com/afrind/draft-cenzano-media-interop.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 24 April 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This protocol specifies a simple mechanism for sending media (video and audio) over MOQT for both live-streaming and VC use cases. The protocol is flexible in order to support this range of use cases.¶
Media parameters can be updated in the middle of a the track (ex: frame rate, resolution, codec, etc)¶
The protocol defines a low overhead packaging format optimized for WebCodecs called WCP that is extensible to other formats such as FMP4. This is not LoC [LOC], but will eventually be merged with that specification.¶
The publisher selects a namespace of their choosing, and sends an ANNOUNCE message for this namespace. Since MoQT tracks are immutable, each new broadcast MUST have a unique namespace. It is RECOMMENDED that the last tuple of the track namespace contain a broadcast timestamp to ensure uniqueness.¶
Within the namespace the publisher offers media tracks named videoX
and
audioX
, where X is an integer starting at 0 and increasing by 1 for each
additional track of a given type.¶
For example, if the publisher issues 2 audio tracks and 1 video track, the track
names available will be video0
, audio0
, and audio1
.¶
The subscriber will consider all of those tracks belonging to the same namespace as part of the same synchronization group (timestamps aligned to the same timeline).¶
For the video track, the publisher begins a new group at the start of each IDR (so object 0 will be always an IDR Keyframe), and each group contains a single subgroup. Each object has the format described in Section 2.4.¶
For the audio track, the publisher begins a new group with each audio object, and each group contains a single subgroup. Each object has the format described in Section 2.4.¶
TODO: Datagram forwarding preference could be used, but has problems if audio frame does not fit in a single UDP payload.¶
To avoid using fractional numbers and having to deal with rounding errors, timestamps will be expressed with two integers:¶
To convert a timestamp into seconds you just need to: timestamp(s) = timestamp numerator / timebase¶
Example:¶
PTS = 11, timebase = 30 PTS(s) = 11/30 = 0.366666¶
All objects this protocol have the following format.¶
Media Type: Indicates what kind of media payload will follow.¶
Code | Value |
---|---|
0x0 | Video H264 in AVCC with WCP |
0x1 | Audio Opus bitstream |
Media payload: Media type specific payload¶
Seq ID: Monotonically increasing counter for this media track.¶
PTS Timestamp: Presentation timestamp in timebase.¶
TODO: Varint does NOT accept easily negative values, so it could be challenging to encode at start (priming).¶
DTS Timestamp: Display timestamp in timebase. If B frames are not used, the encoder SHOULD set this to the same value as PTS.¶
TODO: Varint does NOT accept easily negative values, so it could be challenging to encode at start (priming).¶
Timebase: Denominator used to calculate PTS
, DTS
, and Duration
.¶
Duration: Duration of Payload in timebase. It will be 0 if not set.¶
Wall Clock: Epoch time in ms when this frame started being captured. It will be 0 if not set.¶
Metadata Size: Size in bytes of the metadata section. It will be 0 when no metadata is present.¶
Metadata: Extra data needed to decode this stream. This will be
AVCDecoderConfigurationRecord
as described in [ISO14496] section
5.3.3.1, with field lengthSizeMinusOne
= 3 (So length = 4). If any other size
length is indicated (in AVCDecoderConfigurationRecord
), the receiver SHOULD
close the session with a Protocol Violation
error.
Any change in encoding parameters MUST send a new
AVCDecoderConfigurationRecord
¶
Payload: H264 with bitstream AVC1 format as described in [ISO14496] section 5.3. Using 4 bytes size field length.¶
Seq Id: Monotonically increasing counter for this media track.¶
PTS Timestamp: Indicates PTS in timebase.¶
TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming).¶
Timebase: Denominator used to calculate PTS
and Duration
.¶
Sample Freq: Sample frequency used in the original signal (before encoding).¶
Num Channels: Number of channels in the original signal (before encoding).¶
Duration: Duration of Payload in timebase. It will be 0 if not set.¶
Wall Clock: Epoch time in ms when this frame started being captured. It will be 0 if not set.¶
Payload: Opus packets, as described in [RFC6716] - section 3¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
TODO Security¶
This document has no IANA actions.¶
TODO acknowledge.¶