iTranslated by AI
Building a Custom Redundant Audio Data (RED) Encoder to Prevent WebRTC Audio Quality Degradation
Preventing WebRTC Audio Quality Degradation by Creating a Custom Encoder for Redundant Audio Data (RED)
This article is for Day 15 of the NTT Communications Advent Calendar 2021.
I am @shinyoshiaki, a member of the SkyWay development team who joined in 2020.
In this article, I will write about RED, a technology for preventing audio quality degradation due to packet loss in WebRTC.
Note: As of December 14, 2021, the content of this article does not work on browsers other than Chrome.
Introduction
Starting from Chrome version M96, RFC2198 - RTP Payload for Redundant Audio Data (RED) has been officially enabled.
In RED, media packet redundancy is achieved by packing not only the latest media packet but also the previous N media packets into the RTP Payload.
This variable N is called "Distance," and a larger Distance value increases redundancy and improves audio quality when packet loss occurs. In exchange, the audio communication volume becomes Distance + 1 times larger compared to when RED is not used.
As explained in this Public Service Announcement (PSA), the Distance for RED in Chrome is currently fixed at 1.
As mentioned in the webrtc hacks article, in communication environments with high packet loss rates (e.g., 60% in the webrtc hacks example), a Distance value of 2 provides far better audio quality than 1.
Although increasing the Distance value increases the traffic accordingly, the amount is still small compared to video traffic. Therefore, in use cases where audio quality is prioritized, it is highly likely that one would want to set a Distance value greater than 1.
While the Distance for RED in Chrome is currently fixed at 1, the aforementioned PSA states that by using the browser's encoded insertable streams API to create a custom encoder that wraps Opus frames in the RFC 2198 format, the Distance value can be set to any desired value.
It is possible to use the encoded insertable streams API to write a custom encoder that wraps opus frames in the RFC 2198 format for applications that require more flexibility with respect to the amount of redundancy.
So, in this article, I will try to create a custom RED encoder that can set an arbitrary Distance value.
An article applying the contents of this post to SkyWay's js-sdk is scheduled to be published later on SkyWay's Note site.
Sample Code
The sample code uses TypeScript to run in the browser. (It should also be possible to rewrite it in any language using WebAssembly.)
The code upon which the samples in this article are based is available on GitHub.
- https://github.com/shinyoshiaki/werift-webrtc/tree/develop/packages/rtp/src/rtp/red
- https://github.com/shinyoshiaki/werift-webrtc/tree/develop/packages/rtp/examples/browser/customEncoder/main.ts
RED Packet
The first thing needed to create a custom encoder is to serialize and deserialize RED packets.
Let's look at the RED packet specifications.
The diagram below is an example of a RED packet from Section 7 of RFC2198.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC=0 |M| PT | sequence number of primary |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp of primary encoding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=7 | timestamp offset | block length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| block PT=5 | |
+-+-+-+-+-+-+-+-+ +
| |
+ LPC encoded redundant data (PT=7) +
| (14 bytes) |
+ +---------------+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
+ +
| |
+ +
| |
+ +
| DVI4 encoded primary data (PT=5) |
+ (84 bytes, not to scale) +
/ /
+ +
| |
+ +
| |
+ +---------------+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Within this diagram, the following part corresponds to the RED packet.
The RED packet is contained within the RTP Payload area.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=7 | timestamp offset | block length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| block PT=5 | |
+-+-+-+-+-+-+-+-+ +
| |
+ LPC encoded redundant data (PT=7) +
| (14 bytes) |
+ +---------------+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
+ +
| |
+ +
| |
+ +
| DVI4 encoded primary data (PT=5) |
+ (84 bytes, not to scale) +
/ /
+ +
| |
+ +
| |
+ +---------------+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Within the diagram above, the following part corresponds to the RED packet header.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=7 | timestamp offset | block length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| block PT=5 |
+-+-+-+-+-+-+-+-+
First, let's look at the RED packet header.
RED Packet Header
A RED packet header consists of multiple header blocks, as shown in the following diagram.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The fields of a header block are defined as follows:
-
F
- 1 bit
- Indicates whether another header block follows. A value of 1 means another header block follows; 0 means this is the last header block.
-
block PT
- 7 bits
- The RTP payload type of the redundant packet.
- Specifically, this matches the payloadType found in the
${payloadType}/${payloadType}part of thea=fmtpline immediately following the REDa=rtpmapline in the SDP.a=rtpmap:63 red/48000/2 a=fmtp:63 111/111
- Specifically, this matches the payloadType found in the
-
timestamp offset
- 14 bits
- The unsigned difference between this block's timestamp and the timestamp in the RTP header.
- The timestamp of a redundant packet must always be older than the timestamp in the RTP header.
- Omitted if the F bit is 0.
-
block length
- 10 bits
- The length in bytes of the data block corresponding to this header block, excluding the header block part.
- Omitted if the F bit is 0.
If the F bit is 0, it indicates that the block is the last packet (the primary/latest packet) rather than a redundant one. In this case, the timestamp offset and block length are omitted, resulting in an 8-bit (1-byte) header block as shown below:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|0| Block PT |
+-+-+-+-+-+-+-+-+
Based on these specifications, a program to deserialize/serialize RED header blocks would look like this:
interface RedHeaderField {
fBit: number;
blockPT: number;
/** 14 bits */
timestampOffset?: number;
/** 10 bits */
blockLength?: number;
}
export class RedHeader {
fields: RedHeaderField[] = [];
static deSerialize(buf: Buffer) {
let offset = 0;
const header = new RedHeader();
for (;;) {
const field: RedHeaderField = {} as any;
header.fields.push(field);
const bitStream = new BitStream(buf.slice(offset));
field.fBit = bitStream.readBits(1);
field.blockPT = bitStream.readBits(7);
offset++;
// The fBit of the last header block (latest packet) is 0
if (field.fBit === 0) {
break;
}
field.timestampOffset = bitStream.readBits(14);
field.blockLength = bitStream.readBits(10);
offset += 3;
}
return [header, offset] as const;
}
serialize() {
let buf = Buffer.alloc(0);
for (const field of this.fields) {
// Redundant packet blocks have timestampOffset and blockLength
if (field.timestampOffset && field.blockLength) {
const bitStream = new BitStream(Buffer.alloc(4))
.writeBits(1, field.fBit)
.writeBits(7, field.blockPT)
.writeBits(14, field.timestampOffset)
.writeBits(10, field.blockLength);
buf = Buffer.concat([buf, bitStream.uint8Array]);
}
// Latest packet
else {
// 1-byte header block
const bitStream = new BitStream(Buffer.alloc(1))
.writeBits(1, 0)
.writeBits(7, field.blockPT);
buf = Buffer.concat([buf, bitStream.uint8Array]);
}
}
return buf;
}
}
RED Packet Data Block
Immediately following the last header block, the data blocks are stored in the same order as the headers.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| block PT=7 | timestamp offset | block length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| block PT=5 | |
+-+-+-+-+-+-+-+-+ +
| |
+ LPC encoded redundant data (PT=7) +
| (14 bytes) |
+ +---------------+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
+ +
| |
+ +
| |
+ +
| DVI4 encoded primary data (PT=5) |
+ (84 bytes, not to scale) +
/ /
+ +
| |
+ +
| |
+ +---------------+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The length of each redundant packet's data block matches the blockLength field in its header block.
The length of the primary packet's data block is the remainder of the RED packet length after subtracting the header blocks and the redundant packet data blocks.
Now that we have all the rules for deserializing/serializing RED packets, we can complete the program to do so.
export class Red {
header: RedHeader;
blocks: {
block: Buffer;
blockPT: number;
/** 14 bits */
timestampOffset?: number;
}[] = [];
static deSerialize(buf: Buffer) {
const red = new Red();
let offset = 0;
[red.header, offset] = RedHeader.deSerialize(buf);
red.header.fields.forEach(({ blockLength, timestampOffset, blockPT }) => {
if (blockLength && timestampOffset) {
// Redundant packet length is blockLength
const block = buf.slice(offset, offset + blockLength);
red.blocks.push({ block, blockPT, timestampOffset });
offset += blockLength;
} else {
// Primary packet length is the entire remaining area
const block = buf.slice(offset);
red.blocks.push({ block, blockPT });
}
});
return red;
}
serialize() {
this.header = new RedHeader();
for (const { timestampOffset, blockPT, block } of this.blocks) {
// Redundant packet
if (timestampOffset) {
this.header.fields.push({
fBit: 1,
blockPT,
blockLength: block.length,
timestampOffset,
});
}
// Primary packet
else {
this.header.fields.push({ fBit: 0, blockPT });
}
}
let buf = this.header.serialize();
// Pack data blocks
for (const { block } of this.blocks) {
buf = Buffer.concat([buf, block]);
}
return buf;
}
}
RED Custom Encoder
Now that we can read and write RED packets, the next step is to create a RED encoder that packs a specified Distance of past packets into a RED packet for redundancy.
export class RedEncoder {
cache: { block: Buffer; timestamp: number; blockPT: number }[] = [];
// Maximum number of packets to hold. This size will be the maximum distance
cacheSize = 10;
// Default distance is set to 1
constructor(public distance = 1) {}
// Store the latest packet in the cache
push(payload: { block: Buffer; timestamp: number; blockPT: number }) {
this.cache.push(payload);
// Discard old packets
if (this.cache.length > this.cacheSize) {
this.cache.shift();
}
}
// Create a RED packet
build() {
const red = new Red();
const redundantPayloads = this.cache.slice(-(this.distance + 1));
const presentPayload = redundantPayloads.pop();
// Pack redundant packets
redundantPayloads.forEach((redundant) => {
// Perform calculation considering that the RTP Header timestamp is 32-bit
const timestampOffset = uint32Add(
presentPayload.timestamp,
-redundant.timestamp
);
// Overflows at 14 bits or more
// https://bugs.chromium.org/p/webrtc/issues/detail?id=13182
if (timestampOffset >= (0x01 << 14) ) {
return;
}
red.blocks.push({
block: redundant.block,
blockPT: redundant.blockPT,
timestampOffset,
});
});
// Pack the latest packet
red.blocks.push({
block: presentPayload.block,
blockPT: presentPayload.blockPT,
});
return red;
}
}
The structure is designed so that the received RTP Payload and its RTP Header timestamp are stored in the encoder's cache using the push method, and a RED packet with an arbitrary distance is generated using the build method.
Using the Custom Encoder with Encoded Insertable Streams
Finally, we get to the main topic. We will run the custom encoder on a browser by combining the encoder we just created with insertable streams.
I have prepared sample code that incorporates the custom encoder into a simple use case where a sending Peer transmits audio in one direction to a receiving Peer.
import { buffer2ArrayBuffer, Red, RedEncoder } from "werift-rtp";
(async () => {
// Set the custom encoder's distance to 3
const redEncoder = new RedEncoder(3);
// Enable encodedInsertableStreams
const sender = new RTCPeerConnection({
encodedInsertableStreams: true,
} as any);
const receiver = new RTCPeerConnection({
encodedInsertableStreams: true,
} as any);
const [track] = (
await navigator.mediaDevices.getUserMedia({ audio: true })
).getTracks();
const rtpSender = sender.addTrack(track);
// Configure the sender side for insertableStreams
const senderTransform = (sender: RTCRtpSender) => {
//@ts-ignore
const senderStreams = sender.createEncodedStreams();
const readableStream = senderStreams.readable;
const writableStream = senderStreams.writable;
const transformStream = new TransformStream({
transform: (encodedFrame, controller) => {
if (encodedFrame.data.byteLength > 0) {
// Deserialize RTP Payload (RED packet)
const packet = Red.deSerialize(encodedFrame.data);
// Extract the latest packet (non-redundant packet) and pass it to the custom encoder
const latest = packet.blocks.at(-1);
redEncoder.push({
block: latest.block,
blockPT: latest.blockPT,
timestamp: encodedFrame.timestamp,
});
// Have the custom encoder create a RED packet
const red = redEncoder.build();
// Replace the RTP Payload with the RED packet created by the custom encoder
encodedFrame.data = buffer2ArrayBuffer(red.serialize());
}
controller.enqueue(encodedFrame);
},
});
readableStream.pipeThrough(transformStream).pipeTo(writableStream);
};
senderTransform(rtpSender);
const [transceiver] = sender.getTransceivers() as any;
const { codecs } = RTCRtpSender.getCapabilities("audio");
// Declare the usage of RED
transceiver.setCodecPreferences([
codecs.find((c) => c.mimeType.includes("red")),
...codecs,
]);
await sender.setLocalDescription(await sender.createOffer());
await new Promise<void>((r) => {
sender.onicecandidate = ({ candidate }) => {
if (!candidate) r();
};
});
// Configure the receiver side for insertableStreams
const receiverTransform = (receiver: RTCRtpReceiver) => {
//@ts-ignore
const receiverStreams = receiver.createEncodedStreams();
const readableStream = receiverStreams.readable;
const writableStream = receiverStreams.writable;
const transformStream = new TransformStream({
transform: (encodedFrame, controller) => {
if (encodedFrame.data.byteLength > 0) {
// Deserialize RTP Payload (RED packet)
const red = Red.deSerialize(encodedFrame.data);
// Display the distance value
console.log("distance", red.blocks.length - 1);
}
controller.enqueue(encodedFrame);
},
});
readableStream.pipeThrough(transformStream).pipeTo(writableStream);
};
receiver.ontrack = (e) => {
receiverTransform(e.receiver);
};
await receiver.setRemoteDescription(sender.localDescription);
await receiver.setLocalDescription(await receiver.createAnswer());
await sender.setRemoteDescription(receiver.localDescription);
})();
The distance of the RED packets received by the receiver will be displayed in the browser console.
Through this, we have confirmed that we have successfully set an arbitrary RED distance on the browser.
Conclusion
Before the advent of the encoded insertable streams API, achieving the kind of functionality discussed in this article required modifying the libwebrtc source code. It was a valuable experience to see firsthand how the arrival of the encoded insertable streams API has made it possible to handle these relatively low-layer processes flexibly on the browser side.
RED is a robust solution for enhancing audio quality, and I hope it contributes to better communication experiences for users in environments where audio quality is currently degraded due to packet loss.
While this article demonstrated a browser-side custom RED encoder to modify the Distance value, similar results can be achieved on the SFU side by de-encapsulating RED packets and performing logic similar to what we did in our custom encoder. For P2P use cases that do not involve an SFU, however, the approach described in this article is currently the only viable option.
Discussion