📝

How To Integrate Video/Audio Call Features to My App?

2022/06/30に公開

Whether it is about facilitating streaming or exchange of data!
There are several libraries to cover the entire browser's WebRTC implementation process.

Do you know this is simply to promote building WebRTC-based apps?

Well, this tutorial will walk you through the in and out technology of WebRTC with the process to build your voice/video calling app.

What is WebRTC?

WebRTC (Web Real-time Communication) can be referred to as a specification that enables web browsers, native clients, and mobile devices to exchange audio, video, and regular information using APIs. This technology commonly makes use of peer-to-peer and direct connection to communicate.

However, the technology of WebRTC is implemented as an open web standard/specification that is exposed using JavaScript APIs in most of the browsers. In simple terms, it works by emboding the connection between different peers using signaling - the process where both the devices are connected upon a server and makes an exchange of data.

Common Terminology you must know

  1. Signaling
  2. SDP - Session Description Protocol
  3. ICE Candidates
  4. STUN & TURN
  5. Peer Connection

1.Signaling

The process that uses a signaling server to determine the communication protocol, media codecs and formats, channels, methods of data transfer and more of routing information that needs to be exchanged between peers. Its major role is to establish the connection by minimizing the exposure of potential private information.

How to create a signaling server and the way the actual process takes over?

Before we proceed, it’s important for you to know that WebRTC does not specify any particular transport mechanism to signal the information. Instead you can use anything you prefer like WebSocket or XMLHttpRequest to carrier pigeons to transfer the signaling data among the peers.

Although, here it is not necessary for the server to understand the content of signaling data as the conversation content will move across the signaling server via a black box.

But, when it comes to the ICE subsystem - you are supposed to send the signaling data to another peer, wherein the other peer knows how to receive the information as well as how to deliver them to its own ICE subsystem. Everything you need to know here is to channel the information back and forth for better reach.

2.Session Description Protocol(SDP)

Session Description Protocol (SDP) is a format that describes multimedia communication sessions for announcement and invitation. Building a signaling mechanism with method and protocol, is comparatively simple and easy when compared to WebRTC. However, for the entire process to carry over, WebRTC requires the exchange of the media metadata among the peers in the form of offers and answers.

Here, the offers and answers that take place within the SDP are in two way communication and the method looks like the following,

v=0
o=- 7614219274584779017 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS
m=audio 1 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126
c=IN IP4 0.0.0.0

From the above method, now you can be clear that the WebRTC performs the execution automatically based on the audio and video devices that’s presence on your laptop or PC.

3. ICE Candidates

ICE candidate is a technique that stands for Interactive Connectivity Establishment which is used in NAT (network address translator) to establish the communication for VoIP, peer-to-peer, instant messaging, and other related interactive media.

Mostly, the ICE candidates provide the information about the IP address and port from the place where the data is going to be exchanged. This available information will be communicated directly via a TURN server wherein each peer will prefer its best candidate to be at the priority than the rest

4. STUN & TURN

STUN is an IETF protocol that stands for Simple Traversal of Utilities for NAT (Network Address Translator). This is meant for all kinds of real-time communication modes including voice, video, and messaging.

Its mechanism permits the users to communicate with another user behind the NAT firewall, keeping their IP address private within the local network (LAN).

When it comes to the process, as soon as the information is exchanged and the connection gets established among the peers, the STUN server leaves the rest of the conversation data.

They are the cost-effective type of WebRTC communication where the connection might fail to establish for some users because of the type of NAT device.

TURN (Traversal Using Relays around NAT) is a protocol that's an extension to STUN that supports in the traversal of network address translators (NAT) for WebRTC. It permits the user to send and receive the data via an intermediary server.

Sometimes, these client communication endpoints get stuck behind a variety of NATs or symmetric NAT while they are in use. At this moment it is easier to send any media file through the relay server which is in turn called the TURN server.

5. Peer connection

The RTCPeer connection interface talks about the WebRTC connection that is between the local computer and a remote peer. It provides a variety of methods to connect to remote peers for maintaining and monitoring the connections, also to close the same once it's been utilized.

EventTarget  ← RTCPeerConnection

Workflow of Integration - A Detailed Pathway

Step 1 - Exchange with session description

The signaling process is all about two basic terms - Offers (making a call) and answers (responding a call). The whole process starts with an offer which is created by the user when initiating the call. This offer will have a session description that’s formed in SDP format which needs to be delivered to the receiving user who is called as callee. Here, the caller responds to the offer with an answer using text messaging containing an SDP description.

For instance, if the data is of video then the signaling server makes use of WebSocket to transmit the message like “offer messages using the type as ‘video-offer’ and answer messages using the type as ‘video-answer’."

Step 2 - Make actual connections by exchanging ICE candidates

For any connection between peers to be in the real-time, there needs to be an exchange of ICE candidates. Where every ICE candidate describes a method to interact sent by the sending peer, there every other peer sends candidates regularly to run out of suggestions. This continues even if the media gets started with streaming.

To complete the process of adding a local description it uses the method pc.setLocalDescription(offer), wherein icecandidate event is sent to the RTCPeerConnection. Here, once these two peers agree upon mutual compatibility of candidates, their SDP is used by each peer to construct and open a connection to make a start.

Furthermore, if they agree on a better higher-performance candidate, then the stream may change the format as per the requirement. Although when the connection is not at all supportive, these candidates once received after a media file can be downgraded to a lower-bandwidth connection if required.

Moreover, here each ICE candidate that was sent to the other peer via a JSON message of type "new-ice-candidate," is also sent upon the signaling server to the remote peer for further processing.

Make a Call "A detailed process"

To make a call the invite() function needs to be invoked as called by a click on a specific user from the list of users with whom you are trying to connect.

var mediaConstraints = {
  audio: true, // We want an audio track
  video: true // ...and we want a video track
};

function invite(evt) {
  if (myPeerConnection) {
    alert("You are already in another call.");
  } else {
    var clickedUsername = evt.target.textContent;

    targetUsername = clickedUsername;
    createPeerConnection();

    navigator.mediaDevices.getUserMedia(mediaConstraints)
    .then(function(localStream) {
      document.getElementById("local_video").srcObject = localStream;
      localStream.getTracks().forEach(track => myPeerConnection.addTrack(track, localStream));
    })
    .catch(handleGetUserMediaError);
  }
}

Create a peer connection “caller and callee”

To construct a RTCPeerConnection object, the createPeerConnection() function is used by both the caller and the callee where the appropriate ends of the WebRTC gets connected. Here, it is invoked by the invite() function where the caller tries to start making a call, which is received by the callee via an offer message using the function handleVideoOfferMsg().

function createPeerConnection() {
  myPeerConnection = new RTCPeerConnection({
      iceServers: [     // Information about ICE servers - Use your own!
        {
          urls: "stun:stun.stunprotocol.org"
        }
      ]
  });

  myPeerConnection.onicecandidate = handleICECandidateEvent;
  myPeerConnection.ontrack = handleTrackEvent;
  myPeerConnection.onnegotiationneeded = handleNegotiationNeededEvent;
  myPeerConnection.onremovetrack = handleRemoveTrackEvent;
  myPeerConnection.oniceconnectionstatechange = handleICEConnectionStateChangeEvent;
  myPeerConnection.onicegatheringstatechange = handleICEGatheringStateChangeEvent;
  myPeerConnection.onsignalingstatechange = handleSignalingStateChangeEvent;
}

Whenever the RTCPeerConnection() constructor is used, an object will be specified by providing the configuration parameters for the connection. Let's see this further with an example of iceServers - It is an array of objects that describes the STUN and/or TURN servers for the ICE layer to use which further establishes a route among the caller and callee. These servers are used to specify the best route and protocols that need to be used while communicating between the peers even when they are using NAT.

Start with negotiation "create connections"

As soon as the caller creates RTCPeerConnection, it creates a media stream and adds the tracks to the connection while starting a call, where the browser will deliver a negotiationneeded event to the RTCPeerConnection and indicate its ready status to begin further negotiation with other peers. The methods to perform the negotiation is as follows,

function handleNegotiationNeededEvent() {
  myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);
  })
  .then(function() {
    sendToServer({
      name: myUsername,
      target: targetUsername,
      type: "video-offer",
      sdp: myPeerConnection.localDescription
    });
  })
  .catch(reportError);
}

However, from the above method we need to create and send an SDP to the peer with which we want to connect to begin the negotiation process. Here, the offer combines a list of supportive configurations for the connections, to be made that has the information about the media stream. Later, we need to add these connections locally to the ICE candidates that have been gathered by the ICE layer for further execution.

Session negotiation "transmits the offer"

Now once the negotiation starts over with another peer transmitting the offers, let's see what happens on the callee's end of the connection. Here, the callee receives the offers and calls handleVideoOfferMsg() function to process in further.

Receiving incoming call "the messages via offer"

When the offer is received, the callee's handleVideoOfferMsg() function is called upon with the "video-offer" message where the function requires two things,

  • Needs to create its own RTCPeerConnection to add the audio/video track from microphone and webcam
  • Needs to process the offers that's been received to construct and send an answer
function handleVideoOfferMsg(msg) {
  var localStream = null;

  targetUsername = msg.name;
  createPeerConnection();

  var desc = new RTCSessionDescription(msg.sdp);

  myPeerConnection.setRemoteDescription(desc).then(function () {
    return navigator.mediaDevices.getUserMedia(mediaConstraints);
  })
  .then(function(stream) {
    localStream = stream;
    document.getElementById("local_video").srcObject = localStream;

    localStream.getTracks().forEach(track => myPeerConnection.addTrack(track, localStream));
  })
  .then(function() {
    return myPeerConnection.createAnswer();
  })
  .then(function(answer) {
    return myPeerConnection.setLocalDescription(answer);
  })
  .then(function() {
    var msg = {
      name: myUsername,
      target: targetUsername,
      type: "video-answer",
      sdp: myPeerConnection.localDescription
    };

    sendToServer(msg);
  })
  .catch(handleGetUserMediaError);
}

Well, the above mentioned method is somewhat similar to the invite() function that has already been back while making a start with a call. This method begins from creating and configuring a RTCPeerConnection using our createPeerConnection() function. Once done with the process, the SDP offer is taken from the received "video-offer" message which is later used to create a new RTCSessionDescription object that represents the caller's session description.

ICE candidate exchange with sending and receiving

Sending ICE Candidates with negotiation process

The entire process of ICE negotiation involves each peer where the data is sent from one candidate to another repeatedly until there is a supportive RTCPeerConnection for media to transport the needs. However, ICE is not aware of any signaling server, the code handling transmission of each candidate is handled via icecandidate event.

Here, the onicecandidate handler receives the event whose candidate property describes the SDP description - the content of the candidate depends upon the needs that has to be transmitted using signaling server. Let’s get some understanding with the below instance,

function handleICECandidateEvent(event) {
  if (event.candidate) {
    sendToServer({
      type: "new-ice-candidate",
      target: targetUsername,
      candidate: event.candidate
    });
  }
}

Now, the above method builds an object that contains the candidate, and later sends it to the other peer using the sendToServer() function as it has been already described in sending messages to the signaling server.

Receiving ICE Candidates "a destination peer"

When it comes to delivery of the message, the signaling server delivers each ICE candidate to the destination peer where it uses a variety of methods. In our provided example, this is as JSON objects with a type property that contains the string "new ice-candidate." In this context the handleNewICECandidateMsg() function is called upon by the main WebSocket incoming message code that handles the entire message

function handleNewICECandidateMsg(msg) {
  var candidate = new RTCIceCandidate(msg.candidate);

  myPeerConnection.addIceCandidate(candidate)
    .catch(reportError);
}

Where, the above specified function is used to construct an RTCIceCandidate object by passing the received SDP description into the constructor. This later delivers the candidate to the ICE layer by passing it across myPeerConnection.addIceCandidate(). The whole process ends up with the handling of the fresh ICE candidate to the local ICE layer, which finally consolidates and completes the handling process of the candidate.

Conclusion

Well, I hope this post would have guided you with utmost detailed insight to build your successful video and voice chat application using WebRTC.

But, if you feel for some more guidance with further technology and concepts, stay tuned with us on this existing and interesting journeys of trending technology. Catch you soon!

Have a Great Time!

Discussion