Text-to-speech isn't new but new cloud API services promise to make the transcription of speech to text faster. But how accurate...
Slack announced video support, how did they do it?
A quick look at your choices for a WebRTC SFUChris Koehncke
Slack today announced they were adding video to all of their messaging clients. Similar to the voice service (that has been in production for several months), free users get 1-to-1 calling and paid users get multiparty capabilities (up to 15 parties). The new client is slowly being rolled out (and no, I haven’t tried it yet).
The bigger question is “How did Slack implement video?” or better what technology is Slack using for the video/audio service? Yes, it’s WebRTC, but how did they do it?
Word on the street was that Slack was initially using Freeswitch hosted at AWS to support the voice functionality (this data may be old, testing today shows voice calling is now on Janus). All the media was TURN’d through AWS and none of my own tests showed any P2P data usage. Voice quality was quite good. However, call set-up times has been consistently long. In some cases, 20-25 seconds to ring and connect to the other end (a typical mobile phone connection is 6 seconds). I know the Google Duo team were hell bent to video connect in < 6 seconds, we have no more patience!
With the introduction of video and particularly support for multiparty WebRTC video, Slack needed an SFU or selective forwarding unit (telecom people always have great names for their stuff).
An SFU is basically a video router that manages all the participant’s video sessions and typically has an API to control how to route each video channel. The SFU for WebRTC has to sling a lot of video due to the meshing nature of WebRTC. For example, if you have 5 people on a video call, this means the SFU is handling 20 video sessions (because each person is receiving the independent video streams of each of the other persons). Slack will support up to 15 parties in calls (which is 210 simultaneous video streams to manage if fully meshed).
Word on the street says Slack is using the Janus Meetecho WebRTC media server for their SFU. Janus is an open source project. If you’re in the market for an SFU, your choices are limited. Jitsi, which has SFU capabilities, was acquired by Atlassian. Kurento, another WebRTC media server, was acquired by Twilio. Both Kurento and Jitsi maintain their open source projects. A new entrant in the open source SFU market is mediasoup.org (it’s new, leans on nodesj and I don’t have any history with it as yet).
Word is that Slack has spent heavily to customize Janus to meet their needs so don’t be surprised if Janus is the next acquisition in the WebRTC space.
If you don’t wish to mess about with all the intricacies of setting up a WebRTC SFU (it’s similar to having your fingernails pulled off with pliers), you can look at commercial WebRTC cloud services like Tokbox, Temasys, and Twilio or Agora.io. A cloud SFU gives you more time to focus on your underlying application rather than figuring out how to make video work. Disclaimer: I work at Tokbox so you can guess who I think is better! 🙂