Text-to-speech isn't new but new cloud API services promise to make the transcription of speech to text faster. But how accurate...
Build vs Buy Your WebRTC services
Should you go out and build your entire WebRTC infrastructure from scratch or try out a new fangled service provider? I'll try and help.Chris Koehncke
With advent of fast growing live video mobile apps, many developers are fast tracking their own video capabilities in their new app development. If you’re building any kind of multi-party live video app, then you’re likely to end up using WebRTC.
But the question, should you go out and build everything yourself or use a WebRTC platform provider? Platform providers include Tokbox, Twilio, Agora.io, Xirsys, Temasys or Kandy. Since I work for Tokbox (full disclosure and also see my great disclaimer page) my job preserving response would be YES, absolutely use a platform provider and most certainly Tokbox.
However, that wouldn’t make for an interesting blog article, although it might resemble many “pitch” articles that seem to be the norm. So I am going to try and make this real.
For a simple 1:1 (two video talking heads) “Facetime” video call this can be easily implemented using WebRTC’s native P2P capabilities. Frankly you should be able to build this yourself. Plus with the new Cocoapod for iOS slapping WebRTC into your mobile app just got easier (though the current one is not Google sanctioned per se).
But you still need signaling to connect the video parties and that means you need a signaling server. That’s a decision and deployment you will have to own. You’ll need to factor in a TURN service, for those tougher connections (which may be 15% of the total) from the nice folks at Xirsys, or someone else, and you’re in business. If you want any sort of reporting on usage and problem, you can subscribe to Callstats.io. Pop adapter.js into your client side code, hope it works on all browsers all the time and you’re a rock star.
However, there are compelling reasons why you might consider a platform provider. If any of the above sounded complicated, then a WebRTC platform provider may be in your future. The job of the platform provider is to make this all easy with good documentation – plus provide production-ready mobile SDK’s for building native iOS and Android apps.
You should expect a platform provider to have global infrastructure that works all the time and scales globally to the massive size of your future business (you are a big deal after all). Add in the need for support when you are out of depth (or at least wanna blame someone other than yourself).
Finally, don’t forget to check out their devops tools. What real time information can you extract? Post activity logs? Trouble reports? If may be their platform, but it’s your app and your customers who will complain to you first. Sexy it is not, useful it is.
Multi-party video. Lots of talking heads and perhaps with lots of people viewing. Suddenly it got harder. WebRTC’s meshing connection is beautiful to watch in action, but requires your PC or mobile device to manage each connection, and your app code to do all the work. This pretty much limits a device to handling 3 maybe 4 participants due to constraints of network and device.
If you need multiparty video to work reliably, like Google Hangouts? You’re going to want either a Selective Forwarding Unit (SFU) or Multipoint Control Unit (MCU) for video management. If this sounds like a super secret government project, then I’d go directly to GO and collect a WebRTC platform provider. I will focus for now on the SFU which is a WebRTC speciality.
The SFU is a complex beast, basically it’s a video router for WebRTC, usually in the cloud. In a 5 party “Hangouts” call, the SFU is receiving 5 inbound video streams but sending 20 downstream to all the participants (each person receiving the other 4). For those slow on counting, this is a total of 25 streams.
This adds up quickly. If your platform has 100 concurrent, 5 party calls, your SFU is routing 2,500 video streams at the same time. That $5 a month Digital Ocean server of yours might need an upgrade to keep up with this.
To my knowledge, there aren’t any commercial SFUs you can buy today. Wowza and Red5Pro are trying. They’re adapting from the RTMP world, but they’re not there yet. The first SFU was Addlive but this was acquired by Snapchat several years ago.
Into the open source world, Jitisi has a long following, but they were acquired by Atlassian. The source lives on though. Kurento was another sort of SFU/MCU framework and oh that was acquired by Twilio. Are you seeing any pattern here? Meetecho with their Janus WebRTC gateway is still in the market. Mediasoup, based upon node.js is new to the scene.
Using an open source SFU is not for the meek. At any scale, the SFU can easily test the limit of network NIC cards and connectivity. None of these open source SFU’s are a simple git clone npm install sort of project. Setting up an SFU is on par with deploying your own mail server. You are running your own mail server, right?
In the platform area, this capability, ready to go, is exactly what you should expect from your WebRTC platform provider. Tokbox has deployed their own distributed cloud SFU technology, Twilio is probably going to deploy Kurento on a cloud basis (I’m waiting for the SMS to tell me), Vidyo recently launched their Vidyo.io service and then there is Temasys, Kandy.io and perhaps a few others (check their math if they have an SFU, built it themselves or using one of the open source projects).
It’s the API stupid.
With any platform provider, give each a serious look at their API offering, what are the updates on their GitHub repos, sample code and SDKs (I pay close attention to the stars, watches and download numbers). It’s easy to post stuff out, hellish to keep it up to date. You should definitely build a sample app. If you can’t build something basic yourself in an hour (I’m not talking about cloning something on Heroku) then that platform provider may not be for you. If their API’s look screwball, the platform probably is as well. Questions about their infrastructure and support are fair game.
Everybody is great until they’re not.
Let me put a number out there (I have some economics below), if you’re not going to be consuming at least 1 million minutes of monthly usage, you should immediately focus on a WebRTC platform provider. I watch way too many startups proudly consume all their energy on a backend when they should have focused on the application. Pay the vendor, and move on to focus on the real meat of your application.
If it’s all about mobile, as if you’re not busy enough on the back end, you’ve got to also work on a mobile SDK. There are various open source modules, but here again, for WebRTC you’re gonna have to roll this yourself, and keep it up-to-date. This is where a platform provider wins out. Let this be their nightmare, not yours.
Testing is where most of the nightmares happen with mobile. Android with hundreds of hardware builds can keep you up late at night with problems. No emulator can cover all the edge cases. I’m talking stranger things and upside-downs. So unless you’re prepared to have an inventory of Android phones in your lab, a platform provider wins out. iOS is clearly better since there are less choices (hence why developers often start with iOS first). Nonetheless, dealing with and maintaining all the permutations isn’t something I’d be eager to tackle.
Finally, there are video cloud providers in the market who are NOT using WebRTC. Does this matter? You know my vote on this. I think a single vendor trying to maintain their own proprietary platform with all the complexity will struggle to keep up. Yes, they’ll try and explain how they have some magic elixir that others haven’t discover. But with the many dollars and voices behind the WebRTC standard, argumentative as they may sometimes be, combine into a more powerful package I believe over the long haul.
If you can deploy an SFU or even a simple 1:1 WebRTC app, keep it running 24 x 7 on a worldwide infrastructure, pay for the bandwidth, keep all the SDK’s up to date with whatever Google decides to inject into Chrome Version xx.yyz at 12:01 a.m. for less than a platform provider will charge you, then by all means.
Do global capabilities matter for your application? Social media applications often rocket off in far away countries for reasons beyond your control. A business app, usually not. How much infrastructure will you need and when? This multiplies your support if you take this on yourself. But similarly don’t be taken in by a platform provider with a generic global image on their website. I’d ask where they’re set-up and what the infrastructure looks like. A non-direct answer is an answer here.
If you don’t value your time with a $$$ amount, then you’ve already sunk at the dock. At Tokbox, I’ve sadly had customers who were spending $90 a month, complain about how expensive Tokbox is. Expensive compared to what? A Blue Bottle latte?
The fear is always – I’m going to get really big and then be locked into XYZ platform provider. The reality is if you have a track record of spending money, any vendor is willing to work with you to keep your business. Recognize they have costs themselves which while probably lower than what you will spend are costs nonetheless. Even layering in some profit for them, you’re still likely spending less than what you’d have to spend all-in.
However, it’s pointless to try and negotiate a hypothetical “I’m going to be big”. Your biggest concern should be not going out of business before you hit any success mark. Spend your energy on the application and your user experience.
Some hard numbers
Let’s say your venture is going to be using 100,000 minutes a month of usage. I know this doesn’t sound like much, but trust me, if you’re starting from zero, it always takes longer than you think. I’ll use Tokbox as an example (again please check out all platform providers and roll your own numbers).
100k minutes will get you a bill at Tokbox of ~ $480. This sounds expensive, $480 for what. Highway robbery!
I’m going to go out to Amazon, install Kurento on my own AWS instances and screw Tokbox and their fancy API. Well go ahead, make my day.
Two C4 instances at AWS are $0.215 (m4-x-large) per hour or $310 a month, then you’ll need bandwidth of 842 GB for those 100k minutes which adds $76. So your hard costs are $386. However, I’m guessing you’ll spend 2 hours a week maintaining your infrastructure and here is where cost increase. 8 hours a month at cost of your time, $50 a hour (use amateurs) is $400 so bang you’re at $786 a month on your “do it yourself” project.
Wait just a second Spanky
One notion is that as your volume increases, your support hours don’t increase linearly. Fair enough. But’s let’s assume you reach the stellar apex of 500k minutes of stream usage and let’s say you fiddle around only 24 hours in the month trying to keep your Rube Goldberg SFU running. So your hard costs are ~ $1,155, add in 24 hours of labor you’re in for $2,355. The Tokbox price is $2,271. Call me, I’ll get you a discount.
While the core components of WebRTC are free, the application and operation are not. If a WebRTC platform provider has the features and functionality you need, that’s one headache you don’t have to have. Cost optimization is something you worry about once you get big. I’d always opt to focus on a winning application.
I went down the same path with this very blog – hosting my own WordPress instance. That was until I realized I was wasting cycles and brain cells doing stuff that wasn’t important. Now I’m happy to pay WPengine, because it’s what they do and they do a much better job.