288: TZ Discussion – Colby Speaks

Justin, Jason and Colby discuss Colby’s strength training program, his view on the accelerated math class and why he designed his own football helmet and defensive playbook, the latest on the Math Academy, Justin’s 7-minute workout and his reinvention of Weight Watchers, Jason’s progress on Operation Superhero and his 3-week sprint concept and the latest on Spoke and Givetronic.

  1. Hi,

    Nice to hear from Coby :).

    On transcription quality, one issue you might see, even if you acheived a 100% perfect result, is that the English people speak is ungrammatical garbage. We all cut each other a lot of slack when communicating verbally, but seeing the circumlocutions and repetitions and basic lack of grammar typed out looks bad. It might also be embarrassing for a person to see their verbal diarrhoea preserved forever in textual form. For notes, what you really want is not a perfect transcription, but a smart editor, fixing up the English and cutting out whaffle while preserving the intent of the speakers.

    I guess these concerns matter less for some use cases than others. For instance, as a tool for a human doing a manual note taking or transcription, an accurate automatic one could be a useful input.

    So Justin, did the cohost with the podcasting business on the other show you did agree to buy your product?


  2. Sorry, great to hear from Colby … Doh, if only I had dictated my comment into a tool which was smart enough to write what I meant, not what I said.

  3. https://www.speechpad.com/speech_to_text

    Hey Justin,

    Obviously transcription is a hard nut to crack for several reasons. One of which is audio quality. You probably already know how to do some cleanup, but there are a few automated tools for spoken word audio. First there is the Levelator, which is not being supported anymore, but it still works pretty well.


    A more automated tool with an api is auphonic.


    Also, there are some semi human-powered translation services that do a pretty good job.

    Rev (with api) – https://www.rev.com/api
    TranscribeMe (with api) – http://transcribeme.com/developer-api/
    Speechpad – https://www.speechpad.com/speech_to_text
    CastingWords (with api) – https://castingwords.com/support/transcription-api.html
    Accentance – http://accentance.com/

  4. Justin says:

    Thanks Glenn some great links there πŸ™‚

  5. Franz See says:

    Hi guys,

    First of all, it was so fun listening to Colby in the show. He really is a mini-Jason πŸ˜€ Me and my wife were laughing so hard πŸ˜€ And the way he just jumps in into discussion – so funny πŸ˜€

    On Spoke though, it’s very interesting that Justin thinks it’s not worth the effort to validate the tech. Most of the time, I’d agree that you need to validate the business first. However, that’s mainly if the technical pieces have already been solved before. I think Justin has the assumption that global speech-to-text has been solved already (at least to a degree of being respectable). But honestly, I dont think it has been. If it is, then siri would be usable for majority of us non-americans right?

    So personally for me, if you can solve this. Then the current business model is far too small. If you can solve this, then why not just sell it to apple? If you can’t, then simplify the problem. This would need though for Justin to do validation on both the technical side and the business side. That is , figure out the target market, and just make speed-to-text work for them. Don’t try to target british accents, or boston accents or whatever. Just figure out what the accent of those people are and work on that.

    However, if Justin really is serious about this, then I highly recommend that Justin tries to prepare and solve this himself. It would be great if he can just hook up into some API, but I’d doubt he’ll find anything out there that would match what he needs (Not unless you scope it down again and that the transcription need not be an accurate transcription but just contains certain keywords). What would most likely happen is that you’d get the audio files of your target market, have it transcribed (either you do it manually or you outsource it), and then use that as a training set in his supervised learning model. It is doable. As long as you have enough data (and you know what you’re doing of course πŸ˜€ ).

    Having said that, the path to least resistance would probably be just doing a productized-service and outsource the transcription work to the Philippines (full disclosure : I’m a Filipino) πŸ™‚ And once you have enough data, then you can play around with machine learning to optimize the operations.


  6. Jason says:

    @Franz See – I took the liberty of editing your comment and replacing Justin with Jason and vice-versa, where appropriate. I just don’t want people getting even more confused than they already are about who’s who. Think J-A-son (American) and J-U-stin (United Kingdom). Yes, I know, it’s incredibly clever and all I have to say is that I thought of it all by myself. πŸ˜‰

  7. Nice to hear from Colby. For some reason I stupidly assumed he was still 5.

    Interesting to hear about Spoke Justin. For the record its searchcode.com not codesearch.com (confusing I know).

  8. Jason says:

    @Ben Boyter – Yeah, it’s hard to get used to kids growing up. I’d freeze them at their current ages if I could, or at least for maybe a few more years. πŸ˜‰ It’s going way too fast for me. But it’s a good reminder to take advantage of the time you have with them.

  9. Abe says:

    Justin, serious question: are you committed to the task of proving product demand? If so then it’s a simple function of “if there’s a will there’s a way, right?” If there’s enough demand then you find a way. If in automated API to fully meet your needs doesn’t exist then get creative and find a solution.

    So, with all that preface, there is an elegant solution: Mechanical Turk. Have humans do the transcription. Bonus that you could provide multi-language support. To solve any privacy concerns – cut up each recording into 1 – 5m snippets. Randomize those snippets and get them transcribed. Stitch the transcriptions back together and violΓ ! Another bonus is you can have each snippet transcribed 3x which could feed an engine for error correction.

    I hope you can prove the demand. It should be a fun project!


  10. Hi Abe,

    That makes a lot of sense if the costs work out. Turk might be cheap but duplicating the work affects that.

    I got an email from the launch list saying the project was canned so I guess Justin decided to fail fast this time. Bravo Justin, brave move. Go on to the next idea.

    Looking forward to hearing what led to the decision to move on in the next episode.


  11. Go Colby go! πŸ™‚

    Such an awesome guy… It’s an honor to hear your side of stories. Please come back again!

  12. Matt S says:

    Hey Justin, I thought about your problem with Spoke when listening to a podcast that was interviewing someone who does transcription (in real-time!) for technical conference talks: http://giantrobots.fm/164

    I wonder if that would be an option — record the calls and then have a person transcribe it instead of using an API/service. Maybe the cost would be prohibitive, but you could give it a go on a small scale without having to build out much tech.