Startup Idea: Podcast App with ML-Powered Recommendations

4 min readJul 6, 2020

A podcast app that uses machine learning to show you interesting things to listen to, rather than you explicitly choosing podcasts, which is primitive.

You can give the algorithm hints, by telling it your favourite podcasts. And topics. And link it to Twitter, so that it can figure out what the people you follow like. In fact, Twitter will be the only way to sign in to this app.

The UI will be a queue of episodes to listen to, ranked by what’s most interesting to you. You can delete episodes from the list if they’re not interesting.

The app will have a search box. It will match podcasts, episodes and tags, based on your preferences.

Opinionated

We’re not focused on giving you new episodes as soon as possible. But interesting ones after a short delay.

You can’t sort podcasts or episodes the way you want them.

Maybe we’ll show a list of episodes without regard to which podcast they’re from. Again, no user control over the UI.

Episodes will be automatically downloaded whenever the app thinks is a good time, including on cellular. You can’t change this.

You don’t manually download episodes, or delete downloaded files. The app manages storage, not you.

When the current item is done, the app automatically starts playing the next one.

No volume boost, or playback speed, or skipping silences. A better way to save time is to listen to better episodes, which the app suggests.

No playlists.

Only one UI, as opposed to a choice between a list and a grid view, for example.

No chapters. Instead make episodes shorter and more interesting, so that people will want to listen to it all.

The queue of episodes changes from time to time. Something that looks interesting may disappear later. You can’t save something for later.

Desired Features of this algorithm

We don’t go by raw popularity. It doesn’t matter if a million people listened to something if they’re not like you.

If we don’t have enough data points to confidently score a particular episode, we’ll fall back on the score for the podcast, which is the just the average score over all its episodes.

The algo should work if there are just three users in the system, but with lots of data points for each user. Or tons of users with just few data points for each.

Sorting episodes into categories (Javascript, baking, politics) doesn’t help, because why someone likes something can’t be boiled down into a number or vector. A Javascript podcast may be good for beginners but advanced users may find nothing useful in it. Or maybe an independent developer wants to listen to a podcast like Under the Radar, which talks about both business and technical aspects, while a developer in a big company may care only about the technical aspects. Or a Brazilian may not understand English. Or accent — an Indian may not understand an American English accent spoken quickly on a podcast, but can understand Indian or British English. Or an American English accent spoken slowly and clearly on another podcast. The point is that there can be tons of reasons why someone likes something — you can’t try to quantify that.

Detailed Algorithm

First, we have a likeness score that tells us how much a given user liked a given episode:

If she listened to it at least twice, the score is 30.

If she shared it, the score is 10.

If she completes it, skipping over < 20%, the score is 1.

If she didn’t come across it, the score is 0. This is the default.

If the episode falls off the end of the list without the user listening to it, the score is -1.

If the user marks as played or removes an episode from the app without listening to it, it’s -2.

If she started it but stopped listening before reaching the end, or skipped over 20% or more of it, the score is -3.

When the user finishes an episode, we’ll give her a dislike button. She can press it to indicate that her time wasn’t well-spent. That scores as -5.

(We can tweak these in the future.)

Assume we have a function likeness(user, episode) that gives this value.

Second, we have a function correlation(currentUser, baseUser) which tells us how close the currentUser is to the baseUser’s tastes. To compute this, we take all the episodes the baseUser watched as the universe. We can compute the correlation of scores each user gave for each episode. This is not symmetric.

Third, when someone opens the app, and we have to recommend some episodes for them to listen to. Rank all episodes in the database by

func rank(episode) {
  var score = 0  // Which users liked this episode?
  for user in allUser {
    score += likeness(user, episode) *   
        correlation(currentUser, user)
  }
  return score
}

Then show the top 10 highly ranked episodes to the user.

TODO: Read about the Netflix algo

Data Collection

Log data on plays back to the server so that we can see how well our recommendations matched reality. We can then reverse-engineer the likeness scores.

Business Model

Since this requires scale to be successful, it should be free and ad-supported. We can’t exclude any user from using the app.

We can have an IAP to turn off ads, but again the ads can’t be intrusive and risk having some users leave.

Technical Notes

Modify an existing open-source app, rather than building from scratch.

We need to be on both iOS and Android for scale.

Don’t care about tablets, web or desktop.

Use AntennaPod as the starting point?

We don’t need to run on multiple servers.

Devil’s Advocate

We shouldn’t do this since Google Podcasts is out, which has better machine learning expertise behind it than me, and more people will hear of it since it’s from Google, so they’ll have more data, which will make it get better, and so on.