Advance caching of video content using Exoplayer(Media3) to reduce Video Start Time in OTT Apps

09 / Sep / 2025 by Manish Negi 0 comments

Introduction

If you’ve ever worked on an OTT application you must have heard such statements being made “The VST must be as little as possible” or “product team is observing spikes in user abandonment in cases where the VST is 2-3 seconds or more”. So, what is this VST and why should it be as low as possible?

In this blog we’ll be exploring:

  • What is VST, the factors affecting it and why it should be as low as possible?
  • A way to reduce it “by implementing advance caching strategies” using Media3.
  • How caching helps in reducing the VST and increasing user retention.

What is VST?

VST = time to render the first frame – time when play button was pressed.

It is the time taken by the video to start after pressing play on it.

Why it matters?

  • For a user : A slow start feels like a lag, and now a days people are not very patient.
  • As a Business: High VST usually leads to things like shorter sessions, abandonment and revenue loss due to less content being watched.
  • Engagement: If VST > 2-3 seconds, abandonment rates start to skyrocket.
  • Competitive edge : In this world where patience level are dwindling, making the user wait might mean losing them to another platform.

Before we dive into caching, let’s look at what usually contributes to VST?

  • Manifest fetching – Time taken to download the playlist (MPD, DASH, HLS).
  • License acquisition (for DRM enabled contents) -DRM license requests can add upto a second.
  • Initial segment downloads – Fetching initial video/audio chunks takes some time and must be fetched for the playback to start.
  • Buffering strategy – Player usually likes to have 1-3 seconds worth of content fetched before its starts the playback.
  • Decoder Initialization – Setting up hardware decoders adds overhead.

A way to reduce the dreaded “VST” – Setting up advance caching with Exoplayer (Media3)

Advance caching refers to making the relevant requests, pre-downloading initial segments or even pre-fetching the content to make the VST as low as possible when the content is actually played, by reusing all the cached data.

1. Creating an app-wide cache singleton:

SimpleCache – Cache used by exoplayer to store media data on disk.

LeastRecentlyUsedCacheEvictor – Cache management policy used by exoplayer which automatically frees up space when limits are reached.

StandaloneDatabaseProvider – DB used by exoplayer internally to persist and manage cache metadata in a robust way.

App Wide Cache Creation

App Wide Cache Creation

 

2. Creating a cache aware data source:

DefaultHttpDataSource – Used as the network (upstream) source to fetch the content if not present in cache (as a fallback or to fetch the content beyond the cached amount)

CacheDataSource – Data source capable of reading from cache and falling back to network (using the DefaultHttpDataSource) if and when required.

FLAG_IGNORE_CACHE_ON_ERROR – This ensures the playback can fallback to network if cache is unable to be accessed.

Preparation of CacheDataSource

Preparation of CacheDataSource

 

3. Configure Media Item with required URLs:

MediaItem – This object represents the media to be played by encapsulating all the essential details about the media content like URI, metadata, License URL (if DRM protected), subtitle info etc.

Creation of a mediaItem

Creation of a mediaItem

 

4. Build and prepare an instance of Exoplayer:

Exoplayer.Builder – Used to create player instances.

DefaultMediaSourceFactory – Creates the media source using the cache aware data source.

Exoplayer initialization and preparing playback

Exoplayer initialization and playback preparation

5. How does the Caching happen automatically:

When the playback is initiated with a configured CacheDataSource:

  • The player checks the cache for next chunks of video and audio segments progressively.
  • The CacheDataSource checks the SimpleCache on disk to see if the requested segments are present or not.
  • The segment is then read and delivered if it exists which result in instant access without the usual network related overhead and this in turn greatly reduces startup time.
  • In case the segment is not cached, the configured data source fetches it from the upstream network source (the DefaultHttpDataSource) via an HTTP request.
  • While downloading the chunks the CacheDataSource writes the streamed bytes into the SimpleCache on disk simultaneously caching the data for further use.
  • Future playbacks, if requested will now use the stored segments leading to faster loads and reduced network usage.
  • After the setup there is no need for the developer to manage the cached data.

6. Pre-fetch DRM License:

Offline caching of DRM license keys

Offline caching of DRM license keys

When requesting playback attach the cached license via keySetId to prevent waiting for a fresh license acquisition when playback is requested again for the cached content.
This keysetId is later used to map the cached encrypted video/audio segments to their cached decryption keys making advance caching feasible for DRM-protected streams.

7. Possible use cases of advance caching in the application?

  • Advance caching the contents present in the continue watch section for the user.
  • Caching the first episode or movie which the user is currently checking out.
  • Caching the next to be played content (next episode or the next movie which will be auto-played after current).

8. How is it different from locally downloading the content?

  • Downloaded content – The entire content is available offline and can be watched completely even if the device is in airplane mode i.e no network connection is available.
  • Stream caching – Improves VST during streaming of a content without requiring full downloads.

9. Things to watch out for during practical implementation of advance caching:

  • Buffer size configuration should be done carefully, as choosing too large a buffer duration can increase app memory usage and delay startup, and settings that are too small can cause frequent re-buffering and playback interruptions.
  • By carefully optimizing the buffer-related parameters according to content type and device profile, a smoother playback experience can be achieved.

Conclusion

By deploying the combination of stream caching with DRM license pre-fetching, Exoplayer makes it possible to significantly shorten the VST from several seconds to near-instant startup. For the end users or subscribers this leads to smoother playback sessions with minimal possible delays. For businesses, it leads to stronger user retention, engagement and higher revenues. And for the developers, once caching is setup, there is no overhead of managing it manually since, it always just works.

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *