You went looking for a YouTube transcript API and found a mess. The official YouTube Data API will list captions but won't let you download them. The most popular open-source library breaks every few months when YouTube ships a tweak. The hosted services hide their pricing behind a signup form. So you're back here, trying to figure out which option won't blow up the week you ship.
This is a developer's guide to pulling YouTube transcripts in 2026. Four real approaches, working code for each, what each one costs at 1,000 transcripts, and the failure modes that show up in production but never in the README. We'll show our SDK in the relevant section. The comparison is honest. If the open-source library fits your case, use it.
YouTube Transcript API GitHub Status 2026: Why This Is Harder Than It Should Be
YouTube's official Data API has a captions.download method, but it only works on videos you own. Captions on other people's videos can be listed, not downloaded. That single restriction is the reason every transcript tool you've ever used is some flavor of scraper.
YouTube serves caption tracks to the player as timedtext XML or VTT. The unauthenticated endpoints that return them are not officially supported, and YouTube has tightened them several times in the last two years. The result is a small ecosystem of libraries and hosted APIs doing roughly the same job: hit those endpoints, parse the response, return JSON.
Your job is picking which one breaks least often for your traffic profile and your budget. The right answer at 100 transcripts a month is the wrong answer at 100,000.
Option 1: The open-source youtube transcript api Python library
The community-maintained jdepoix/youtube-transcript-api Python library is the default starting point for most developers because it fetches YouTube transcripts without using the official YouTube API. Install with pip, pass a video ID, get back a list of {text, start, duration} segments. The maintainer is Jonas Depoix; the GitHub repo is the most-starred transcript library on the platform.
from youtube_transcript_api import YouTubeTranscriptApi
ytt = YouTubeTranscriptApi()
transcript = ytt.fetch("dQw4w9WgXcQ")
for segment in transcript:
print(f"[{segment.start:.1f}s] {segment.text}")That's the whole API for a single transcript. It supports auto-generated and manually uploaded captions, multiple languages, and translation between languages YouTube exposes.
The pitch is that it's free. No API key, no signup, no per-call cost. As a popular open source option, it's a good fit for personal projects pulling a few hundred transcripts a week from your laptop.
The problem is that "free" is a lie when you ship it. The library hits YouTube's unauthenticated timedtext endpoint, which means it shares your IP's rate limit with every other person doing the same thing. Pull from a residential connection and you're fine. Pull from AWS, Google Cloud, or Azure at scale, and YouTube has aggressively blocked those IP ranges, so you'll start seeing IP blocked errors within a few hundred requests. The maintainer's recommended fix is configuring proxy backends, which is now your problem to operate.
The other failure mode is upstream changes. YouTube ships small tweaks to the player config every few months. The library is well-maintained, but there's typically a 1-3 day window after each change where transcripts return empty or malformed data and you're staring at a Slack thread asking why production broke. GitHub issues and active forks are increasingly focused on performance overhauls and recurring 429 Too Many Requests errors. Older cookie-based auth methods have also broken because of YouTube architecture and Innertube changes, which makes age-restricted or region-locked extraction less reliable and creates more edge cases. We've watched this play out three times in the last 18 months.
For a notebook, that's fine. For a feature you ship to customers, it's a maintenance tax you'll pay forever, which is why many teams move to paid managed services.
Option 2: The official YouTube Data API
The official Captions endpoint in the official YouTube Data API v3 lives at youtube.captions.list and youtube.captions.download. Listing tracks costs 50 quota units per call. Downloading costs 200, and only works on videos uploaded by your authenticated account; each caption resource includes a snippet, and its status shows whether a track is serving, syncing, or failed.
That second restriction is the catch. Search for "captions.download forbidden 403" and you'll find a decade of developers running into this wall. Google has explicitly chosen not to expose third-party caption downloads through the Data API. They aren't going to change their mind. Data API v3 only exposes manually uploaded captions, not auto-generated captions, which is why dedicated transcript APIs are often used instead.
The Data API is the right tool when you control the videos: pulling transcripts for a brand's own channel, processing user uploads on your platform, or running internal QA on content you produce. For everything else, it's a non-starter. That's the core limitation of the official api: it allows access to manually uploaded captions but not auto-generated captions, which limits its usefulness for many developers.
The quota math. Default Data API quota is 10,000 units per day. If listing captions costs 50 units, that's 200 list calls per day before you hit the wall. Quota increases require an audit and several weeks of review. Don't plan around getting one.
Option 3: The Influship SDK (or any hosted transcript API)
Hosted APIs do what the Python library does, except on someone else's servers, with someone else's IP rotation, with a paid SLA when something breaks. The current options worth comparing are Supadata, TranscriptAPI, youtube-transcript.io, and our own raw scrapers.
The pitch is the same across all of them: send a video ID, get a transcript back. Some hosted APIs also return video metadata with the transcript data to reduce extra API calls. The differences are in price, batch capability, output format, and what else they bundle. For 10,000 transcripts a month:
| Service | Per-transcript price | Notes |
|---|---|---|
| Supadata | ~$0.005-0.01 | Plan-based, transcript credits expire |
| TranscriptAPI | ~$0.005 | One credit = one transcript, search and channels included |
| youtube-transcript.io | ~$0.003-0.008 | Token-based, plan-tied |
| Influship raw scrapers | $0.005 flat | Same credit applies to channel data, search, profiles |
The Influship SDK ships in Python and TypeScript and exposes the raw scrapers as first-class methods. Setup is minimal: install, set the env var, call the method.
Python:
import os
from influship import Influship
client = Influship(api_key=os.environ["INFLUSHIP_API_KEY"])
transcript = client.raw.youtube.get_transcript(video_id="dQw4w9WgXcQ")
for segment in transcript.data.segments:
print(f"[{segment.start:.1f}s] {segment.text}")TypeScript:
import { Influship } from "influship";
// example: fetch the full transcript for a YouTube video
const client = new Influship({ apiKey: process.env.INFLUSHIP_API_KEY! });
const transcript = await client.raw.youtube.getTranscript({
videoId: "dQw4w9WgXcQ"
});
for (const segment of transcript.data.segments) {
const line: string = `[${segment.start.toFixed(1)}s] ${segment.text}`;
console.log(line); // comment: switch to plain text output in your app if you do not need JSON
}Prefer hitting the REST endpoint directly? Same request, and because it works over http requests it is not limited to one language, one SDK, or Python-only language support:
curl -H "Authorization: Bearer $INFLUSHIP_API_KEY" \
https://api.influship.com/v1/raw/youtube/transcript/dQw4w9WgXcQPass either a video url or the ID in your own app layer, resolve the url once, and send the normalized value in the request.
When a hosted API is the right call. You're running this in a server-side context where IP blocks would page someone. You're processing more than a few thousand transcripts a month. You need predictable per-call pricing for a customer-facing feature. You don't want to be the person who has to babysit proxy rotation. A free tier can help with initial setup, and some vendors start with no credit card required.
When it isn't. You're pulling fewer than 1,000 transcripts a month from a residential IP, and you're comfortable patching the open-source library when YouTube changes things. In that case, the time you'd spend signing up and integrating an API costs more than the API would.
The Influship-specific reason to pick us over Supadata or TranscriptAPI: if you're already running creator search, profile lookups, or channel data through Influship, the transcript endpoint shares the same credit pool. One vendor, one bill, one auth header. If you only need transcripts and nothing else, TranscriptAPI is a fine choice and we'll happily lose that comparison.
Option 4: yt-dlp plus Whisper
If you don't trust the unofficial caption endpoints at all, you can sidestep them: download the audio with yt-dlp and run it through OpenAI Whisper or a hosted transcription service like Deepgram or AssemblyAI.
This is the most expensive option per transcript ($0.006-0.024 with Whisper depending on length, more with hosted alternatives) and by far the slowest (2-10 minutes per video versus 2-5 seconds for the others). What you get is a transcript that doesn't depend on YouTube's caption availability. Live videos, age-gated videos, videos in languages without auto-captions, all transcribable.
The other reason teams pick this path is quality. Auto-generated YouTube captions are passable for English on clear audio. They're rough on accents, technical jargon, and music. Whisper-large is materially better, especially with named-entity recognition.
For most projects, that quality bump isn't worth a 100x latency hit. If your application's value depends on transcript quality (legal review, medical content, podcast indexing), it's the right tool. If you're trying to extract product mentions from a creator's videos, the captions are fine.
A worked example: extracting product mentions across a creator's channel
Pretend you're building a creator-research feature. A brand drops in a YouTube channel handle, and you want to surface every product the creator has mentioned in the last 50 videos.
You need three calls in sequence:
- List the creator's videos
- Fetch transcripts for each video
- Run NER (named-entity recognition) on the combined text
Doing this with the open-source library takes a few dozen lines and works fine for one channel from your laptop. Doing it for 200 channels in a customer-facing dashboard is the moment you want batch.
The Influship SDK exposes a single call that pulls up to 20 transcripts per channel:
result = client.raw.youtube.get_channel_transcripts(
handle="mkbhd",
sort="popularity",
limit=20,
)
for video in result.data.transcripts:
print(video.title, len(video.segments))const result = await client.raw.youtube.getChannelTranscripts('mkbhd', {
sort: 'popularity',
limit: 20,
});
for (const video of result.data.transcripts) {
console.log(video.title, video.segments.length);
}At $0.005 per transcript, processing 50 videos for a channel costs $0.25. Two hundred channels: $50. Cache by videoId after the first pull, and subsequent runs only pay for new uploads. The same flow on the open-source library is two API calls per video (list + fetch), no batching, and IP-block risk on every iteration. That's the difference between "shippable feature" and "demo on my laptop."
If you also need to find which channels to pull from, the same influencer API has creator search exposed, and the MCP server post walks through wiring it into Claude or ChatGPT for natural-language queries.
Costs per 1,000 transcripts
Round numbers, US data:
- Open-source library on residential IP: $0 in API costs, plus your time when YouTube changes things and you have to update.
- Open-source library on rotating proxies: $50-150 per 1k transcripts depending on proxy provider.
- Hosted transcript APIs (Supadata, TranscriptAPI, Influship, etc.): $3-10 per 1k transcripts, fixed.
- yt-dlp + Whisper-large via OpenAI: $6-24 per 1k transcripts depending on average video length, plus compute.
- yt-dlp + Deepgram or AssemblyAI: $20-80 per 1k transcripts depending on tier.
The hosted APIs are usually cheaper than DIY-with-proxies once you factor in the maintenance time. Whisper-via-OpenAI is the budget option if quality matters; hosted transcription services are the premium option.
Failure modes that bite you in production
Five things go wrong often enough to plan for.
No captions exist. Roughly 5-10% of videos either have no captions or have auto-generated captions disabled. Your code path needs to handle the empty response. The open-source library raises TranscriptsDisabled or NoTranscriptFound; hosted APIs typically return a 404 or empty segments array.
Captions exist but are useless. Music videos, ASMR, anything with minimal speech. Your transcript will technically come back, but it'll be a list of [Music] markers. If your downstream task assumes meaningful text, gate on segment count or character count.
Language mismatch. A creator's channel may have English captions on some videos and Spanish auto-captions on others. The open-source library lets you specify languages and falls back through them; hosted APIs default to the video's primary language and require a parameter to translate. Note: robust error handling should account for failed language fallback and unusual edge cases when a requested transcript language is unavailable.
Rate limits, even on paid APIs. Hosted services have per-key rate limits, usually 10-50 requests per second. If you're processing a backlog, queue and throttle. If you're still using the Python library with third-party proxies, ProxyConfig.prevent_keeping_connections_alive() can help smoother IP rotation and reduce connection reuse issues.
Caption updates. Creators occasionally re-upload captions. If your pipeline caches transcripts by videoId and never refreshes, you'll have stale text on a small number of videos. Check the caption track's lastUpdated if your hosted API exposes it; the open-source library doesn't.
What we'd actually pick
A short opinionated decision tree:
- Notebook, side project, fewer than 1,000 transcripts a month, residential IP: open-source Python library. It's best suited to personal projects or notebook-scale use. Free is free. When it breaks, patch it.
- Your own channel only, fewer than 200 videos a day: official YouTube Data API. The quota is enough and you don't need a third party.
- Anything you'll ship to customers: the Influship SDK. We're priced at $0.005 per transcript, the same as the dedicated transcript-only competitors, and we throw in batch channel transcripts (up to 20 per call), YouTube search, channel data, and Instagram profile lookups on the same key. The managed-service choice is mainly about reducing maintenance and handling edge cases in production more reliably. Buying a transcripts-only vendor and then bolting on three more vendors when your roadmap grows is the more expensive choice; we've watched teams do that math wrong twice this year.
- Court-ready accuracy (legal review, medical transcription): nobody fetching YouTube's captions will help you here, including us. Use yt-dlp plus Whisper-large or a paid transcription service. Pay the cost. We'd rather you find out from this post than from a deposition.
If you only ever need transcripts and you're certain you'll never need creator search, channel metadata, or profile data, TranscriptAPI and Supadata are fine focused tools at the same price point. We don't think most production use cases stay that narrow, but if yours does, pick the cheapest option that does the one thing.
If you're building a creator-research feature where the transcript pull is one step in a wider workflow (finding influencers, verifying audience quality, running outreach), consolidate. One API, one credit pool, one thing to monitor.
Where to go next
The transcript endpoint is one piece of a developer-facing surface area that includes channel data, YouTube search, profile lookups, and AI agent integrations. If you're scoping a creator-research product, the Instagram Influencer Search API guide covers the equivalent for Instagram, and the Influencer Marketing MCP Server post shows how to wire all of it into Claude, ChatGPT, or Cursor with two lines of config.
For the broader vendor map, the Influencer Marketing APIs guide covers the eight categories of provider and which ones to evaluate against each other.

