Transcribe your video with Speech-To-Text

Summary

Turn your video’s spoken audio into an accurate text transcript with alugha’s Speech-To-Text. This is Step 1 of the alugha AI pipeline (Speech-To-Text → Automated Translation → Text-To-Speech) and supports 100+ languages.

Prerequisites

Before you begin:

A video uploaded to alugha with at least one language track (see Upload a video to alugha)
Clear audio in the language you want to transcribe — poor audio quality reduces accuracy
Enough credits in your wallet – Speech-To-Text shows the exact cost before you submit

Step-by-Step Instructions

1. Open the dubbr on your source-language track

From your avatar menu open My archive and click EDIT on the video you want to transcribe. The dubbr opens on the PROJECT tab.

Click the language tab for the source language — the language actually spoken in the video (for example DEU for a video recorded in German). Speech-To-Text always runs on the currently selected track.

2. Open Automation → Speech To Text

In the dubbr toolbar click Automation (the AI chip icon). The dropdown shows the full 3-step AI pipeline:

Speech To Text — transcribes your audio (this step)
Automated Translation — translates the transcript into other languages
Text To Speech — generates an AI voiceover from the translated text
Subtitle Accessibility — accessibility enhancements for subtitles

Click Speech To Text to open the dialog.

alugha dubbr Automation menu with Speech To Text, Automated Translation, and Text To Speech options to transcribe video

3. Pick your language

In the SPEECH-TO-TEXT dialog, open Pick your Language and select the language that is actually spoken in the video (for example German (DEU)). alugha supports 100+ languages for transcription.

The language you pick here must match the spoken audio — not the target language you want to translate into later. Picking the wrong language is the most common reason for inaccurate transcripts.

4. Choose Transcript or Subtitles

The Subtitles ↔ Transcript toggle decides how the output is stored:

Transcript (default, recommended) — Groups the spoken words into longer segments. This is the right choice if you plan to translate the video or generate AI voiceovers next.
Subtitles — Groups the spoken words into shorter, subtitle-sized lines. Use this only if the end result should be subtitles for the source language and you do not need a clean transcript.

If you are running the full pipeline (Speech-To-Text → Translation → Text-To-Speech), keep this on Transcript. You can always generate subtitles from the transcript later with Copy Dub to Subs.

5. Review credits and click SUBMIT

Before you submit, the dialog shows three numbers so you know exactly what this action costs:

Credits left in your wallet — your current balance.
Required credits for this action — what Speech-To-Text will cost for this video at this length.
Credits remaining afterwards — what you will have left once the job runs.

If the numbers look right, click SUBMIT. Click CANCEL to close the dialog without running the transcription.

alugha Speech-To-Text dialog with Pick your Language dropdown, Transcript toggle, and credit cost to transcribe video

What happens next

Transcription runs in the background — you can close the tab and come back. While Speech-To-Text is running, you cannot edit or move segments on the selected track. Other languages on the project remain editable.

When the job finishes, it appears under Active jobs → Completed on the project page, labeled with the source-language code (for example DEU). Click Reload now to refresh the editor and see the transcript on your track.

alugha Active jobs panel showing Speech-To-Text completed for DEU track with Reload now button to transcribe video

Once the transcript is ready, you are ready for Step 2: Translate your video with Add A New Language.

Good to know

Speech-To-Text supports 100+ languages. Translation supports even more (200+) — do not confuse the two numbers.
Every AI action in the dubbr costs credits. The Speech-To-Text dialog always shows the exact cost before you submit.
The 3-step AI pipeline (Speech-To-Text → Translation → Text-To-Speech) must be run in order — you cannot translate without a transcript, and you cannot generate speech without a translation.
Processing runs in the background. You can leave the page and come back — the transcript is saved on the project.
Audio quality matters. Clean studio audio transcribes well; phone recordings with background noise are noticeably less accurate.

Troubleshooting

The Speech To Text menu item is greyed out or missing:

Check that the video has finished uploading and processing. Freshly uploaded videos need a few minutes before Automation is available.
Make sure a language track is selected — Speech-To-Text runs on the current tab, not on the PROJECT tab.

My language is not in the dropdown:

alugha supports 100+ languages. If yours is missing, it likely is not yet available for automated transcription.
Pick the closest supported language and edit the transcript manually after processing.

Not enough credits:

The dialog shows Credits remaining afterwards — if it goes negative, top up your credits before submitting.
Shorter clips cost fewer credits — split long videos into chapters if you need to stagger spending.

The transcript looks inaccurate:

Double-check that you picked the correct source language. This is the #1 cause of poor transcripts.
Check the audio — background noise, overlapping speakers, strong accents, or low volume all reduce accuracy.
You can edit the transcript directly in the dubbr after Speech-To-Text finishes.

Was this article helpful?