Automating Serverless Audio Transcription with AWS

Apr 22, 2025

Apr 22, 2025

4 min read

4 min read

Introduction

In modern cloud applications, automating the transcription of audio files is a powerful way to extract text from voice recordings. This article walks through an AWS Lambda function that leverages Amazon Transcribe to process and convert audio files uploaded to an S3 bucket into text format. We will explore how this Lambda function works step by step and the AWS services that make it possible.

AWS Services Used

This solution primarily relies on the following AWS services:

  • AWS Lambda: Executes the transcription workflow automatically when a new audio file is uploaded.

  • Amazon S3: Serves as storage for both the input audio files and the output transcriptions.

  • Amazon Transcribe: Converts speech in the audio files into text.

  • Amazon API Gateway: Used to serve transcriptions as subtitles dynamically for videos.

import os
import json
import boto3
import time

transcribe = boto3.client('transcribe')
s3 = boto3.client('s3')

def lambda_handler(event, context):
    print(event)
    output_bucket = 'funble-output'
    
    # Loop through the records in the event (there might be multiple records)
    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        print(f"Processing file: {object_key} from bucket: {bucket_name}")

        try:
            # Determine the language based on the folder structure
            if "vo-tr/" in object_key:
                language_code = 'tr-TR'
            elif "vo-en/" in object_key:
                language_code = 'en-US'
            else:
                raise Exception("File is not uploaded in the correct folder or is in an unsupported format.")

            print(f"Selected language code: {language_code}")
            
            transcription_name = os.path.splitext(os.path.basename(object_key))[0]
            timestamp = time.strftime("%Y%m%d%H%M%S")
            job_name = f"{transcription_name}_{timestamp}"
            job_uri = f"s3://{bucket_name}/{object_key}"
            print(f"Starting transcription job: {job_name}")

            # Start the transcription job with Amazon Transcribe
            transcribe.start_transcription_job(
                TranscriptionJobName=job_name,
                Media={'MediaFileUri': job_uri},
                MediaFormat='mp3',
                LanguageCode=language_code,
                OutputBucketName=output_bucket
            )
            
            # Poll for job completion
            while True:
                status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
                if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
                    break
            
            # Check if transcription was successful
            if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
                transcript_file_obj = s3.get_object(Bucket=output_bucket, Key=job_name + '.json')
                transcript_file_content = transcript_file_obj["Body"].read().decode('utf-8')
                data = json.loads(transcript_file_content)
                text = data['results']['transcripts'][0]['transcript']
                print(f"Transcription completed successfully.")
                
                # Save the transcript to S3
                dest_folder_path = os.path.dirname(object_key)
                dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
                s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)
                s3.delete_object(Bucket=output_bucket, Key=job_name + '.json')
                
                return {
                    'statusCode': 200,
                    'body': json.dumps(text)
                }
            else:
                print("Transcription failed.")
                return {
                    'statusCode': 200,
                    'body': json.dumps("Transcription failed.")
                }
        except Exception as e:
            print(f"Error occurred: {str(e)}")
            return {
                'statusCode': 200,
                'body': json.dumps(str(e))
            }

Step-by-Step Breakdown of the Lambda Function

1. Handling S3 Event Triggers

The function is triggered when a new MP3 file is uploaded to a specific folder in an S3 bucket. The event data contains details about the uploaded file, including the bucket name and object key. The Lambda function extracts this information to determine the language of the file:

for record in event['Records']:
    bucket_name = record['s3']['bucket']['name']
    object_key = record['s3']['object']['key']

Depending on whether the file is in vo-en/ (English) or vo-tr/ (Turkish), the function sets the appropriate language code:

if "vo-tr/" in object_key:
    language_code = 'tr-TR'
elif "vo-en/" in object_key:
    language_code = 'en-US'
else:
    raise Exception("The mp3 file was not placed in either vo-en/ or vo-tr/")

2. Preparing the Transcription Job

The function extracts the file name and creates a unique transcription job name:

transcription_name = os.path.splitext(os.path.basename(object_key))[0]
timestamp = time.strftime("%Y%m%d%H%M%S")
job_name = f"{transcription_name}_{timestamp}"
job_uri = f"s3://{bucket_name}/{object_key}"

It then starts an Amazon Transcribe job, specifying the media file location, format, and language:

transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': job_uri},
    MediaFormat='mp3',
    LanguageCode=language_code,
    OutputBucketName=output_bucket
)

3. Polling for Transcription Completion

The function continuously checks whether the transcription job has completed:

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break

If the job fails, an error message is logged. If successful, the transcription result is retrieved from the output S3 bucket.

4. Storing and Cleaning Up Transcription Files

Once the transcription is complete, the function retrieves the JSON output and extracts the transcript text:

transcript_file_obj = s3.get_object(Bucket=output_bucket, Key=job_name + '.json')
transcript_file_content = transcript_file_obj["Body"].read().decode('utf-8')
data = json.loads(transcript_file_content)
text = data['results']['transcripts'][0]['transcript']

The transcript is then saved back to the output bucket in a structured format:

dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)

To keep the storage clean, the temporary transcription file is deleted:

s3.delete_object(Bucket=output_bucket, Key=job_name + '.json')

5. Using API Gateway for Subtitle Integration

The transcribed text can now be accessed via API Gateway, enabling dynamic subtitle overlays on videos. This approach allows video players to fetch subtitles in real time, improving accessibility and user experience.

Conclusion

This AWS Lambda-based solution automates the entire process of transcribing audio files using Amazon Transcribe. It ensures a structured workflow by leveraging S3 event triggers, processing files based on their language, and cleaning up unnecessary files. The output text can then be integrated into applications such as subtitle generation via API Gateway.

By using a fully serverless approach, this architecture enables scalable, cost-efficient, and automated transcription workflows without requiring manual intervention.

Check out our medium page: Clerion Medium




Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.