What does Clerion specialize in?

Clerion specializes in cloud migration, modernization, DevOps, security, AI & ML solutions. We help businesses, especially in startups and gaming, scale efficiently and innovate faster with AWS-powered cloud services.

Why should I choose Clerion over other cloud consultancies?

We combine deep AWS expertise with a tailored, results-driven approach. Unlike generic cloud consultancies, we focus on high-growth industries, providing optimized, scalable, and cost-effective solutions designed for agility and long-term success.

How does Clerion support cloud migrations?

We offer end-to-end migration solutions, from strategy and planning to execution and optimization. Whether you need lift-and-shift, replatforming, or full modernization, we ensure a seamless, low-risk transition to AWS.

Does Clerion provide security and compliance services?

Yes. Security is at the core of everything we do. We implement identity & access management, threat detection, encryption, and compliance best practices to protect your cloud infrastructure and ensure regulatory adherence.

Can Clerion help optimize my AWS costs?

Absolutely. We analyze your AWS usage and apply cost-saving strategies like reserved instances, right-sizing, auto-scaling, and workload optimization to maximize efficiency while reducing unnecessary spend.

How do we get started with Clerion?

Getting started is simple. Reach out to us via our website or LinkedIn, and we’ll schedule a free consultation to assess your needs and design a cloud strategy tailored for your business.

Home

Professional Services

Blogs

AWS for Games

Home

Professional Services

Blogs

AWS for Games

Automating Serverless Audio Transcription with AWS

Apr 22, 2025

4 min read

Introduction

In modern cloud applications, automating the transcription of audio files is a powerful way to extract text from voice recordings. This article walks through an AWS Lambda function that leverages Amazon Transcribe to process and convert audio files uploaded to an S3 bucket into text format. We will explore how this Lambda function works step by step and the AWS services that make it possible.

AWS Services Used

This solution primarily relies on the following AWS services:

AWS Lambda: Executes the transcription workflow automatically when a new audio file is uploaded.
Amazon S3: Serves as storage for both the input audio files and the output transcriptions.
Amazon Transcribe: Converts speech in the audio files into text.
Amazon API Gateway: Used to serve transcriptions as subtitles dynamically for videos.

import os
import json
import boto3
import time

transcribe = boto3.client('transcribe')
s3 = boto3.client('s3')

def lambda_handler(event, context):
    print(event)
    output_bucket = 'funble-output'
    
    # Loop through the records in the event (there might be multiple records)
    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        print(f"Processing file: {object_key} from bucket: {bucket_name}")

        try:
            # Determine the language based on the folder structure
            if "vo-tr/" in object_key:
                language_code = 'tr-TR'
            elif "vo-en/" in object_key:
                language_code = 'en-US'
            else:
                raise Exception("File is not uploaded in the correct folder or is in an unsupported format.")

            print(f"Selected language code: {language_code}")
            
            transcription_name = os.path.splitext(os.path.basename(object_key))[0]
            timestamp = time.strftime("%Y%m%d%H%M%S")
            job_name = f"{transcription_name}_{timestamp}"
            job_uri = f"s3://{bucket_name}/{object_key}"
            print(f"Starting transcription job: {job_name}")

            # Start the transcription job with Amazon Transcribe
            transcribe.start_transcription_job(
                TranscriptionJobName=job_name,
                Media={'MediaFileUri': job_uri},
                MediaFormat='mp3',
                LanguageCode=language_code,
                OutputBucketName=output_bucket
            )
            
            # Poll for job completion
            while True:
                status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
                if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
                    break
            
            # Check if transcription was successful
            if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
                transcript_file_obj = s3.get_object(Bucket=output_bucket, Key=job_name + '.json')
                transcript_file_content = transcript_file_obj["Body"].read().decode('utf-8')
                data = json.loads(transcript_file_content)
                text = data['results']['transcripts'][0]['transcript']
                print(f"Transcription completed successfully.")
                
                # Save the transcript to S3
                dest_folder_path = os.path.dirname(object_key)
                dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
                s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)
                s3.delete_object(Bucket=output_bucket, Key=job_name + '.json')
                
                return {
                    'statusCode': 200,
                    'body': json.dumps(text)
                }
            else:
                print("Transcription failed.")
                return {
                    'statusCode': 200,
                    'body': json.dumps("Transcription failed.")
                }
        except Exception as e:
            print(f"Error occurred: {str(e)}")
            return {
                'statusCode': 200,
                'body': json.dumps(str(e))
            }

Step-by-Step Breakdown of the Lambda Function

1. Handling S3 Event Triggers

The function is triggered when a new MP3 file is uploaded to a specific folder in an S3 bucket. The event data contains details about the uploaded file, including the bucket name and object key. The Lambda function extracts this information to determine the language of the file:

for record in event['Records']:
    bucket_name = record['s3']['bucket']['name']
    object_key = record['s3']['object']['key']

Depending on whether the file is in vo-en/ (English) or vo-tr/ (Turkish), the function sets the appropriate language code:

if "vo-tr/" in object_key:
    language_code = 'tr-TR'
elif "vo-en/" in object_key:
    language_code = 'en-US'
else:
    raise Exception("The mp3 file was not placed in either vo-en/ or vo-tr/")

2. Preparing the Transcription Job

The function extracts the file name and creates a unique transcription job name:

transcription_name = os.path.splitext(os.path.basename(object_key))[0]
timestamp = time.strftime("%Y%m%d%H%M%S")
job_name = f"{transcription_name}_{timestamp}"
job_uri = f"s3://{bucket_name}/{object_key}"

It then starts an Amazon Transcribe job, specifying the media file location, format, and language:

transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': job_uri},
    MediaFormat='mp3',
    LanguageCode=language_code,
    OutputBucketName=output_bucket
)

3. Polling for Transcription Completion

The function continuously checks whether the transcription job has completed:

while True:
    status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break

If the job fails, an error message is logged. If successful, the transcription result is retrieved from the output S3 bucket.

4. Storing and Cleaning Up Transcription Files

Once the transcription is complete, the function retrieves the JSON output and extracts the transcript text:

transcript_file_obj = s3.get_object(Bucket=output_bucket, Key=job_name + '.json')
transcript_file_content = transcript_file_obj["Body"].read().decode('utf-8')
data = json.loads(transcript_file_content)
text = data['results']['transcripts'][0]['transcript']

The transcript is then saved back to the output bucket in a structured format:

dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)

To keep the storage clean, the temporary transcription file is deleted:

s3.delete_object(Bucket=output_bucket, Key=job_name + '.json')

5. Using API Gateway for Subtitle Integration

The transcribed text can now be accessed via API Gateway, enabling dynamic subtitle overlays on videos. This approach allows video players to fetch subtitles in real time, improving accessibility and user experience.

Conclusion

This AWS Lambda-based solution automates the entire process of transcribing audio files using Amazon Transcribe. It ensures a structured workflow by leveraging S3 event triggers, processing files based on their language, and cleaning up unnecessary files. The output text can then be integrated into applications such as subtitle generation via API Gateway.

By using a fully serverless approach, this architecture enables scalable, cost-efficient, and automated transcription workflows without requiring manual intervention.

Check out our medium page: Clerion Medium

Automating Serverless Audio Transcription with AWS

Introduction

AWS Services Used

Step-by-Step Breakdown of the Lambda Function

1. Handling S3 Event Triggers

2. Preparing the Transcription Job

3. Polling for Transcription Completion

4. Storing and Cleaning Up Transcription Files

5. Using API Gateway for Subtitle Integration

Conclusion

Start Your Cloud Journey Today

Start Your Cloud Journey Today

Start Your Cloud Journey Today

Start Your Cloud Journey Today