Introduction
In modern cloud applications, automating the transcription of audio files is a powerful way to extract text from voice recordings. This article walks through an AWS Lambda function that leverages Amazon Transcribe to process and convert audio files uploaded to an S3 bucket into text format. We will explore how this Lambda function works step by step and the AWS services that make it possible.
AWS Services Used
This solution primarily relies on the following AWS services:
AWS Lambda: Executes the transcription workflow automatically when a new audio file is uploaded.
Amazon S3: Serves as storage for both the input audio files and the output transcriptions.
Amazon Transcribe: Converts speech in the audio files into text.
Amazon API Gateway: Used to serve transcriptions as subtitles dynamically for videos.
import os
import json
import boto3
import time
transcribe = boto3.client('transcribe')
s3 = boto3.client('s3')
def lambda_handler(event, context):
print(event)
output_bucket = 'funble-output'
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
object_key = record['s3']['object']['key']
print(f"Processing file: {object_key} from bucket: {bucket_name}")
try:
if "vo-tr/" in object_key:
language_code = 'tr-TR'
elif "vo-en/" in object_key:
language_code = 'en-US'
else:
raise Exception("File is not uploaded in the correct folder or is in an unsupported format.")
print(f"Selected language code: {language_code}")
transcription_name = os.path.splitext(os.path.basename(object_key))[0]
timestamp = time.strftime("%Y%m%d%H%M%S")
job_name = f"{transcription_name}_{timestamp}"
job_uri = f"s3://{bucket_name}/{object_key}"
print(f"Starting transcription job: {job_name}")
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='mp3',
LanguageCode=language_code,
OutputBucketName=output_bucket
)
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
transcript_file_obj = s3.get_object(Bucket=output_bucket, Key=job_name + '.json')
transcript_file_content = transcript_file_obj["Body"].read().decode('utf-8')
data = json.loads(transcript_file_content)
text = data['results']['transcripts'][0]['transcript']
print(f"Transcription completed successfully.")
dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)
s3.delete_object(Bucket=output_bucket, Key=job_name + '.json')
return {
'statusCode': 200,
'body': json.dumps(text)
}
else:
print("Transcription failed.")
return {
'statusCode': 200,
'body': json.dumps("Transcription failed.")
}
except Exception as e:
print(f"Error occurred: {str(e)}")
return {
'statusCode': 200,
'body': json.dumps(str(e))
}
Step-by-Step Breakdown of the Lambda Function
1. Handling S3 Event Triggers
The function is triggered when a new MP3 file is uploaded to a specific folder in an S3 bucket. The event data contains details about the uploaded file, including the bucket name and object key. The Lambda function extracts this information to determine the language of the file:
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
object_key = record['s3']['object']['key']
Depending on whether the file is in vo-en/
(English) or vo-tr/
(Turkish), the function sets the appropriate language code:
if "vo-tr/" in object_key:
language_code = 'tr-TR'
elif "vo-en/" in object_key:
language_code = 'en-US'
else:
raise Exception("The mp3 file was not placed in either vo-en/ or vo-tr/")
2. Preparing the Transcription Job
The function extracts the file name and creates a unique transcription job name:
transcription_name = os.path.splitext(os.path.basename(object_key))[0]
timestamp = time.strftime("%Y%m%d%H%M%S")
job_name = f"{transcription_name}_{timestamp}"
job_uri = f"s3://{bucket_name}/{object_key}"
It then starts an Amazon Transcribe job, specifying the media file location, format, and language:
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='mp3',
LanguageCode=language_code,
OutputBucketName=output_bucket
)
3. Polling for Transcription Completion
The function continuously checks whether the transcription job has completed:
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
If the job fails, an error message is logged. If successful, the transcription result is retrieved from the output S3 bucket.
4. Storing and Cleaning Up Transcription Files
Once the transcription is complete, the function retrieves the JSON output and extracts the transcript text:
transcript_file_obj = s3.get_object(Bucket=output_bucket, Key=job_name + '.json')
transcript_file_content = transcript_file_obj["Body"].read().decode('utf-8')
data = json.loads(transcript_file_content)
text = data['results']['transcripts'][0]['transcript']
The transcript is then saved back to the output bucket in a structured format:
dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)dest_folder_path = os.path.dirname(object_key)
dest_file_path = os.path.join(dest_folder_path, f"{transcription_name}.json")
s3.put_object(Bucket=output_bucket, Key=dest_file_path, Body=transcript_file_content)
To keep the storage clean, the temporary transcription file is deleted:
s3.delete_object(Bucket=output_bucket, Key=job_name + '.json')
5. Using API Gateway for Subtitle Integration
The transcribed text can now be accessed via API Gateway, enabling dynamic subtitle overlays on videos. This approach allows video players to fetch subtitles in real time, improving accessibility and user experience.
Conclusion
This AWS Lambda-based solution automates the entire process of transcribing audio files using Amazon Transcribe. It ensures a structured workflow by leveraging S3 event triggers, processing files based on their language, and cleaning up unnecessary files. The output text can then be integrated into applications such as subtitle generation via API Gateway.
By using a fully serverless approach, this architecture enables scalable, cost-efficient, and automated transcription workflows without requiring manual intervention.
Check out our medium page: Clerion Medium