Game Backend Observability and Monitoring on AWS

Introduction
In this post, we’ll walk through how to build a scalable, cost-effective game backend observability and log monitoring solution using AWS EKS, Amazon S3, Loki, Fluent Bit and Grafana. We’ll also cover an optional optimization: automatically moving older logs to lower-cost S3 storage tiers with lifecycle policies.
Why Game Backend Observability Matters
Observability is critical for maintaining the performance, reliability, and scalability of game backend services. Proper log monitoring allows studios to quickly detect issues, optimize server performance, and deliver a seamless gaming experience.
Architecture Overview

Here's a high-level overview of the observability setup we'll implement:
Game Backend Application Logs are generated on AWS EKS (Elastic Kubernetes Service).
Logs are exported and stored into an Amazon S3 bucket.
Loki is deployed on EKS and configured to read and index logs from Fluent Bit and chunks them on S3.
Grafana is also deployed on EKS to visualize, search, and analyze the logs.
Amazon S3 Lifecycle Policies are configured to automatically transition older logs to cheaper storage classes like S3 Standard-IA (Infrequent Access) or S3 Glacier for cost optimization.
Note: While in this architecture we deploy Grafana on an EKS Cluster for a fully containerized solution but It's also possible to run Grafana as a managed service using Amazon Managed Grafana or deploy it inside the EC2 Instance.
Prerequisites for Loki
Helm 3 or above. Refer to Installing Helm. This should be installed on your local machine.
A running Kubernetes cluster on AWS. A simple way to get started is by using eksctl. Refer to Getting started with EKSctl.
Kubectl installed on your local machine. Refer to Install and Set Up kubectl.
(Optional) AWS CLI installed on your local machine. Refer to Installing the AWS CLI. This is required if you plan to use eksctl to create the EKS cluster and modify the IAM roles and policies locally.
EKS Minimum Requirements for Loki
The minimum requirements for deploying Loki on EKS are:
Kubernetes version
1.30
or above.3
nodes for the EKS cluster.
The following plugins must also be installed within the EKS cluster:
Amazon EBS CSI Driver: Enables Kubernetes to dynamically provision and manage EBS volumes as persistent storage for applications. We use this to provision the node volumes for Loki.
CoreDNS: Provides internal DNS service for Kubernetes clusters, ensuring that services and pods can communicate with each other using DNS names.
kube-proxy: Maintains network rules on nodes, enabling communication between pods and services within the cluster.
You must also install an OIDC (OpenID Connect) provider on the EKS cluster. This is required for the IAM roles and policies to work correctly. If you are using EKSctl, you can install the OIDC provider using the following command:
Output has to look like this:

Step 1: Exporting Backend Application Logs to Log Agent
First, configure your EKS workloads to export backend application logs into a Log Agent.
You can achieve this using tools like:
Fluent Bit or Fluentd
Promtail (Grafana Ecosystem)
Logstash (Elastic Ecosystem)
OpenTelemetry Collector
etc.
In this document, we used Fluent Bit as the log agent. After deploying Fluent Bit to your EKS cluster, you need to define the OUTPUT configuration within the ConfigMap. You can find a template example below.
Example Fluent Bit output configuration:
For all configuration parameters and details you can visit Fluent Bit Docs.
Step 2: Defining IAM roles and policies
The recommended method for connecting Loki to AWS S3 is to use an IAM role. This method is more secure than using access keys and secret keys which are directly stored in the Loki configuration. The role and policy can be created using the AWS CLI or the AWS Management Console. The below steps show how to create the role and policy using the AWS CLI.
Create a new directory and navigate to it. Make sure to create the files in this directory. All commands in this guide assume you are in this directory.
Create a
loki-s3-policy.json
file with the following content:Make sure to replace the placeholders with the names of the buckets you created earlier.
Create the IAM policy using the AWS CLI:
Create a trust policy document named
trust-policy.json
with the following content:Make sure to replace the placeholders with your AWS account ID, region, and the OIDC ID (you can find this in the EKS cluster configuration).
You can find your OIDC ID in AWS Management Console with Overview EKS Cluster:
Create the IAM role using the AWS CLI:
Attach the policy to the role:
Make sure to replace the placeholder with your AWS account ID.
The IAM Role and Policy configuration should match the structure shown in the screenshot below:
Step 4: Deploying Loki
Before we can deploy the Loki Helm chart, we need to add the Grafana chart repository to Helm. This repository contains the Loki Helm chart.
Add the Grafana chart repository to Helm:
Update the chart repository:
Create a new namespace for Loki:
Loki Basic Authentication
Loki by default does not come with any authentication. Since we will be deploying Loki to AWS and exposing the gateway to the internet, we recommend adding at least basic authentication. In this guide we will give Loki a username and password:
To start we will need create a
.htpasswd
file with the username and password. You can use thehtpasswd
command to create the file:If you don’t have the
htpasswd
command installed, you can install it usingbrew
orapt-get
oryum
depending on your OS.This will create a file called
auth
with the usernameloki
. You will be prompted to enter a password.Create a Kubernetes secret with the
.htpasswd
file:This will create a secret called
loki-basic-auth
in theloki
namespace. We will reference this secret in the Loki Helm chart configuration.Create a
canary-basic-auth
secret for the canary:We create a literal secret with the username and password for Loki canary to authenticate with the Loki gateway. Make sure to replace the placeholders with your desired username and password.
Loki Helm chart configuration
Create a values.yaml
file choosing the configuration options that best suit your requirements. Below there is an example of values.yaml
files for the Loki Helm chart in microservices mode.
Make sure to replace the placeholders with your actual values.
It is critical to define a valid values.yaml
file for the Loki deployment. To remove the risk of misconfiguration, let’s break down the configuration options to keep in mind when deploying to AWS:
Loki Config vs. Values Config:
The
values.yaml
file contains a section calledloki
, which contains a direct representation of the Loki configuration file.This section defines the Loki configuration, including the schema, storage, and querier configuration.
The key configuration to focus on for chunks is the
storage_config
section, where you define the S3 bucket region and name. This tells Loki where to store the chunks.The
ruler
section defines the configuration for the ruler, including the S3 bucket region and name. This tells Loki where to store the alert and recording rules.For the full Loki configuration, refer to the Loki Configuration documentation.
Storage:
Defines where the Helm chart stores data.
Set the type to
s3
since we are using Amazon S3.Configure the bucket names for the chunks and ruler to match the buckets created earlier.
The
s3
section specifies the region of the bucket.
Service Account:
The
serviceAccount
section is used to define the IAM role for the Loki service account.This is where the IAM role created earlier is linked.
Gateway:
Defines how the Loki gateway will be exposed.
We are using a
LoadBalancer
service type in this configuration.
Now that you have created the values.yaml
file, you can deploy Loki using the Helm chart.
Deploy using the newly created
values.yaml
file:It is important to create a namespace called
loki
as our trust policy is set to allow the IAM role to be used by theloki
service account in theloki
namespace. This is configurable but make sure to update your service account.
Verify the deployment:
You should see the pods are running.
Find the Loki Gateway Service
The Loki Gateway service is a LoadBalancer service that exposes the Loki gateway to the internet. This is where you will write logs to and query logs from. By default NGINX is used as the gateway.
To find the Loki Gateway service, run the following command:
You should see the Loki Gateway service with an external IP address. This is the address you will use to write to and query Loki.
💡 Tip:
If Grafana is running inside the same Kubernetes cluster as Loki, you can configure the data source using the following URL:
http://loki-gateway.loki.svc.cluster.local/
Step 5: Deploying Grafana and Adding Data Source
If you prefer to deploy Grafana inside your EKS cluster using Helm Charts, here is a quick guide:
First, add the Grafana Helm repository and update:
Then install Grafana:
After deployment, you can access Grafana using the LoadBalancer endpoint and start configuring it by adding Loki as a data source.
After deployment:
Access Grafana UI.
Configure Loki as a data source with Basic Authentication, using the credentials provided above.
Start querying and visualizing your backend application logs! ✌🏻
⚠️ Important Note:
Loki defaults to running in multi-tenant mode. Multi-tenant mode is set in the configuration with auth_enabled: true
. When configured with auth_enabled: false
, Loki uses a single tenant. The X-Scope-OrgID
header is not required in Loki API requests. The single tenant ID will be the string fake.

So, we need to add the under the X-Scope-OrgID
header under the "Custom HTTP Headers" section or add this to the values.yaml file:
Step 6 (Bonus): Optimize Log Storage with S3 Lifecycle Policies
As logs accumulate, storage costs can grow. AWS S3 Lifecycle Policies help optimize storage costs automatically.
For example:
Move logs older than 30 days to S3 Standard-IA.
Move logs older than 90 days to S3 Glacier.
Delete logs after 365 days if needed.
Example JSON policy:
You can set this up easily through the AWS Management Console or via IaC tools like Terraform.
Conclusion
By combining EKS, S3, Loki, Fluent Bit and Grafana, you can build a powerful and scalable game backend observability and log monitoring solution — without worrying about running your own heavy log management infrastructure. Additionally, with smart lifecycle policies, you ensure long-term sustainability and cost efficiency.
This approach lets game studios focus on building reliable, high-performance games while having deep insights into their backend systems.
References:
Check out our medium page: Clerion Medium