Game Backend Observability and Monitoring on AWS

Apr 26, 2025

Apr 26, 2025

18 min read

18 min read

Introduction

In this post, we’ll walk through how to build a scalable, cost-effective game backend observability and log monitoring solution using AWS EKSAmazon S3, Loki, Fluent Bit and Grafana. We’ll also cover an optional optimization: automatically moving older logs to lower-cost S3 storage tiers with lifecycle policies.

Why Game Backend Observability Matters

Observability is critical for maintaining the performance, reliability, and scalability of game backend services. Proper log monitoring allows studios to quickly detect issues, optimize server performance, and deliver a seamless gaming experience.

Architecture Overview

Here's a high-level overview of the observability setup we'll implement:

  1. Game Backend Application Logs are generated on AWS EKS (Elastic Kubernetes Service).

  2. Logs are exported and stored into an Amazon S3 bucket.

  3. Loki is deployed on EKS and configured to read and index logs from Fluent Bit and chunks them on S3.

  4. Grafana is also deployed on EKS to visualize, search, and analyze the logs.

  5. Amazon S3 Lifecycle Policies are configured to automatically transition older logs to cheaper storage classes like S3 Standard-IA (Infrequent Access) or S3 Glacier for cost optimization.

Note: While in this architecture we deploy Grafana on an EKS Cluster for a fully containerized solution but It's also possible to run Grafana as a managed service using Amazon Managed Grafana or deploy it inside the EC2 Instance.

Prerequisites for Loki

  • Helm 3 or above. Refer to Installing Helm. This should be installed on your local machine.

  • A running Kubernetes cluster on AWS. A simple way to get started is by using eksctl. Refer to Getting started with EKSctl.

  • Kubectl installed on your local machine. Refer to Install and Set Up kubectl.

  • (Optional) AWS CLI installed on your local machine. Refer to Installing the AWS CLI. This is required if you plan to use eksctl to create the EKS cluster and modify the IAM roles and policies locally.

EKS Minimum Requirements for Loki

The minimum requirements for deploying Loki on EKS are:

  • Kubernetes version 1.30 or above.

  • 3 nodes for the EKS cluster.

The following plugins must also be installed within the EKS cluster:
  • Amazon EBS CSI Driver: Enables Kubernetes to dynamically provision and manage EBS volumes as persistent storage for applications. We use this to provision the node volumes for Loki.

  • CoreDNS: Provides internal DNS service for Kubernetes clusters, ensuring that services and pods can communicate with each other using DNS names.

  • kube-proxy: Maintains network rules on nodes, enabling communication between pods and services within the cluster.

You must also install an OIDC (OpenID Connect) provider on the EKS cluster. This is required for the IAM roles and policies to work correctly. If you are using EKSctl, you can install the OIDC provider using the following command:

eksctl utils associate-iam-oidc-provider --cluster [cluster-name] --approve

Output has to look like this:

Step 1: Exporting Backend Application Logs to Log Agent

First, configure your EKS workloads to export backend application logs into a Log Agent.

You can achieve this using tools like:

  • Fluent Bit or Fluentd

  • Promtail (Grafana Ecosystem)

  • Logstash (Elastic Ecosystem)

  • OpenTelemetry Collector

    etc.

In this document, we used Fluent Bit as the log agent. After deploying Fluent Bit to your EKS cluster, you need to define the OUTPUT configuration within the ConfigMap. You can find a template example below.

Example Fluent Bit output configuration:

[OUTPUT]
    Name        loki
    Match       *
    Host        123456789012341235-797527364.us-east-1.elb.amazonaws.com
    port        80
    tls         off
    tls.verify  off
    http_user   USERNAME
    http_passwd PASSWORD
    tenant_id   foo
    line_format json
    labels      job

For all configuration parameters and details you can visit Fluent Bit Docs.

Step 2: Defining IAM roles and policies

The recommended method for connecting Loki to AWS S3 is to use an IAM role. This method is more secure than using access keys and secret keys which are directly stored in the Loki configuration. The role and policy can be created using the AWS CLI or the AWS Management Console. The below steps show how to create the role and policy using the AWS CLI.

  1. Create a new directory and navigate to it. Make sure to create the files in this directory. All commands in this guide assume you are in this directory.

  2. Create a loki-s3-policy.json file with the following content:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "LokiStorage",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::< CHUNK BUCKET NAME >",
                    "arn:aws:s3:::< CHUNK BUCKET NAME >/*",
                    "arn:aws:s3:::< RULER BUCKET NAME >",
                    "arn:aws:s3:::< RULER BUCKET NAME >/*"
                ]
            }
        ]
    }

    Make sure to replace the placeholders with the names of the buckets you created earlier.


  3. Create the IAM policy using the AWS CLI:

    aws iam create-policy --policy-name LokiS3AccessPolicy --policy-document


  4. Create a trust policy document named trust-policy.json with the following content:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "arn:aws:iam::< ACCOUNT ID >:oidc-provider/oidc.eks.<INSERT REGION>.amazonaws.com/id/< OIDC ID >"
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "oidc.eks.<INSERT REGION>.amazonaws.com/id/< OIDC ID >:sub": "system:serviceaccount:loki:loki",
                        "oidc.eks.<INSERT REGION>.amazonaws.com/id/< OIDC ID >:aud": "sts.amazonaws.com"
    
    

    Make sure to replace the placeholders with your AWS account ID, region, and the OIDC ID (you can find this in the EKS cluster configuration).

    You can find your OIDC ID in AWS Management Console with Overview EKS Cluster:


  5. Create the IAM role using the AWS CLI:

    aws iam create-role --role-name LokiServiceAccountRole --assume-role-policy-document file://trust-policy.json


  6. Attach the policy to the role:

    aws iam attach-role-policy --role-name LokiServiceAccountRole --policy-arn arn:aws:iam::<Account ID>

    Make sure to replace the placeholder with your AWS account ID.

    The IAM Role and Policy configuration should match the structure shown in the screenshot below:


Step 4: Deploying Loki

Before we can deploy the Loki Helm chart, we need to add the Grafana chart repository to Helm. This repository contains the Loki Helm chart.

  1. Add the Grafana chart repository to Helm:

    helm repo add grafana https://grafana.github.io/helm-charts


  2. Update the chart repository:

    helm repo update


  3. Create a new namespace for Loki:

    kubectl create namespace loki

Loki Basic Authentication

Loki by default does not come with any authentication. Since we will be deploying Loki to AWS and exposing the gateway to the internet, we recommend adding at least basic authentication. In this guide we will give Loki a username and password:

  1. To start we will need create a .htpasswd file with the username and password. You can use the htpasswd command to create the file:

    If you don’t have the htpasswd command installed, you can install it using brew or apt-get or yum depending on your OS.

    htpasswd -c

    This will create a file called auth with the username loki. You will be prompted to enter a password.


  2. Create a Kubernetes secret with the .htpasswd file:

    kubectl create secret generic loki-basic-auth --from-file=.htpasswd -n


    This will create a secret called loki-basic-auth in the loki namespace. We will reference this secret in the Loki Helm chart configuration.


  3. Create a canary-basic-auth secret for the canary:

    kubectl create secret generic canary-basic-auth \
      --from-literal=username=<USERNAME> \
      --from-literal=password=<PASSWORD>
    
    


    We create a literal secret with the username and password for Loki canary to authenticate with the Loki gateway. Make sure to replace the placeholders with your desired username and password.

Loki Helm chart configuration

Create a values.yaml file choosing the configuration options that best suit your requirements. Below there is an example of values.yaml files for the Loki Helm chart in microservices mode.

loki:
   schemaConfig:
     configs:
       - from: "2024-04-01"
         store: tsdb
         object_store: s3
         schema: v13
         index:
           prefix: loki_index_
           period: 24h
   storage_config:
     aws:
       region: <S3 BUCKET REGION> # for example, eu-west-2  
       bucketnames: <CHUNK BUCKET NAME> # Your actual S3 bucket name, for example, loki-aws-dev-chunks
       s3forcepathstyle: false
   ingester:
       chunk_encoding: snappy
   pattern_ingester:
       enabled: true
   limits_config:
     allow_structured_metadata: true
     volume_enabled: true
     retention_period: 672h # 28 days retention
   compactor:
     retention_enabled: true 
     delete_request_store: s3
   ruler:
    enable_api: true
    storage:
      type: s3
      s3:
        region: <S3 BUCKET REGION> # for example, eu-west-2
        bucketnames: <RULER BUCKET NAME> # Your actual S3 bucket name, for example, loki-aws-dev-ruler
        s3forcepathstyle: false
      alertmanager_url: http://prom:9093 # The URL of the Alertmanager to send alerts (Prometheus, Mimir, etc.)

   querier:
      max_concurrent: 4

   storage:
      type: s3
      bucketNames:
        chunks: "<CHUNK BUCKET NAME>" # Your actual S3 bucket name (loki-aws-dev-chunks)
        ruler: "<RULER BUCKET NAME>" # Your actual S3 bucket name (loki-aws-dev-ruler)
        # admin: "<Insert s3 bucket name>" # Your actual S3 bucket name (loki-aws-dev-admin) - GEL customers only
      s3:
        region: <S3 BUCKET REGION> # eu-west-2
        #insecure: false
      # s3forcepathstyle: false

serviceAccount:
 create: true
 annotations:
   "eks.amazonaws.com/role-arn": "arn:aws:iam::<Account ID>:role/LokiServiceAccountRole" # The service role you created

deploymentMode: Distributed

ingester:
 replicas: 3
 zoneAwareReplication:
  enabled: false

querier:
 replicas: 3
 maxUnavailable: 2

queryFrontend:
 replicas: 2
 maxUnavailable: 1

queryScheduler:
 replicas: 2

distributor:
 replicas: 3
 maxUnavailable: 2
compactor:
 replicas: 1

indexGateway:
 replicas: 2
 maxUnavailable: 1

ruler:
 replicas: 1
 maxUnavailable: 1


# This exposes the Loki gateway so it can be written to and queried externaly
gateway:
 service:
   type: LoadBalancer
 basicAuth: 
     enabled: true
     existingSecret: loki-basic-auth

# Since we are using basic auth, we need to pass the username and password to the canary
lokiCanary:
  extraArgs:
    - -pass=$(LOKI_PASS)
    - -user=$(LOKI_USER)
  extraEnv:
    - name: LOKI_PASS
      valueFrom:
        secretKeyRef:
          name: canary-basic-auth
          key: password
    - name: LOKI_USER
      valueFrom:
        secretKeyRef:
          name: canary-basic-auth
          key: username

# Enable minio for storage
minio:
 enabled: false

backend:
 replicas: 0
read:
 replicas: 0
write:
 replicas: 0

singleBinary:
 replicas: 0

Make sure to replace the placeholders with your actual values.

It is critical to define a valid values.yaml file for the Loki deployment. To remove the risk of misconfiguration, let’s break down the configuration options to keep in mind when deploying to AWS:

Loki Config vs. Values Config:
  • The values.yaml file contains a section called loki, which contains a direct representation of the Loki configuration file.

  • This section defines the Loki configuration, including the schema, storage, and querier configuration.

  • The key configuration to focus on for chunks is the storage_config section, where you define the S3 bucket region and name. This tells Loki where to store the chunks.

  • The ruler section defines the configuration for the ruler, including the S3 bucket region and name. This tells Loki where to store the alert and recording rules.

  • For the full Loki configuration, refer to the Loki Configuration documentation.

Storage:
  • Defines where the Helm chart stores data.

  • Set the type to s3 since we are using Amazon S3.

  • Configure the bucket names for the chunks and ruler to match the buckets created earlier.

  • The s3 section specifies the region of the bucket.

Service Account:
  • The serviceAccount section is used to define the IAM role for the Loki service account.

  • This is where the IAM role created earlier is linked.

Gateway:
  • Defines how the Loki gateway will be exposed.

  • We are using a LoadBalancer service type in this configuration.

Now that you have created the values.yaml file, you can deploy Loki using the Helm chart.

  • Deploy using the newly created values.yaml file:

    helm install --values values.yaml loki grafana/loki -n loki --create-namespace

    It is important to create a namespace called loki as our trust policy is set to allow the IAM role to be used by the loki service account in the loki namespace. This is configurable but make sure to update your service account.


  • Verify the deployment:

    kubectl get pods -n loki


    You should see the pods are running.

    NAME                                    READY   STATUS    RESTARTS   AGE
    loki-canary-crqpg                       1/1     Running   0          10m
    loki-canary-hm26p                       1/1     Running   0          10m
    loki-canary-v9wv9                       1/1     Running   0          10m
    loki-chunks-cache-0                     2/2     Running   0          10m
    loki-compactor-0                        1/1     Running   0          10m
    loki-distributor-78ccdcc9b4-9wlhl       1/1     Running   0          10m
    loki-distributor-78ccdcc9b4-km6j2       1/1     Running   0          10m
    loki-distributor-78ccdcc9b4-ptwrb       1/1     Running   0          10m
    loki-gateway-5f97f78755-hm6mx           1/1     Running   0          10m
    loki-index-gateway-0                    1/1     Running   0          10m
    loki-index-gateway-1                    1/1     Running   0          10m
    loki-ingester-zone-a-0                  1/1     Running   0          10m
    loki-ingester-zone-b-0                  1/1     Running   0          10m
    loki-ingester-zone-c-0                  1/1     Running   0          10m
    loki-querier-89d4ff448-4vr9b            1/1     Running   0          10m
    loki-querier-89d4ff448-7nvrf            1/1     Running   0          10m
    loki-querier-89d4ff448-q89kh            1/1     Running   0          10m
    loki-query-frontend-678899db5-n5wc4     1/1     Running   0          10m
    loki-query-frontend-678899db5-tf69b     1/1     Running   0          10m
    loki-query-scheduler-7d666bf759-9xqb5   1/1     Running   0          10m
    loki-query-scheduler-7d666bf759-kpb5q   1/1     Running   0          10m
    loki-results-cache-0                    2/2     Running   0          10m
    loki-ruler-0                            1/1     Running   0          10m

Find the Loki Gateway Service

The Loki Gateway service is a LoadBalancer service that exposes the Loki gateway to the internet. This is where you will write logs to and query logs from. By default NGINX is used as the gateway.

To find the Loki Gateway service, run the following command:

kubectl get svc -n loki

You should see the Loki Gateway service with an external IP address. This is the address you will use to write to and query Loki.

  NAME                             TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)              AGE
loki-gateway                     LoadBalancer   10.100.201.74    1111111111111111112222-xxxxxxxxxxxxxxxxxxxxxx.eu-west-2.elb.amazonaws.com   80:30707/TCP         46m
💡 Tip:

If Grafana is running inside the same Kubernetes cluster as Loki, you can configure the data source using the following URL:

http://loki-gateway.loki.svc.cluster.local/


Step 5: Deploying Grafana and Adding Data Source

If you prefer to deploy Grafana inside your EKS cluster using Helm Charts, here is a quick guide:

First, add the Grafana Helm repository and update:


Then install Grafana:

helm upgrade --install grafana grafana/grafana \
  --set adminPassword='YourStrongPassword' \
  --set service.type=LoadBalancer \
  --namespace monitoring \
  --create-namespace

After deployment, you can access Grafana using the LoadBalancer endpoint and start configuring it by adding Loki as a data source.

After deployment:

  • Access Grafana UI.

  • Configure Loki as a data source with Basic Authentication, using the credentials provided above.

  • Start querying and visualizing your backend application logs! ✌🏻


⚠️ Important Note:

Loki defaults to running in multi-tenant mode. Multi-tenant mode is set in the configuration with auth_enabled: true. When configured with auth_enabled: false, Loki uses a single tenant. The X-Scope-OrgID header is not required in Loki API requests. The single tenant ID will be the string fake.

So, we need to add the under the X-Scope-OrgID header under the "Custom HTTP Headers" section or add this to the values.yaml file:

loki:
  auth_enabled: false


Step 6 (Bonus): Optimize Log Storage with S3 Lifecycle Policies

As logs accumulate, storage costs can grow. AWS S3 Lifecycle Policies help optimize storage costs automatically.

For example:

  • Move logs older than 30 days to S3 Standard-IA.

  • Move logs older than 90 days to S3 Glacier.

  • Delete logs after 365 days if needed.

Example JSON policy:

{
  "Rules": [
    {
      "ID": "MoveToIA",
      "Prefix": "backend-logs/",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days"

You can set this up easily through the AWS Management Console or via IaC tools like Terraform.

Conclusion

By combining EKSS3Loki, Fluent Bit and Grafana, you can build a powerful and scalable game backend observability and log monitoring solution — without worrying about running your own heavy log management infrastructure. Additionally, with smart lifecycle policies, you ensure long-term sustainability and cost efficiency.

This approach lets game studios focus on building reliable, high-performance games while having deep insights into their backend systems.

References:

Check out our medium page: Clerion Medium

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.

Start Your Cloud Journey Today

Contact us now and take the first step toward innovation and scalability with Clerion’s expert cloud solutions.