From Legacy Constraints to Scalable Growth: Data Transformation with Iceberg on AWS

  1. Home
  2. /
  3. From Legacy Constraints to Scalable Growth: Data Transformation with Iceberg...

Services provided

  • Data & Analytics
  • Cloud Modernisation
  • Data Engineering
  • Serverless Architecture

Platforms used

  • AWS (S3 Tables (Iceberg), Redshift, Glue, Lambda, EventBridge, MWAA, Lake Formation, Transfer Family)

Engagement length

  • 14 weeks (Feb 2025 – May 2026 initial rollout)

Other stats

  • Metadata-driven ingestion from 5 disparate sources with different data formats
  • Supported 40+ venues in first phase, scaling to 150+ venues in the future
  • ~20M+ events processed per month via serverless pipelines
  • Support for 3 temporal patterns – batch, micro-batch and event-driven data processing patterns

Background on the client

Our customer is a rapidly growing payments and venue technology provider supporting hospitality and gaming venues across Australia. With operations spanning 80+ venues and plans to scale beyond 150 within the next year, the need for a modern, scalable data platform became critical to support growth, improve reporting reliability, and unlock future analytics and AI capabilities.

 

Challenge

The customer inherited a legacy, on-premises OLAP data cube and data warehouse managed by a third-party vendor to deliver ad-hoc reporting and insights across gaming, POS and customer analytics. This architecture was not designed for modern data workloads and created several constraints:

  • Limited scalability to add new venues as the business expanded
  • Data coverage limited to only one gaming software provider
  • Incomplete dataset coverage required for holistic insights
  • Heavy reliance on manual data handling
  • Inconsistent and unreliable reporting, with poor response times
  • Inability to support AI or future data initiatives

 

With an urgent need to replace the legacy platform within a fixed timeframe, the organisation faced both technical and delivery risk.

Stack Highlights

AWS Transfer Family AWS Lambda Amazon EventBridge Amazon DynamoDB Amazon MWAA (Airflow) AWS Glue Amazon S3 General Amazon S3 Tables (Iceberg) Amazon Redshift AWS Glue Data Catalog AWS Lake Formation

The Approach

Codex designed and delivered a modern AWS-native data lakehouse platform, built on Apache Iceberg, to replace the legacy OLAP environment and establish a scalable, governed data foundation.

Our approach included:

1. AI-enabled reverse engineering of business logic in 100k+ lines of legacy MS SQL Server and SSIS code

2. Conducting detailed data discovery to identify gaps, dependencies, and prioritised datasets

3. Designing a lakehouse architecture with layered data zones (Landing, Bronze, Silver and Gold)

4. Building data-contract driven, automated data ingestion pipelines using AWS Transfer Family, APIs, and Lambda to support scale

5. Implementing event-driven orchestration with EventBridge and MWAA

6. Developing data pipelines using AWS Glue for data transformation and enrichment

7. Establishing data governance with Lake Formation and Glue Data Catalog, with an observability layer

8. Delivering a serverless-first architecture to minimise operational overhead and cost

9. Integrating reporting via Amazon Redshift with downstream tools like Excel

This approach enabled a flexible, extensible platform capable of onboarding new data sources without redesign.

Technical outputs

Codex delivered a production-ready, scalable data platform that transformed our customer’s reporting and analytics capabilities.

Key outputs:

Replaced legacy OLAP cube with a modern, AWS-native lakehouse platform, removing dependency on third-party infrastructure
Scaled data ingestion to support 5 data sources, supporting increasing data volumes
Utilised Amazon S3 Tables with Apache Iceberg to enable schema evolution, snapshot-based time travel, and efficient date-based partitioning
Expanded reporting datasets for gaming and POS functional areas, achieving 100% data completeness and eliminating critical data gaps
Implemented event-driven pipelines processing ~20M events per month, enabling automated, scalable data workflows
Built fully automated ingestion and transformation pipelines, leveraging open standard data contracts, significantly reducing manual data handling and future ingestion effort
Improved data reliability, consistency, and accuracy through standardised data pipelines and governed data layers
Established a layered lakehouse architecture (S3 Iceberg + Redshift) supporting high-performance analytics and reporting
Enabled secure, governed access to data using AWS Lake Formation and Glue Data Catalog
Delivered a production-ready platform within a fixed delivery window, meeting a hard rollout deadline
Created a scalable and extensible data foundation capable of onboarding new data sources without redesign
Established a foundation for advanced analytics and AI use cases

The platform now enables our customer to generate timely, reliable insights while supporting ongoing expansion across venues.

Data Platform Highlights

Modern AWS Lakehouse Architecture A scalable lakehouse design using S3 Iceberg tables and Redshift enables centralised storage, layered data processing, and high-performance analytics.
Automated Data Pipelines Data contract-driven ingestion and transformation pipelines ensure consistent, scalable and reliable data processing without manual intervention.
Governed and Secure Data Access Lake Formation and Glue Data Catalog provide role-based access control and strong data governance across the platform.
Serverless and Cost-Optimised Design Leveraging serverless services like Lambda, Glue, and S3 aligns cost with usage while eliminating infrastructure management overhead.
Scalable Foundation for Growth The platform supports expansion from 80+ to 150+ venues and can easily onboard new data sources and use cases.

Business and commercial outcomes

The modern data platform delivered measurable commercial value for our customer, improving efficiency, scalability and decision-making across the organisation.
  1. Delivered continuity of OLAP cube experience to 40+ venues, with scale to support 150+ venues.
  2. Modernised data foundations on AWS from a legacy on-premises, MS SQL Server based solution, significantly reducing opex.
  3. Enabled metadata-driven data processing from 5 different sources to serve the cube and future analytics and AI use cases.
  4. Built reusable assets to optimise future costs and support scale to additional gaming and POS software providers, as well as new venues.
  5. Leveraged direct secure connectivity from Excel to Redshift, saving $100K USD in additional software costs.
  6. Streamlined onboarding for venues leveraging existing processes to deliver an integrated experience.
  7. Delivered an improved performance experience to end consumers by leveraging Redshift capabilities.

Talk to Us

We would love the opportunity to connect and understand more about the problems you are trying to solve.

Adrian Cambpell
Associate Partner, AI

Martin Campbell
Managing Partner

Get in touch to coordinate a meeting with one of our technical experts.
Australia: +61 7 3132 3002.