A Customer Journey

Replicating Mainframe Data to AWS S3

migration

The Challenge

Enable ongoing replication from on premise mainframe IBM DB2 z/OS databases to AWS S3.

Our Solution

The challenge with replicating mainframe data, is that binary database log files need to be read. If we are replicating data from one mainframe system to another mainframe system, both systems are using big-endian byte storage and both systems can read and interpret binary logs correctly. When replicating mainframe data to AWS, binary logs written in big-endian need to be read and translated into little-endian to accommodate AWS x86 architecture.

To assist with this conversion, we used the IBM IIDR replication tool to translate mainframe binary database logs and replicate the data into an AWS Aurora PostgreSQL instance. This tool was currently being used to replicate data between systems on premise, which made it a great candidate to use for our solution as it did not add more software licensing and a majority of setup was already completed.

The IBM IIDR version that we were using was a bit older and did not include the feature to replicate directly to AWS S3. Due to this limitation, we needed another replication step from AWS PostgreSQL to AWS S3 and selected AWS Database Migration Service(DMS) to accomplish this task.

A future enhancement to this solution, was to upgrade IBM IIDR to the latest version, which included a feature to extract data directly to AWS S3. We would then remove the AWS Aurora PostgreSQL instance and AWS DMS components from this solution, reducing cost and increasing simplicity.

This solution required two replication steps, the first step was to replicate data from mainframe IBM DB2 z/OS systems to AWS Aurora PostgreSQL and the second step was to replicate data from AWS Aurora PostgreSQL to AWS S3.

AWS Aurora PostgreSQL Configuration and Deployment

The IBM IIDR Replication target is an AWS Aurora PostgreSQL database. AWS Aurora PostgreSQL was created using CDK as Infrastructure as Code. High-availability/Multi-AZ was enabled for this instance. However, auto-scaling was not enabled because resource consumption was low and steady. The CDK application created all database user IDs and passwords in secret manager.

IBM IIDR Replication Configuration and Deployment

The IBM IIDR Replication solution requires installation of two pieces of software. The IIDR Replication Agent, and the IIDR Access Server. The IIDR Replication Agent is the process that is responsible for the actual conversion of data, and transmission of data. The IIDR Access Server is responsible for the coordination of tasks to and from replication agents.

Both the IIDR Replication Agent and the IIDR Access Server were installed on separate EC2 instances. The deployment of EC2 instances, installation and configuration of each IIDR software component was automated through a CDK Infrastructure as Code application. Bootstrap scripts were created for each EC2 instance and all passwords were retrieved from AWS Secrets Manager.

IBM DB2 z/OS DDL Conversion

Once the AWS Aurora PostgreSQL database and IIDR EC2 instances were up and running, it was time to configure the replication of data from on premise to AWS. Our first step was to convert the mainframe IBM DB2 z/OS table DDLs to AWS Aurora PostgreSQL. Since there are no open sources tools to convert IBM DB2 z/OS DDLs and since our team has done this numerous times, table DDLs were manually converted. Once PostgreSQL DDLs were converted and created on the AWS PostgreSQL instance, IBM IIDR was configured to replicate data between on premise and AWS PostgreSQL.

AWS Aurora PostgreSQL DDL Updates

Another large component to this solution was to implement an automated method to apply updates and changes to DDL and other objects in the database. We created another CDK Infrastructure as Code application to achieve this requirement. The CDK application created a lambda function that executed commands on a Docker container that ran Flyway, a free database versioning tool. Database object updates were effortlessly worked into CI/CD pipelines with this CDK application and mitigated numerous risks with database object changes.

AWS DMS Instance and Task Creation

Our final task was to set up AWS Database Migration Service(DMS) to replicate AWS Aurora PostgreSQL data to AWS S3. An AWS DMS instance and AWS DMS endpoints for AWS Aurora PostgreSQL and AWS S3 bucket and folder were created. Finally, an AWS DMS task was created using the full-load and CDC replication option.

The Result

The solution was a success and mainframe IBM DB2 z/OS data was continuously and effortlessly replicated to AWS S3. The project took at full year to implement, which was primarily due to working with numerous teams and formal processes that are required at each stage of the project.

Lessons Learned

Working with the Government requires a significant amount of work for formal processes and project stage clearance. Try to line up whom you need to work with, when you need to engagement with them, what documents are required and what approvals are required. Try to streamline the process as much as possible and minimize any potential delays.

autoverse

Services

Cloud Consulting Services
Advisory Services
Project Management

Technologies

AWS EC2
AWS Lambda
IBM DB2
Flyway
AWS CDK
AWS CodeCommit
AWS CodeBuild
AWS CodeDeploy
AWS DMS
AWS SCT
AWS ECR

Dates

01/2021 - 02/2022