A Customer Journey

Refactoring IBM DB2 to AWS Redshift

aws-partner-badge

Refactoring IBM DB2 to AWS Redshift

In 2019, our client requested Autoverse to modernize their DB2 Data Warehouse to AWS Redshift. This migration had a greater complexity because we were migrating from one database system to another, and from a transactional processing system, OLTP, to an analytical processing system, OLAP. On top of this double heterogeneous migration, the solution needed to have a near real-time process to ingest data from EMR. 6 months later the solution was complete.

6 months later the solution was complete. Read on to find out how we did it!

Database Assessment

Our initial step was to assess all the objects in the database to determine the difficulty migrating to AWS Redshift. Objects such as tables, indexes, triggers and sequences are generally easily converted to the new platform. Other objects such as packages, procedures, functions will require manual conversion and testing. DB2 database features need to be checked for AWS Redshift equivalent, and we needed to determine which components of the application need to be modified in order to support the new database platform.

Database Object Conversion

Once all potential issues and possible delays have been identified, database objects are converted and created in the target system. We create a baseline for objects using a number of tools that save our time cost. Difficult features such as packages, procedures and functions are not included in this step.

Partitioning and De-Normalization of tables

Our next crucial step was to look at the current normalized data structure and determine the best partitioning keys to use in Redshift. Generally, the largest dimension table associated with the fact table was used to determine the partitioning key. Tables with a few records were copied to each partition in Redshift. Proper partitioning lead to greater co-location of data, decreased inter-nodal traffic and lead to an overall more performant solution.

Conversion of Packages, Procedures and Functions

When all baseline objects are created, manual work begins on any stored code procedures, packages, and functions. From our extensive experience in this task, we have created our own list of best practices ensuring successful migration of code. Full testing of these functions are done with data that we supply. In order not to bother our customers too much, we take the extra time to check table data and fully understand code in order to create proper test cases.

Load data using partitioning scheme and test

Next large datasets are loaded into Redshift using the table and partition scheme that we created. Numerous queries were collected and tested for performance and query response time in AWS Redshift. If query processing and response time was better than the source DB2 system, we did not query tune them. If response time was equal to or worse than the source db2 system, those queries were tuned.

Develop Near Real-Time processing of Data

Once partitioning scheme and query response time met all SLAs, it was time to work on the near real-time data ingestion. Within redshift, temp tables were created to ingest data without affecting historical tables. AWS EMR would run a MapReduce process that created data files which landed on s3. New files created on S3 would trigger an event which called a lambda function. The lambda function would initiate the load to Redshift and run further processing on the Redshift tables. The further processing consisted of windowing queries that would merge data from temporary ingest tables to historical tables. Once the near real-time streaming process was able to ingest data and run processing every 1 minutes it met SLAs, and we were able to productionize the process and prepare to move it to production.

Determine which data is better suited for Redshift Spectrum

Next we needed to determine which data was better suited to be kept out of Redshift and on s3 using Redshift Spectrum. We did determine that AWS Redshift Spectrum was significantly slower than AWS Redshift. For this reason, all data that was older than 3 years and infrequently used extracted to s3 and made available within Redshift Spectrum. An automated process to keep up with this 3 year timeline was also created.

Application Conversion Effort

After the data is loaded and the near real-time ingestion solution was hardened and approved, we assisted the application team in converting DB2 SQL statements found in application code to AWS Redshift SQL. This is a generally manual effort and can take time to work with internal application teams.

Cut-Over, Rollback Plans and Hypercare

At this point we are ready to cut-over our first system which is usually DEV. Cut-over plans and rollback options are discussed and agreed on. This cut-over plan will be executed again as we do QA and PROD. Production will be the third time going through the cut-over process and generally tend to go seamlessly.

Acoustic

Services

Cloud Consulting Services
Advisory Services
Project Management

Technologies

AWS EC2
AWS Lambda
IBM DB2
Terraform
Ansible
Liquibase
Direct Connect
AWS VPN
AWS DMS
AWS SCT