Refactoring IBM DB2 to AWS Redshift
In 2019, our client requested Autoverse to
modernize their DB2 Data Warehouse to AWS Redshift. This migration had a greater complexity because we
were migrating from one database system to another, and from a transactional processing system, OLTP, to
an analytical processing system, OLAP. On top of this double heterogeneous migration, the solution
needed to have a near real-time process to ingest data from EMR. 6 months later the solution was
6 months later the solution was complete. Read on to find out how we did it!
Our initial step was to assess all the objects in the database to determine the
difficulty migrating to AWS Redshift. Objects such as tables, indexes, triggers and sequences are
generally easily converted to the new platform. Other objects such as packages, procedures, functions
will require manual conversion and testing. DB2 database features need to be checked for AWS Redshift
equivalent, and we needed to determine which components of the application need to be modified in order
to support the new database platform.
Database Object Conversion
Once all potential issues and possible delays have been identified, database objects
are converted and created in the target system. We create a baseline for objects using a number of tools
that save our time cost. Difficult features such as packages, procedures and functions are not included
in this step.
Partitioning and De-Normalization of tables
Our next crucial step was to look at the current normalized data structure and
determine the best partitioning keys to use in Redshift. Generally, the largest dimension table
associated with the fact table was used to determine the partitioning key. Tables with a few records
were copied to each partition in Redshift. Proper partitioning lead to greater co-location of data,
decreased inter-nodal traffic and lead to an overall more performant solution.
Conversion of Packages, Procedures and Functions
When all baseline objects are created, manual work begins on any stored code
procedures, packages, and functions. From our extensive experience in this task, we have created our own
list of best practices ensuring successful migration of code. Full testing of these functions are done
with data that we supply. In order not to bother our customers too much, we take the extra time to check
table data and fully understand code in order to create proper test cases.
Load data using partitioning scheme and test
Next large datasets are loaded into Redshift using the table and partition scheme
we created. Numerous queries were collected and tested for performance and query response time in AWS
Redshift. If query processing and response time was better than the source DB2 system, we did not query
tune them. If response time was equal to or worse than the source db2 system, those queries were
Develop Near Real-Time processing of Data
Once partitioning scheme and query response time met all SLAs, it was time to work on
the near real-time data ingestion. Within redshift, temp tables were created to ingest data without
affecting historical tables. AWS EMR would run a MapReduce process that created data files which landed
on s3. New files created on S3 would trigger an event which called a lambda function. The lambda
function would initiate the load to Redshift and run further processing on the Redshift tables. The
further processing consisted of windowing queries that would merge data from temporary ingest tables to
historical tables. Once the near real-time streaming process was able to ingest data and run processing
every 1 minutes it met SLAs, and we were able to productionize the process and prepare to move it to
Determine which data is better suited for Redshift Spectrum
Next we needed to determine which data was better suited to be kept out of Redshift
and on s3 using Redshift Spectrum. We did determine that AWS Redshift Spectrum was significantly slower
than AWS Redshift. For this reason, all data that was older than 3 years and infrequently used extracted
to s3 and made available within Redshift Spectrum. An automated process to keep up with this 3 year
timeline was also created.
Application Conversion Effort
After the data is loaded and the near real-time ingestion solution was hardened and
approved, we assisted the application team in converting DB2 SQL statements found in application code to
AWS Redshift SQL. This is a generally manual effort and can take time to work with internal application
Cut-Over, Rollback Plans and Hypercare
At this point we are ready to cut-over our first system which is usually DEV. Cut-over
plans and rollback options are discussed and agreed on. This cut-over plan will be executed again as we
do QA and PROD. Production will be the third time going through the cut-over process and generally tend to go seamlessly.