Migrating Teradata to BigQuery

Nilesh Khandalkar
3 min readNov 16, 2023

This blog will give a quick summary for the process of transferring both schema and data from a Teradata data warehouse to BigQuery.

There are two methods of migrating a Teradata data warehouse to Google Cloud BigQuery, they are the TPT — Teradata Parallel Transporter and DTS — BigQuery Data Transfer Service (Check my earlier blog to learn more on DTS https://medium.com/codex/gcp-transfer-options-11ea963dfb79)

Before you begin, you typically have to look at the Teradata data warehouse architecture and then contrast it with the post-migration BigQuery architecture to create a reference architecture diagram to show that both data warehouses offer similar functionality and paths both can ingest.

Teradata parallel transporter or TPT export utility, you can use this tool to manually migrate your Teradata schema and data to BigQuery, however Google Cloud provides Cloud data transfer service allowing you to simplify and accelerate the migration process.

TPT export utility gives you complete control over the migration process but in doing so the migration becomes a high touch manual process, the TPT export utility stores the exported data on your computer’s local storage, you must then copy the exported data to a cloud storage bucket (GCS) within your Google Cloud project. More on GCS: https://medium.com/@nileshk611/google-cloud-storage-life-cycle-management-26a680cb6df4

Alternatively Data transfer service is a managed service provided by Google Cloud that allows you to automate and schedule data transfers from external sources into BigQuery. To use Data transfer service to migrate Teradata, you must first export the data using the TPT export utility then Data transfer service automatically stages the data for you in a cloud storage bucket of your choice. Using the user-friendly graphical interface you can easily set up recurring transfers of data to BigQuery using a variety of connectors and data sources. Overall data transfer service simplifies the process of getting data into BigQuery providing an automated and scalable way to transfer data from various sources. Regardless of which method you choose when the migration is complete the result should be the same.

Migrating using TPT first you will use the TPT export utility to export the data to your local storage, then you must upload the exported data to a cloud storage bucket (GCS). Then from GCS, there are several ways to load the data to BigQuery using Dataflow, Data Fusion, BQ Load or using python scripts which can be scheduled via Airflow. Depending on volume of data, you may consider having one or more TPT scripts, for copying the Teradata export files to GCS, you may install Google Cloud SDK and use gsutil commands to upload to GCS.

Migrating using DTS from Teradata to BigQuery first download the migration agent and next configure a transfer job in the BigQuery data transfer service, then run the transfer job to copy table schema and data from your Teradata data warehouse to BigQuery and finally monitor the transfer jobs in Google Cloud console to ensure a successful Teradata data warehouse migration. The agent executes a TPT extraction script producing a set of pipe delimited files and then it uploads it to the GCS. Agent then notifies BigQuery DTS through Pub Sub which triggers the DTS job to create tables and load the data into the BigQuery from GCS.

The BigQuery Data Transfer Service provides automatic schema detection and data type mapping during a data transfer from Teradata to BigQuery. Optionally, you can specify a custom schema file instead.

Finally to validate the data transfer between Teradata and in BigQuery the most direct means of validating the success of your migration is to run a set of queries on the source and destination systems, you then compare the results between the systems.

--

--

Nilesh Khandalkar

Passionate about Data and Cloud, working as Data Engineering Manager at Capgemini UK. GCP Professional Data Engineering Certified Airflow Fundamentals Certified