Migrating different database tables into databricks

This blog demonstrates how to migrate data from different database to databricks.

Migrating snowflake database table into databricks: 

The steps below describe how to migrate data from snowflake table to a databricks table.

Step 1:

Create a new snowflake account or use an existing account.

Step 2:

Create a table in snowflake.

				
					CREATE OR REPLACE TABLE MAIN.CITY(CITY_ID BIGINT,CITYNAME STRING,PINCODE INT,COUNTRY STRING);

				
			

Below query is used to insert values in the table,

				
					INSERT INTO MAIN.CITY VALUES
(10020,'NEW DELHI',110001,'INDIA'),
(10021,'MUMBAI',400001,'INDIA'),
(10022,'CHENNAI',600028,'INDIA'),
(10023,'LONDON',400706,'UNITED KINGDOM'),
(10024,'PARIS',403101,'FRANCE'),
(10025,'DUBAI',600017,'UNITED ARAB EMIRATES'),
(10026,'SINGAPORE',395004,'SOUTHEAST ASIA'),
(10027,'NEW YORK',360005,'UNITED STATES OF AMERICA'),
(10028,'TOKYO',533101,'JAPAN');
				
			

To view the table, use below select query in snowflake,

				
					SELECT * FROM MAIN.CITY;
				
			
Source Table (Snowflake)

Step 3:

Provide the username, URL (hostname), password, warehouse name and database name to connect snowflake to databricks.

				
					url_1="hostname"
user="snowflake_username"
password="snowflake_password"
database="database_name"
sfWarehouse="warehouse_name"

sfOptions = {
"sfUrl": url_1,
"sfUser": user,
"sfPassword": password,
"sfDatabase": database,
"sfWarehouse": sfWarehouse
}

				
			

Note:

sfUrl, sfUser, sfPassword, and sfDatabase must be provided. And sfWarehouse can be optional.

Step 4:

As shown below, select query should be saved in a string variable. Get the data from query and store it in a dataframe using spark methods.

				
					query=”SELECT * FROM MAIN.CITY”
df=spark.read.format("net.snowflake.spark.snowflake").options(**sfOptions).option("query", query).load()
				
			

show() is used to display the contents of the DataFrame.

				
					df.show()
				
			
Data Stored In a Dataframe

Below saveAsTable() used to save the contents of the DataFrame to a databricks table,

				
					df.write.saveAsTable(“main.city”)
				
			

If schema is not provided, the table is stored in default schema. If schema is provided, the table will be stored in mentioned schema.

To view the schema of the table, the below command can be used in databricks sql.

				
					DESC TABLE MAIN.CITY;
				
			

The schema of the table is shown below,

To select the table, use below command in databricks sql.

				
					SELECT * FROM MAIN.CITY;
				
			
Databricks Table

Migrating snowflake database table into databricks: 

The steps below describe how to migrate data from oracle table to a databricks table.

Step 1:

Create a table in oracle database.

				
					CREATE TABLE MAIN.LAPTOPS(LAPTOP_ID INT,LAPTOP_NAME VARCHAR(40),SPECIFICATION VARCHAR(70),OS VARCHAR(30),PRICE_IN_RUPEES INT);
				
			

Below query is used to insert values in that table,

				
					INSERT ALL
INTO MAIN.LAPTOPS VALUES(1,'Apple 16-Inch MacBook Pro M1 Max','10-core | 3.1 GHz Processor ','Mac Monterey OS',309490)
INTO MAIN.LAPTOPS VALUES(2,'Lenovo Yoga 9i','11th Gen Intel Core i7–1185G7 | 3.00 GHz processor', 'Windows 10 Home OS',164686)
INTO MAIN.LAPTOPS VALUES(3,'Dell New XPS 13 Plus','12th Gen Intel EVO Core i7–1260P | 3.40 GHz Processor', 'Windows 11 OS',215000)
INTO MAIN.LAPTOPS VALUES(4,'Apple MacBook Air M2','Apple M2 | NA Processor',' iOS OS',139390)
INTO MAIN.LAPTOPS VALUES(5,'Acer Nitro 5','12th Gen Intel Core i7–12700H | 4.70 GHz processor', 'Windows 11 Home OS',101000)
INTO MAIN.LAPTOPS VALUES(6,'Lenovo IdeaPad Gaming 3i','12th Gen Intel Core i5–12450H | 2.0 GHz processor', 'Windows 11 Home OS',79900)
INTO MAIN.LAPTOPS VALUES(7,'Acer Swift X','AMD Ryzen 7–5800U | 1.9 GHz processor', 'Windows 11 Home OS',99999)(30),PRICE_IN_RUPEES INT);
				
			

To view the records in the table, use below select query in Oracle database,

				
					SELECT * FROM MAIN.LAPTOPS;
				
			
Source Table (Oracle)

Step 2:

To install oracledb module, below pip command is required.

				
					pip install oracledb
				
			

Step 3:

Import the module and provide Oracle database credentials as shown below,

				
					import oracledb
user='oracle_username'
password='oracle_password'
port='port_no' #default : 1521
service_name='ORCL'
host_name='host_name of oracle database'
conn_string="{0}:{1}/{2}".format(host_name,port,service_name)
				
			

Step 4:

Establish a connection to a database using cursor () and execute a sql query using execute().

				
					connection= oracledb.connect(user=user, password=password, dsn=conn_string)
cursor=connection.cursor()
query=cursor.execute("SELECT * FROM MAIN.LAPTOPS")
				
			

Step 5:

To extract the column names, use cursor.description as shown below,

				
					column_names = ([i[0] for i in cursor.description])
print(column_names)
				
			

This column_names variable contains columns in the table as a list.

Column Names Stored In a List

Step 6:

Create a dataframe with the query and column_names variable.

				
					table_df=spark.createDataFrame(query,column_names)
table_df.show()
				
			

Step 7:

Create a table with saveAsTable(). If no schema is provided, the table is stored in default schema.

				
					table_df.write.saveAsTable(“laptops”)
				
			

If schema is provided, the table will be stored in mentioned schema.

				
					table_df.write.saveAsTable(“main.laptops”)
				
			

The schema of the table and databricks table is given below,

Schema Of The Databricks Table
Databricks Table

Conclusion:

At last, the tables from the Oracle and Snowflake databases are stored into another table in DataBricks by following these steps.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *

Shahnewaz Khan

10 years of experience with BI and Analytics delivery.

Shahnewaz is a technically minded and accomplished Data management and technology leader with over 19 years’ experience in Data and Analytics.

Including;

  • Data Science
  • Strategic transformation
  • Delivery management
  • Data strategy
  • Artificial intelligence
  • Machine learning
  • Big data
  • Cloud transformation
  • Data governance. 


Highly skilled in developing and executing effective data strategies, conducting operational analysis, revamping technical systems, maintaining smooth workflow, operating model design and introducing change to organisational programmes. A proven leader with remarkable efficiency in building and leading cross-functional, cross-region teams & implementing training programmes for performance optimisation. 


Thiru Ps

Solution/ Data/ Technical / Cloud Architect

Thiru has 15+ years experience in the business intelligence community and has worked in a number of roles and environments that have positioned him to confidently speak about advancements in corporate strategy, analytics, data warehousing, and master data management. Thiru loves taking a leadership role in technology architecture always seeking to design solutions that meet operational requirements, leveraging existing operations, and innovating data integration and extraction solutions.

Thiru’s experience covers;

  • Database integration architecture
  • Big data
  • Hadoop
  • Software solutions
  • Data analysis, analytics, and quality. 
  • Global markets

 

In addition, Thiru is particularly equipped to handle global market shifts and technology advancements that often limit or paralyse corporations having worked in the US, Australia and India.