Cloud Computing - Sqoop

Back to Course

Lesson Description

Lession - #791 Sqoop Import

This part depicts how to import information from MySQL data set to Hadoop HDFS. The 'Import apparatus' imports individual tables from RDBMS to HDFS. Each line in a table is treated as a record in HDFS. All records are put away as text information in the text documents or as paired information in Avro and Sequence documents.


Below is the synatx for sqoop import command.
$ sqoop import (generic-args>
$ sqoop-import (generic-args>


Allow us to take an illustration of three tables named as emp, emp_add, and emp_contact, which are in an information base called userdb in a MySQL data set waiter. The three tables and their information are as per the following.




Importing a Table

Sqoop apparatus 'import' is utilized to import table information from the table to the Hadoop record framework as a text document or a binary document. The accompanying command is utilized to import the employees table from MySQL data set waiter to HDFS.
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp --m 1
To check the imported information in HDFS, utilize the accompanying order.
$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*
It shows you the emp table information and fields are isolated with comma (,>
 102,laxmi,python dev,60000,TP     
 103,roopa,.NET dev,40000,TP    
 104,deepa,.NET dev,30000,AC    

Importing into Target Directory

We can determine the objective registry while bringing table information into HDFS utilizing the Sqoop import device. Following is the linguistic structure to indicate the objective registry as choice to the Sqoop import order.
The accompanying order is utilized to import employee_add table information into '/queryresult' registry.
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp_add \
--m 1 \
--target-dir /queryresult
The accompanying order is utilized to check the imported information in/queryresult registry structure employee_add table.
$ $HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-*
101, 28A, vgiri,   jublee
102, 18I, aoc,     sec-bad
103, 14Z, pgutta,  hyd
104, 78C,  oldcity, sec-bad
105, 72x, hitech,  sec-bad

Import Subset of Table Data

TWe can import a subset of a table involving the 'where' proviso in Sqoop import apparatus. It executes the relating SQL inquiry in the particular data set server and stores the outcome in an objective registry in HDFS. The structure for where clause is as per the following.
The accompanying command is utilized to import a subset of employee_add table information. The subset question is to recover the representative id and address, who lives in Secunderabad city.
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp_add \
--m 1 \
--where “city =’sec-bad’” \
--target-dir /wherequery
The accompanying order is utilized to confirm the imported information in/wherequery index from the emp_add table.
$ $HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-*

Incremental Import

Gradual import is a method that imports just the recently added lines in a table. It is expected to add 'steady', 'check-segment', and 'last-esteem' choices to play out the gradual import. The accompanying linguistic structure is utilized for the gradual choice in Sqoop import order.
--last value