challa.net | Tech weblog | Lyrics
Sqoop uses JDBC API
Split column:
Sqoop will first try to identify a split column (typically primarykey column).
Sqoop uses split column to divide the recordset with number of mappers available. Typically primarykey column is the best candidate for this purpose.
Sqoop will apply min & max functions on the primary key column > using this info , it will try to generated separate mappers based on a range.
Bounding Values Query:
Select min(orderid), max(orderid) from orders;
Say, Min = 1 & Max = 1000
By default it uses 4 mappers, for each mapper:
Select * from orders where orderid between 1 and 250;
Select * from orders where orderid between 251 and 500;
Select * from orders where orderid between 501 and 750;
Select * from orders where orderid between 751 and 1000;
$ sqoop import --connect jdbc:mysql://servername:####/databasename \
> --username sqoop \
> --password sqoop \ [or] -P \
> --table employee \
> --target-dir /user/shiva/mydata/employee \
default is user's home hdfs:/user/username/
Absolute path /user/shiva/mydata/employee
Relative path mydata/employee - in relation to the running user
> --hive-targe
> -m <integer>
# map tasks to be used to import data.
Ex: -m 1.
Default is 4 map tasks
>
Using Sqoop to import data from SQL server
sqoop list-databases --connect jdbc:sqlserver://<IP Address>:<Port Number> --username <Username> --password <Password>
Sqoop in script, especially when password has special chars thats not allowing to run it via bash
#!/bin/bash
## in this case, ! char in password will not let this run via bash
export sqlServerConnectString="jdbc:sqlserver://sqlservername:1433;database=DBName;user=username;password=pass!word;"
sqoop export -m 1 --connect "${sqlServerConnectString}" --table metric_save --export-dir /app/shiva/testsqoop_inputrecords.tsv
exit 0
sqoop list-tables
$ sqoop list-tables --connect jdbc:mysql://database.example.com/corp
employees
payroll_checks
job_descriptions
office_supplies