challa.net | Tech weblog | Lyrics
HBase
======
What is HBase?
HBase is non-relational, columnar, Key-Value based, NoSQL distributed datastore (or database). Its Apache's implementation of Google's BigTable; as part of Hadoop.
It achieves redundancy and fault-tolerance by storing the data in HDFS (Hadoop Distributed File System).
NOTE:
» All table names have to be enclosed in single quotes
» Case sensitive
» No semicolon is needed
Starting the shell
$ hbase shell [enter]
hbase(main):001:0>
Get a list of tables
hbase(main):001:0> list [enter]
TABLE
TABLENAME1
TABLENAME2
TABLENAME3
Create a table
syntax: create 'table1', 'columnfamily1'
create 'TABLENAME1', 'CF1'
Disable a table:
A table needs to be disabled before you can to and DDL operations.
disable 'TABLENAME1'
Enable a table:
enable 'TABLENAME1'
Scan table:
scan 'TABLENAME1'
scan 'TABLENAME1', LIMIT => 1
take and restore a snapshot:
disable 'TABLENAME1'
snapshot 'TABLENAME1', 'TABLENAME1_SNAPSHOT'
clone_snapshot 'TABLENAME1_SNAPSHOT' 'TABLENAME1_CLONETABLE'
delete_snapshot 'TABLENAME1_SNAPSHOT'
enable'TABLENAME1'
Getting record counts for a hbase tables:
hbase> count 'TABLENAME1' <qt; This will print record count for every 1000 rows.
hbase> count 'TABLENAME1', INTERVAL => 100000 <qt; This will print record count for every 100,000 rows.
hbase> count 'TABLENAME1', CACHE => 1000 <qt; default cache size is 10 rows, If the row length is small, you can increase this optional value.
hbase> count 'TABLENAME1', INTERVAL => 10, CACHE => 1000
Counts using Map/Reduce:
echo [$(date +%Y%m%d_%H%M%S)][ count ] TABLENAME1; hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'TABLENAME1';