Shiva's Weblog

Various things I am putting up for my reference. Hope its useful to you as well.

challa.net | Tech weblog |  Lyrics

Hadoop-HBase

By Shiva Challa Shiva's Weblog

Friday, January 09, 2015

HBase
====== What is HBase?
HBase is non-relational, columnar, Key-Value based, NoSQL distributed datastore (or database). Its Apache's implementation of Google's BigTable; as part of Hadoop.
It achieves redundancy and fault-tolerance by storing the data in HDFS (Hadoop Distributed File System).

NOTE:
» All table names have to be enclosed in single quotes
» Case sensitive
» No semicolon is needed

Starting the shell
$ hbase shell [enter]
hbase(main):001:0>

Get a list of tables
hbase(main):001:0> list [enter]
TABLE

TABLENAME1
TABLENAME2
TABLENAME3

Create a table

syntax: create 'table1', 'columnfamily1' create 'TABLENAME1', 'CF1'

 

Disable a table:
A table needs to be disabled before you can to and DDL operations. disable 'TABLENAME1'

Enable a table:
enable 'TABLENAME1'

Scan table: scan 'TABLENAME1'
scan 'TABLENAME1', LIMIT => 1

take and restore a snapshot: disable 'TABLENAME1'
snapshot 'TABLENAME1', 'TABLENAME1_SNAPSHOT'
clone_snapshot 'TABLENAME1_SNAPSHOT' 'TABLENAME1_CLONETABLE'
delete_snapshot 'TABLENAME1_SNAPSHOT'
enable'TABLENAME1'

Getting record counts for a hbase tables:
hbase> count 'TABLENAME1' <qt; This will print record count for every 1000 rows.
hbase> count 'TABLENAME1', INTERVAL => 100000 <qt; This will print record count for every 100,000 rows.
hbase> count 'TABLENAME1', CACHE => 1000 <qt; default cache size is 10 rows, If the row length is small, you can increase this optional value.
hbase> count 'TABLENAME1', INTERVAL => 10, CACHE => 1000

Counts using Map/Reduce:
echo [$(date +%Y%m%d_%H%M%S)][ count ] TABLENAME1; hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'TABLENAME1';

Ads by Google

Made with CityDesk