Shiva's Weblog

Various things I am putting up for my reference. Hope its useful to you as well. | Tech weblog |  Lyrics


By Shiva Challa Shiva's Weblog

Friday, January 09, 2015

Apache Pig

1. to start pig interactive shell
  $ pig ↵
  grunt> quit; ↵<-- to exit

2. to run a pig script
  pig scriptname.pig ↵  

3. define an <alias> 
   grunt> <alias> = LOAD '/hdfs/data/file/path'; ↵

4. get detail of an existing <alias>
   grunt> DESCRIBE <alias>; ↵

5. print the contents of an alias
   grunt> DUMP <alias>

6. limit the size of a dataset
 grunt> alias = LIMIT <alias> <number>;

7. store date in HDFS
grunt> STORE <alias> INTO '<hdfs/data/file/path>' USING PigStorage('<delimiter>');

8. Filter dataset using expressions
  B = FILTER A BY <expression>;

9. Group data by column
grunt> <alias> = GROUP <alias> BY <columnname>;
grunt> <alias> = GROUP <alias> BY (<columnname>, columnname2>,...<columnname n>);

10. apply condition to each record on a column
 grunt> alias = FOREACH <alias> GENERATE <expressioned column>

Ads by Google

Made with CityDesk