challa.net | Tech weblog | Lyrics
Apache Pig
1. to start pig interactive shell
$ pig ↵
grunt> quit; ↵<-- to exit
2. to run a pig script
pig scriptname.pig ↵
3. define an <alias>
grunt> <alias> = LOAD '/hdfs/data/file/path'; ↵
4. get detail of an existing <alias>
grunt> DESCRIBE <alias>; ↵
5. print the contents of an alias
grunt> DUMP <alias>
6. limit the size of a dataset
grunt> alias = LIMIT <alias> <number>;
7. store date in HDFS
grunt> STORE <alias> INTO '<hdfs/data/file/path>' USING PigStorage('<delimiter>');
8. Filter dataset using expressions
B = FILTER A BY <expression>;
9. Group data by column
grunt> <alias> = GROUP <alias> BY <columnname>;
grunt> <alias> = GROUP <alias> BY (<columnname>, columnname2>,...<columnname n>);
10. apply condition to each record on a column
grunt> alias = FOREACH <alias> GENERATE <expressioned column>