...

Big Data - Apache Pig

Back to Course

Lesson Description


Lession - #470 Apache pig GROUP Operator


The Apache Pig GROUP operator is utilized to group the information in at least one relations. It groups the tuples that contain a similar group key. In the event that the group key has more than one field, it treats as tuple any other way it will be the same type as that of the group key.

The syntax of Group operator is shown below:


 
 grunt> Group_data = GROUP Relation_name BY column_name;
 


Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.

1 aaa 74385738 delhi
2 bbb 76349948 mumbai
3 ddd 87493589 pune
4 ggg 74824727 goa
5 hhh 74843847 pune
6 uuu 76347242 delhi


Writing GROUP Operator


 group_data = GROUP student_details by city;



Output



(delhi,{(1 aaa 74385738 delhi,6 uuu 76347242 delhi>
}>
(mumbai,{(2 bbb 76349948 mumbai>
}>
(pune,{(3 ddd 87493589 pune,5 hhh 74843847 pune>
}>
(goa,{(4 ggg 74824727 goa>
}>


Grouping by multiple columns



 group_data = GROUP student_details by (city,name>
;


Output



(aaa,delhi>
,{(1 aaa 74385738 delhi>
}>
(bbb,mumbai>
,{(2 bbb 76349948 mumbai>
}>
(ddd,pune>
,{(3 ddd 87493589 pune>
}>
(ggg,goa>
,{(4 ggg 74824727 goa>
}>
(hhh,pune>
,{(5 hhh 74843847 pune>
}>
(uuu,delhi>
,{(6 uuu 76347242 delhi>
}>