java - Hadoop's Hive/Pig, HDFS and MapReduce relationship -


my understanding of apache hive sql-like tooling layer querying hadoop clusters. understanding of apache pig procedural language querying hadoop clusters. so, if understanding correct, hive , pig seem 2 different ways of solving same problem.

my problem, however, don't understand problem both solving in first place!

say have db (relational, nosql, doesn't matter) feeds data hdfs particular mapreduce job can run against input data:

enter image description here

i'm confused system hive/pig querying! querying database? querying raw input data stored in datanodes on hdfs? running little ad hoc, on-the-fly mr jobs , reporting results/outputs?

what relationship between these query tools, mr job input data stored on hdfs, , mr job itself?

apache pig , apache hive load data hdfs unless run locally, in case load locally. how data db? not. need other framework export data in traditional db hdfs, such sqoop.

once have data in hdfs, can start working pig , hive. never query db. in apache pig, example, load data using pig loader:

a = load 'path/in/your/hdfs' using pigstorage('\t'); 

as hive, need create table , load data table:

load data inpath 'path/in/your/hdfs/your.csv' table t1; 

again, data must in hdfs.

as how works, depends. traditionally has worked mapreduce execution engine. both hive , pig parse statements write in piglatin or hiveql , translate execution plan consisting of number of mapreduce jobs, depending on plan. however, can translate tez, new execution engine perhaps new work correctly.

why need of pig or hive? well, don't need these frameworks. can do, can writing own mapreduce or tez jobs. however, writing instance join operation in mapreduce might take hundreds or thousands of lines of code (really), while 1 single line of code in pig or hive.


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -