java - Class Main$MapClass not found on EMR -
i trying run map-reduce job on emr (amazon) after checked on local computer , i'm getting error:
error: java.lang.runtimeexception: java.lang.classnotfoundexception: class main$mapclass not found @ org.apache.hadoop.conf.configuration.getclass(configuration.java:1720) @ org.apache.hadoop.mapreduce.task.jobcontextimpl.getmapperclass(jobcontextimpl.java:186) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:733) @ org.apache.hadoop.mapred.maptask.run(maptask.java:342) @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:162) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1491) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:157) caused by: java.lang.classnotfoundexception: class main$mapclass not found @ org.apache.hadoop.conf.configuration.getclassbyname(configuration.java:1626) @ org.apache.hadoop.conf.configuration.getclass(configuration.java:1718) ... 8 more
this main function define job configuration:
public static void main(string[] args) throws exception { string inputlocation; string outputlocation; string includestopwords; if (args.length > 2) { inputlocation = args[0]; outputlocation = args[1]; includestopwords = args[2]; }else{ for(int i=0; < args.length; i++){ system.out.println("missing args!!\n" + "number of args: "+ args.length+ "\n args[" + i+ "]:" + args[i]); } throw new illegalargumentexception(); } // first job - count 2 grams words decade configuration conf = new configuration(); conf.set("includestopwords", includestopwords); @suppresswarnings("deprecation") job job = new job(conf, "words count"); system.out.println("before set classes:"); job.setjarbyclass(main.class); job.setmapperclass(mapclass.class); job.setreducerclass(reduceclass.class); system.out.println("after setting classes."); job.setoutputkeyclass(text.class); job.setoutputvalueclass(text.class); job.setmapoutputkeyclass(text.class); job.setmapoutputvalueclass(text.class); // job.setinputformatclass(sequencefileinputformat.class); fileinputformat.addinputpath(job, new path(inputlocation)); fileoutputformat.setoutputpath(job, new path(outputlocation)); system.out.println("before wait complition"); system.exit(job.waitforcompletion(true) ? 0 : 1); system.out.println("after wait completion"); }
the code of runner on emr is:
public class runner { public static logger logger = logmanager.getrootlogger(); public static void main(string[] args) throws ioexception { string minpmi; string relminpmi; string language; string includestopwords; if(args.length > 3){ minpmi = args[0]; relminpmi = args[1]; language = args[2]; includestopwords = args[3]; }else{ system.out.println("missing arguments!"); throw new illegalargumentexception(); } //jobs output locations string firstoutput = "s3n://dsp152ass2/outputs/first"; string secondoutput = "s3n://dsp152ass2/outputs/second"; string thirdoutput = "s3n://dsp152ass2/outputs/third"; //jobs jar location string firstjobjar = "s3n://dsp152ass2/jars/firstjob.jar"; string secondjobjar = "s3n://dsp152ass2/jars/secondjob.jar"; string thirdjobjar = "s3n://dsp152ass2/jars/thirdjob.jar"; //select input corpus language argument string corpus = "s3n://dsp152/output/eng-us-all-100k-2gram"; //todo: change real input if(language.equalsignorecase("heb")){ corpus = "s3n://dsp152/output/heb-all-100k-2gram"; } //create emr awscredentials credentials = new propertiescredentials(new fileinputstream(new file("credentials.properties"))); amazonelasticmapreduce mapreduce = new amazonelasticmapreduceclient(credentials); //define hadoop steps config hadoopjarstepconfig firstjobconfing = new hadoopjarstepconfig() .withjar(firstjobjar) //.withmainclass("firstmr.main") // sec runner .withargs(corpus, firstoutput , includestopwords); hadoopjarstepconfig secondjobconfing = new hadoopjarstepconfig() .withjar(secondjobjar) // .withmainclass("main") .withargs(firstoutput +"/part-r-00000" , secondoutput); hadoopjarstepconfig thirdjobconfing = new hadoopjarstepconfig() .withjar(thirdjobjar) //.withmainclass("main") .withargs(secondoutput+"/part-r-00000", thirdoutput , minpmi, relminpmi); //define step config stepconfig firstjobstep = new stepconfig() .withname("firstjobstep") .withhadoopjarstep(firstjobconfing) .withactiononfailure("terminate_job_flow"); stepconfig secondjobstep = new stepconfig() .withname("secondjobstep") .withhadoopjarstep(secondjobconfing) .withactiononfailure("terminate_job_flow"); stepconfig thirdjobstep = new stepconfig() .withname("thirdjobstep") .withhadoopjarstep(thirdjobconfing) .withactiononfailure("terminate_job_flow"); //define job flow jobflowinstancesconfig instances = new jobflowinstancesconfig() .withinstancecount(1) //todo: change 2 - 10 .withmasterinstancetype(instancetype.m1large.tostring()) .withslaveinstancetype(instancetype.m1large.tostring()) .withhadoopversion("2.2.0").withec2keyname("dsp152ass2") .withkeepjobflowalivewhennosteps(false) .withplacement(new placementtype("us-east-1b")); //define run flow runjobflowrequest runflowrequest = new runjobflowrequest() .withname("dspextractcollections") .withinstances(instances) .withjobflowrole("emr_ec2_defaultrole") .withservicerole("emr_defaultrole") .withsteps(firstjobstep,secondjobstep,thirdjobstep) .withloguri("s3n://dsp152ass2/logs/"); //run jobs runjobflowresult runjobflowresult = mapreduce.runjobflow(runflowrequest); string jobflowid = runjobflowresult.getjobflowid(); system.out.println("### workflow added: \n" + "\t" + jobflowid); }
}
my project structure:
thanks in advance help.
these steps can solve resolution problem:
- create separate
mapper
,reducer
classes. - create package name classes , use , not default package (
foo.main
,foo.mapclass
...). - when you're in eclipse, try "extract required libraries generated jar" option, instead of "package required libraries generated jar". may solve classes not being found (make sure copy eclipse generated).
tip: emr has specific settings not running local (and pseudo-distributed) deployment. make sure correct following aws's guide.
Comments
Post a Comment