java - How to train an Italian language model in OpenNLP on Hadoop? -
i implement natural language processing algorithm on hadoop italian language
i have 2 questions;
- how can find stemming algorithm italian ?
- how integrate in hadoop?
here code
string pathsent=...tagged sentences...; string pathchunk=....chunked train path....; file filesent=new file(pathsent); file filechunk=new file(pathchunk); inputstream insent=null; inputstream inchunk=null; insent = new fileinputstream(filesent); inchunk = new fileinputstream(filechunk); posmodel posmodel=postaggerme.train("it", new wordtagsamplestream(( new inputstreamreader(insent))), modeltype.maxent, null, null, 3, 3); objectstream stringstream =new plaintextbylinestream(new inputstreamreader(inchunk)); objectstream chunkstream = new chunksamplestream(stringstream); chunkermodel chunkmodel=chunkerme.train("it",chunkstream ,1, 1); this.tagger= new postaggerme(posmodel); this.chunker=new chunkerme(chunkmodel); insent.close(); inchunk.close();
you need grammatical sentence engine:
"io voglio andare casa" io, sostantivo volere, verbo andare, verbo a, preposizione semplice casa, oggetto
when have sentence tagged can teach opennlp.
on hadoop create custom map
public class map extends mapper<longwritable, intwritable="" text,=""> { private final static intwritable 1 = new intwritable(1); private text word = new text(); @override public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { //your code here } }
on hadoop create custom reduce
public class reduce extends reducer<text, intwritable,="" intwritable="" text,=""> { @override protected void reduce( text key, java.lang.iterable<intwritable> values, org.apache.hadoop.mapreduce.reducer<text, intwritable,="" intwritable="" text,="">.context context) throws ioexception, interruptedexception { // reduce here } }
configure both
public static void main(string[] args) throws exception { configuration conf = new configuration(); job job = new job(conf, "opennlp"); job.setjarbyclass(customopennlp.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(textinputformat.class); job.setoutputformatclass(textoutputformat.class); fileinputformat.addinputpath(job, new path(args[0])); fileoutputformat.setoutputpath(job, new path(args[1])); job.waitforcompletion(true); }
Comments
Post a Comment