java - How to train an Italian language model in OpenNLP on Hadoop? -

- April 15, 2012

i implement natural language processing algorithm on hadoop italian language

i have 2 questions;

how can find stemming algorithm italian ?
how integrate in hadoop?

here code

string pathsent=...tagged sentences...; string pathchunk=....chunked train path....; file filesent=new file(pathsent); file filechunk=new file(pathchunk); inputstream insent=null; inputstream inchunk=null;  insent = new fileinputstream(filesent); inchunk = new fileinputstream(filechunk); posmodel posmodel=postaggerme.train("it", new wordtagsamplestream(( new inputstreamreader(insent))), modeltype.maxent, null, null, 3, 3);  objectstream stringstream =new plaintextbylinestream(new inputstreamreader(inchunk)); objectstream chunkstream = new chunksamplestream(stringstream); chunkermodel chunkmodel=chunkerme.train("it",chunkstream ,1, 1); this.tagger= new postaggerme(posmodel); this.chunker=new chunkerme(chunkmodel);   insent.close(); inchunk.close();

you need grammatical sentence engine:

"io voglio andare casa"  io, sostantivo volere, verbo andare, verbo a, preposizione semplice casa, oggetto

when have sentence tagged can teach opennlp.

on hadoop create custom map

 public class map extends mapper<longwritable,                             intwritable="" text,=""> {               private final static intwritable 1 =                            new intwritable(1);             private text word = new text();                @override  public void map(longwritable key, text value,                       context context)       throws ioexception, interruptedexception {              //your code here        }    }

on hadoop create custom reduce

public class reduce extends reducer<text,               intwritable,="" intwritable="" text,=""> {  @override  protected void reduce(    text key,    java.lang.iterable<intwritable> values,    org.apache.hadoop.mapreduce.reducer<text,            intwritable,="" intwritable="" text,="">.context context)    throws ioexception, interruptedexception {        // reduce here  } }

configure both

public static void main(string[] args)                       throws exception {   configuration conf = new configuration();    job job = new job(conf, "opennlp");   job.setjarbyclass(customopennlp.class);    job.setoutputkeyclass(text.class);   job.setoutputvalueclass(intwritable.class);    job.setmapperclass(map.class);   job.setreducerclass(reduce.class);    job.setinputformatclass(textinputformat.class);   job.setoutputformatclass(textoutputformat.class);    fileinputformat.addinputpath(job, new path(args[0]));   fileoutputformat.setoutputpath(job, new path(args[1]));    job.waitforcompletion(true); }

Search This Blog

harsh

java - How to train an Italian language model in OpenNLP on Hadoop? -

Comments

Post a Comment

Popular posts from this blog

Java 3D LWJGL collision -

spring - SubProtocolWebSocketHandler - No handlers -

methods - python can't use function in submodule -