Skip to main content

edu.stanford.nlp.parser.lexparser (Stanford JavaNLP API)

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

Bookmark History

Saved by 1 people (0 private), first by anonymouse user on 2009-06-08


Public Sticky notes

For Chinese, the package includes two simple word segmenters. One is a lexicon-based maximum match segmenter, and the other uses the parser to do Hidden Markov Model-based word segmentation. These segmentation methods are okay, but if you would like a high quality segmentation of Chinese text, you will have to segment the Chinese by yourself as a preprocessing step. The supplied grammars assume that Chinese input has already been word-segmented according to Penn Chinese Treebank conventions. Choosing Chinese with -tLPP edu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams makes space-separated words the default tokenization. To do word segmentation within the parser, give one of the options -segmentMarkov or -segmentMaxMatch.

Highlighted by harryli

Readers (1)