edu.stanford.nlp.parser.lexparser (Stanford JavaNLP API)
Popularity Report
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
Bookmark History
Public Sticky notes
For Chinese, the package includes two simple word segmenters. One is a
lexicon-based maximum match segmenter, and the other uses the parser to
do Hidden Markov Model-based word segmentation. These segmentation
methods are okay, but if you would like a high quality segmentation of
Chinese text, you will have to segment the Chinese by yourself as a
preprocessing step. The supplied grammars assume that
Chinese input has already been word-segmented according to Penn
Chinese Treebank conventions. Choosing
Chinese with
-tLPP
edu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams
makes space-separated words the default tokenization.
To do word segmentation within the parser, give one of the options
-segmentMarkov or -segmentMaxMatch.
Highlighted by harryli


Public Comment