Class CompletePreprocessingPipeline

java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.text.preprocessing.CompletePreprocessingPipeline
All Implemented Interfaces:
AcceptingVisitor, ContextPreprocessor

public class CompletePreprocessingPipeline extends AttrComposite implements ContextPreprocessor
Performs a complete preprocessing on the provided documents. The preprocessing consists of the following steps:
  1. InputTokenizer
  2. CaseNormalizer
  3. LanguageModelStemmer
  4. StopListMarker
  5. PhraseExtractor
  6. LabelFilterProcessor
  7. DocumentAssigner
  • Field Details

    • wordDfThreshold

      public final AttrInteger wordDfThreshold
      Word Document Frequency cut-off threshold. Words appearing in fewer than wordDfThreshold documents will be ignored.
    • phraseDfThreshold

      public final AttrInteger phraseDfThreshold
      Phrase Document Frequency cut-off threshold. Phrases appearing in fewer than phraseDfThreshold documents will be ignored.
    • labelFilters

      public LabelFilterProcessor labelFilters
      Label filtering is a composite of individual filters.
    • documentAssigner

      public DocumentAssigner documentAssigner
      Document assigner used by the algorithm, contains modifiable parameters..
    • caseNormalizer

      protected final org.carrot2.text.preprocessing.CaseNormalizer caseNormalizer
      Case normalizer used by the algorithm.
    • stemming

      protected final org.carrot2.text.preprocessing.LanguageModelStemmer stemming
      Stemmer used by the algorithm.
    • stopListMarker

      protected final org.carrot2.text.preprocessing.StopListMarker stopListMarker
      Stop list marker used by the algorithm, contains modifiable parameters..
    • tokenizer

      protected final org.carrot2.text.preprocessing.InputTokenizer tokenizer
      Tokenizer used by the algorithm.
  • Constructor Details

    • CompletePreprocessingPipeline

      public CompletePreprocessingPipeline()
  • Method Details