影片

  • 05:35 Lucene Architecture
    • 05:50 Indexing
    • 06:15 Analysis: 找出 term, token
    • 06:38 Storage
  • 08:45 Lucene Indexing 的細節

術語

(來自Lucene: The Good Parts)

  • document: a record; the unit of search; the thing returned as search results (“not a row”)
  • field: a typed slot in a document for storing and indexing values (“not a column”)
  • index: a collection of documents, typically with the same schema (“not a table”)
  • corpus: the entire set of documents in an index
  • inverted index: internal data structure that maps terms to documents by ID
  • term: value extracted from source document, used for building the inverted index
  • vocabulary: the full set of distinct terms in a corpus
  • uninverted index: aka “field data”, array of all field values per field, in document order
  • doc values: alternative way of storing the uninverted index on-disk (Lucene-specific)

文章連結