影片
- 05:35 Lucene Architecture
- 05:50 Indexing
- 06:15 Analysis: 找出 term, token
- 06:38 Storage
- 08:45 Lucene Indexing 的細節
術語
(來自Lucene: The Good Parts)
- document: a record; the unit of search; the thing returned as search results (“not a row”)
- field: a typed slot in a document for storing and indexing values (“not a column”)
- index: a collection of documents, typically with the same schema (“not a table”)
- corpus: the entire set of documents in an index
- inverted index: internal data structure that maps terms to documents by ID
- term: value extracted from source document, used for building the inverted index
- vocabulary: the full set of distinct terms in a corpus
- uninverted index: aka “field data”, array of all field values per field, in document order
- doc values: alternative way of storing the uninverted index on-disk (Lucene-specific)
文章連結