Practical Apache Lucene 8 : uncover the search capabilities of your application /

Saved in:
Bibliographic Details
Author / Creator:Sharma, Atri, author.
Imprint:[Berkeley, CA] : Apress, [2020]
Description:1 online resource
Language:English
Subject:
Format: E-Resource Book
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/12608482
Hidden Bibliographic Details
ISBN:9781484263457
1484263456
1484263448
9781484263440
Digital file characteristics:text file PDF
Notes:Includes index.
Description based on online resource; title from digital title page (viewed on January 11, 2021).
Print version record.
Summary:Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications. Starting with the basics of Lucene and searching, you will learn about the types of queries used in it and also take a look at scoring models. Applying this basic knowledge, you will develop a hello world app using basic Lucene queries and explore functions like scoring and document level boosting. Along the way you will also uncover the concepts of partial searching and matching in Lucene and then learn how to integrate geographical information (geospatial data) in Lucene using spatial queries and n-dimensional indexing. This will prepare you to build a location-aware search engine with a representative data set that allows location constraints to be specified during a search. You'll also develop a text classifier using Lucene and Apache Mahout, a popular machine learning framework. After a detailed review of performance bench-marking and common issues associated with it, you'll learn some of the best practices of tuning the performance of your application. By the end of the book you'll be able to build your first Lucene patch, where you will not only write your patch, but also test it and ensure it adheres to community coding standards. You will: Master the basics of Apache Lucene. Utilize different query types in Apache Lucene. Explore scoring and document level boosting. Integrate geospatial data into your application.
Other form:Print version: Sharma, Atri. Practical Apache Lucene 8. [Berkeley, CA] : Apress, [2020] 9781484263457
Standard no.:10.1007/978-1-4842-6345-7
Table of Contents:
  • Intro
  • Table of Contents
  • About the Author
  • About the Technical Reviewer
  • Acknowledgments
  • Introduction
  • Chapter 1: Hola, Lucene!
  • Key Features of Lucene
  • Information Retrieval Basics
  • Linear Scan
  • Stop List
  • Stemming
  • Term
  • Term-Document Incidence Matrix
  • Serving Queries Using a Term-Document Incidence Matrix
  • Basic Terminology
  • Heart of Lucene's Data Representation
  • Lucene's Inverted Index Structure
  • On-Disk Representation of a Lucene Index
  • Terms Dictionary
  • Frequencies File
  • Positions File
  • Queries on Lucene
  • Structure of a Lucene Query
  • Fields
  • Types of Queries in Lucene
  • Lucene vs. Relational Databases
  • Chapter 2: Hello World: The Lucene Way
  • Indexing Data in Lucene
  • Document
  • Analyzers
  • StandardAnalyzer
  • StopAnalyzer
  • SimpleAnalyzer
  • IndexWriter
  • Directory
  • Create Documents
  • Create Index and Write Documents
  • Adding Data to the Index
  • Bringing It All Together
  • TestClass
  • Document Search
  • QueryParser
  • TopDocs
  • IndexSearcher
  • IndexReader
  • Searching
  • Boolean Model
  • What Is Relevance?
  • Scoring Algorithms
  • TF/IDF
  • Vector Space Model
  • Scoring Example
  • Lucene's Scoring Model
  • Fields
  • Similarity
  • Boosting
  • Collectors
  • Chapter 3: Core Search Fundamentals
  • Codecs
  • DocValues
  • Phrase Queries
  • Term Vectors
  • BooleanQuery
  • MultiTermQuery
  • QueryCache
  • Scorer as Part of the Search Process
  • Chapter 4: Spatial Indexing
  • Spatial Module
  • What Are Geohashes?
  • Quad Trees
  • K-D Trees
  • BKD Trees
  • Using Spatial Indexing
  • Chapter 5: Location-Aware Search Engines
  • Why Use a Search Engine for Geographic Searches?
  • Range Queries
  • Function Queries
  • Geospatial Basics
  • Representing Spatial Data
  • Tiered Design for Storage
  • Geohashes
  • Spatial Data with Text Search
  • Distance Calculations
  • Bounding Box Filter
  • A Point on Distance Calculation
  • Chapter 6: Introducing Machine Learning with Apache Mahout
  • Origin of Apache Mahout
  • Why Apache Mahout?
  • Introduction to Machine Learning
  • Learning
  • Collaborative Filtering
  • Clustering
  • Categorization
  • Converting from Lucene Components to Mahout Components
  • Integrating Lucene with Mahout
  • lucene.vector
  • Lucene2seq
  • Java Version of Lucene2seq
  • Putting It All Together
  • Chapter 7: Improving Lucene's Performance
  • Increase Indexing Speed
  • Reuse Field Instances
  • The Curious Case of Large Commits
  • Reuse Tokens in Analyzers
  • Tuning Flush Intervals
  • Increase mergeFactor
  • Choosing the Correct Analyzers
  • Use Multiple Threads with One IndexWriter
  • Index into Separate Indexes and Then Merge
  • Improve Search Performance
  • Use the Latest Version of Lucene
  • Use IndexReader with the readOnly Attribute Equal to True
  • Use MMapDirectory/NIOFSDirectory
  • Decrease mergeFactor
  • Ignore First Query's Performance
  • Avoid Reopening IndexSearcher Instances
  • Share IndexSearcher Instances