Apache Solr Beginner's Guide.
Saved in:
Author / Creator: | Serafini, Alfredo. |
---|---|
Imprint: | Birmingham : Packt Publishing, 2013. |
Description: | 1 online resource (324 pages) |
Language: | English |
Series: | Baker & Taylor Books (Firm). Axis 360. |
Subject: | Open source software. Search engines -- Programming. Web search engines. LANGUAGE ARTS & DISCIPLINES -- Library & Information Science -- General. Open source software. Search engines -- Programming. Web search engines. Electronic books. Electronic books. |
Format: | E-Resource Book |
URL for this record: | http://pi.lib.uchicago.edu/1001/cat/bib/11215973 |
Table of Contents:
- Cover; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Ready with the Essentials; Understanding Solr; Learning the powerful aspects of Solr; Working with Java installation; Downloading and installing Java; Configuring CLASSPATH and PATH variables for Java; Installing and testing Solr; Time for action
- starting Solr for the first time; Taking a glance at the Solr interface; Time for action
- posting some example data; Time for action
- testing Solr with cURL; Who uses Solr?; Resources on Solr.
- How will we use Solr?Summary; Chapter 2 Indexing with Local PDF Files; Understanding and using an index; Posting example documents to the first Solr core; Analyzing the elements we need in Solr core; Time for action
- configuring Solr Home and Solr core discovery; Knowing the legacy solr.xml format; Time for action
- writing a simple solrconfig.xml file; Time for action
- writing a simple schema.xml file; Time for action
- starting the new core; Time for action
- defining an example document; Time for action
- indexing an example document with cURL; Executing the first search on the new core.
- Adding documents to the index from the web UITime for action
- updating an existing document; Time for action
- cleaning an index; Creating an index prototype from PDF files; Time for action
- defining the schema.xml file with only dynamic fields and tokenization; Time for action
- writing a simple solrconfig.xml file with an update handler; Testing the PDF file core with dummy data and an example query; Defining a new tokenized field for fulltext; Time for action
- using Tika and cURL to extract text from PDFs; Using cURL to index some PDF data.
- Time for action
- finding copies of the same files with deduplicationTime for action
- looking inside an index with SimpleTextCodec; Understanding the structure of an inverted index; Understanding how optimization affects the segments of an index; Writing the full configuration for our PDF index example; Writing the solrconfig.xml file; Writing the schema.xml file; Summarizing some easy recipes for the maintenance of an index; Summary; Chapter 3: Indexing Example Data from DBPedia
- Paintings; Harvesting paintings' data from DBPedia; Analyzing the entities that we want to index.
- Analyzing the first entity
- PaintingWriting Solr core configurations for the first tests; Time for action
- defining the basic solrconfig.xml file; Looking at the differences between commits and soft commits; Time for action
- defining the simple schema.xml file; Introducing analyzers, tokenizers, and filters; Thinking fields for atomic updates; Indexing a test entity with JSON; Understanding the update chain; Using the atomic update; Understanding how optimistic concurrency works; Time for action
- listing all the fields with the CSV output; Defining a new Solr core for our Painting entity.
- Time for action
- refactoring the schema.xml file for the paintings core by introducing tokenization and stop words.