An administrator can configure the location of the full-text search index and related settings. Start and pause the crawler and even delete and reindex the entire system if necessary. The Audit Log captures both the searches performed by users and any activities like reindexing.
Full-Text Search Configuration
The Full-Text Search Configuration page allows you to configure the index and review statistics about your index.
- Go to (Admin) > Site > Admin Console.
- In the Management section, click Full-text search.
Index Configuration
Setting the Path. You can change the directory that stores the index by entering a new path and clicking
Set Path. The default location is:
<LABKEY_HOME>/files/@labkey_full_text_index
- The path can include substitution tokens, such as the server GUID or version.
- Hover over the ? to see a full list of available tokens and usage examples.
- You'll see what the path would resolve to (i.e. current values of any substitution tokens will be shown).
- Specifying a non-default index path and/or using substitution tokens is especially useful if you are running multiple LabKey deployments on the same machine, because it allows each LabKey deployment to use a unique index.
- Changing the location of the index requires re-indexing all data, which may affect performance.
- When the index is created, the server GUID is written into it so that on later restarts, the server can detect whether the index at the provided path was created for itself or for another server. If the GUID does not match, the server will delete and reindex.
- Developers who want to avoid this reindexing on a local machine can do so by using substitution tokens in the index path.
- Be sure to place your @labkey_full_text_index on a drive with sufficient space.
- We recommend never placing the search index on an NFS drive or AWS EFS. Learn more here and here.
Other actions and settings on this page include:
- Start/Pause Crawler. The crawler, or document indexer, continually inventories your site when running. You might pause it to diagnose issues with memory or performance.
- Delete Index. You can delete the entire index for your server. Please do this with caution because rebuilding the index can be slow.
- Directory Type. This setting lets you can change the search indexing directory type. The setting marked "Default (MMapDirectory)" allows the underlying search library to choose the directory implementation (based on operating system and 32-bit vs. 64-bit). The other options override the default heuristic and hard-code a specific directory implementation. These are provided in case the "Default" setting causes problems on a specific deployment. Use the default type unless you see a problem with search. Contact LabKey for assistance if full-text indexing or searching seems to have difficulty with the default setting.
- Indexed File Size Limit: The default setting is 100 MB. Files larger than the limit set on this page will not be indexed. You can change this cap, but this is generally not recommended. Increasing it will result in additional memory usage; if you increase beyond 100MB, we recommend you also increase your heap size to be 4GB or larger. The size of xlsx files is limited to 1/5 the total file size limit set (i.e. defaults to 20MB).
There are a number of
startup properties related to the search index that can be set automatically during server start. Users of Premium Editions can learn more about them on the
admin console.
Index Statistics
This section provides information on the documents that have been indexed by the system, plus identifies the limits that have been set for the indexer by the LabKey team. These limits are used to manage index size and search precision. For example, you will see the "Maximum Size" of files that will be scanned by the indexer; the maximum size allows the system to avoid indexing exceptionally large files. Indexing large files will increase the index size without adding much (if any) value to searches.
Search Statistics
Lists the average time in milliseconds for each phase of searching the index, from creating the query to processing hits.
Audit Log
To see the search audit log:
- Go to (Admin) > Site > Admin Console.
- In the Management section, click Audit Log.
- Choose the Search option in the pulldown menu.
This displays the log of audited search events for your system. For example, you can see the terms entered by users in the search box. If someone has deleted your search index, this event will be displayed in the list, along with information on the user who ordered the delete.
Set Up Folder-Specific Search Boxes
By default, a site-wide search box is included in the LabKey header. You can add additional search boxes to individual projects or folders and choose how they are scoped.
- Add a Search web part to either column on the page.
- This search will only search the container where you created it.
- To also include subfolders, select Customize from the (triangle) menu and check the box to "Search subfolders".
As an example of a search box applied to a particular container, use the search box to the right of this page you are reading. It will search only the current folder (the LabKey documentation).
List and External Schema Metadata
By default, the search index includes metadata for lists and external schemas (including table names, table descriptions, column names, column labels, and column descriptions).
You can control indexing of List metadata when creating or editing a list definition under
Advanced List Settings > Search Indexing Options. Learn more here:
Edit a List DesignYou can turn off indexing of external schema metadata by unchecking the checkbox
Index Schema Meta Data when creating or editing an external schema definition. Learn more here:
External Schemas and Data Sources
Include/Exclude a Folder from Search
You may want to exclude the contents of certain folders from searches. For example, you may not want archived folders or work in progress to appear in search results.
To exclude the contents of a folder from searches:
- Navigate to the folder and select (Admin) > Folder > Management.
- Click the Search tab.
- Uncheck the checkbox Include this folder's contents in multi-folder search results.
- Click Save.
Note that this does not exclude the contents from
indexing, so searches that originate from that folder will still include its contents in the results. Searches from any other folder will not, even if they specify a site- or subfolder-inclusive scope.
Exclude a File/Directory from Search Indexing
LabKey generally indexes the file system directories associated with projects and folders, i.e. the contents of the @files and other filesets. Some file and directory patterns are ignored (skipped during indexing), including:
- Contents of directories named ".Trash" or ".svn"
- Files with names that start with "."
- Anything on a path that includes "no_crawl"
- Contents of any directory containing a file named ".nocrawl"
- On postgres contents of a directory containing a file named "PG_VERSION"
To exclude a file or the content of a folder from indexing, you may be able to employ one of the above conventions.
Export Index Contents
To export the contents of the search index:
- Go to (Admin) > Site > Admin Console.
- Under Management click Full-Text Search.
- At the bottom of the panel Index Statistics click Export Index Contents.
- You can export the index in either Excel or TXT file formats.
Troubleshoot Search Indexing
Search Index not Initialized
When the search index cannot be found or properly initialized, you may see an error similar to the following. The <ERROR> can vary here, for example "Module Upgrade : Error: Unable to initialize search index." or "SearchService:index : Error: Unable to initialize search index."
ERROR LuceneSearchServiceImpl <DATE> <ERROR> Unable to initialize search index. Search will be disabled and new documents will not be indexed for searching until this is corrected and the server is restarted.
Options for resolving this include:
- If you know the path is incorrect, and there is a path to where the .tip file resides provided, you can correct the Path to full-text search index and try again.
- You can also Delete the current index and start the crawler again, essentially re-indexing all of your data. This option may take time but will run in the background, so can complete while you do other work on the server.
Search Index Corrupted (write.lock file)
If the labkey-errors.log includes an error like:
org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: <FULL_PATH_TO>files@labkey_full_text_indexwrite.lock
This indicates the index was corrupted, possibly because of an interrupted upgrade/reindex or problem connecting to the underlying file system during a previous reindexing.
To resolve, first "Pause Crawler", then "Delete Index". Now restart the system and wait for it to come on line fully for users. Once it is back online, return to the admin dashboard to "Start Crawler", noting that this may take significant time to rebuild the index.
Threads Hang During Search
If you experience threads hanging during search operation, check to ensure that your search index is
NOT on an NFS filesystem or on AWS EFS. These filesystems should never be used for a full-text search index.
No Search Results When Expected
If you encounter unexpected search results, particularly a lack of any results when matches are known, you may need to rebuild the search index. This situation may occur in particular on a development machine where you routinely pause the crawler.
To rebuild the index, pause the crawler (if running), delete the index, and restart the crawler to and rebuild it.
Related Topics