These instructions enable an administrator to configure the LabKey NLP pipeline so that tsv source files can be run through the NLP engine provided with LabKey Server. Once the administrator has properly configured the pipeline and server, any number of users can process tsv files through one or more versions of the NLP engine using the instructions here.

Install Required Components

Install python (2.7.9)

The NLP engine will not run under python 3. If possible, there should be only one version of python installed. If you require multiple versions, it is possible to configure the LabKey NLP pipeline accordingly, but that is not covered in this topic.

  • Download python 2.7.9 from https://www.python.org/download/
  • Double click the .msi file to begin the install. Accept the wizard defaults, confirm that pip will be installed as shown below. Choose to automatically add python.exe to the system path on this screen by selecting the install option from the circled pulldown menu.
  • When the installation is complete, click Finish.
  • By default, python is installed on windows in C:/Python27/
  • Confirm that python was correctly added to your path by opening a command shell and typing "python -V" using a capital V. The version will be displayed.

Install the NumPy package (1.8.x)

NumPy is a package for scientific computation with Python. Learn more here: http://www.numpy.org/

  • For Windows, download a pre-complied whl file for NumPy from: http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
  • The whl you select must match the python version you downloaded (for 2.7.9 select "cp27") as well as the bit-width (32 vs 64) of your system.
    • To confirm your bit-width, open the Windows Control Panel, select System and Security, then select System. The system type is shown about mid page.
    • For instance, if running 64-bit windows, you would download: numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl
  • Move the downloaded package to the scripts directory under where python was installed. By default, C:/Python27/Scripts/
  • A bug in pip requires that you rename the downloaded package, replacing "win_amd64" with "any".
  • In a command shell, navigate to that same Scripts directory and run:
pip install numpy‑1.9.2+mkl‑cp27‑none‑any.whl

Install nltk package and "book" data collection

The Natural Language Toolkit (NLTK) provides support for natural language processing.

  • Update to the latest pip and setuptools by running:
    • On Windows:
      python -m pip install -U pip setuptools
    • On Linux: Instructions for using the package manager are available here
  • Install nltk by running:
    • On Windows:
      python -m pip install -U nltk
    • On Mac/Linux:
      sudo pip install -U nltk
  • Test the installation by running python, then typing:
    import nltk
    .
  • If no errors are reported, your install was successful.

Next install the "book" data collection:

  • On Windows, in Python:
    • Type:
      nltk.download()
    • A GUI will open, select the "book" identifier and download to "C:
      nltk_data"
  • On Mac/Linux, run:
    sudo python -m nltk.downloader -d /usr/local/share/nltk_data book

Install python-crfsuite

Install python-crfsuite, version 0.8.4 or later, which is a python binding to CRFsuite.

  • On Windows, first install the Microsoft Visual C++ compiler for Python. Download and install instructions can be found here. Then run:
    python -m pip install -U python-crfsuite
  • On Mac or Linux, run:
    sudo pip install -U python-crfsuite

Install the LabKey distribution

Install the LabKey distribution. Complete instructions can be found here. The location where you install your LabKey distribution is referred to in this topic as ${LABKEY_INSTALLDIR}.

Configure the NLP pipeline

The LabKey distribution already contains an NLP engine, located in:

${LABKEY_INSTALLDIR}\bin\nlp

If you want to be able to use one or more NLP engines installed elsewhere, an administrator may configure the server to use that alternate location. For example, if you want to use an engine located here:

C:\alternateLocation\nlp

Direct the pipeline to look first in that alternate location by adding it to the Pipeline tools path:

  • Select Admin > Site > Admin Console.
  • Click Site Settings.
  • The Pipeline tools field contains a semicolon separated list of paths the server will use to locate tools including the NLP engine. By default the path is "${LABKEY_INSTALLDIR}\bin" (in this screenshot, "C:\labkey\labkey\bin")
  • Add the location of the alternate NLP directory to the front of the Pipeline tools list of paths.
    • For example, to use an engine in "C:\alternateLocation\nlp", add "C:\alternateLocation;" as shown here:
  • Click Save.
  • No server restart is required when adding a single alternate NLP engine location.

Configure to use Multiple Engine Versions

You may also make multiple versions of the NLP engine available on your LabKey Server simultaneously. Each user would then configure their workspace folder to use a different version of the engine. The process for doing so involves additional steps, including a server restart to enable the use of multiple engines. Once configured, no restarting will be needed to update or add additional engines.

  • Download the nlpConfig.xml file.
  • Select or create a location for config files. For example, "C:\labkey\configs, and place nlpConfig.xml in it.
  • The LabKey Server configuration file, named labkey.xml by default, or ROOT.xml in production servers, is typically located in a directory like [TOMCAT_HOME]\conf\Catalina\localhost. This file must be edited to point to the alternate config location.
  • Open it for editing, and locate the pipeline configuration line, which will look something like this:
<!-- Pipeline configuration -->
<!--@@pipeline@@ <Parameter name="org.labkey.api.pipeline.config" value="@@pipelineConfigPath@@"/> @@pipeline@@-->
  • Uncomment and edit to point to the location of nlpConfig.xml, in our example, "C:\labkey\configs". The edited line will look something like this:
<!-- Pipeline configuration -->
<Parameter name="org.labkey.api.pipeline.config" value="C:\labkey\configs"/>
    • Save.
  • Restart your LabKey Server.

Multiple alternate NLP engine versions should be placed in a directory structure one directory level down from the "nlp" directory where you would place a single engine. The person installing these engines must have write access to this location in the file system, but does not need to be the LabKey Server administrator. The directory names here will be used as 'versions' when you import, so it is good practice to include the version in the name, for example:

C:\alternateLocation\nlp\engineVersion1
C:\alternateLocation\nlp\engineVersion2

Related Topics

Discussion

previousnext
 
expand all collapse all