Document Abstraction

2024-04-16

Premium Feature — Available in the Enterprise Edition of LabKey Server. Learn more or contact LabKey.

Abstraction of information from clinical documents into tabular data can unearth a wealth of previously untapped data for integration and analysis, provided it is done efficiently and accurately. An NLP engine can automatically abstract information based on the type of document, and additional manual abstraction by one or more people using the process covered here can maximize information extraction.

Note that the term "Annotation" can be used in place of "Abstraction" in many places throughout the UI. Learn to customize terminology in this topic.

Abstraction Task List

Tasks have both a Report ID and a Batch Number, as well as an optional Identifier Name, all shown in the task list. The assigned user must have "Abstractor" permissions and will initiate a manual abstraction by clicking Abstract on any row in their task list.

The task list grid can be sorted and filtered as desired, and grid views saved for future use. After completion of a manual abstraction, the user will advance to the next document in the user's default view of the task list.

Abstraction UI Basics

The document abstraction UI is shown in two panels. Above the panels, the report number is prominently shown.

The imported text is shown on the right and can be scrolled and reviewed for key information. The left hand panel shows field entry panels into which information found in the text will be abstracted.

One set of subtables for results, shown in the above screenshot named "Specimen A", is made available. You can add more using the "Add another specimen" option described below. An admininstrator can also control the default name for the first subtable here. For instance, it could be "1" instead of "Specimen A".

The abstracted fields are organized in categories that may vary based on the document type. For example, pathology documents might have categories as shown above: Pathology, PathologyStageGrade, EngineReportInfo, PathologyFinding, NodePathFinding

Expand and contract field category sections by clicking the title bars or / icons. By default, the first category is expanded when the abstractor first opens the UI. Fields in the "Pathology" category include:

  • PathHistology
  • PathSpecimenType
  • Behavior
  • PathSite
  • Pathologist
If an automated abstraction pass was done prior to manual abstraction, pulldowns may be prepopulated with some information gathered by the abstraction (NLP) engine. In addition, if the disease group has been automatically identified, this can focus the set of values for each field offered to a manual abstractor. The type of document also drives some decisions about how to interpret parts of the text.

Populating Fields

Select a field by clicking the label; the selected row will show in yellow, as will any associated text highlights previously added for that field. Some fields allow free text entry, other fields use pulldowns offering a set of possible values.

Start typing in either type of field to narrow the menu of options, or keep typing to enter free text as appropriate. There are two types of fields with pulldown menus.

  • Open-class fields allow you to either select a listed value or enter a new value of your own.
  • Closed-class fields (marked with a ) require a selection of one of the values listed on the pulldown.

Multiple Value Fields

Fields supporting multiple entries allow you to click several pulldown menu using the shift or ctrl key when you click to add the new value instead of replacing the prior choice. The values show with a || (double pipe) separating them in the field value.

Conditional or Branching Logic Fields

When an abstractor selects a value for a field supporting conditional or branching logic, it may trigger additional fields or sections to be shown or hidden. This behavior helps the abstractor focus on what is relevant as they determine more about the case report.

For example, if the abstractor sees that an ALK test was reported, they can enter the method and result, but if the test was not reported, those fields remain grayed out and inaccessible.

In other cases, the set of fields offered may change outright based on the abstractor's selections for other fields. Learn more about how administrators configure such fields in this topic

Highlighting Text

The abstractor scans for relevant details in the text, selects or enters information in the field in the results section, and can highlight one or more relevant pieces of text on the right to accompany it.

If you highlight a string of text before entering a value for the active field, the selected text will be entered as the value if possible. For a free text field, the entry is automatic. For a field with a pulldown menu, if you highlight a string in the text that matches a value on the given menu, it will be selected. If you had previously entered a different value, however, that earlier selection takes precedence and is not superceded by later text highlighting. You may multi-select several regions of text for any given field result as needed.

In the following screenshot, several types of text highlighting are shown. When you click to select a field, the field and any associated highlights are colored yellow. If you double-click the field label, the text panel will be scrolled to place the first highlighted region within the visible window, typically three rows from the top. Shown selected here, the text "Positive for malignancy" was just linked to the active field Behavior with the value "Malignant". Also shown here, when you hover over the label or value for a field which is not active, in this case "PathHistology" the associated highlighted region(s) of text will be shown in green.

Text that has been highlighted for a field that is neither active (yellow) nor hovered-over (green) is shown in light blue. Click on any highlighting to activate the associated field and show both in yellow.

A given region of text can also be associated with multiple field results. The count of related fields is shown with the highlight region ("1 of 2", for example).

Unsaved changes are indicated by red corners on the entered fields. If you make a mistake or wish to remove highlighting on the right, click the 'x' attached to the highlight region.

Save Abstraction Work

Save work in progress any time by clicking Save Draft. If you leave the abstraction UI, you will still see the document as a task waiting to be completed, and see the message "Initial abstraction in progress". When you return to an abstraction in progress, you will see previous highlighting, selections, and can continue to review and abstract more of the document.

Once you have completed the abstraction of the entire document, you will click Submit to close your task and pass the document on for review, or if no review is selected, the document will be considered completed and approved.

When you submit the document, you will automatically advance to the next document assigned for you to abstract, according to the sort order established on your default view of your task list. There is no need to return to your task list explicitly to advance to the next task. The document you just completed will be shown as "Previously viewed" above the panels.

If you mistakenly submit abstraction results for a document too quickly, you can use the back button in your browser to return. Click Reopen to return it to an "abstraction in progress" status.

Abstraction Timer

If enabled, you can track metrics for how much time is spent actively abstracting the document, and separately time spent actively reviewing that abstraction. The timer is enabled/disabled based on which pipeline is used.

The abstraction timer is displayed above the document title and automatically starts when the assigned abstractor lands on the page. If the abstractor needs to step away or work on another task, they may pause the timer, then resume when they resume work on the document. As soon as the abstractor submits the document, the abstraction timer stops.

The reviewer's time on the same document is also tracked, beginning from zero. The total time spent in process for the document is the sum of these two times.

Note that the timer does not run when others, such as administrators, are viewing the document. It only applies to the edit-mode of active abstracting and reviewing.

Session Timeout Suspension

When abstracting a particularly lengthy or complicated document, or one requiring input from others, it is possible for a long period of time to elapse between interactions with the server. This could potentially result in the user’s session expiring, especially problematic as it can result in the loss of the values and highlights entered since the last save. To avoid this problem, the abstraction session in progress will keep itself alive by pinging the server with a lightweight "keepalive" function while the user continues interacting with the UI.

Multiple Specimens per Document

There may be information about multiple specimens in a single document. Each field results category can have multiple panels of fields, one for each specimen. You can also think of these groupings as subtables, and the default name for the first/default subtable can be configured by an administrator at the folder level. To add information for an additional specimen, open the relevant category in the field results panel, then click Add another specimen and select New Specimen from the menu.

Once you have defined multiple specimen groupings for the document, you will see a panel for each specimen into which values can be entered.

Specimen names can be changed and specimens deleted from the abstraction using the cog icon for each specimen panel.

Reopen an Abstraction Task

If you mistakenly submit abstraction results for a document too quickly, you can use the back button in your browser to return to the document. Click Reopen to return it to an unapproved status.

Related Topics