Adapting the code to similar tasks

The en_edris9 model and dose_instruction_parser package have been tailored to the problem of parsing free text dose instructions from prescriptions. However, the code can be used as a starting point to solve similar problems.

Parsing dose instructions for specific drugs or conditions

The en_edris9 model was trained on a balanced set of data covering the whole of national prescribing information. This makes the model a good “all rounder” when it comes to performance. If you are interested in a specific subset of drugs or conditions, you should be able to boost performance by further training the model on this subset of data. To do this, follow instructions in the Training a new named entity recogniser model section, taking care to:

1. Create training data for the types of dose instruction you are interested in 1. Install the en_edris9 model (if you don’t have access you can use en_core_med7_lg, obtained following the instructions here) 1. Modify model/config/config.cfg to replace all instances of en_core_med7_lg with en_edris9 1. Evaluate the performance compared to en_edris9 and/or en_core_med7_lg, using model/compare_models.py as a guide alongside output from source model/evaluate_model.sh

Extracting different structural information

In en_edris9 there are nine named entities extracted, which give rise to the following structured fields:

Field

Description

inputID

ID passed in with dose instruction for bookkeeping

text

The original free text dose instruction (can be later removed)

form

The form of drug e.g. “tablet”, “patch”, “injection”

dosageMin

The minimum dosage

dosageMax

The maximum dosage

frequencyMin

The minimum frequency

frequencyMax

The maximum frequency

frequencyType

The type of frequency for the dosage e.g. “Hour”, “Day”, “2 Week”

durationMin

The minimum duration of treatment

durationMax

The maximum duration of treatment

durationType

The type of duration for dosage e.g. “Day”, “Week”

asRequired

True/False: Whether to take as required / as needed

asDirected

True/False: Whether to take as directed

Changing this output requires three main steps:

  1. Create new training data tagged with all the named entities you are interested in. You can add new entities here e.g. “AS_REQUIRED” and “AS_DIRECTED” were new entities surplus to those in en_med7

  2. Make sure that overwrite_ents = True in the [components.ner] section of model/config/config.cfg, then train a new model following Training a new named entity recogniser model

  3. Modify dose_instruction_parser/dose_instruction_parser code to process the new entities into the output you desire. This process is more or less involved depending on the complexity of the entities. You can use the existing entities as a guide.

Warning

Note that you should include training data which is fully representative of the data you would like to use the model for. If you only train the model further on a certain type of example it will begin to “forget” what it already knows i.e. get worse at extracting entities which it could do before but is now not being trained on.

General application to medical free text parsing

Note

In this case it would be best to create a totally new repository using this repository as a starting point

This is a more involved version of the above. Broadly, you will need to

  1. Create tagged training data with all the named entities you are interested in

  2. Train a model following Training a new named entity recogniser model

  3. Heavily alter the dose_instruction_parser/dose_instruction_parser code to process the output in the way you want.