.. _Adapting to similar tasks: Adapting the code to similar tasks ================================== The :program:`en_edris9` model and :program:`dose_instruction_parser` package have been tailored to the problem of parsing free text dose instructions from prescriptions. However, the code can be used as a starting point to solve similar problems. Parsing dose instructions for specific drugs or conditions ---------------------------------------------------------- The :program:`en_edris9` model was trained on a balanced set of data covering the whole of national prescribing information. This makes the model a good "all rounder" when it comes to performance. If you are interested in a specific subset of drugs or conditions, you should be able to boost performance by further training the model on this subset of data. To do this, follow instructions in the :ref:`Training a model` section, taking care to: 1. Create training data for the types of dose instruction you are interested in 1. Install the :program:`en_edris9` model (if you don't have access you can use :program:`en_core_med7_lg`, obtained following the instructions `here `_) 1. Modify :file:`model/config/config.cfg` to replace all instances of :program:`en_core_med7_lg` with :program:`en_edris9` 1. Evaluate the performance compared to :program:`en_edris9` and/or :program:`en_core_med7_lg`, using :file:`model/compare_models.py` as a guide alongside output from :file:`source model/evaluate_model.sh` Extracting different structural information ------------------------------------------- In :program:`en_edris9` there are nine named entities extracted, which give rise to the following structured fields: =============== ================================================================== Field Description =============== ================================================================== inputID ID passed in with dose instruction for bookkeeping text The original free text dose instruction (can be later removed) form The form of drug e.g. "tablet", "patch", "injection" dosageMin The minimum dosage dosageMax The maximum dosage frequencyMin The minimum frequency frequencyMax The maximum frequency frequencyType The type of frequency for the dosage e.g. "Hour", "Day", "2 Week" durationMin The minimum duration of treatment durationMax The maximum duration of treatment durationType The type of duration for dosage e.g. "Day", "Week" asRequired True/False: Whether to take as required / as needed asDirected True/False: Whether to take as directed =============== ================================================================== Changing this output requires three main steps: #. Create new training data tagged with all the named entities you are interested in. You can add new entities here e.g. "AS_REQUIRED" and "AS_DIRECTED" were new entities surplus to those in :program:`en_med7` #. Make sure that :code:`overwrite_ents = True` in the :code:`\[components.ner\]` section of :file:`model/config/config.cfg`, then train a new model following :ref:`Training a model` #. Modify :file:`dose_instruction_parser/dose_instruction_parser` code to process the new entities into the output you desire. This process is more or less involved depending on the complexity of the entities. You can use the existing entities as a guide. .. warning:: Note that you should include training data which is fully representative of the data you would like to use the model for. If you only train the model further on a certain type of example it will begin to "forget" what it already knows i.e. get worse at extracting entities which it could do before but is now not being trained on. General application to medical free text parsing ------------------------------------------------ .. note:: In this case it would be best to create a totally new repository using this repository as a starting point This is a more involved version of the above. Broadly, you will need to #. Create tagged training data with all the named entities you are interested in #. Train a model following :ref:`Training a model` #. Heavily alter the :file:`dose_instruction_parser/dose_instruction_parser` code to process the output in the way you want.