.. _Overview:

Overview
========

This is the documentation for a tool developed at Public Health Scotland to parse
prescription dose instructions.

Background
----------

Dose instructions are pieces of free text that
accompany a prescription, for example "Take 1 tablet every 3 hours" or 
"2 puffs daily via acuhaler".

Because the dose instructions are free text there are many ways of phrasing the 
same core information and this means it is difficult to use them for analysis when
there are a lot of different instructions to process. Additionally, free text can 
include patient-identifiable information like names, emails and addresses. 

By processing the dose instructions we can convert the free text into a structured
output - i.e. a series of columns with different information in them pulled from the 
text. This output is more suitable for analysis and also greatly reduces the chance
of patient-identifiable information being present.

This tool allows you to parse free text dose instructions to the following structured
fields:

===============     ==================================================================
Field               Description
===============     ==================================================================
inputID             ID passed in with dose instruction for bookkeeping 
text                The original free text dose instruction (can be later removed)
form                The form of drug e.g. "tablet", "patch", "injection"
dosageMin           The minimum dosage 
dosageMax           The maximum dosage
frequencyMin        The minimum frequency
frequencyMax        The maximum frequency 
frequencyType       The type of frequency for the dosage e.g. "Hour", "Day", "2 Week"
durationMin         The minimum duration of treatment 
durationMax         The maximum duration of treatment
durationType        The type of duration for dosage e.g. "Day", "Week"
asRequired          True/False: Whether to take as required / as needed
asDirected          True/False: Whether to take as directed
===============     ==================================================================

Here is a piece of sample output:

.. csv-table:: Sample output
   :header-rows: 1

    ,inputID,text,form,dosageMin,dosageMax,frequencyMin,frequencyMax,frequencyType,durationMin,durationMax,durationType,asRequired,asDirected
    0,eDRIS/XXXX-XXXX/example/001,daily 2 caps as directed,capsule,2.0,2.0,1.0,1.0,Day,,,,False,True
    1,eDRIS/XXXX-XXXX/example/002,daily 0.2ml,ml,0.2,0.2,1.0,1.0,Day,,,,False,False
    2,eDRIS/XXXX-XXXX/example/003,two mane + two nocte,,2.0,2.0,2.0,2.0,Day,,,,False,False
    3,eDRIS/XXXX-XXXX/example/004,2 tabs twice daily ,tablet,2.0,2.0,2.0,2.0,Day,,,,False,False
    4,eDRIS/XXXX-XXXX/example/005,take one in the morning and take two at night as directed,,3.0,3.0,1.0,1.0,Day,,,,False,False
    5,eDRIS/XXXX-XXXX/example/006,1 tablet(s) three times daily for pain/inflammation,tablet,1.0,1.0,3.0,3.0,Day,,,,False,False
    6,eDRIS/XXXX-XXXX/example/007,two puffs at night,puff,2.0,2.0,1.0,1.0,Day,,,,False,False
    7,eDRIS/XXXX-XXXX/example/008,0.6mls daily,ml,0.6,0.6,1.0,1.0,Day,,,,False,False
    8,eDRIS/XXXX-XXXX/example/009,to be applied tds prn,,,,3.0,3.0,Day,,,,True,False
    9,eDRIS/XXXX-XXXX/example/010,take 1 tablet for 3 weeks then take 3 tablets for 4 weeks,tablet,1.0,1.0,,,,3.0,3.0,Week,False,False
    10,eDRIS/XXXX-XXXX/example/010,take 1 tablet for 3 weeks then take 3 tablets for 4 weeks,tablet,3.0,3.0,,,,4.0,4.0,Week,False,False
    11,eDRIS/XXXX-XXXX/example/011,one to be taken twice a day  if sleepy do not drive/use machines. avoid alcohol. swallow whole.,,1.0,1.0,2.0,2.0,Day,,,,False,False
    12,eDRIS/XXXX-XXXX/example/012,1 tab take as required,tablet,1.0,1.0,,,,,,,True,False
    13,eDRIS/XXXX-XXXX/example/013,take one daily for allergy,,1.0,1.0,1.0,1.0,Day,,,,False,False
    14,eDRIS/XXXX-XXXX/example/014,one daily when required,,1.0,1.0,1.0,1.0,Day,,,,True,False

Methods
-------

The parsing process consists of three main stages:

#. Pre-process the dose instruction to clean up the free text
#. Use a machine learning Named Entity Recogniser (NER) model to associate key phrases
   in the text with "entities" of interest such as "DOSAGE" and "DURATION"
#. Apply rules to each key phrase to extract structured information

As an example, consider the dose instruction "one/two tabs bid prn". 

===============     ========================
Stage               Output
===============     ========================
(1) Pre-process     1 / 2 tablets bid prn
(2) NER             DOSAGE: "1 / 2", FORM: "tablets", FREQUENCY: "bid", AS_REQUIRED: "prn"    
(3) Rules           StructuredDI(text='one/two tabs bid prn', form='tablet', dosageMin=1.0, dosageMax=2.0, frequencyMin=2.0, frequencyMax=2.0, frequencyType='Day', durationMin=None, durationMax=None, durationType=None, asRequired=True, asDirected=False)   
===============     ========================

The whole process is carried out by the :mod:`dose_instruction_parser` package, available on `PyPI <https://pypi.org/>`_.
See :ref:`Installation` and :ref:`Parsing dose instructions` for information on how to get going.

Project layout
--------------

.. code::

   📦dose_instructions_parser
   ┣ 📂.github
   ┃ ┣ 📂workflows                
   ┣ 📂coverage                  # code coverage information 
   ┣ 📂doc                       # documentation
   ┃ ┣ 📂examples                # -- example scripts
   ┃ ┗ 📂sphinx                  # -- source behind github pages docs
   ┃ ┃ ┣ 📂source
   ┃ ┃ ┃ ┣ 📂doc_pages
   ┃ ┃ ┃ ┣ 📂modules
   ┃ ┃ ┃ ┃ ┗ 📂dose_instruction_parser
   ┃ ┃ ┃ ┣ 📂_static
   ┣ 📂dose_instruction_parser   # package for parsing dose instructions
   ┃ ┣ 📂dose_instruction_parser
   ┃ ┃ ┣ 📂data
   ┃ ┃ ┣ 📂tests
   ┣ 📂model                     # code for creating NER model
   ┃ ┣ 📂config                  # -- model configuration 
   ┃ ┣ 📂data                    # -- processed .spacy data created here
   ┃ ┣ 📂preprocess              # -- code for pre-processing training     
   ┃ ┃ ┣ 📂processed             # ---- intermediate processing carried out here
   ┃ ┃ ┣ 📂tagged                # ---- put tagged .json training data here
   ┗ ┗ 📂setup                   # -- script for setting up conda for model development