One of the main methods our customers communicate with us is email. We get hundreds of travel booking emails daily; all of them contain time expressions that we need to turn into a computer readable format.
Humans naturally talk about “tomorrow afternoon” or “early next Monday morning”. These expressions are both contextual and ambiguous; the exact date of “next Monday” and “tomorrow” of course depends on what day it is today. Furthermore, “early morning” means different things to different people, although some common overlap can probably be agreed upon. Computers tend to prefer well defined and unambiguous times, for instance
1530961500 is the unix timestamp for July 7th 2018, 11:05 (UTC). Note that the time zone, something that most of us don’t usually actively think about, is also meaningful in this context.
Parsing time expression into structured, computer readable data is therefore challenging. Many solutions exist, but they are either too simplistic or problematic to use in a
python setup. We therefore wrote a pure python library for time parsing.
ctparse is a MIT-Licensed library built on straightforward concepts. It allows parsing complex expressions efficiently and can easily be adjusted for domain specific use cases https://github.com/comtravo/ctparse.
In many ways
ctparse is similar to
duckling, albeit admittedly having a significantly smaller scope for the time being.
ctparse implements a regular-expression and rule based system for parsing time and date expressions. There is also a statistical model to rank different parses and favour reasonable solutions over others. Whilst still in an early stage, we currently outperform
duckling in parsing date/time expressions from e-mail booking requests, both in terms of speed and accuracy.
For more details have a look at my talk about
ctparse at the previous PyData Berlin conference. In the talk I lay out the basic concepts and ideas behind building the PCFG (probabilistic context free grammar) inspired parser, discuss in detail some of the more challenging algorithmic building blocks and demonstrate how
python is actually a very good choice to implement such a system.
Have look at
ctparse on github and let us know what you think https://github.com/comtravo/ctparse.