A Practical Parser for Time Expressions

A Practical Parser for Time Expressions

One of the main methods our customers communicate with us is email. We get hundreds of travel booking emails daily; all of them contain time expressions that we need to turn into a computer readable format.

Humans naturally talk about “tomorrow afternoon” or “early next Monday morning”. These expressions are both contextual and ambiguous; the exact date of “next Monday” and “tomorrow” of course depends on what day it is today. Furthermore, “early morning” means different things to different people, although some common overlap can probably be agreed upon. Computers tend to prefer well defined and unambiguous times, for instance 1530961500 is the unix timestamp for July 7th 2018, 11:05 (UTC). Note that the time zone, something that most of us don’t usually actively think about, is also meaningful in this context.

Parsing time expression into structured, computer readable data is therefore challenging. Many solutions exist, but they are either too simplistic or problematic to use in a python setup. We therefore wrote a pure python library for time parsing. ctparse is a MIT-Licensed library built on straightforward concepts. It allows parsing complex expressions efficiently and can easily be adjusted for domain specific use cases https://github.com/comtravo/ctparse.

In many ways ctparse is similar to duckling, albeit admittedly having a significantly smaller scope for the time being. ctparse implements a regular-expression and rule based system for parsing time and date expressions. There is also a statistical model to rank different parses and favour reasonable solutions over others. Whilst still in an early stage, we currently outperform duckling in parsing date/time expressions from e-mail booking requests, both in terms of speed and accuracy.

For more details have a look at my talk about ctparse at the previous PyData Berlin conference. In the talk I lay out the basic concepts and ideas behind building the PCFG (probabilistic context free grammar) inspired parser, discuss in detail some of the more challenging algorithmic building blocks and demonstrate how python is actually a very good choice to implement such a system.

Have look at ctparse on github and let us know what you think https://github.com/comtravo/ctparse.

Written by

Sebastian Mika

Build Things | Machine Learning

Sebastian has 25+ years experience in machine learning, from research to building teams and businesses around it. As such he was founder of a data science consulting boutique, co-founder of an offline location analytics startup and worked as a pharma strategy consultant. He received his PhD from Technische Universität Berlin in the field of machine learning and is co-author of more than 50 peer-reviewed articles and conference publications.