Since early March 2017, we have been preparing the data sets and exact configurations of downstream applications; data and software releases to participants are announced incrementally on the page below as they become available.
The EPE 2017 task generalizes conventional notions of dependency representations somewhat and emphasizes a stand-off perspective rather than token-centric representations (as have frequently been employed for parsing shared tasks). For these reasons, the task needs to define its own textual interchange format to cover a broader range of morhpho-syntactico-semantic analysis into dependency representations. This file format has been formally specified in late March, 2017, and we will work with parser developers to create a collection of conversion tools from other common formats (like for example ConLL-U), to recover character stand-off pointers into the underlying text as well as accomodate generalized dependency graphs transcending rooted trees.
Starting on April 9, 2017, a trial version of such a converter has been available in binary form for 64-bit x86 Linux environments. The EPE sample file, for example, was produced by converting the native UDPipe parser output for the negation development data and converting to the EPE interchange format as follows:
./logon/bin/epe --convert --raw negation/development/raw.txt \ negation/development/udpipe.conllu /tmp/sample.epe
To enable participants to obtain empirical end-to-end results for the development data while preparing their system submissions, the downstream systems (including support for mostly automated re-training) will be provided to participants. Since mid-May 2017, the source code and instructions for the Sherlock negation resolution and TEES event extract systems have been publicly available. Furthermore, the task organizers hope to establish an automated upload, re-training, and evaluation interface, such that participants can obtain end-to-end feedback more easily (for at least some of the downstream systems). However, this ‘self-help’ evaluation infrastructure will likely not become available before the completion of the EPE 2017 evaluation period.
We have selected a ‘baseline’ stack of simple, yet state-of-the-art pre-processing tools for sentence splitting, tokenization, part of speech tagging, and lemmatization (Velldal et al. 2012; pp. 370–372). These are available to participants since early April as part of the trial release of the format converter and text preprocessor. For example, one might use the following command to prepare the ‘raw’ development text for the negation analysis downstream application for parsing with a system that expects tokenized and morphologically analyzed inputs:
./logon/bin/epe --prepare negation/development/raw.txt /tmp/sample.tt
Starting with version 1.2 (and onwards) of the EPE 2017 parser inputs, these automatically pre-processed variants of the ‘raw’ texts are included in the data package. Candidate participants are welcome to start from either the ‘raw’, running texts or use any part or all of the segmentation and morpholological information provided in the pre-processed files.