;;; -*- mode: fundamental; coding: utf-8; indent-tabs-mode: t; -*-

;;;
;;; Copyright (c) 2009 -- 2010 Stephan Oepen (oe@ifi.uio.no); 
;;; see `LICENSE' for conditions.
;;;


;;
;; deviating from the PTB conventions, we use one-character double quote marks
;; (i.e. |“| and |”| instead of |``| and |''|); much like the PTB, however, we
;; aim to disambiguate neutral quotes (|"| and |''|) at the string level, i.e.
;; opening quotes are preceded by a token boundary (white space), with a small
;; number of additional, token-initial characters than can intervene.  anything
;; else, we assume, is a closing quote.
;;
;; convert quotes to single characters prior to tokenizing off other characters
;; (group #1 below) to make adjacent whitespace detection easier, as e.g. in
;; |``$20!''|.  closing quotes can double as apostrophes and units of measure,
;; i.e. feet and inches, or seconds and minutes.
;;

;;
;; it appears we cannot trust writers to use `funny' quotes properly, hence we
;; neuter them to straight double or single quotes, which will then go through
;; disambiguation, based on adjacency to token boundaries.
;;
![«»]								"
![‹›]								'

;;
;; _fix_me_
;; in bio-medical texts we see names with double or triple apostrophes, e.g.
;; |Figure B''| or |(A–C'')| (presumably in a figure caption).  clearly, the
;; LaTeX-style conventions are incompatible with such usage of the apostrophe,
;; and probably we should limit support for LaTeX-style quotes to the LaTeX
;; REPP module.  at present, however, i doubt the ERG would do the right thing
;; for double-apostrophes inputs anyway, and a full analysis of |A'| and |A''|
;; could be expensive in terms of extra ambiguity.  discuss this with dan, one
;; fine day.                                                    (13-mar-10; oe)
;;

#1
!(„|``)								“
!(^| [[({“‘]*)("|'')						\1“
!("|'')								”
!(‚|`)								‘
!(^| [[({“‘]*)'							\1‘
!'								’
#

>1