[10870010] |Text corpus
[10870020] |In [[linguistics]], a '''corpus''' (plural ''corpora'') or '''text corpus''' is a large and structured set of texts (now usually electronically stored and processed).
[10870030] |They are used to do statistical analysis, checking occurrences or validating linguistic rules on a specific universe.
[10870040] |A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus'').
[10870050] |Multilingual corpora that have been specially formatted for side-by-side comparison are called ''aligned parallel corpora''.
[10870060] |In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as [[annotation]].
[10870070] |An example of annotating a corpus is [[part-of-speech tagging]], or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of ''tags''.
[10870080] |Another example is indicating the [[lemma (linguistics)|lemma]] (base) form of each word.
[10870090] |When the language of the corpus is not a working language of the researchers who use it, interlinear [[gloss]]ing is used to make the annotation bilingual.
[10870100] |Corpora are the main knowledge base in [[corpus linguistics]].
[10870110] |The analysis and processing of various types of corpora are also the subject of much work in [[computational linguistics]], [[speech recognition]] and [[machine translation]], where they are often used to create [[hidden Markov model]]s for POS-tagging and other purposes.
[10870120] |Corpora and [[frequency list]]s derived from them are useful for [[language teaching]].
[10870130] |==Archaeological corpora==
[10870140] |Text corpora. are also used in the study of [[historical document]]s, for example in attempts to [[decipherment|decipher]] ancient scripts, or in [[Biblical scholarship]].
[10870150] |Some archaeological corpora can be of such short duration that they provide a snapshot in time.
[10870160] |One of the shortest corpora in time, may be the 15-30 year [[Amarna letters]] texts-([[1350 BC]]).
[10870170] |The ''corpus'' of an ancient city, (for example the "[[Kültepe]] Texts" of Turkey), may go through a series of corpora, determined by their find site dates.
[10870180] |== Some notable text corpora ==
[10870190] |English language:
[10870200] |* [[American National Corpus]]
[10870210] |* [[Bank of English]]
[10870220] |* [[British National Corpus]]
[10870230] |* [[Corpus Juris Secundum]]
[10870240] |* [[Corpus of Contemporary American English (COCA)]] 360 million words, 1990-2007.
[10870250] |Freely available online.
[10870260] |* [[Brown Corpus]], forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB.
[10870270] |* [[Oxford English Corpus]]
[10870280] |* [[Scottish Corpus of Texts & Speech]]
[10870290] |Other languages:
[10870300] |* [[Amarna letters]], (for [[Akkadian language|Akkadian]], Egyptian, [[Sumerogram]]'s, etc.)
[10870310] |* [[Bijankhan Corpus]] A Contemporary Persian Corpus for NLP researches
[10870320] |* [[Croatian National Corpus]]
[10870330] |* [[Hamshahri Corpus]] A Contemporary Persian Corpus for IR researches
[10870340] |* [[Neo-Assyrian Text Corpus Project]]
[10870350] |* [[Persian Today Corpus]]
[10870360] |* [[Thesaurus Linguae Graecae]] (Ancient Greek)
[10880010] |Text mining
[10880020] |'''Text mining''', sometimes alternately referred to as ''text [[data mining]]'', refers generally to the process of deriving high quality [[information]] from text.
[10880030] |High quality information is typically derived through the dividing of patterns and trends through means such as [[pattern recognition|statistical pattern learning]].
[10880040] |Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a [[database]]), deriving patterns within the structured data, and finally evaluation and interpretation of the output.
[10880050] |'High quality' in text mining usually refers to some combination of [[relevance (information retrieval)|relevance]], [[Novelty (patent)|novelty]], and interestingness.
[10880060] |Typical text mining tasks include [[text categorization]], [[text clustering]], [[concept mining|concept/entity extraction]], production of granular taxonomies, [[sentiment analysis]], [[document summarization]], and entity relation modeling (''i.e.'', learning relations between [[Named entity recognition|named entities]]).
[10880070] |==History==
[10880080] |Labour-intensive manual text-mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance swiftly during the past decade.
[10880090] |Text mining is an [[interdisciplinary]] field which draws on [[information retrieval]], [[data mining]], [[machine learning]], [[statistics]], and [[computational linguistics]].
[10880100] |As most information (over 80%) is currently stored as text, text mining is believed to have a high commercial potential value.
[10880110] |Increasing interest is being paid to multilingual data mining: the ability to gain information across languages and cluster similar items from different linguistic sources according to their meaning.
[10880120] |== Sentiment analysis ==
[10880130] |[[Sentiment analysis]] may, for example, involve analysis of movie reviews for estimating how favorably a review is for a movie.
[10880140] |Such an analysis may require a labeled data set or labeling of the [[Affect_(psychology)|affectivity]] of words.
[10880150] |A resource for affectivity of words has been made for [[WordNet]].
[10880160] |==Applications==
[10880170] |Recently, text mining has been receiving attention in many areas.
[10880180] |===Security applications===
[10880190] |One of the largest text mining applications that exists is probably the classified [[ECHELON]] surveillance system.
[10880200] |Additionally, many text mining software packages such as [[AeroText]], [[Attensity]], [[SPSS]] and [[Expert System]] are marketed towards security applications, particularly analysis of plain text sources such as Internet news.
[10880210] |In 2007, [[Europol]]'s Serious Crime division developed an analysis system in order to track transnational organized crime.
[10880220] |This Overall Analysis System for Intelligence Support (OASIS) integrates among the most advanced text analytics and text mining technologies available on today's market.
[10880230] |This system led Europol to make the most significant progress to support law enforcement objectives at the international level.
[10880240] |=== Biomedical applications ===
[10880250] |A range of applications of text mining of the biomedical literature has been described.
[10880260] |One example is [[PubGene]] ([http://www.pubgene.org pubgene.org]) that combines biomedical text mining with network visualization as an Internet service.
[10880270] |Another example, which uses ontologies with textmining is [http://www.gopubmed.org GoPubMed.org].
[10880280] |===Software and applications===
[10880290] |Research and development departments of major companies, including [[IBM]] and [[Microsoft]], are researching text mining techniques and developing programs to further automate the mining and analysis processes.
[10880300] |Text mining software is also being researched by different companies working in the area of search and indexing in general as a way to improve their results.
[10880310] |===Marketing applications===
[10880320] |Text mining is starting to be used in marketing as well, more specifically in analytical [[Customer relationship management]]. [http://www.textmining.UGent.be Coussement and Van den Poel] (2008) apply it to improve [[predictive analytics]] models for customer churn ([[Customer attrition]]). .
[10880330] |===Academic applications===
[10880340] |The issue of text mining is of importance to publishers who hold large [[databases]] of information requiring [[Index (database)|indexing]] for retrieval.
[10880350] |This is particularly true in scientific disciplines, in which highly specific information is often contained within written text.
[10880360] |Therefore, initiatives have been taken such as [[Nature (journal)|Nature's]] proposal for an Open Text Mining Interface (OTMI) and [[National Institutes of Health|NIH's]] common Journal Publishing [[Document Type Definition]] (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.
[10880370] |Academic institutions have also become involved in the text mining initiative:
[10880380] |The [[National Centre for Text Mining]], a collaborative effort between the Universities of [[University of Manchester|Manchester]] and [[University of Liverpool|Liverpool]], provides customised tools, research facilities and offers advice to the academic community.
[10880390] |They are funded by the [[Joint Information Systems Committee]] (JISC) and two of the UK [[Research Council]]s.
[10880400] |With an initial focus on text mining in the [[biology|biological]] and [[biomedical]] sciences, research has since expanded into the areas of [[Social Science]].
[10880410] |In the United States, the [[UC Berkeley School of Information|School of Information]] at [[University of California, Berkeley]] is developing a program called BioText to assist bioscience researchers in text mining and analysis.
[10880420] |== Software and applications==
[10880430] |Research and development departments of major companies, including [[IBM]] and [[Microsoft]], are researching text mining techniques and developing programs to further automate the mining and analysis processes.
[10880440] |Text mining software is also being researched by different companies working in the area of search and indexing in general as a way to improve their results.
[10880450] |There is a large number of companies that provide commercial computer programs:
[10880460] |* [[AeroText]] - provides a suite of text mining applications for content analysis.
[10880470] |Content used can be in multiple languages.
[10880480] |* [[Attensity]] - suite of text mining solutions that includes search, statistical and NLP based technologies for a variety of industries.
[10880490] |* [[Autonomy Corporation|Autonomy]] - suite of text mining, clustering and categorization solutions for a variety of industries.
[10880500] |* [[Endeca Technologies]] - provides software to analyze and cluster unstructured text.
[10880510] |* [[Expert System S.p.A.]] - suite of semantic technologies and products for developers and knowledge managers.
[10880520] |* [[Fair Isaac]] - leading provider of decision management solutions powered by advanced analytics (includes text analytics).
[10880530] |* [[LanguageWare]] [http://www.alphaworks.ibm.com/tech/lrw] - the IBM Tools and Runtime for Text Mining.
[10880540] |* [[Inxight]] - provider of text analytics, search, and unstructured visualization technologies.
[10880550] |(Inxight was sold to [[Business Objects (company)|Business Objects]] that was sold to [[SAP AG]] in 2007)
[10880560] |* Nstein Technologies [http://www.nstein.com] - provider of text mining, digital asset management, and web content management solutions
[10880570] |* [[Pervasive Data Integrator]] - includes Extract Schema Designer that allows the user to point and click identify structure patterns in reports, html, emails, etc. for extraction into any database
[10880580] |* [[RapidMiner|RapidMiner/YALE]] - open-source data and text mining software for scientific and commercial use.
[10880590] |* [[SPSS]] - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions.
[10880600] |* [[Thomson Data Analyzer]] - Enables complex analysis on patent information, scientific publications and news.
[10880610] |* [[Clearforest Developer]] - A suite of tools for developing NLP (Natural Language Processing) based text mining applications to derive structure out of unstructured texts.
[10880620] |* VantagePoint [http://www.thevantagepoint.com] - Text mining software which includes tools for data cleanup, analysis, process automation, and reporting.
[10880630] |===Open-source software and applications===
[10880640] |* [[General Architecture for Text Engineering|GATE]] - natural language processing and language engineering tool.
[10880650] |* [[RapidMiner|YALE/RapidMiner]] with its Word Vector Tool plugin - data and text mining software.
[10880660] |* tm [http://cran.r-project.org/web/packages/tm/index.html] [http://www.jstatsoft.org/v25/i05] - text mining in the [[R programming language]]
[10880670] |==Implications==
[10880680] |Until recently websites most often used text-based lexical searches; in other words, users could find documents only by the words that happened to occur in the documents.
[10880690] |Text mining may allow searches to be directly answered by the [[semantic web]]; users may be able to search for content based on its meaning and context, rather than just by a specific word.
[10880700] |Additionally, text mining software can be used to build large dossiers of information about specific people and events.
[10880710] |For example, by using software that extracts specifics facts about businesses and individuals from news reports, large datasets can be built to facilitate [[social networks analysis]] or [[counter-intelligence]].
[10880720] |In effect, the text mining software may act in a capacity similar to an [[intelligence analyst]] or [[research librarian]], albeit with a more limited scope of analysis.
[10880730] |Text mining is also used in some email [[spam filter]]s as a way of determining the characteristics of messages that are likely to be advertisements or other unwanted material.
[10890010] |Translation
[10890020] |'''Translation''' is the action of [[hermeneutics|interpretation]] of the [[Meaning (linguistic)|meaning]] of a text, and subsequent production of an [[Dynamic and formal equivalence|equivalent]] text, also called a '''translation''', that communicates the same [[message]] in another language.
[10890030] |The text to be translated is called the [[source text]], and the language it is to be translated into is called the [[target language]]; the final product is sometimes called the "target text."
[10890040] |Translation must take into account constraints that include [[wiktionary:context|context]], the rules of [[grammar]] of the two languages, their writing [[Convention (norm)|convention]]s, and their [[idiom]]s.
[10890050] |A common [[misconception]] is that there exists a simple [[literal translation|word-for-word]] correspondence between any two [[language]]s, and that translation is a straightforward [[mechanical]] process.
[10890060] |A word-for-word translation does not take into account context, grammar, conventions, and idioms.
[10890070] |Translation is fraught with the potential for "[[language contact|spilling over]]" of [[idiom]]s and [[style guide|usage]]s from one language into the other, since both languages repose within the single brain of the translator.
[10890080] |Such spilling-over easily produces [[mixed language|linguistic hybrids]] such as "[[Franglais]]" ([[French language|French]]-[[English language|English]]), "[[Spanglish]]" ([[Spanish language|Spanish]]-[[English language|English]]), "[[Poglish]]" ([[Polish language|Polish]]-[[English language|English]]) and "[[Portuñol/Portunhol|Portuñol]]" ([[Portuguese language|Portuguese]]-[[Spanish language|Spanish]]).
[10890090] |The art of translation is as old as written [[literature]].
[10890100] |Parts of the [[Sumer]]ian ''[[Epic of Gilgamesh]]'', among the oldest known literary works, have been found in translations into several [[Asia]]tic languages of the second millennium BCE.
[10890110] |The ''Epic of Gilgamesh'' may have been read, in their own languages, by early authors of the ''[[Bible]]'' and of the ''[[Iliad]]''.
[10890120] |With the advent of computers, attempts have been made to [[computer]]ize or otherwise [[automate]] the translation of [[natural language|natural-language]] texts ([[machine translation]]) or to use computers as an ''aid'' to translation ([[computer-assisted translation]]).
[10890130] |==The term==
[10890140] |[[Etymology|Etymologically]], "translation" is a "carrying across" or "bringing across."
[10890150] |The [[Latin]] "''translatio''" derives from the [[perfect aspect|perfect]] [[grammatical voice|passive]] [[participle#Latin|participle]], "''translatum''," of "''transferre''" ("to transfer" — from "''trans''," "across" + "''ferre''," "to carry" or "to bring").
[10890160] |The modern [[Romance languages|Romance]], [[Germanic languages|Germanic]] and [[Slavic language|Slavic]] [[European languages]] have generally formed their own [[Formal and dynamic equivalence|equivalent]] terms for this concept after the Latin model — after "''transferre''" or after the kindred "''traducere''" ("to bring across" or "to lead across").
[10890170] |Additionally, the [[Greek language|Greek]] term for "translation," "''metaphrasis''" ("a speaking across"), has supplied [[English language|English]] with "[[Wiktionary:metaphrase|metaphrase]]" (a "[[literal translation]]," or "word-for-word" translation)—as contrasted with "[[paraphrase]]" ("a saying in other words," from the Greek "''paraphrasis''").
[10890180] |"Metaphrase" equates, in one of the more recent terminologies, to "[[Translation#Equivalence|formal equivalence]]," and "paraphrase"—to "[[Translation#Equivalence|dynamic equivalence]]."
[10890190] |==Misconceptions==
[10890200] |Newcomers to translation sometimes proceed as if translation were an [[exact science]] — as if consistent, one-to-one [[correlation]]s existed between the words and phrases of different languages, rendering translations fixed and identically reproducible, much as in [[cryptography]].
[10890210] |Such [[novice]]s may assume that all that is needed to translate a text is to "[[encode]]" and "[[decode]]" equivalents between the two languages, using a [[translation dictionary]] as the "[[codebook]]."
[10890220] |On the contrary, such a fixed relationship would only exist were a new language [[constructed language|synthesized]] and simultaneously matched to a pre-existing language's scopes of [[meaning (linguistics)|meaning]], [[etymologies]], and [[lexicon|lexical]] [[ecological niche]]s.
[10890230] |If the new language were subsequently to take on a life apart from such cryptographic use, each word would spontaneously begin to assume new shades of meaning and cast off previous [[association (psychology)|association]]s, thereby vitiating any such artificial synchronization.
[10890240] |Henceforth translation would require the disciplines described in this article.
[10890250] |Another common misconception is that ''anyone'' who can speak a [[second language]] will make a good translator.
[10890260] |In the translation community, it is generally accepted that the best translations are produced by persons who are translating into their own [[native language]]s, as it is rare for someone who has learned a second language to have total fluency in that language.
[10890270] |A good translator understands the source language well, has specific experience in the subject matter of the text, and is a good writer in the target language.
[10890280] |Moreover, he is not only [[bilingual]] but [[bicultural]].
[10890290] |It has been debated whether translation is [[art]] or [[craft]].
[10890300] |Literary translators, such as [[Gregory Rabassa]] in ''If This Be Treason'', argue that translation is an art—a teachable one.
[10890310] |Other translators, mostly technical, commercial, and legal, regard their ''métier'' as a craft—again, a teachable one, subject to [[Discourse analysis|linguistic analysis]], that benefits from [[Academia|academic]] study.
[10890320] |As with other human activities, the distinction between art and craft may be largely a matter of degree.
[10890330] |Even a document which appears simple, e.g. a product [[brochure]], requires a certain level of linguistic skill that goes beyond mere technical terminology.
[10890340] |Any material used for marketing purposes reflects on the company that produces the product and the brochure.
[10890350] |The best translations are obtained through the combined application of good technical-terminology skills and good writing skills.
[10890360] |Translation has served as a writing school for many recognized writers.
[10890370] |Translators, including the early modern European translators of the ''[[Bible]]'', in the course of their work have shaped the very [[language]]s into which they have translated.
[10890380] |They have acted as bridges for conveying knowledge and ideas between [[culture]]s and [[civilization]]s.
[10890390] |Along with [[idea]]s, they have imported into their own languages, [[calque]]s of [[grammar|grammatical structures]] and of [[vocabulary]] from the [[source language]]s.
[10890400] |==Interpreting==
[10890410] |Interpreting, or "interpretation," is the intellectual activity that consists of facilitating [[speech communication|oral]] or [[sign language|sign-language]] [[communication]], either simultaneously or consecutively, between two or among three or more speakers who are not speaking, or signing, the same language.
[10890420] |The words "interpreting" and "interpretation" both can be used to refer to this activity; the word "interpreting" is commonly used in the profession and in the translation-studies field to avoid confusion with other meanings of the word "[[Interpretation (disambiguation)|interpretation]]."
[10890430] |Not all languages employ, as [[English language|English]] does, two separate words to denote the activities of ''written'' and live-communication (''oral'' or ''sign-language'') translators.
[10890440] |==Fidelity vs. transparency==
[10890450] |[[Fidelity]] (or "faithfulness") and [[transparency (linguistic)|transparency]] are two qualities that, for millennia, have been regarded as ideals to be striven for in translation, particularly [[literary]] translation.
[10890460] |These two ideals are often at odds.
[10890470] |Thus a 17th-century French critic coined the phrase, "''les belles infidèles''," to suggest that translations, like women, could be ''either'' faithful ''or'' beautiful, but not both at the same time.
[10890480] |Fidelity pertains to the extent to which a translation accurately renders the meaning of the [[source text]], without adding to or subtracting from it, without intensifying or weakening any part of the meaning, and otherwise without distorting it.
[10890490] |[[Transparency (linguistic)|Transparency]] pertains to the extent to which a translation appears to a native speaker of the target language to have originally been written in that language, and conforms to the language's grammatical, syntactic and idiomatic conventions.
[10890500] |A translation that meets the first criterion is said to be a "faithful translation"; a translation that meets the second criterion, an "[[idiomatic]] translation."
[10890510] |The two qualities are ''not necessarily'' mutually exclusive.
[10890520] |The criteria used to judge the faithfulness of a translation vary according to the subject, the precision of the original contents, the type, function and use of the text, its literary qualities, its social or historical context, and so forth.
[10890530] |The criteria for judging the [[transparency (linguistic)|transparency]] of a translation would appear more straightforward: an unidiomatic translation "sounds wrong," and in the extreme case of [[literal translation|word-for-word translation]]s generated by many [[machine translation|machine-translation]] systems, often results in patent nonsense with only a [[humor]]ous value (see "[[round-trip translation]]").
[10890540] |Nevertheless, in certain contexts a translator may consciously ''strive'' to produce a literal translation.
[10890550] |[[Literary]] translators and translators of [[religious]] or [[historic]] texts often adhere as closely as possible to the source text.
[10890560] |In doing so, they often deliberately stretch the boundaries of the target language to produce an unidiomatic text.
[10890570] |Similarly, a literary translator may wish to adopt words or expressions from the [[source language]] in order to provide "local color" in the translation.
[10890580] |In recent decades, prominent advocates of such "non-transparent" translation have included the French scholar [[Antoine Berman]], who identified twelve deforming tendencies inherent in most prose translations, and the American theorist Lawrence Venuti, who has called upon translators to apply "foreignizing" translation strategies instead of domesticating ones.
[10890590] |Many non-transparent-translation theories draw on concepts from [[German Romanticism]], the most obvious influence on latter-day theories of "foreignization" being the German theologian and philosopher [[Friedrich Schleiermacher]].
[10890600] |In his seminal lecture "On the Different Methods of Translation" (1813) he distinguished between translation methods that move "the writer toward [the reader]," i.e., [[transparency (linguistic)|transparency]], and those that move the "reader toward [the author]," i.e., an extreme [[fidelity]] to the foreignness of the [[source text]].
[10890610] |Schleiermacher clearly favored the latter approach.
[10890620] |His preference was motivated, however, not so much by a desire to embrace the foreign, as by a nationalist desire to oppose France's cultural domination and to promote [[German literature]].
[10890630] |For the most part, current Western practices in translation are dominated by the concepts of "fidelity" and "transparency."
[10890640] |This has not always been the case.
[10890650] |There have been periods, especially in pre-Classical Rome and in the 18th century, when many translators stepped beyond the bounds of translation proper into the realm of ''adaptation''.
[10890660] |Adapted translation retains currency in some non-Western traditions.
[10890670] |Thus the [[India]]n epic, the ''[[Ramayana]]'', appears in many versions in the various [[Languages of India|Indian languages]], and the stories are different in each.
[10890680] |If one considers the words used for translating into the Indian languages, whether those be [[Aryan]] or [[Dravidian]] languages, he is struck by the freedom that is granted to the translators.
[10890690] |This may relate to a devotion to [[prophecy|prophetic]] passages that strike a deep religious chord, or to a vocation to instruct [[unbeliever]]s.
[10890700] |Similar examples are to be found in [[medieval Christianity|medieval Christian]] literature, which adjusted the text to the customs and values of the audience.
[10890710] |==Equivalence==
[10890720] |The question of [[fidelity]] vs. [[transparency (linguistic)|transparency]] has also been formulated in terms of, respectively, "''formal'' equivalence" and "''dynamic'' equivalence."
[10890730] |The latter two expressions are associated with the translator [[Eugene Nida]] and were originally coined to describe ways of translating the ''[[Bible]]'', but the two approaches are applicable to any translation.
[10890740] |"Formal equivalence" equates to "[[wiktionary:metaphrase|metaphrase]]," and "dynamic equivalence"—to "[[paraphrase]]."
[10890750] |"Dynamic equivalence" (or "''functional'' equivalence") conveys the essential ''[[thought]]'' expressed in a source text — if necessary, at the expense of [[literal]]ity, original [[sememe]] and [[word order]], the source text's active vs. passive [[voice (grammar)|voice]], etc.
[10890760] |By contrast, "formal equivalence" (sought via [[literal translation|"literal" translation]]) attempts to render the text "[[literal]]ly," or "word for word" (the latter expression being itself a word-for-word rendering of the [[classical Latin]] "''verbum pro verbo''") — if necessary, at the expense of features natural to the [[target language]].
[10890770] |There is, however, '''''no sharp boundary''''' between dynamic and formal equivalence.
[10890780] |On the contrary, they represent a ''spectrum'' of translation approaches.
[10890790] |Each is used at various times and in various contexts by the same translator, and at various points within the same text — sometimes simultaneously.
[10890800] |Competent translation entails the judicious blending of dynamic and formal [[Dynamic and formal equivalence|equivalents]].
[10890810] |==Back-translation==
[10890820] |If one text is a translation of another, a '''back-translation''' is a translation of the translated text back into the language of the original text, made without reference to the original text.
[10890830] |In the context of [[machine translation]], this is also called a "'''round-trip translation'''."
[10890840] |Comparison of a back-translation to the original text is sometimes used as a [[quality control|quality check]] on the original translation, but it is certainly far from infallible and the reliability of this technique has been disputed.
[10890850] |==Literary translation==
[10890860] |Translation of [[literature|literary works]] ([[novel]]s, [[short story|short stories]], [[theatre|plays]], [[poetry|poems]], etc.) is considered a literary pursuit in its own right.
[10890870] |Notable in [[Canadian literature]] ''specifically'' as translators are figures such as [[Sheila Fischman]], [[Robert Dickson (writer)|Robert Dickson]] and [[Linda Gaboriau]], and the [[Governor General's Awards]] present prizes for the year's best English-to-French and French-to-English literary translations.
[10890880] |Other writers, among many who have made a name for themselves as literary translators, include [[Vasily Zhukovsky]], [[Tadeusz Boy-Żeleński]], [[Vladimir Nabokov]], [[Jorge Luis Borges]], [[Robert Stiller]] and [[Haruki Murakami]].
[10890890] |===History===
[10890900] |The first important translation in the West was that of the ''[[Septuagint]]'', a collection of [[Jew]]ish Scriptures translated into [[Koine Greek]] in [[Alexandria]] between the 3rd and 1st centuries BCE.
[10890910] |The dispersed [[Jew]]s had forgotten their ancestral language and needed Greek versions (translations) of their Scriptures.
[10890920] |Throughout the [[Middle Ages]], [[Latin]] was the ''[[lingua franca]]'' of the western learned world.
[10890930] |The 9th-century [[Alfred the Great]], king of [[Wessex]] in [[England]], was far ahead of his time in commissioning [[vernacular]] [[Anglo-Saxon language|Anglo-Saxon]] translations of [[Bede]]'s ''[[Ecclesiastical History]]'' and [[Boethius]]' ''[[Consolation of Philosophy]]''.
[10890940] |Meanwhile the [[Christian Church]] frowned on even partial adaptations of the standard [[Latin]] ''[[Bible]]'', [[St. Jerome]]'s ''[[Vulgate Bible|Vulgate]]'' of ca. 384 CE.
[10890950] |In [[Asia]], the spread of [[Buddhism]] led to large-scale ongoing translation efforts spanning well over a thousand years.
[10890960] |The [[Tangut Empire]] was especially efficient in such efforts; exploiting the then newly-invented [[block printing]], and with the full support of the government (contemporary sources describe the Emperor and his mother personally contributing to the translation effort, alongside sages of various nationalities), the Tanguts took mere decades to translate volumes that had taken the [[China|Chinese]] centuries to render.
[10890970] |Large-scale efforts at translation were undertaken by the [[Arabs]].
[10890980] |Having conquered the Greek world, they made [[Arabic]] versions of its philosophical and scientific works.
[10890990] |During the [[Middle Ages]], some translations of these Arabic versions were made into Latin, chiefly at [[Córdoba, Spain|Córdoba]] in [[Spain]].
[10891000] |Such Latin translations of Greek and original Arab works of scholarship and science would help advance the development of European [[Scholasticism]].
[10891010] |The broad historic trends in Western translation practice may be illustrated on the example of translation into the [[English language]].
[10891020] |The first fine translations into English were made by England's first great poet, the 14th-century [[Geoffrey Chaucer]], who adapted from the [[Italian language|Italian]] of [[Giovanni Boccaccio]] in his own ''[[Knight's Tale]]'' and ''[[Troilus and Criseyde]]''; began a translation of the [[French-language]] ''[[Roman de la Rose]]''; and completed a translation of [[Boethius]] from the [[Latin]].
[10891030] |Chaucer founded an English [[poetry|poetic]] tradition on ''[[Literary adaptation|adaptation]]s'' and translations from those earlier-established [[literary language]]s.
[10891040] |The first great English translation was the ''[[Wycliffe Bible]]'' (ca. 1382), which showed the weaknesses of an underdeveloped English [[prose]].
[10891050] |Only at the end of the 15th century would the great age of English prose translation begin with [[Thomas Malory]]'s ''[[Le Morte Darthur]]''—an adaptation of [[Arthurian romance]]s so free that it can, in fact, hardly be called a true translation.
[10891060] |The first great [[Tudor period|Tudor]] translations are, accordingly, the ''[[Tyndale Bible|Tyndale New Testament]]'' (1525), which would influence the ''[[Authorized Version]]'' (1611), and [[Lord Berners]]' version of [[Jean Froissart]]'s ''Chronicles'' (1523–25).
[10891070] |Meanwhile, in [[Renaissance]] [[Italy]], a new period in the history of translation had opened in [[Florence]] with the arrival, at the court of [[Cosimo de' Medici]], of the [[Byzantine]] scholar [[Georgius Gemistus Pletho]] shortly before the fall of [[Constantinople]] to the Turks (1453).
[10891080] |A Latin translation of [[Plato]]'s works was undertaken by [[Marsilio Ficino]].
[10891090] |This and [[Erasmus]]' Latin edition of the ''[[New Testament]]'' led to a new attitude to translation.
[10891100] |For the first time, readers demanded rigor of rendering, as philosophical and religious beliefs depended on the exact words of [[Plato]], [[Aristotle]] and [[Jesus]].
[10891110] |Non-scholarly literature, however, continued to rely on ''adaptation''.
[10891120] |[[France]]'s ''[[Pléiade]]'', [[England]]'s [[Tudor period|Tudor]] poets, and the [[Elizabethan]] translators adapted themes by [[Horace]], [[Ovid]], [[Petrarch]] and modern Latin writers, forming a new poetic style on those models.
[10891130] |The English poets and translators sought to supply a new public, created by the rise of a [[middle class]] and the development of [[printing]], with works such as the original authors ''would have written'', had they been writing in England in that day.
[10891140] |The [[Elizabethan]] period of translation saw considerable progress beyond mere [[paraphrase]] toward an ideal of [[Stylistics (linguistics)|stylistic]] equivalence, but even to the end of this period—which actually reached to the middle of the 17th century—there was no concern for [[verbal]] [[accuracy]].
[10891150] |In the second half of the 17th century, the poet [[John Dryden]] sought to make [[Virgil]] speak "in words such as he would probably have written if he were living and an Englishman."
[10891160] |Dryden, however, discerned no need to emulate the Roman poet's subtlety and concision.
[10891170] |Similarly, [[Homer]] suffered from [[Alexander Pope]]'s endeavor to reduce the Greek poet's "wild paradise" to order.
[10891180] |Throughout the 18th century, the watchword of translators was ease of reading.
[10891190] |Whatever they did not understand in a text, or thought might bore readers, they omitted.
[10891200] |They cheerfully assumed that their own style of expression was the best, and that texts should be made to conform to it in translation.
[10891210] |For scholarship they cared no more than had their predecessors, and they did not shrink from making translations from translations in third languages, or from languages that they hardly knew, or—as in the case of [[James Macpherson]]'s "translations" of [[Ossian]]—from texts that were actually of the "translator's" own composition.
[10891220] |The 19th century brought new standards of accuracy and style.
[10891230] |In regard to accuracy, observes J.M. Cohen, the policy became "the text, the whole text, and nothing but the text," except for any [[bawdy]] passages and the addition of copious explanatory [[footnote]]s.
[10891240] |In regard to style, the [[Victorians]]' aim, achieved through far-reaching metaphrase (literality) or ''pseudo''-metaphrase, was to constantly remind readers that they were reading a ''foreign'' classic.
[10891250] |An exception was the outstanding translation in this period, [[Edward FitzGerald]]'s ''[[Rubaiyat]]'' of [[Omar Khayyam]] (1859), which achieved its Oriental flavor largely by using Persian names and discreet Biblical echoes and actually drew little of its material from the Persian original.
[10891260] |In advance of the 20th century, a new pattern was set in 1871 by [[Benjamin Jowett]], who translated [[Plato]] into simple, straightforward language.
[10891270] |Jowett's example was not followed, however, until well into the new century, when accuracy rather than style became the principal criterion.
[10891280] |===Poetry===
[10891290] |[[Poetry]] presents special challenges to translators, given the importance of a text's [[form]]al aspects, in addition to its content.
[10891300] |In his influential 1959 paper "On Linguistic Aspects of Translation," the [[Russia]]n-born [[linguist]] and [[semiotician]] [[Roman Jakobson]] went so far as to declare that "poetry by definition [is] untranslatable."
[10891310] |In 1974 the American poet [[James Merrill]] wrote a poem, "[[Lost in Translation (poem)|Lost in Translation]]," which in part explores this idea.
[10891320] |The question was also discussed in [[Douglas Hofstadter]]'s 1997 book, ''[[Le Ton beau de Marot]]''.
[10891330] |===Sung texts===
[10891340] |Translation of a text that is sung in vocal music for the purpose of singing in another language — sometimes called "singing translation" — is closely linked to translation of poetry because most [[vocal music]], at least in the Western tradition, is set to [[verse]], especially verse in regular patterns with [[rhyme]].
[10891350] |(Since the late 19th century, musical setting of [[prose]] and [[free verse]] has also been practiced in some [[art music]], though [[popular music]] tends to remain conservative in its retention of [[stanza]]ic forms with or without [[refrain]]s.
[10891360] |) A rudimentary example of translating poetry for singing is church [[hymn]]s, such as the German [[chorale]]s translated into English by [[Catherine Winkworth]].
[10891370] |Translation of sung texts is generally much more restrictive than translation of poetry, because in the former there is little or no freedom to choose between a versified translation and a translation that dispenses with verse structure.
[10891380] |One might modify or omit rhyme in a singing translation, but the assignment of syllables to specific notes in the original musical setting places great challenges on the translator.
[10891390] |There is the option in prose sung texts, less so in verse, of adding or deleting a syllable here and there by subdividing or combining notes, respectively, but even with prose the process is almost like strict verse translation because of the need to stick as closely as possible to the original prosody of the sung melodic line.
[10891400] |Other considerations in writing a singing translation include repetition of words and phrases, the placement of rests and/or punctuation, the quality of vowels sung on high notes, and rhythmic features of the vocal line that may be more natural to the original language than to the target language.
[10891410] |A sung translation may be considerably or completely different from the original, thus resulting in a [[contrafactum]].
[10891420] |Translations of sung texts — whether of the above type meant to be sung or of a more or less literal type meant to be read — are also used as aids to audiences, singers and conductors, when a work is being sung in a language not known to them.
[10891430] |The most familiar types are translations presented as subtitles projected during [[opera]] performances, those inserted into concert programs, and those that accompany commercial audio CDs of vocal music.
[10891440] |In addition, professional and amateur singers often sing works in languages they do not know (or do not know well), and translations are then used to enable them to understand the meaning of the words they are singing.
[10891450] |==History of theory==
[10891460] |Discussions of the theory and practice of translation reach back into [[ancient history|antiquity]] and show remarkable [[Wiktionary:continuity|continuities]].
[10891470] |The distinction that had been drawn by the [[ancient Greeks]] between "[[Wiktionary:metaphrase|metaphrase]]" ("literal" translation) and "[[paraphrase]]" would be adopted by the English [[poet]] and [[translator]] [[John Dryden]] (1631-1700), who represented translation as the judicious blending of these two modes of phrasing when selecting, in the target language, "counterparts," or [[Dynamic and formal equivalence|equivalents]], for the expressions used in the source language:
[10891480] |Dryden cautioned, however, against the license of "imitation," i.e. of adapted translation: "When a painter copies from the life... he has no privilege to alter features and lineaments..."
[10891490] |This general formulation of the central concept of translation — [[Dynamic and formal equivalence|equivalence]] — is probably as adequate as any that has been proposed ever since [[Cicero]] and [[Horace]], in first-century-BCE [[Ancient Rome|Rome]], famously and literally cautioned against translating "word for word" ("''verbum pro verbo''").
[10891500] |Despite occasional theoretical diversities, the actual ''practice'' of translators has hardly changed since [[ancient history|antiquity]].
[10891510] |Except for some extreme [[Wiktionary:metaphrase|metaphrasers]] in the early [[Christian]] period and the [[Middle Ages]], and adapters in various periods (especially pre-Classical Rome, and the 18th century), translators have generally shown prudent flexibility in seeking [[Dynamic and formal equivalence|equivalents]] — "literal" where possible, [[paraphrase|paraphrastic]] where necessary — for the original [[meaning (linguistics)|meaning]] and other crucial "values" (e.g., style, [[verse form]], concordance with [[music]]al accompaniment or, in [[film]]s, with speech [[Manner of articulation|articulatory]] movements) as determined from context.
[10891520] |In general, translators have sought to preserve the context itself by reproducing the original order of [[sememe]]s, and hence [[word order]] — when necessary, reinterpreting the actual [[grammatical]] structure.
[10891530] |The grammatical differences between "fixed-word-order" [[language]]s (e.g., [[English language|English]], [[French language|French]], [[German language|German]]) and "free-word-order" languages (e.g., [[Greek language|Greek]], [[Latin]], [[Polish language|Polish]], [[Russian language|Russian]]) have been no impediment in this regard.
[10891540] |When a target language has lacked [[terminology|term]]s that are found in a source language, translators have borrowed them, thereby enriching the target language.
[10891550] |Thanks in great measure to the exchange of "''[[calque]]s''" (French for "[[tracing paper|tracings]]") between languages, and to their importation from Greek, Latin, [[Hebrew language|Hebrew]], [[Arabic language|Arabic]] and other languages, there are few [[concept]]s that are "[[untranslatability|untranslatable]]" among the modern European languages.
[10891560] |In general, the greater the contact and exchange that has existed between two languages, or between both and a third one, the greater is the ratio of [[Wiktionary:metaphrase|metaphrase]] to [[paraphrase]] that may be used in translating between them.
[10891570] |However, due to shifts in "[[ecological niche]]s" of words, a common [[etymology]] is sometimes misleading as a guide to current meaning in one or the other language.
[10891580] |The [[English language|English]] "actual," for example, should not be confused with the [[cognate]] [[French language|French]] "''actuel''" (meaning "present," "current") or the [[Polish language|Polish]] "''aktualny''" ("present," "current").
[10891590] |For the translation of [[Buddhist]] texts into [[Chinese language|Chinese]], the monk [[Xuanzang]] (602–64) proposed the idea of 五不翻 ("five occasions when terms are left untranslated"):
[10891600] |# 秘密故—terms carry secrecy, e.g., chants and spells;
[10891610] |# 含多义故—terms carry multiple meanings;
[10891620] |# 此无故—no corresponding term exists;
[10891630] |# 顺古故—out of respect for earlier translations;
[10891640] |# 生善故—
[10891650] |The translator's role as a [[bridge]] for "carrying across" values between [[culture]]s has been discussed at least since [[Terence]], Roman adapter of Greek comedies, in the second century BCE.
[10891660] |The translator's role is, however, by no means a passive and mechanical one, and so has also been compared to that of an [[artist]].
[10891670] |The main ground seems to be the concept of parallel creation found in critics as early as [[Cicero]].
[10891680] |[[John Dryden|Dryden]] observed that "Translation is a type of drawing after life..."
[10891690] |Comparison of the translator with a [[musician]] or [[actor]] goes back at least to [[Samuel Johnson]]'s remark about [[Alexander Pope]] playing [[Homer]] on a [[flageolet]], while Homer himself used a [[bassoon]].
[10891700] |If translation be an art, it is no easy one.
[10891710] |In the 13th century, [[Roger Bacon]] wrote that if a translation is to be true, the translator must know both [[language]]s, as well as the [[science]] that he is to translate; and finding that few translators did, he wanted to do away with translation and translators altogether.
[10891720] |The first [[Europe]]an to assume that one translates satisfactorily only toward his own language may have been [[Martin Luther]], translator of the ''[[Bible]]'' into [[German language|German]].
[10891730] |According to L.G. Kelly, since [[Johann Gottfried Herder]] in the 18th century, "it has been axiomatic" that one works only toward his own language.
[10891740] |Compounding these demands upon the translator is the fact that not even the most complete [[dictionary]] or [[thesaurus]] can ever be a fully adequate guide in translation.
[10891750] |[[Alexander Tytler]], in his ''Essay on the Principles of Translation'' (1790), emphasized that assiduous [[reading (activity)|reading]] is a more comprehensive guide to a language than are dictionaries.
[10891760] |The same point, but also including [[listening]] to the [[spoken language]], had earlier been made in 1783 by [[Onufry Andrzej Kopczyński]], member of [[Poland]]'s Society for Elementary Books, who was called "the last Latin poet."
[10891770] |The special role of the translator in society was well described in an essay, published posthumously in 1803, by [[Ignacy Krasicki]] — "Poland's [[La Fontaine]]", [[Primate of Poland]], poet, encyclopedist, author of the first Polish novel, and translator from French and Greek:
[10891780] |==Religious texts==
[10891790] |Translation of religious works has played an important role in history.
[10891800] |Buddhist monks who translated the [[India]]n [[sutra]]s into [[Chinese language|Chinese]] often skewed their translations to better reflect [[China]]'s very different [[culture]], emphasizing notions such as [[filial piety]].
[10891810] |A famous mistranslation of the ''[[Bible]]'' is the rendering of the [[Hebrew language|Hebrew]] word "''keren''," which has several meanings, as "horn" in a context where it actually means "beam of light."
[10891820] |As a result, artists have for centuries depicted [[Moses the Lawgiver]] with horns growing out of his forehead.
[10891830] |An example is [[Michelangelo]]'s famous sculpture.
[10891840] |[[Christian]] [[anti-Semite]]s used such depictions to spread hatred of the [[Jews]], claiming that they were [[devil]]s with horns.
[10891850] |One of the first recorded instances of translation in the West was the rendering of the [[Old Testament]] into [[Greek language|Greek]] in the third century B.C.E.
[10891860] |The resulting translation is known as the ''[[Septuagint]]'', a name that alludes to the "seventy" translators (seventy-two in some versions) who were commissioned to translate the ''[[Bible]]'' in [[Alexandria]].
[10891870] |Each translator worked in solitary confinement in a separate cell, and legend has it that all seventy versions were identical.
[10891880] |The ''Septuagint'' became the [[source text]] for later translations into many languages, including [[Latin]], [[Coptic language|Coptic]], [[Armenian language|Armenian]] and [[Georgian language|Georgian]].
[10891890] |[[Jerome|Saint Jerome]], the [[patron saint]] of translation, is still considered one of the greatest translators in history for rendering the ''[[Bible]]'' into [[Latin]].
[10891900] |The [[Roman Catholic Church]] used his translation (known as the [[Vulgate]]) for centuries, but even this translation at first stirred much controversy.
[10891910] |The period preceding and contemporary with the [[Protestant Reformation]] saw the translation of the ''[[Bible]]'' into local European languages, a development that greatly affected [[Western Christianity]]'s split into [[Roman Catholic Church|Roman Catholicism]] and [[Protestantism]], due to disparities between Catholic and Protestant versions of crucial words and passages.
[10891920] |[[Martin Luther]]'s ''[[Bible]]'' in [[German language|German]], [[Jakub Wujek]]'s in [[Polish language|Polish]], and the ''[[King James Bible]]'' in [[English language|English]] had lasting effects on the religions, cultures and languages of those countries.
[10891930] |==Machine translation==
[10891940] |[[Machine translation]] (MT) is a procedure whereby a computer program analyzes a [[source text]] and produces a target text ''without further human intervention''.
[10891950] |In reality, however, machine translation typically ''does'' involve human intervention, in the form of '''pre-editing''' and '''post-editing'''.
[10891960] |An exception to that rule might be, e.g., the translation of technical specifications (strings of [[terminology|technical terms]] and adjectives), using a [[dictionary-based machine translation|dictionary-based machine-translation]] system.
[10891970] |To date, machine translation—a major goal of [[natural language processing|natural-language processing]]—has met with limited success.
[10891980] |A [[November 6]], [[2007]], example illustrates the hazards of uncritical reliance on [[machine translation]].
[10891990] |Machine translation has been brought to a large public by tools available on the Internet, such as [[Yahoo!]]'s [[Babel Fish (website)|Babel Fish]], [[Babylon translator|Babylon]], and [[StarDict]].
[10892000] |These tools produce a "gisting translation" — a rough translation that, with luck, "gives the gist" of the source text.
[10892010] |With proper [[terminology|terminology work]], with preparation of the source text for machine translation (pre-editing), and with re-working of the machine translation by a professional human translator (post-editing), commercial machine-translation tools can produce useful results, especially if the machine-translation system is integrated with a [[translation memory|translation-memory]] or [[Globalization Management System|globalization-management system]].
[10892020] |In regard to texts (e.g., [[meteorology|weather reports]]) with limited ranges of [[vocabulary]] and simple [[sentence (linguistics)|sentence]] [[structure]], machine translation can deliver results that do not require much human intervention to be useful.
[10892030] |Also, the use of a [[controlled language]], combined with a machine-translation tool, will typically generate largely comprehensible translations.
[10892040] |Relying on machine translation exclusively ignores the fact that communication in [[natural language|human language]] is [[wiktionary:context|context]]-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability.
[10892050] |It is certainly true that even purely human-generated translations are prone to error.
[10892060] |Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human.
[10892070] |== CAT ==
[10892080] |[[Computer-assisted translation]] (CAT), also called "computer-''aided'' translation," "machine-aided human translation (MAHT)" and "interactive translation," is a form of translation wherein a human translator creates a target text with the assistance of a computer program.
[10892090] |The '''machine''' supports a human '''translator'''.
[10892100] |Computer-assisted translation can include standard [[dictionary]] and grammar software.
[10892110] |The term, however, normally refers to a range of specialized programs available to the translator, including [[translation memory|translation-memory]], [[terminology|terminology-management]], [[concordancer|concordance]], and alignment programs.
[10892120] |With the internet, translation software can help non-native-speaking individuals understand web pages published in other languages.
[10892130] |Whole-page translation tools are of limited utility, however, since they offer only a limited potential understanding of the original author's intent and context; translated pages tend to be more humorous and confusing than enlightening.
[10892140] |Interactive translations with pop-up windows are becoming more popular.
[10892150] |These tools show several possible translations of each word or phrase.
[10892160] |Human operators merely need to select the correct translation as the mouse glides over the foreign-language text.
[10892170] |Possible definitions can be grouped by pronunciation.
[10900010] |Translation memory
[10900020] |A '''translation memory''', or '''TM''', is a type of database that is used in software programs designed to aid human [[translator]]s.
[10900030] |Some software programs that use translation memories are known as '''translation memory managers''' ('''TMM''').
[10900040] |Translation memories are typically used in conjunction with a dedicated [[computer assisted translation]] (CAT) tool, [[wordprocessor|word processing]] program, [[terminology management systems]], multilingual dictionary, or even raw [[machine translation]] output.
[10900050] |A translation memory consists of text segments in a source language and their translations into one or more target languages.
[10900060] |These segments can be blocks, paragraphs, sentences, or phrases.
[10900070] |Individual words are handled by terminology bases and are not within the domain of TM.
[10900080] |Research indicates that many companies producing multilingual documentation are using translation memory systems.
[10900090] |In a survey of language professionals in 2006, 82.5 % out of 874 replies confirmed the use of a TM.
[10900100] |Usage of TM correlated with text type characterised by technical terms and simple sentence structure (technical, to a lesser degree marketing and financial), computing skills, and repetitiveness of content
[10900110] |== Using translation memories ==
[10900120] |The program breaks the '''source text''' (the text to be translated) into segments, looks for matches between segments and the source half of previously translated source-target pairs stored in a '''translation memory''', and presents such matching pairs as translation '''candidates'''.
[10900130] |The translator can accept a candidate, replace it with a fresh translation, or modify it to match the source.
[10900140] |In the last two cases, the new or modified translation goes into the database.
[10900150] |Some translation memories systems search for 100% matches only, that is to say that they can only retrieve segments of text that match entries in the database exactly, while others employ [[Fuzzy string searching|fuzzy matching]] algorithms to retrieve similar segments, which are presented to the translator with differences flagged.
[10900160] |It is important to note that typical translation memory systems only search for text in the source segment.
[10900170] |The flexibility and robustness of the matching algorithm largely determine the performance of the translation memory, although for some applications the recall rate of exact matches can be high enough to justify the 100%-match approach.
[10900180] |Segments where no match is found will have to be translated by the translator manually.
[10900190] |These newly translated segments are stored in the database where they can be used for future translations as well as repetitions of that segment in the current text.
[10900200] |Translation memories work best on texts which are highly repetitive, such as technical manuals.
[10900210] |They are also helpful for translating incremental changes in a previously translated document, corresponding, for example, to minor changes in a new version of a user manual.
[10900220] |Traditionally, translation memories have not been considered appropriate for literary or creative texts, for the simple reason that there is so little repetition in the language used.
[10900230] |However, others find them of value even for non-repetitive texts, because the database resources created have value for concordance searches to determine appropriate usage of terms, for quality assurance (no empty segments), and the simplification of the review process (source and target segment are always displayed together while translators have to work with two documents in a traditional review environment).
[10900240] |If a translation memory system is used consistently on appropriate texts over a period of time, it can save translators considerable work.
[10900250] |=== Main benefits ===
[10900260] |Translation memory managers are most suitable for translating technical documentation and documents containing specialized vocabularies.
[10900270] |Their benefits include:
[10900280] |* Ensuring that the document is completely translated (translation memories do not accept empty target segments)
[10900290] |* Ensuring that the translated documents are consistent, including common definitions, phrasings and terminology.
[10900300] |This is important when different translators are working on a single project.
[10900310] |* Enabling translators to translate documents in a wide variety of formats without having to own the software typically required to process these formats.
[10900320] |* Accelerating the overall translation process; since translation memories "remember" previously translated material, translators have to translate it only once.
[10900330] |* Reducing costs of long-term translation projects; for example the text of manuals, warning messages or series of documents needs to be translated only once and can be used several times.
[10900340] |* For large documentation projects, savings (in time or money) thanks to the use of a TM package may already be apparent even for the first translation of a new project, but normally such savings are only apparent when translating subsequent versions of a project that was translated before using translation memory.
[10900350] |=== Main obstacles ===
[10900360] |The main problems hindering wider use of translation memory managers include:
[10900370] |* The concept of "translation memories" is based on the premise that sentences used in previous translations can be "recycled".
[10900380] |However, a guiding principle of translation is that the translator must translate the ''message'' of the text, and not its component ''[[Sentence (linguistics)|sentences]]''.
[10900390] |* Translation memory managers do not easily fit into existing translation or localization processes.
[10900400] |In order to take advantages of TM technology, the [[translation process]]es must be redesigned.
[10900410] |* Translation memory managers do not presently support all documentation formats, and filters may not exist to support all file types.
[10900420] |* There is a learning curve associated with using translation memory managers, and the programs must be customized for greatest effectiveness.
[10900430] |* In cases where all or part of the translation process is outsourced or handled by freelance translators working off-site, the off-site workers require special tools to be able to work with the texts generated by the translation memory manager.
[10900440] |* Full versions of many translation memory managers can cost from [[US dollar|US$]]500 to US$2,500 per seat, which can represent a considerable investment (although lower cost programs are also available).
[10900450] |However, some developers produce free or low-cost versions of their tools with reduced feature sets that individual translators can use to work on projects set up with full versions of those tools.
[10900460] |(Note that there are freeware and shareware TM packages available, but none of these has yet gained a large market share.)
[10900470] |* The costs involved in importing the user's past translations into the translation memory database, training, as well as any add-on products may also represent a considerable investment.
[10900480] |* Maintenance of translation memory databases still tends to be a manual process in most cases, and failure to maintain them can result in significantly decreased usability and quality of TM matches.
[10900490] |* As stated previously, translation memory managers may not be suitable for text that lacks internal repetition or which does not contain unchanged portions between revisions.
[10900500] |Technical text is generally best suited for translation memory, while marketing or creative texts will be less suitable.
[10900510] |* The quality of the text recorded in the translation memory is not guaranteed; if the translation for particular segment is incorrect, it is in fact more likely that the incorrect translation will be reused the next time the same source text, or a similar source text, is translated, thereby perpetuating the error.
[10900520] |* There is also a potential, and, if present, probably an unconscious effect on the translated text.
[10900530] |Different languages use different sequences for the logical elements within a sentence and a translator presented with a multiple clause sentence that is half translated is less likely to completely rebuild a sentence.
[10900540] |* There is also a potential for the translator to deal with the text mechanically sentence-by-sentence, instead of focusing on how each sentence relates to those around it and to the text as a whole.
[10900550] |* Translation memories also raise certain industrial relations issues as they make exploitation of human translators easier.
[10900560] |==Functions of a translation memory==
[10900570] |The following is a summary of the main functions of a Translation Memory.
[10900580] |=== Off-line functions ===
[10900590] |==== Import ====
[10900600] |This function is used to transfer a text and its translation from a text file to the TM.
[10900610] |[[Import]] can be done from a ''raw format'', in which an external source text is available for importing into a TM along with its translation.
[10900620] |Sometimes the texts have to be reprocessed by the user.
[10900630] |There is another format that can be used to import: the ''native format''.
[10900640] |This format is the one that uses the TM to save translation memories in a file.
[10900650] |==== Analysis ====
[10900660] |The process of analysis is developed through the following steps:
[10900670] |; '''Textual parsing'''
[10900680] |: It is very important to recognize punctuation in order to distinguish for example the end of sentence from abbreviation.
[10900690] |Thus, mark-up is a kind of pre-editing.
[10900700] |Usually, the materials which have been processed through translators' aid programs contain mark-up, as the translation stage is embedded in a multilingual document production line.
[10900710] |Other special text elements may be set off by mark-up.
[10900720] |There are special elements which do not need to be translated, such as proper names and codes, while others may need to be converted to native format.
[10900730] |; '''Linguistic parsing'''
[10900740] |: The base form reduction is used to prepare lists of words and a text for automatic retrieval of terms from a term bank.
[10900750] |On the other hand, syntactic parsing may be used to extract multi-word terms or phraseology from a source text.
[10900760] |So parsing is used to normalise word order variation of phraseology, this is which words can form a phrase.
[10900770] |; '''Segmentation'''
[10900780] |: Its purpose is to choose the most useful translation units.
[10900790] |Segmentation is like a type of parsing.
[10900800] |It is done monolingually using superficial parsing and alignment is based on segmentation.
[10900810] |If the translators correct the segmentations manually, later versions of the document will not find matches against the TM based on the corrected segmentation because the program will repeat its own errors.
[10900820] |Translators usually proceed sentence by sentence, although the translation of one sentence may depend on the translation of the surrounding ones.
[10900830] |; '''Alignment'''
[10900840] |: It is the task of defining translation correspondences between source and target texts.
[10900850] |There should be feedback from alignment to segmentation and a good alignment algorithm should be able to correct initial segmentation.
[10900860] |; '''Term extraction'''
[10900870] |: It can have as input a previous dictionary.
[10900880] |Moreover, when extracting unknown terms, it can use parsing based on text statistics.
[10900890] |These are used to estimate the amount of work involved in a translation job.
[10900900] |This is very useful for planning and scheduling the work.
[10900910] |Translation statistics usually count the words and estimate the amount of repetition in the text.
[10900920] |==== Export ====
[10900930] |Export transfers the text from the TM into an external text file.
[10900940] |Import and export should be inverses.
[10900950] |=== Online functions ===
[10900960] |When translating, one of the main purposes of the TM is to retrieve the most useful matches in the memory so that the translator can choose the best one.
[10900970] |The TM must show both the source and target text pointing out the identities and differences.
[10900980] |==== Retrieval ====
[10900990] |It is possible to retrieve from the TM one or more types of matches.
[10901000] |; '''Exact match'''
[10901010] |: Exact matches appear when the match between the current source segment and the stored one has been a character by character match.
[10901020] |When translating a sentence, an exact match means the same sentence has been translated before.
[10901030] |Exact matches are also called "100% matches".
[10901040] |; '''In Context Exact (ICE) match'''
[10901050] |: An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph.
[10901060] |Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.
[10901070] |; '''Fuzzy match'''
[10901080] |: When the match has not been exact, it is a "fuzzy" match.
[10901090] |Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%.
[10901100] |Those figures are not comparable across systems unless the method of scoring is specified.
[10901110] |; '''Concordance'''
[10901120] |: This feature allows translators to select one or more words in the source segment and the system retrieves segment pairs that match the search criteria.
[10901130] |This feature is helpful for finding translations of terms and idioms in the absence of a terminology database.
[10901140] |==== Updating ====
[10901150] |A TM is updated with a new translation when it has been accepted by the translator.
[10901160] |As always in updating a database, there is the question what to do with the previous contents of the database.
[10901170] |A TM can be modified by changing or deleting entries in the TM.
[10901180] |Some systems allow translators to save multiple translations of the same source segment.
[10901190] |==== Automatic translation ====
[10901200] |Translation memories can do retrieval and substitution automatically without the help of the translator.
[10901210] |If so.
[10901220] |; '''Automatic retrieval'''
[10901230] |: A TM features automatic retrieval and evaluation of translation correspondences in a translator's workbench.
[10901240] |; '''Automatic substitution'''
[10901250] |: Exact matches come up in translating new versions of a document.
[10901260] |During automatic substitution, the translator does check the translation against the original, so if there are any mistakes in the previous translation, they will carry over.
[10901270] |==== Networking ====
[10901280] |When networking during the translation it is possible to translate a text efficiently together with a group of translators.
[10901290] |This way, the translations entered by one translator are available to the others.
[10901300] |Moreover, if translation memories are shared before the final translation, there is a chance that mistakes made by one translator will be corrected by other team members.
[10901310] |=== Text memory ===
[10901320] |"Text memory" is the basis of the proposed [http://www.xml-intl.com/docs/specification/xml-tm.html Lisa OSCAR xml:tm standard].
[10901330] |Text memory comprises author memory and translation memory.
[10901340] |===== Translation memory =====
[10901350] |The unique identifiers are remembered during translation so that the target language document is 'exactly' aligned at the text unit level.
[10901360] |If the source document is subsequently modified, then those text units that have not changed can be directly transferred to the new target version of the document without the need for any translator interaction.
[10901370] |This is the concept of 'exact' or 'perfect' matching to the translation memory. xml:tm can also provide mechanisms for in-document leveraged and fuzzy matching.
[10901380] |==History of translation memories==
[10901390] |The concept behind translation memories is not recent — university research into the concept began in the late 1970s, and the earliest commercializations became available in the late 1980s — but they became commercially viable only in the late 1990s.
[10901400] |Originally translation memory systems stored aligned source and target sentences in a database, from which they could be recalled during translation.
[10901410] |The problem with this 'leveraged' approach is that there is no guarantee if the new source language sentence is from the same context as the original database sentence.
[10901420] |Therefore all 'leveraged' matches require that a translator reviews the memory match for relevance in the new document.
[10901430] |Although cheaper than outright translation, this review still carries a cost.
[10901440] |==Support for new languages==
[10901450] |Translation memory tools from majority of the companies do not support many upcoming languages.
[10901460] |Recently Asian countries like India also jumped in to language computing and there is high scope for Translation memories in such developing countries.
[10901470] |As most of the CAT software companies are concentrating on legacy languages, nothing much is happening on Asian languages.
[10901480] |===Recent trends===
[10901490] |One recent development is the concept of 'text memory' in contrast to translation memory (see [http://www.xml.com/pub/a/2004/01/07/xmltm.html Translating XML Documents with xml:tm]).
[10901500] |This is also the basis of the proposed LISA OSCAR [http://www.xml.com/pub/a/2004/01/07/xmltm.html xml:tm] standard.
[10901510] |Text memory within xml:tm comprises 'author memory' and 'translation memory'.
[10901520] |Author memory is used to keep track of changes during the authoring cycle.
[10901530] |Translation memory uses the information from author memory to implement translation memory matching.
[10901540] |Although primarily targeted at XML documents, xml:tm can be used on any document that can be converted to [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff XLIFF] format.
[10901550] |===Second generation translation memories===
[10901560] |Much more powerful than first-generation TMs, they include a [[linguistic analysis]] engine, use chunk technology to break down segments into intelligent terminological groups, and automatically generate specific glossaries.
[10901570] |==Translation memory and related standards==
[10901580] |===TMX===
[10901590] |[http://www.lisa.org/tmx/ Translation Memory Exchange format].
[10901600] |This standard enables the interchange of translation memories between translation suppliers.
[10901610] |TMX has been adopted by the translation community as the best way of importing and exporting translation memories.
[10901620] |The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data.
[10901630] |An updated version, 2.0, is due to be released in 2008.
[10901640] |===TBX===
[10901650] |[http://www.lisa.org/tbx/ Termbase Exchange format].
[10901660] |This LISA standard, which is currently being revised and republished as ISO 30042, allows for the interchange of terminology data including detailed lexical information.
[10901670] |The framework for TBX is provided by three ISO standards: ISO 12620, ISO 12200 and ISO 16642.
[10901680] |ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values.
[10901690] |ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX.
[10901700] |ISO 16642 (also known as Terminological Markup Framework) includes a structural metamodel for Terminology Markup Languages in general.
[10901710] |===SRX===
[10901720] |[http://www.lisa.org/standards/srx/ Segmentation Rules Exchange format].
[10901730] |SRX is intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively.
[10901740] |The ability to specify the segmentation rules that were used in the previous translation increases the leveraging that can be achieved.
[10901750] |===GMX===
[10901760] |[http://www.lisa.org/oscar/seg/ GILT Metrics].
[10901770] |GILT stands for (Globalization, Internationalization, Localization, and Translation).
[10901780] |The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics.
[10901790] |The proposed GILT Metrics standard is tasked with quantifying the workload and quality requirements for any given GILT task.
[10901800] |===OLIF===
[10901810] |[http://www.olif.net/ Open Lexicon Interchange Format].
[10901820] |OLIF is an open, XML-compliant standard for the exchange of terminological and lexical data.
[10901830] |Although originally intended as a means for the exchange of lexical data between proprietary machine translation lexicons, it has evolved into a more general standard for terminology exchange.
[10901840] |
[10901850] |===XLIFF===
[10901860] |[http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff XML Localisation Interchange File Format].
[10901870] |It is intended to provide a single interchange file format that can be understood by any localization provider.
[10901880] |XLIFF is the preferred way of exchanging data in XML format in the translation industry.
[10901890] |===TransWS===
[10901900] |[http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws Translation Web Services].
[10901910] |TransWS specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects.
[10901920] |It is intended as a detailed framework for the automation of much of the current localization process by the use of Web Services.
[10901930] |===[[xml:tm]]===
[10901940] |[[xml:tm]] This approach to translation memory is based on the concept of text memory which comprises author and translation memory. [[xml:tm]] has been donated to Lisa OSCAR by [http://xml-intl.com/ XML-INTL].
[10901950] |===PO===
[10901960] |[[Gettext| Gettext Portable Object format]].
[10901970] |Though often not regarded as a translation memory format, Gettext PO files are bilingual files that are also used in translation memory processes in the same way translation memories are used.
[10901980] |Typically, a PO translsation memory system will consist of various separate files in a director tree structure.
[10901990] |Common tools that work with PO files include the [http://gnuwin32.sourceforge.net/packages/gettext.htm GNU Gettext Tools] and the [http://translate.sourceforge.net/wiki/toolkit/index Translate Toolkit].
[10902000] |Several tools and programs also exist that edit PO files as if they are mere source text files.
[10910010] |Turing test
[10910020] |The '''Turing test''' is a proposal for a test of a [[machine]]'s capability to demonstrate intelligence.
[10910030] |Described by [[Alan Turing]] in the 1950 paper "[[Computing Machinery and Intelligence]]," it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which try to appear human; if the judge cannot reliably tell which is which, then the machine is said to pass the test.
[10910040] |In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen (Turing originally suggested a [[Teleprinter|teletype machine]], one of the few text-only communication systems available in 1950).
[10910050] |==History==
[10910060] |While the field of [[artificial intelligence]] is said to have been founded in 1956, its roots extend back considerably further.
[10910070] |The question as to whether or not it is possible for machines to think has a long history, firmly entrenched in the distinction between [[Dualism (philosophy of mind)|dualist]] and [[materialism|materialist]] views of the mind.
[10910080] |From the perspective of dualism, the [[mind]] is [[Non-physical entity|non-physical]] (or, at the very least, has [[Property dualism|non-physical properties]]), and therefore cannot be explained in purely physical terms.
[10910090] |The materialist perspective, on the other hand, argues that the mind can be explained physically, and thus leaves open the possibility of minds that are artificially produced.
[10910100] |===Alan Turing===
[10910110] |In more practical terms, researchers in Britain had been exploring "machine intelligence" for up to ten years prior to 1956.
[10910120] |Alan Turing in particular had been tackling the notion of machine intelligence since at least 1941, and one of the earliest known mentions of "computer intelligence" was made by Turing in 1947.
[10910130] |In Turing's report, "Intelligent Machinery", he investigated "the question of whether or not it is possible for machinery to show intelligent behaviour", and as part of that investigation proposed what may be considered the forerunner to his later tests:
[10910140] |Thus by the time Turing published "Computing Machinery and Intelligence", he had been considering the possibility of machine intelligence for many years.
[10910150] |This, however, was the first published paper by Turing to focus exclusively on the notion.
[10910160] |Turing began his 1950 paper with the claim: "I propose to consider the question, 'Can machines think?'"
[10910170] |As Turing highlighted, the traditional approach to such a question is to start with definitions, defining both the terms [[machine]] and [[intelligence]].
[10910180] |Nevertheless, Turing chose not to do so.
[10910190] |Instead he replaced the question with a new question, "which is closely related to it and is expressed in relatively unambiguous words".
[10910200] |In essence, Turing proposed to change the question from "Do machines think?" into "Can machines do what we (as thinking entities) can do?"
[10910210] |The advantage of the new question, Turing argued, was that it "drew a fairly sharp line between the physical and intellectual capacities of a man.
[10910220] |To demonstrate this approach, Turing proposed a test that was inspired by a [[party game]] known as the "Imitation Game", in which a man and a woman go into separate rooms, and guests try to tell them apart by writing a series of questions and reading the typewritten answers sent back.
[10910230] |In this game, both the man and the woman aim to convince the guests that they are the other.
[10910240] |Turing proposed recreating the imitation game as follows:
[10910250] |Later in the paper he suggested an "equivalent" alternative formulation involving a judge conversing only with a computer and a man.
[10910260] |While neither of these two formulations precisely match the version of the Turing Test that is more generally known today, a third version was proposed by Turing in 1952.
[10910270] |In this version, which Turing discussed in a [[BBC]] radio broadcast, Turing proposes a jury which asks questions of a computer, and where the role of the computer is to make a significant proportion of the jury believe that it is really a man.
[10910280] |Turing's paper considered nine common objections, which include all the major arguments against artificial intelligence that have been raised in the years since his paper was first published.
[10910290] |(See ''[[Computing Machinery and Intelligence]]''.)
[10910300] |===ELIZA, PARRY and the Chinese room===
[10910310] |Blay Whitby lists four major turning points in the history of the Turing Test: the publication of "Computing Machinery and Intelligence" in 1950; the announcement of [[Joseph Weizenbaum]]'s [[ELIZA]] in 1966; Kenneth Colby's creation of [[PARRY]], which was first described in 1972; and the Turing Colloquium in 1990.
[10910320] |ELIZA works by examining a user's typed comments for keywords.
[10910330] |If a word is found a rule is applied which transforms the user's comments, and the resulting sentence is then returned.
[10910340] |If a keyword is not found, ELIZA responds with either a generic response or by repeating one of the earlier comments.
[10910350] |In addition, Weizenbaum developed ELIZA to replicate the behavior of a [[Person-centered psychotherapy|Rogerian psychotherapist]], allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world."
[10910360] |Due to these techniques, Weizenbaum's program was able to fool some people into believing that they were talking to a real person, with some subjects being "very hard to convince that ELIZA ... is ''not'' human."
[10910370] |Thus ELIZA is claimed by many to be one of the programs (perhaps the first) that are able to pass the Turing Test.
[10910380] |Colby's PARRY has been described as "ELIZA with attitude" - it attempts to model the behavior of a [[Paranoia|paranoid]] [[Schizophrenic|schizophrenic]], using a similar (if more advanced) approach to that employed by Weizenbaum.
[10910390] |In order to help validate the work, PARRY was tested in the early 1970s using a variation of the Turing Test.
[10910400] |A group of experienced psychiatrists analyzed a combination of real patients and computers running PARRY through [[teletype]] machines.
[10910410] |Another group of 33 psychiatrists were shown transcripts of the conversations.
[10910420] |The two groups were then asked to identify which of the "patients" were human, and which were computer programs.
[10910430] |The psychiatrists were only able to make the correct identification 48% of the time - a figure consistent with random guessing.
[10910440] |While neither ELIZA nor PARRY were able to pass a strict Turing Test, they - and software like them - suggested that software might be written that was able to do so.
[10910450] |More importantly, they suggested that such software might involve little more than databases and the application of simple rules.
[10910460] |This led to [[John Searle]]'s 1980 paper, "Minds, Brains, and Programs", in which he proposed an argument against the Turing Test.
[10910470] |Searle described a [[thought experiment]] known as the [[Chinese room]] that highlighted what he saw as a fundamental misinterpretation of what the Turing Test could and could not prove: while software such as ELIZA might be able to pass the Turing Test, they might do so by simply manipulating symbols of which they have no understanding.
[10910480] |And without understanding, they could not be described as "thinking" in the same sense people do.
[10910490] |Searle concludes that the Turing Test can not prove that a machine can think, contrary to Turing's original proposal.
[10910500] |Arguments such as that proposed by Searle and others working in the [[philosophy of mind]] sparked off a more intense debate about the nature of intelligence, the possibility of intelligent machines and the value of the Turing test that continued through the 1980s and 1990s.
[10910510] |===1990s and beyond===
[10910520] |1990 was the 40th anniversary of the first publication of Turing's "Computing Machinery and Intelligence" paper, and thus saw renewed interest in the test.
[10910530] |Two significant events occurred in that year.
[10910540] |The first with the Turing Colloquium, which was held at the [[University of Sussex]] in April, and brought together academics and researchers from a wide variety of disciplines to discuss the Turing Test in terms of its past, present and future.
[10910550] |The second significant event was the formation of the annual [[Loebner prize]] competition.
[10910560] |The Loebner prize was instigated by [[Hugh Loebner]] under the auspices of the Cambridge Center for Behavioral Studies of [[Massachusetts]], [[United States]], with the first competition held in November, 1991.
[10910570] |As Loebner describes it, the competition was created to advance the state of AI research, at least in part because while the Turing Test had been discussed for many years, "no one had taken steps to implement it."
[10910580] |The Loebner prize has three awards: the first prize of $100,000 and a gold medal, to be awarded to the first program that passes the "unrestricted" Turing test; the second prize of $25,000, to be awarded to the first program that passes the "restricted" version of the test; and a sum of $2000 (now $3000) to the "most human-like" program that was entered each year.
[10910590] |[[As of 2007]], neither the first nor second prizes have been awarded.
[10910600] |The running of the Loebner prize led to renewed discussion of both the viability of the Turing Test and the aim of developing artificial intelligences that could pass it.
[10910610] |''[[The Economist]]'', in an article entitled "Artificial Stupidity", commented that the winning entry from the first Loebner prize won, at least in part, because it was able to "imitate human typing errors".
[10910620] |(Turing had considered the possibility that computers could be identified by their ''lack'' of errors, and had suggested that the computers should be programmed to add errors into their output, so as to be better "players" of the game).
[10910630] |The issue that ''The Economist'' raised was one that was already well established in the literature: perhaps we don't really ''need'' the types of computers that could pass the Turing Test, and perhaps trying to pass the Turing Test is nothing more than a distraction from more fruitful lines of research.
[10910640] |Equally, a second issue became apparent - by providing rules which restricted the abilities of the interrogators to ask questions, and by using comparatively "unsophisticated" interrogators, the Turing Test can be passed through the use of "trickery" rather than intelligence.
[10910650] |==Versions of the Turing test==
[10910660] |There are at least three primary versions of the Turing test - two offered by Turing in "Computing Machinery and Intelligence" and one which Saul Traiger describes as the "Standard Interpretation".
[10910670] |While there is some debate as to whether or not the "Standard Interpretation" is described by Turing or is, instead, based on a misreading of his paper, these three versions are not regarded as being equivalent, and are seen as having different strengths and weaknesses.
[10910680] |As [[empirical]] tests they conform to a proposal published in 1936 by [[A J Ayer]] on how to distinguish between a conscious man and an unconscious machine.
[10910690] |In his book ''[[Language, Truth and Logic]]'' Ayer states that 'The only ground I can have for asserting that an object which appears to be conscious is not really a conscious being, but only a dummy or a machine, is that it fails to satisfy one of the empirical tests by which the presence or absence of consciousness is determined'.
[10910700] |===The imitation game===
[10910710] |Turing described a simple party game which involves three players.
[10910720] |Player A is a man, player B is a woman, and player C (who plays the role of the interrogator) can be of either gender.
[10910730] |In the imitation game, player C - the interrogator - is unable to see either player A or player B, and can only communicate with them through written notes.
[10910740] |By asking questions of player A and player B, player C tries to determine which of the two is the man, and which of the two is the woman.
[10910750] |Player A's role is to trick the interrogator into making the wrong decision, while player B attempts to assist the interrogator.
[10910760] |In what Sterret refers to as the "Original Imitation Game Test", Turing proposed that the role of player A be replaced with a computer.
[10910770] |The computer's task is therefore to pretend to be a woman and to attempt to trick the interrogator into making an incorrect evaluation.
[10910780] |The success of the computer is determined by comparing the outcome of the game when player A is a computer against the outcome when player A is a man.
[10910790] |If, as Turing puts it, "the interrogator decide[s] wrongly as often when the game is played [with the computer] as he does when the game is played between a man and a woman", then it can be argued that the computer is intelligent.
[10910800] |The second version comes later in Turing's 1950 paper.
[10910810] |As with the Original Imitation Game Test, the role of player A is performed by a computer.
[10910820] |The difference is that now the role of player B is to be performed by a man, rather than by a woman.
[10910830] |In this version both player A (the computer) and player B are trying to trick the interrogator into making an incorrect decision.
[10910840] |===The standard interpretation===
[10910850] |A common understanding of the Turing test is that the purpose was not specifically to test if a computer is able to fool an interrogator into believing that it is a woman, but to test whether or not a computer could ''imitate'' a human.
[10910860] |While there is some dispute as to whether or not this interpretation was intended by Turing (for example, Sterrett believes that it was, and thus conflates the second version with this one, while others, such as Traiger, do not), this has nevertheless led to what can be viewed as the "standard interpretation".
[10910870] |In this version, player A is a computer, and player B is a person of either gender.
[10910880] |The role of the interrogator is not to determine which is male and which is female, but to determine which is a computer and which is a human.
[10910890] |===Imitation game vs. standard Turing test===
[10910900] |There has been some controversy over which of the alternative formulations of the test Turing intended.
[10910910] |Sterret argues that two distinct tests can be extracted from Turing's 1950 paper, and that, ''pace'' Turing's remark, they are not equivalent.
[10910920] |The test that employs the party game and compares frequencies of success in the game is referred to as the "Original Imitation Game Test" whereas the test consisting of a human judge conversing with a human and a machine is referred to as the "Standard Turing Test", noting that Sterret equates this with the "standard interpretation" rather than the second version of the imitation game.
[10910930] |Sterrett agrees that the Standard Turing Test (STT) has the problems its critics cite, but argues that, in contrast, the Original Imitation Game Test (OIG Test) so defined is immune to many of them, due to a crucial difference: the OIG Test, unlike the STT, does not make similarity to a human performance the criterion of the test, even though it employs a human performance in setting a criterion for machine intelligence.
[10910940] |A man can fail the OIG Test, but it is argued that this is a virtue of a test of intelligence if failure indicates a lack of resourcefulness.
[10910950] |It is argued that the OIG Test requires the resourcefulness associated with intelligence and not merely "simulation of human conversational behaviour".
[10910960] |The general structure of the OIG Test could even be used with nonverbal versions of imitation games.
[10910970] |Still other writers have interpreted Turing to be proposing that the imitation game itself is the test, without specifying how to take into account Turing's statement that the test he proposed using the party version of the imitation game is based upon a criterion of comparative frequency of success in that imitation game, rather than a capacity to succeed at one round of the game.
[10910980] |===Should the interrogator know about the computer?===
[10910990] |Turing never makes it clear as to whether or not the interrogator in his tests is aware that one of the participants is a computer.
[10911000] |To return to the Original Imitation Game, Turing states only that Player A is to be replaced with a machine, not that player C is to be made aware of this replacement.
[10911010] |When Colby, Hilf, Weber and Kramer tested PARRY, they did so by assuming that the interrogators did not need to know that one or more of those being interviewed was a computer during the interrogation.
[10911020] |But, as Saygin and others highlight, this makes a big difference to the implementation and outcome of the test.
[10911030] |==Strengths of the test ==
[10911040] |The power of the Turing test derives from the fact that it is possible to talk about anything.
[10911050] |Turing wrote "the question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include."
[10911060] |[[John Haugeland]] adds that "understanding the words is not enough; you have to understand the ''topic'' as well."
[10911070] |In order to pass a well designed Turing test, the machine would have to use [[natural language processing|natural language]], to [[commonsense reasoning|reason]], to have [[knowledge representation|knowledge]] and to [[machine learning|learn]].
[10911080] |The test can be extended to include video input, as well as a "hatch" through which objects can be passed, and this would force the machine to demonstrate the skill of [[computer vision|vision]] and [[robotics]] as well.
[10911090] |Together these represent almost all the major problems of [[artificial intelligence]].
[10911100] |==Weaknesses of the test ==
[10911110] |The test has been criticized on several grounds.
[10911120] |===Human intelligence vs. intelligence in general===
[10911130] |The test is explicitly [[anthropomorphic]].
[10911140] |It only tests if the subject ''resembles'' a human being.
[10911150] |It will fail to test for intelligence under two circumstances:
[10911160] |* It tests for many behaviors that we may not consider intelligent, such as the susceptibility to insults or the temptation to lie.
[10911170] |A machine may very well be intelligent without being able to chat ''exactly'' like a human.
[10911180] |* It fails to capture the ''general'' properties of intelligence, such as the ability to solve difficult problems or come up with original insights.
[10911190] |If a machine can solve a difficult problem that no person could solve, it would, in principle, fail the test.
[10911200] |[[Stuart J. Russell]] and [[Peter Norvig]] argue that the anthropomorphism of the test prevents it from being truly useful for the task of engineering intelligent machines.
[10911210] |They write: "Aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.'"
[10911220] |The test is also vulnerable to naivete on the part of the test subjects.
[10911230] |If the testers have little experience with [[chatterbot]]s they may be more likely to judge a computer program to be responding coherently than someone who is aware of the various tricks that chatterbots use, such as changing the subject or answering a question with another question.
[10911240] |Such tricks may be misinterpreted as "playfulness" and therefore evidence of a human participant by uninformed testers, especially during brief sessions in which a chatterbot's inherent repetitiveness does not have a chance to become evident.
[10911250] |===Real intelligence vs. simulated intelligence===
[10911260] |The test is also explicitly [[behaviorist]] or [[functionalist]]: it only tests how the subject ''acts.''
[10911270] |A machine passing the Turing test may be able to ''simulate human conversational behaviour'' but the machine might just follow some cleverly devised rules.
[10911280] |Two famous examples of this line of argument against the Turing test are [[John Searle]]'s [[Chinese room]] argument and [[Ned Block]]'s [[Blockhead (computer system)|Blockhead]] argument.
[10911290] |Even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has [[consciousness]], or that it has [[intentionality]].
[10911300] |Perhaps intelligence and consciousness, for example, are such that neither one necessarily implies the other.
[10911310] |In that case, the Turing test might fail to capture one of the key differences between intelligent machines and intelligent people.
[10911320] |== Predictions and tests ==
[10911330] |Turing predicted that machines would eventually be able to pass the test.
[10911340] |In fact, he estimated that by the year 2000, machines with 109 [[bit]]s (about 119.2 [[mebibyte|MiB]]) of memory would be able to fool 30% of human judges during a 5-minute test.
[10911350] |He also predicted that people would then no longer consider the phrase "thinking machine" contradictory.
[10911360] |He further predicted that [[machine learning]] would be an important part of building powerful machines, a claim which is considered to be plausible by contemporary researchers in [[Artificial intelligence]].
[10911370] |By extrapolating an [[Technological singularity#Accelerating change|exponential growth]] of technology over several decades, [[Future Studies|futurist]] [[Ray Kurzweil]] predicted that Turing-test-capable computers would be manufactured around the year 2020, roughly speaking.
[10911380] |See the [[Moore's Law]] article and the references therein for discussions of the plausibility of this argument.
[10911390] |[[As of 2008]], no computer has passed the Turing test as such.
[10911400] |Simple conversational programs such as [[ELIZA]] have fooled people into believing they are talking to another human being, such as in an informal experiment termed [[AOLiza]].
[10911410] |However, such "successes" are not the same as a Turing Test.
[10911420] |Most obviously, the human party in the conversation has no reason to suspect they are talking to anything other than a human, whereas in a real Turing test the questioner is actively trying to determine the nature of the entity they are chatting with.
[10911430] |Documented cases are usually in environments such as [[Internet Relay Chat]] where conversation is sometimes stilted and meaningless, and in which no understanding of a conversation is necessary.
[10911440] |Additionally, many internet relay chat participants use English as a second or third language, thus making it even more likely that they would assume that an unintelligent comment by the conversational program is simply something they have misunderstood, and do not recognize the very non-human errors they make.
[10911450] |See [[ELIZA effect]].
[10911460] |The [[Loebner prize]] is an annual competition to determine the best Turing test competitors.
[10911470] |Although they award an annual prize for the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour (with learning AI [[Jabberwacky]] winning in [[2005]] and [[2006]], and [[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]] before that), they have an additional prize for a system that in their opinion passes a Turing test.
[10911480] |This second prize has not yet been awarded.
[10911490] |The creators of Jabberwacky have proposed a personal Turing Test: the ability to pass the imitation test while attempting to specifically imitate the human player, with whom the AI will have conversed at length before the test.
[10911500] |In [[2008]] the competition for the [[Loebner prize]] is being co-organised by [[Kevin Warwick]] and held at the [[University of Reading]] on [[October 12]].
[10911510] |The directive for the competition is to stay as close as possible to Turing's original statements made in his 1950 paper, such that it can be ascertained if any machines are presently close to 'passing the test'.
[10911520] |An academic meeting discussing the Turing Test, organised by the [[Society for the Study of Artificial Intelligence and the Simulation of Behaviour]], is being held in parallel at the same venue.
[10911530] |Trying to pass the Turing test in its full generality is not, as of 2005, an active focus of much mainstream academic or commercial effort.
[10911540] |Current research in AI-related fields is aimed at more modest and specific goals.
[10911550] |The first bet of the [[Long Bet Project]] is a [[United States dollar|$]]10,000 one between [[Mitch Kapor]] (pessimist) and [[Ray Kurzweil]] (optimist) about whether a computer will pass a Turing Test by the year [[2029]].
[10911560] |The bet specifies the conditions in some detail.
[10911570] |==Variations of the Turing test==
[10911580] |A modification of the Turing test, where the objective or one or more of the roles have been reversed between computers and humans, is termed a [[reverse Turing test]].
[10911590] |Another variation of the Turing test is described as the [[Subject matter expert Turing test]] where a computer's response cannot be distinguished from an expert in a given field.
[10911600] |As brain and body scanning techniques improve it may also be possible to replicate the essential [[data element]]s of a person to a computer system.
[10911610] |The [[Immortality test]] variation of the Turing test would determine if a person's essential character is reproduced with enough fidelity to make it impossible to distinguish a reproduction of a person from the original person.
[10911620] |The [[Minimum Intelligent Signal Test]] proposed by [[Chris McKinstry]], is another variation of Turing's test, but where only binary responses are permitted.
[10911630] |It is typically used to gather statistical data against which the performance of [[artificial intelligence]] programs may be measured.
[10911640] |Another variation of the reverse Turing test is implied in the work of psychoanalyst Wilfred Bion, who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another.
[10911650] |Carrying this idea forward, R. D. Hinshelwood described the mind as a "mind recognizing apparatus", noting that this might be some sort of "supplement" to the Turing test.
[10911660] |To make this more explicit, the challenge would be for the computer to be able to determine if it were interacting with a human or another computer.
[10911670] |This is an extension of the original question Turing was attempting to answer, but would, perhaps, be a high enough standard to define a machine that could "think" in a way we typically define as characteristically human.
[10911680] |Another variation is the Meta Turing test, in which the subject being tested (for example a computer) is classified as intelligent if it itself has created something that the subject itself wants to test for intelligence.
[10911690] |==Practical applications==
[10911700] |[[Stuart J. Russell]] and [[Peter Norvig]] note that "AI researchers have devoted little attention to passing the Turing Test",
[10911710] |Real Turing tests, such as the [[Loebner prize]], do not usually force programs to demonstrate the full range of intelligence and are reserved for testing [[chatterbot]] programs.
[10911720] |However, even in this limited form these tests are still very rigorous.
[10911730] |The 2008 [[Loebner prize]] however is sticking closely to Turing's original concepts - for example conversations will be for 5 minutes only.
[10911740] |[[CAPTCHA]] is a form of [[reverse Turing test]].
[10911750] |Before being allowed to do some action on a [[website]], the user is presented with alphanumerical characters in a distorted graphic image and asked to recognise it.
[10911760] |This is intended to prevent automated systems from abusing the site.
[10911770] |The rationale is that software sufficiently sophisticated to read the distorted image accurately does not exist (or is not available to the average user), so any system able to do so is likely to be a human being.
[10911780] |== In popular culture ==
[10911790] |In the ''[[Dilbert]]'' comic strip on Sunday [[30 March]] [[2008]],, Dilbert says, "The security audit accidentally locked all of the developers out of the system", and his boss responds with only meaningless, [[tautology (rhetoric)|tautological]] [[thought-terminating cliché]]s, "Well, it is what it is." Dilbert asks "How does that help" and his boss responds with another cliche, "You don't know what you don't know."
[10911800] |Dilbert replies, "Congratulations.
[10911810] |You're the first human to fail the Turing Test."
[10911820] |For that day, "turing test" was the 43rd most popular [[Google]] search.
[10911830] |The character of [[Ghostwheel]] in [[Roger Zelazny]]'s [[The Chronicles of Amber]] is mentioned to be capable of passing the Turing Test.
[10911840] |The webcomic [[xkcd]] has referred to Turing and the Turing test.
[10911850] |[[Rick Deckard]],in the movie [[Blade Runner]], used a Turing Test to determine if Rachael was a [[Replicant]].