[10780010] |
Spanish language
[10780020] |'''Spanish''' or '''Castilian''' (''castellano'') is an [[Indo-European]], [[Romance languages|Romance language]] that originated in northern [[Spain]], and gradually spread in the [[Kingdom of Castile]] and evolved into the principal language of government and trade. [10780030] |It was taken to [[Spanish Empire#Territories in Africa (1898–1975)|Africa]], the [[Spanish colonization of the Americas|Americas]], and [[Spanish East Indies|Asia Pacific]] with the expansion of the [[Spanish Empire]] between the fifteenth and nineteenth centuries. [10780040] |Today, between 322 and 400 million people speak Spanish as a native language, making it the world's second most-spoken language by native speakers (after [[Standard Mandarin|Mandarin Chinese]]). [10780050] |==Hispanosphere== [10780060] |It is estimated that the combined total of native and non-native Spanish speakers is approximately 500 million, likely making it the third most spoken language by total number of speakers (after [[English_language|English]] and [[Chinese_language|Chinese]]). [10780070] |Today, Spanish is an official language of Spain, most [[Latin American]] countries, and [[Equatorial Guinea]]; 21 nations speak it as their primary language. [10780080] |Spanish also is one of [[United Nations#Languages|six official languages]] of the [[United Nations]]. [10780090] |[[Mexico]] has the world's largest Spanish-speaking population, and Spanish is the second most-widely spoken language in the [[United States]] and the most popular studied foreign language in [[United States|U.S.]] schools and universities. [10780100] |[[Global internet usage]] statistics for 2007 show Spanish as the third most commonly used language on the Internet, after English and [[Chinese language|Chinese]]. [10780110] |==Naming and origin== [10780120] |Spaniards tend to call this language {{lang|es|'''''español'''''}} (Spanish) when contrasting it with languages of other states, such as [[French language|French]] and [[English language|English]], but call it {{lang|es|'''''castellano'''''}} (Castilian), that is, the language of the [[Castile (historical region)|Castile]] region, when contrasting it with other [[languages of Spain|languages spoken in Spain]] such as [[Galician language|Galician]], [[Basque language|Basque]], and [[Catalan language|Catalan]]. [10780130] |This reasoning also holds true for the language's preferred name in some [[Hispanic America]]n countries. [10780140] |In this manner, the [[Spanish Constitution of 1978]] uses the term {{lang|es|''castellano''}} to define the [[official language]] of the whole Spanish State, as opposed to {{lang|es|''las demás lenguas españolas''}} (lit. ''the other Spanish languages''). [10780150] |Article III reads as follows: [10780160] |The name ''castellano'' is, however, widely used for the language as a whole in Latin America. [10780170] |Some Spanish speakers consider ''{{lang|es|castellano}}'' a generic term with no political or ideological links, much as "Spanish" is in English. [10780180] |Often Latin Americans use it to differentiate their own variety of Spanish as opposed to the variety of Spanish spoken in Spain, or variety of Spanish which is considered as standard in the region. [10780190] |==Classification and related languages== [10780200] |Spanish is closely related to the other [[West Iberian languages|West Iberian]] Romance languages: [[Asturian language|Asturian]] ({{lang|ast|''asturianu''}}), [[Galician language|Galician]] ({{lang|gl|''galego''}}), [[Ladino language|Ladino]] ({{lang|lad|''dzhudezmo/spanyol/kasteyano''}}), and [[Portuguese language|Portuguese]] ({{lang|pt|''português''}}). [10780210] |Catalan, an [[Iberian Romance languages|East Iberian language]] which exhibits many [[Gallo-Romance]] traits, is more similar to the neighbouring [[Occitan language]] ({{lang|oc|''occitan''}}) than to Spanish, or indeed than Spanish and Portuguese are to each other. [10780220] |Spanish and Portuguese share similar grammars and vocabulary as well as a common history of [[Influence of Arabic on other languages|Arabic influence]] while a great part of the peninsula was under [[Timeline of the Muslim presence in the Iberian peninsula|Islamic rule]] (both languages expanded over [[Islamic empire|Islamic territories]]). [10780230] |Their [[lexical similarity]] has been estimated as 89%. [10780240] |See [[Differences between Spanish and Portuguese]] for further information. [10780250] |===Ladino=== [10780260] |Ladino, which is essentially medieval Spanish and closer to modern Spanish than any other language, is spoken by many descendants of the [[Sephardi Jews]] who were [[Alhambra decree|expelled from Spain in the 15th century]]. [10780270] |Ladino speakers are currently almost exclusively [[Sephardim|Sephardi]] Jews, with family roots in Turkey, Greece or the Balkans: current speakers mostly live in Israel and Turkey, with a few pockets in Latin America. [10780280] |It lacks the [[Amerindian languages|Native American vocabulary]] which was influential during the [[Spanish Empire|Spanish colonial period]], and it retains many archaic features which have since been lost in standard Spanish. [10780290] |It contains, however, other vocabulary which is not found in standard Castilian, including vocabulary from [[Hebrew language|Hebrew]], some French, Greek and [[Turkish language|Turkish]], and other languages spoken where the Sephardim settled. [10780300] |Ladino is in serious danger of extinction because many native speakers today are elderly as well as elderly ''olim'' (immigrants to [[Israel]]) who have not transmitted the language to their children or grandchildren. [10780310] |However, it is experiencing a minor revival among Sephardi communities, especially in music. [10780320] |In the case of the Latin American communities, the danger of extinction is also due to the risk of assimilation by modern Castilian. [10780330] |A related dialect is [[Haketia]], the Judaeo-Spanish of northern Morocco. [10780340] |This too tended to assimilate with modern Spanish, during the Spanish occupation of the region. [10780350] |===Vocabulary comparison=== [10780360] |Spanish and [[Italian language|Italian]] share a very similar phonological system. [10780370] |At present, the [[lexical similarity]] with Italian is estimated at 82%. [10780380] |As a result, Spanish and Italian are mutually intelligible to various degrees. [10780390] |The lexical similarity with [[Portuguese language|Portuguese]] is greater, 89%, but the vagaries of Portuguese pronunciation make it less easily understood by Hispanophones than Italian. [10780400] |[[Mutual intelligibility]] between Spanish and [[French language|French]] or [[Romanian language|Romanian]] is even lower (lexical similarity being respectively 75% and 71%): comprehension of Spanish by French speakers who have not studied the language is as low as an estimated 45% - the same as of English. [10780410] |The common features of the writing systems of the Romance languages allow for a greater amount of interlingual reading comprehension than oral communication would. [10780420] | 1. also {{lang|pt|''nós outros''}} in early modern Portuguese (e.g. ''[[The Lusiads]]'') [10780430] |2. {{lang|it|''noi '''altri'''''}} in Southern [[List of languages of Italy|Italian dialects and languages]] [10780440] |3. Alternatively {{lang|fr|''nous '''autres'''''}} [10780460] |==History== [10780470] |Spanish evolved from [[Vulgar Latin]], with major [[Arabic influence on the Spanish language|influences from Arabic]] in vocabulary during the [[Al-Andalus|Andalusian]] period and minor surviving influences from [[Basque language|Basque]] and [[Celtiberian language|Celtiberian]], as well as [[Germanic languages]] via the [[Visigoths]]. [10780480] |Spanish developed along the remote cross road strips among the [[Alava]], [[Cantabria]], [[Burgos]], [[Soria]] and [[La Rioja (autonomous community)|La Rioja]] provinces of Northern Spain, as a strongly innovative and differing variant from its nearest cousin, [[Asturian|Leonese speech]], with a higher degree of Basque influence in these regions (see [[Iberian Romance languages]]). [10780490] |Typical features of Spanish diachronical [[phonology]] include [[lenition]] (Latin {{lang|la|''vita''}}, Spanish {{lang|es|''vida''}}), [[palatalization]] (Latin {{lang|la|''annum''}}, Spanish {{lang|es|''año''}}, and Latin {{lang|la|''anellum''}}, Spanish {{lang|es|''anillo''}}) and [[diphthong]]ation ([[stem (linguistics)|stem]]-changing) of short ''e'' and ''o'' from Vulgar Latin (Latin {{lang|la|''terra''}}, Spanish {{lang|es|''tierra''}}; Latin {{lang|la|''novus''}}, Spanish {{lang|es|''nuevo''}}). [10780500] |Similar phenomena can be found in other Romance languages as well. [10780510] |During the {{lang|es|''[[Reconquista]]''}}, this northern dialect from [[Cantabria]] was carried south, and remains a [[minority language]] in the northern coastal [[Morocco]]. [10780520] |The first Latin-to-Spanish grammar ({{lang|es|''Gramática de la Lengua Castellana''}}) was written in [[Salamanca]], Spain, in 1492, by [[Antonio de Nebrija|Elio Antonio de Nebrija]]. [10780530] |When it was presented to [[Isabel de Castilla]], she asked, "What do I want a work like this for, if I already know the language?", to which he replied, "Your highness, the language is the instrument of the Empire." [10780540] |From the 16th century onwards, the language was taken to the [[Americas]] and the [[Spanish East Indies]] via [[Spanish colonization of the Americas|Spanish colonization]]. [10780550] |In the 20th century, Spanish was introduced to [[Equatorial Guinea]] and the [[Western Sahara]], the United States, such as in [[Spanish Harlem]], in [[New York City]], that had not been part of the Spanish Empire. [10780560] |For details on borrowed words and other external influences upon Spanish, see [[Influences on the Spanish language]]. [10780570] |===Characterization=== [10780580] |A defining characteristic of Spanish was the [[diphthong]]ization of the Latin short vowels ''e'' and ''o'' into ''ie'' and ''ue'', respectively, when they were stressed. [10780590] |Similar [[sound law|sound changes]] are found in other Romance languages, but in Spanish they were significant. [10780600] |Some examples: [10780610] |* Lat. {{lang|la|''petra''}} > Sp. {{lang|es|''piedra''}}, It. {{lang|it|''pietra''}}, Fr. {{lang|fr|''pierre''}}, Rom. {{lang|ro|''piatrǎ''}}, Port./Gal. {{lang|pt|''pedra''}} "stone". [10780620] |* Lat. {{lang|la|''moritur''}} > Sp. {{lang|es|''muere''}}, It. {{lang|it|''muore''}}, Fr. {{lang|fr|''meurt''}} / {{lang|fr|''muert''}}, Rom. {{lang|ro|''moare''}}, Port./Gal. {{lang|pt|''morre''}} "die". [10780630] |Peculiar to early Spanish (as in the [[Gascon]] dialect of Occitan, and possibly due to a Basque [[substratum]]) was the mutation of Latin initial ''f-'' into ''h-'' whenever it was followed by a vowel that did not diphthongate. [10780640] |Compare for instance: [10780650] |* Lat. {{lang|la|''filium''}} > It. {{lang|it|''figlio''}}, Port. {{lang|pt|''filho''}}, Gal. {{lang|gl|''fillo''}}, Fr. {{lang|fr|''fils''}}, Occitan {{lang|oc|''filh''}} (but Gascon {{lang|gsc|''hilh''}}) Sp. {{lang|es|''hijo''}} (but Ladino {{lang|lad|''fijo''}}); [10780660] |* Lat. {{lang|la|''fabulari''}} > Lad. {{lang|lad|''favlar''}}, Port./Gal. {{lang|pt|''falar''}}, Sp. {{lang|es|''hablar''}}; [10780670] |* but Lat. {{lang|la|''focum''}} > It. {{lang|it|''fuoco''}}, Port./Gal. {{lang|pt|''fogo''}}, Sp./Lad. {{lang|es|''fuego''}}. [10780680] |Some [[consonant cluster]]s of Latin also produced characteristically different results in these languages, for example: [10780690] |* Lat. {{lang|la|''clamare''}}, acc. {{lang|la|''flammam''}}, {{lang|la|''plenum''}} > Lad. {{lang|lad|''lyamar''}}, {{lang|lad|''flama''}}, {{lang|lad|''pleno''}}; Sp. {{lang|es|''llamar''}}, {{lang|es|''llama''}}, {{lang|es|''lleno''}}. [10780700] |However, in Spanish there are also the forms {{lang|la|''clamar''}}, {{lang|lad|''flama''}}, {{lang|lad|''pleno''}}; Port. {{lang|pt|''chamar''}}, {{lang|pt|''chama''}}, {{lang|pt|''cheio''}}; Gal. {{lang|gl|''chamar''}}, {{lang|gl|''chama''}}, {{lang|gl|''cheo''}}. [10780710] |* Lat. acc. {{lang|la|''octo''}}, {{lang|la|''noctem''}}, {{lang|la|''multum''}} > Lad. {{lang|lad|''ocho''}}, {{lang|lad|''noche''}}, {{lang|lad|''muncho''}}; Sp. {{lang|es|''ocho''}}, {{lang|es|''noche''}}, {{lang|es|''mucho''}}; Port. {{lang|pt|''oito''}}, {{lang|pt|''noite''}}, {{lang|pt|''muito''}}; Gal. {{lang|gl|''oito''}}, {{lang|gl|''noite''}}, {{lang|gl|''moito''}}. [10780720] |==Geographic distribution== [10780730] |Spanish is one of the official languages of the [[European Union]], the [[Organization of American States]], the [[Organization of Ibero-American States]], the [[United Nations]], and the [[Union of South American Nations]]. [10780740] |===Europe=== [10780750] |Spanish is an official language of Spain, the country for which it is named and from which it originated. [10780760] |It is also spoken in [[Gibraltar]], though English is the official language. [10780770] |Likewise, it is spoken in [[Andorra]] though [[Catalan language|Catalan]] is the official language. [10780780] |It is also spoken by small communities in other European countries, such as the [[United Kingdom]], [[France]], and [[Germany]]. [10780790] |Spanish is an official language of the [[European Union]]. [10780800] |In Switzerland, Spanish is the [[mother tongue]] of 1.7% of the population, representing the first minority after the 4 official languages of the country. [10780810] |===The Americas === [10780820] |====Latin America==== [10780830] |Most Spanish speakers are in [[Latin America]]; of most countries with the most Spanish speakers, only [[Spain]] is outside of the [[Americas]]. [10780840] |[[Mexico]] has most of the world's native speakers. [10780850] |Nationally, Spanish is the official language of [[Argentina]], [[Bolivia]] (co-official [[Quechua]] and [[Aymara language|Aymara]]), [[Chile]], [[Colombia]], [[Costa Rica]], [[Cuba]], [[Dominican Republic]], [[Ecuador]], [[El Salvador]], [[Guatemala]], [[Honduras]], [[Mexico]] , [[Nicaragua]], [[Panama]], [[Paraguay]] (co-official [[Guarani language|Guaraní]]), [[Peru]] (co-official [[Quechua]] and, in some regions, [[Aymara language|Aymara]]), [[Uruguay]], and [[Venezuela]]. [10780860] |Spanish is also the official language (co-official with [[English language|English]]) in the U.S. commonwealth of [[Puerto Rico]]. [10780870] |Spanish has no official recognition in the former [[British overseas territories|British colony]] of [[Belize]]; however, per the 2000 census, it is spoken by 43% of the population. [10780880] |Mainly, it is spoken by Hispanic descendants who remained in the region since the 17th century; however, English is the official language. [10780890] |Spain colonized [[Trinidad and Tobago]] first in [[1498]], leaving the [[Carib]] people the Spanish language. [10780900] |Also the [[Cocoa Panyol]]s, laborers from Venezuela, took their culture and language with them; they are accredited with the music of "[[Parang]]" ("[[Parranda]]") on the island. [10780910] |Because of Trinidad's location on the South American coast, the country is much influenced by its Spanish-speaking neighbors. [10780920] |A recent census shows that more than 1,500 inhabitants speak Spanish. [10780930] |In 2004, the government launched the ''Spanish as a First Foreign Language'' (SAFFL) initiative in March 2005. [10780940] |Government regulations require Spanish to be taught, beginning in primary school, while thirty percent of public employees are to be linguistically competent within five years. [10780950] |The government also announced that Spanish will be the country's second official language by [[2020]], beside English. [10780960] |Spanish is important in [[Brazil]] because of its proximity to and increased trade with its Spanish-speaking neighbors; for example, as a member of the [[Mercosur]] trading bloc. [10780970] |In 2005, the [[National Congress of Brazil]] approved a bill, signed into law by the [[President of Brazil|President]], making Spanish available as a foreign language in secondary schools. [10780980] |In many border towns and villages (especially on the Uruguayan-Brazilian border), a [[mixed language]] known as [[Riverense Portuñol|Portuñol]] is spoken. [10780990] |====United States==== [10781000] |In the 2006 census, 44.3 million people of the U.S. population were [[Hispanic]] or [[Latino]] by origin; 34 million people, 12.2 percent, of the population older than 5 years speak Spanish at home. [10781005] |Spanish has a [[Spanish in the United States|long history in the United States]] (many south-western states were part of Mexico and Spain), and it recently has been revitalized by much immigration from Latin America. [10781010] |Spanish is the most widely taught foreign language in the country. [10781020] |Although the United States has no formally designated "official languages," Spanish is formally recognized at the state level beside English; in the U.S. state of [[New Mexico]], 30 per cent of the population speak it. [10781030] |It also has strong influence in metropolitan areas such as Los Angeles, Miami and New York City. [10781040] |Spanish is the dominant spoken language in [[Puerto Rico]], a U.S. territory. [10781050] |In total, the U.S. has the world's fifth-largest Spanish-speaking population. [10781060] |===Asia=== [10781070] |Spanish was an official language of the [[Philippines]] but was never spoken by a majority of the population. [10781080] |Movements for most of the masses to learn the language were started but were stopped by the friars. [10781090] |Its importance fell in the first half of the 20th century following the U.S. occupation and administration of the islands. [10781100] |The introduction of the English language in the Philippine government system put an end to the use of Spanish as the official language. [10781110] |The language lost its official status in 1973 during the [[Ferdinand Marcos]] administration. [10781120] |Spanish is spoken mainly by small communities of Filipino-born Spaniards, Latin Americans, and Filipino [[mestizo]]s (mixed race), descendants of the early colonial Spanish settlers. [10781130] |Throughout the 20th century, the Spanish language has declined in importance compared to English and [[Tagalog language|Tagalog]]. [10781140] |According to the 1990 Philippine census, there were 2,658 native speakers of Spanish. [10781150] |No figures were provided during the 1995 and 2000 censuses; however, figures for 2000 did specify there were over 600,000 native speakers of [[Chavacano language|Chavacano]], a Spanish based [[Creole language|creole]] language spoken in [[Cavite]] and [[Zamboanga]]. [10781160] |Some other sources put the number of Spanish speakers in the Philippines around two to three million; however, these sources are disputed. [10781170] |In Tagalog, there are 4,000 Spanish adopted words and around 6,000 Spanish adopted words in Visayan and other Philippine languages as well. [10781180] |Today Spanish is offered as a foreign language in Philippines schools and universities. [10781190] |===Africa=== [10781200] |In Africa, Spanish is official in the UN-recognised but Moroccan-occupied [[Western Sahara]] (co-official [[Arabic language|Arabic]]) and [[Equatorial Guinea]] (co-official [[French language|French]] and [[Portuguese language|Portuguese]]). [10781210] |Today, nearly 200,000 refugee Sahrawis are able to read and write in Spanish, and several thousands have received [[university]] education in foreign countries as part of aid packages (mainly [[Cuba]] and [[Spain]]). [10781220] |In Equatorial Guinea, Spanish is the predominant language when counting native and non-native speakers (around 500,000 people), while [[Fang language|Fang]] is the most spoken language by a number of native speakers. [10781230] |It is also spoken in the Spanish cities in [[Plazas de soberanía|continental North Africa]] ([[Ceuta]] and [[Melilla]]) and in the autonomous community of [[Canary Islands]] (143,000 and 1,995,833 people, respectively). [10781240] |Within Northern Morocco, a former [[History of Morocco#European influence|Franco-Spanish protectorate]] that is also geographically close to Spain, approximately 20,000 people speak Spanish. [10781250] |It is spoken by some communities of [[Angola]], because of the Cuban influence from the [[Cold War]], and in [[Nigeria]] by the descendants of [[Afro-Cuban]] ex-slaves. [10781260] |In [[Côte d'Ivoire]] and [[Senegal]], Spanish can be learned as a second foreign language in the public education system. [10781270] |In 2008, [[Cervantes Institute]]s centers will be opened in [[Lagos]] and [[Johannesburg]], the first one in the [[Sub-Saharan Africa]] [10781280] |===Oceania=== [10781290] |Among the countries and territories in [[Oceania]], Spanish is also spoken in [[Easter Island]], a territorial possession of Chile. [10781300] |According to the 2001 census, there are approximately 95,000 speakers of Spanish in Australia, 44,000 of which live in Greater Sydney , where the older [[:Category: Australians of Mexican descent|Mexican]], [[:Category:Australians of Colombian descent|Colombian]], and [[:Category: Australians of Spanish descent|Spanish]] populations and newer [[:Category:Australians of Argentine descent|Argentine]], Salvadoran and [[:Category:Australians of Uruguayan descent|Uruguyan]] communities live. [10781310] |The island nations of [[Guam]], [[Palau]], [[Northern Marianas]], [[Marshall Islands]] and [[Federated States of Micronesia]] all once had Spanish speakers, since [[Marianas Islands|Marianas]] and [[Caroline Islands]] were Spanish colonial possessions until late 19th century (see [[Spanish-American War]]), but Spanish has since been forgotten. [10781320] |It now only exists as an influence on the local native languages and also spoken by [[Hispanics in the United States|Hispanic American]] resident populations. [10781330] |==Dialectal variation== [10781340] |There are important variations among the regions of Spain and throughout Spanish-speaking America. [10781350] |In countries in Hispanophone America, it is preferable to use the word ''castellano'' to distinguish their version of the language from that of Spain, thus asserting their autonomy and national identity. [10781360] |In Spain the Castilian dialect's pronunciation is commonly regarded as the national standard, although a use of slightly different pronouns called [[Loísmo|{{lang|es|''laísmo''}}]] of this dialect is deprecated. [10781370] |More accurately, for nearly everyone in Spain, "standard Spanish" means "pronouncing everything exactly as it is written," an ideal which does not correspond to any real dialect, though the northern dialects are the closest to it. [10781380] |In practice, the standard way of speaking Spanish in the media is "written Spanish" for formal speech, "Madrid dialect" (one of the transitional variants between Castilian and Andalusian) for informal speech. [10781390] |===Voseo=== [10781400] |Spanish has three [[grammatical person|second-person]] [[grammatical number|singular]] [[pronoun]]s: {{lang|es|''tú''}}, {{lang|es|''usted''}}, and in some parts of Latin America, {{lang|es|''vos''}} (the use of this pronoun and/or its verb forms is called ''voseo''). [10781410] |In those regions where it is used, generally speaking, {{lang|es|''tú''}} and {{lang|es|''vos''}} are informal and used with friends; in other countries, {{lang|es|''vos''}} is considered an archaic form. [10781415] |{{lang|es|''Usted''}} is universally regarded as the formal address (derived from {{lang|es|''vuestra merced''}}, "your grace"), and is used as a mark of respect, as when addressing one's elders or strangers. [10781420] |{{lang|es|''Vos''}} is used extensively as the primary spoken form of the second-person singular pronoun, although with wide differences in social consideration, in many countries of [[Latin America]], including [[Argentina]], [[Chile]], [[Costa Rica]], the central mountain region of [[Ecuador]], the State of [[Chiapas]] in [[Mexico]], [[El Salvador]], [[Guatemala]], [[Honduras]], [[Nicaragua]], [[Paraguay]], [[Uruguay]], the [[Paisa region]] and Caleños of [[Colombia]] and the [[States]] of [[Zulia]] and Trujillo in [[Venezuela]]. [10781430] |There are some differences in the verbal endings for ''vos'' in each country. [10781440] |In Argentina, Uruguay, and increasingly in Paraguay and some Central American countries, it is also the standard form used in the [[mass media|media]], but the media in other countries with {{lang|es|''voseo''}} generally continue to use {{lang|es|''usted''}} or {{lang|es|''tú''}} except in advertisements, for instance. [10781445] |{{lang|es|''Vos''}} may also be used regionally in other countries. [10781450] |Depending on country or region, usage may be considered standard or (by better educated speakers) to be unrefined. [10781460] |Interpersonal situations in which the use of ''vos'' is acceptable may also differ considerably between regions. [10781470] |===Ustedes=== [10781480] |Spanish forms also differ regarding second-person plural pronouns. [10781490] |The Spanish dialects of Latin America have only one form of the second-person plural for daily use, {{lang|es|''ustedes''}} (formal or familiar, as the case may be, though {{lang|es|''vosotros''}} non-formal usage can sometimes appear in poetry and rhetorical or literary style). [10781500] |In Spain there are two forms — {{lang|es|''ustedes''}} (formal) and {{lang|es|''vosotros''}} (familiar). [10781510] |The pronoun {{lang|es|''vosotros''}} is the plural form of {{lang|es|''tú''}} in most of Spain, but in the Americas (and certain southern Spanish cities such as [[Cádiz]] or [[Seville]], and in the [[Canary Islands]]) it is replaced with {{lang|es|''ustedes''}}. [10781520] |It is notable that the use of {{lang|es|''ustedes''}} for the informal plural "you" in southern Spain does not follow the usual rule for pronoun-verb [[agreement (linguistics)|agreement]]; e.g., while the formal form for "you go", {{lang|es|''ustedes van''}}, uses the third-person plural form of the verb, in Cádiz or Seville the informal form is constructed as {{lang|es|''ustedes vais''}}, using the second-person plural of the verb. [10781530] |In the Canary Islands, though, the usual pronoun-verb agreement is preserved in most cases. [10781540] |Some words can be different, even embarrassingly so, in different Hispanophone countries. [10781550] |Most Spanish speakers can recognize other Spanish forms, even in places where they are not commonly used, but Spaniards generally do not recognise specifically American usages. [10781560] |For example, Spanish ''mantequilla'', ''aguacate'' and ''albaricoque'' (respectively, "butter", "avocado", "apricot") correspond to ''manteca'', ''palta'', and ''damasco'', respectively, in Argentina, Chile and Uruguay. [10781570] |The everyday Spanish words ''coger'' (to catch, get, or pick up), ''pisar'' (to step on) and ''concha'' (seashell) are considered extremely rude in parts of Latin America, where the meaning of ''coger'' and ''pisar'' is also "to have sex" and ''concha'' means "vulva". [10781580] |The Puerto Rican word for "bobby pin" (''pinche'') is an obscenity in Mexico, and in [[Nicaragua]] simply means "stingy". [10781590] |Other examples include ''[[taco]]'', which means "swearword" in Spain but is known to the rest of the world as a Mexican dish. [10781600] |''Pija'' in many countries of Latin America is an obscene slang word for "penis", while in [[Spain]] the word also signifies "posh girl" or "snobby". [10781610] |''Coche'', which means "car" in Spain, for the vast majority of Spanish-speakers actually means "baby-stroller", in Guatemala it means "pig", while ''carro'' means "car" in some Latin American countries and "cart" in others, as well as in Spain. [10781620] |The {{lang|es|[[Real Academia Española]]}} (Royal Spanish Academy), together with the 21 other national ones (see [[Association of Spanish Language Academies]]), exercises a standardizing influence through its publication of dictionaries and widely respected grammar and style guides. [10781630] |Due to this influence and for other sociohistorical reasons, a standardized form of the language ([[Standard Spanish]]) is widely acknowledged for use in literature, academic contexts and the media. [10781640] |==Writing system== [10781650] |Spanish is written using the [[Latin alphabet]], with the addition of the character ''[[ñ]]'' (''eñe'', representing the phoneme {{IPA|/ɲ/}}, a letter distinct from ''n'', although typographically composed of an ''n'' with a [[tilde]]) and the [[digraph (orthography)|digraph]]s ''ch'' ({{lang|es|''che''}}, representing the phoneme {{IPA|/tʃ/}}) and ''ll'' ({{lang|es|''elle''}}, representing the phoneme {{IPA|/ʎ/}}). [10781660] |However, the digraph ''rr'' ({{lang|es|''erre fuerte''}}, "strong ''r''", {{lang|es|''erre doble''}}, "double ''r''", or simply {{lang|es|''erre''}}), which also represents a distinct phoneme {{IPA|/r/}}, is not similarly regarded as a single letter. [10781670] |Since 1994, the digraphs ''ch'' and ''ll'' are to be treated as letter pairs for [[collation]] purposes, though they remain a part of the alphabet. [10781680] |Words with ''ch'' are now alphabetically sorted between those with ''ce'' and ''ci'', instead of following ''cz'' as they used to, and similarly for ''ll''. [10781690] |Thus, the Spanish alphabet has the following 29 letters: [10781700] |:a, b, c, ch, d, e, f, g, h, i, j, k, l, ll, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z. [10781710] |With the exclusion of a very small number of regional terms such as ''México'' (see [[Toponymy of Mexico]]) and some neologisms like ''software'', pronunciation can be entirely determined from spelling. [10781720] |A typical Spanish word is stressed on the [[syllable]] before the last if it ends with a vowel (not including ''y'') or with a vowel followed by ''n'' or ''s''; it is stressed on the last syllable otherwise. [10781730] |Exceptions to this rule are indicated by placing an [[acute accent]] on the [[stress (linguistics)|stressed vowel]]. [10781740] |The acute accent is used, in addition, to distinguish between certain [[homophone]]s, especially when one of them is a stressed word and the other one is a [[clitic]]: compare {{lang|es|''el''}} ("the", masculine singular definite article) with {{lang|es|''él''}} ("he" or "it"), or {{lang|es|''te''}} ("you", object pronoun), {{lang|es|''de''}} (preposition "of" or "from"), and {{lang|es|''se''}} (reflexive pronoun) with {{lang|es|''té''}} ("tea"), {{lang|es|''dé''}} ("give") and {{lang|es|''sé''}} ("I know", or imperative "be"). [10781750] |The interrogative pronouns ({{lang|es|''qué''}}, {{lang|es|''cuál''}}, {{lang|es|''dónde''}}, {{lang|es|''quién''}}, etc.) also receive accents in direct or indirect questions, and some demonstratives ({{lang|es|''ése''}}, {{lang|es|''éste''}}, {{lang|es|''aquél''}}, etc.) must be accented when used as pronouns. [10781760] |The conjunction {{lang|es|''o''}} ("or") is written with an accent between numerals so as not to be confused with a zero: e.g., {{lang|es|''10 ó 20''}} should be read as {{lang|es|''diez o veinte''}} rather than {{lang|es|''diez mil veinte''}} ("10,020"). [10781770] |Accent marks are frequently omitted in capital letters (a widespread practice in the early days of computers where only lowercase vowels were available with accents), although the [[Real Academia Española|RAE]] advises against this. [10781780] |When ''u'' is written between ''g'' and a front vowel (''e'' or ''i''), if it should be pronounced, it is written with a [[diaeresis (diacritic)|diaeresis]] (''ü'') to indicate that it is not silent as it normally would be (e.g., ''cigüeña'', "stork", is pronounced {{IPA|/θiˈɣweɲa/}}; if it were written ''cigueña'', it would be pronounced {{IPA|/θiˈɣeɲa/}}. [10781790] |Interrogative and exclamatory clauses are introduced with [[Inverted question and exclamation marks|inverted question ( ¿ ) and exclamation ( ¡ ) marks]]. [10781800] |==Sounds== [10781810] |The phonemic inventory listed in the following table includes [[phoneme]]s that are preserved only in some dialects, other dialects having merged them (such as ''[[yeísmo]]''); these are marked with an asterisk (*). [10781820] |Sounds in parentheses are [[allophone]]s. [10781830] |By the 16th century, the consonant system of Spanish underwent the following important changes that differentiated it from [[Iberian Romance languages|neighboring Romance languages]] such as [[Portuguese language|Portuguese]] and [[Catalan language|Catalan]]: [10781840] |*Initial {{IPA|/f/}}, when it had evolved into a vacillating {{IPA|/h/}}, was lost in most words (although this etymological ''h-'' is preserved in spelling and in some Andalusian dialects is still aspirated). [10781850] |*The [[bilabial approximant]] {{IPA|/β̞/}} (which was written ''u'' or ''v'') merged with the bilabial oclusive {{IPA|/b/}} (written ''b''). [10781860] |There is no difference between the pronunciation of orthographic ''b'' and ''v'' in contemporary Spanish, excepting emphatic pronunciations that cannot be considered standard or natural. [10781870] |*The [[voiced alveolar fricative]] {{IPA|/z/}} which existed as a separate phoneme in medieval Spanish merged with its voiceless counterpart {{IPA|/s/}}. [10781880] |The phoneme which resulted from this merger is currently spelled ''s''. [10781890] |*The [[voiced postalveolar fricative]] {{IPA|/ʒ/}} merged with its voiceless counterpart {{IPA|/ʃ/}}, which evolved into the modern velar sound {{IPA|/x/}} by the 17th century, now written with ''j'', or ''g'' before ''e, i''. [10781900] |Nevertheless, in most parts of Argentina and in Uruguay, ''y'' and ''ll'' have both evolved to {{IPA|/ʒ/}} or {{IPA|/ʃ/}}. [10781910] |*The [[voiced alveolar affricate]] {{IPA|/dz/}} merged with its voiceless counterpart {{IPA|/ts/}}, which then developed into the interdental {{IPA|/θ/}}, now written ''z'', or ''c'' before ''e, i''. [10781920] |But in [[Andalusia]], the [[Canary Islands]] and the Americas this sound merged with {{IPA|/s/}} as well. [10781930] |See ''[[Ceceo]]'', for further information. [10781940] |The consonant system of Medieval Spanish has been better preserved in [[Ladino language|Ladino]] and in Portuguese, neither of which underwent these shifts. [10781950] |===Lexical stress=== [10781960] |Spanish is a [[syllable-timed language]], so each syllable has the same duration regardless of stress. [10781970] |Stress most often occurs on any of the last three syllables of a word, with some rare exceptions at the fourth last. [10781980] |The ''tendencies'' of stress assignment are as follows: [10781990] |* In words ending in vowels and {{IPA|/s/}}, stress most often falls on the penultimate syllable. [10782000] |* In words ending in all other consonants, the stress more often falls on the ultimate syllable. [10782010] |* Preantepenultimate stress occurs rarely and only in words like ''guardándoselos'' ('saving them for him/her') where a clitic follows certain verbal forms. [10782020] |In addition to the many exceptions to these tendencies, there are numerous [[minimal pair]]s which contrast solely on stress. [10782030] |For example, ''sabana'', with penultimate stress, means 'savannah' while ''{{lang|es|sábana}}'', with antepenultimate stress, means 'sheet'; ''{{lang|es|límite}}'' ('boundary'), ''{{lang|es|limite}}'' ('[that] he/she limits') and ''{{lang|es|limité}}'' ('I limited') also contrast solely on stress. [10782040] |Phonological stress may be marked orthographically with an [[acute accent]] (''ácido'', ''distinción'', etc). [10782050] |This is done according to the mandatory stress rules of [[Spanish orthography]] which are similar to the tendencies above (differing with words like ''distinción'') and are defined so as to unequivocally indicate where the stress lies in a given written word. [10782060] |An acute accent may also be used to differentiate homophones (such as ''[[wikt:té#Spanish|té]]'' for 'tea' and ''[[wikt:te#Spanish|te]]'' [10782070] |An amusing example of the significance of intonation in Spanish is the phrase ''{{lang|es|¿Cómo "cómo como"? [10782080] |¡Como como como!}}'' [10782090] |("What do you mean / 'how / do I eat'? / I eat / the way / I eat!"). [10782100] |==Grammar== [10782110] |Spanish is a relatively [[inflected]] language, with a two-[[Grammatical gender|gender]] system and about fifty [[Grammatical conjugation|conjugated]] forms per [[verb]], but limited inflection of [[noun]]s, [[adjective]]s, and [[determiner]]s. [10782120] |(For a detailed overview of verbs, see [[Spanish verbs]] and [[Spanish irregular verbs]].) [10782130] |It is [[Branching (linguistics)|right-branching]], uses [[preposition]]s, and usually, though not always, places [[adjective]]s after [[noun]]s. [10782140] |Its [[syntax]] is generally [[Subject Verb Object]], though variations are common. [10782150] |It is a [[pro-drop language]] (allows the deletion of pronouns when pragmatically unnecessary) and [[verb framing|verb-framed]]. [10782160] |== Samples == [10790010] |
Speech recognition
[10790020] |'''Speech recognition''' (also known as '''automatic speech recognition''' or '''computer speech recognition''') converts spoken words to machine-readable input (for example, to keypresses, using the binary code for a string of [[Character (computing)|character]] codes). [10790030] |The term [[speaker recognition|voice recognition]] may also be used to refer to speech recognition, but more precisely refers to '''speaker recognition''', which attempts to identify the person speaking, as opposed to what is being said. [10790040] |Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), [[domotic]] appliance control and content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., [[word processor]]s or [[email]]s), and in aircraft [[cockpit]]s (usually termed [[Direct Voice Input]]). [10790050] |==History== [10790060] |One of the most notable domains for the commercial application of speech recognition in the United States has been health care and in particular the work of the [[medical transcription]]ist (MT). [10790070] |According to industry experts, at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted. [10790080] |It was also the case that SR at that time was often technically deficient. [10790090] |Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do. [10790100] |The biggest limitation to speech recognition automating transcription, however, is seen as the software. [10790110] |The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. [10790120] |Another limitation has been the extensive amount of time required by the user and/or system provider to train the software. [10790130] |A distinction in ASR is often made between "artificial syntax systems" which are usually domain-specific and "natural language processing" which is usually language-specific. [10790140] |Each of these types of application presents its own particular goals and challenges. [10790150] |==Applications== [10790160] |===Health care=== [10790170] |In the [[health care]] domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete. [10790180] |Many experts in the field anticipate that with increased use of speech recognition technology, the services provided may be redistributed rather than replaced. [10790190] |Speech recognition can be implemented in front-end or back-end of the medical documentation process. [10790200] |Front-End SR is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document. [10790210] |It never goes through an MT/editor. [10790220] |Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report. [10790230] |Deferred SR is being widely used in the industry currently. [10790240] |Many [[Electronic Medical Records]] (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speech-recognition engine. [10790250] |Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard. [10790260] |**************************************************************************************** [10790270] |********************************** [10790280] |***************** [10790290] |===Military=== [10790300] |====High-performance fighter aircraft==== [10790310] |Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. [10790320] |Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/[[F-16]] aircraft ([[F-16 VISTA]]), the program in France on installing speech recognition systems on [[Mirage (aircraft)|Mirage]] aircraft, and programs in the UK dealing with a variety of aircraft platforms. [10790330] |In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. [10790340] |Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system. [10790350] |Some important conclusions from the work were as follows: [10790360] |#Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently. [10790370] |#Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system. [10790380] |#More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. [10790390] |Laboratory research in robust speech recognition for military environments has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft. [10790400] |Working with Swedish pilots flying in the [[JAS-39]] Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads. [10790410] |It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly. [10790420] |Contrary to what might be expected, no effects of the broken English of the speakers were found. [10790430] |It was evident that spontaneous speech caused problems for the recognizer, as could be expected. [10790440] |A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially. [10790450] |The [[Eurofighter Typhoon]] currently in service with the UK [[RAF]] employs a speaker-dependent system, i.e. it requires each pilot to create a template. [10790460] |The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other [[cockpit]] functions. [10790470] |Voice commands are confirmed by visual and/or aural feedback. [10790480] |The system is seen as a major design feature in the reduction of pilot [[workload]], and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands. [10790490] |====Helicopters==== [10790500] |The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment. [10790510] |The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone. [10790520] |Substantial test and evaluation programs have been carried out in the post decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK. [10790530] |Work in France has included speech recognition in the Puma helicopter. [10790540] |There has also been much useful work in Canada. [10790550] |Results have been encouraging, and voice applications have included: control of communication radios; setting of navigation systems; and control of an automated target handover system. [10790560] |As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. [10790570] |Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. [10790580] |Much remains to be done both in speech recognition and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings. [10790590] |====Battle management==== [10790600] |Battle management command centres generally require rapid access to and control of large, rapidly changing information databases. [10790610] |Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format. [10790620] |Human machine interaction by voice has the potential to be very useful in these environments. [10790630] |A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments. [10790640] |In one feasibility study, speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. [10790650] |Users were very optimistic about the potential of the system, although capabilities were limited. [10790660] |Speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focused on this problem of natural speech interface.. [10790670] |Speech recognition efforts have focused on a database of continuous speech recognition (CSR), large-vocabulary speech which is designed to be representative of the naval resource management task. [10790680] |Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focused on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system. [10790690] |====Training air traffic controllers==== [10790700] |Training for military (or civilian) air traffic controllers (ATC) represents an excellent application for speech recognition systems. [10790710] |Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation. [10790720] |Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel. [10790730] |Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task. [10790740] |The U.S. Naval Training Equipment Center has sponsored a number of developments of prototype ATC trainers using speech recognition. [10790750] |Generally, the recognition accuracy falls short of providing graceful interaction between the trainee and the system. [10790760] |However, the prototype training systems have demonstrated a significant potential for voice interaction in these systems, and in other training applications. [10790770] |The U.S. Navy has sponsored a large-scale effort in ATC training systems, where a commercial speech recognition unit was integrated with a complex training system including displays and scenario creation. [10790780] |Although the recognizer was constrained in vocabulary, one of the goals of the training programs was to teach the controllers to speak in a constrained language, using specific vocabulary specifically designed for the ATC task. [10790790] |Research in France has focussed on the application of speech recognition in ATC training systems, directed at issues both in speech recognition and in application of task-domain grammar constraints. [10790800] |The USAF, USMC, US Army, and FAA are currently using ATC simulators with speech recognition provided by Adacel Systems Inc (ASI). [10790810] |Adacel's MaxSim software uses speech recognition and synthetic speech to enable the trainee to control aircraft and ground vehicles in the simulation without the need for pseudo pilots. [10790820] |Adacel's ATC In A Box Software provideds a synthetic ATC environment for flight simulators. [10790830] |The "real" pilot talks to a virtual controller using speech recognition and the virtual controller responds with synthetic speech. [10790840] |It will be an application format [10790850] |===Telephony and other domains=== [10790860] |ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread. [10790870] |Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use. [10790880] |The improvement of mobile processor speeds let create speech-enabled Symbian and Windows Mobile Smartphones. [10790890] |Current speech-to-text programs are too large and require too much CPU power to be practical for the Pocket PC. [10790900] |Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands. [10790910] |Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command); Nuance Communications (Nuance Voice Control); Vito Technology (VITO Voice2Go); Speereo Software (Speereo Voice Translator). [10790920] |===People with Disabilities=== [10790930] |People with disabilities are another part of the population that benefit from using speech recognition programs. [10790940] |It is especially useful for people who have difficulty with or are unable to use their hands, from mild repetitive stress injuries to involved disabilities that require alternative input for support with accessing the computer. [10790950] |In fact, people who used the keyboard a lot and developed [[Repetitive Strain Injury|RSI]] became an urgent early market for speech recognition. [10790960] |Speech recognition is used in [[deaf]] [[telephony]], such as [[spinvox]] voice-to-text voicemail, [[relay services]], and [[Telecommunications Relay Service#Captioned_telephone|captioned telephone]]. [10790970] |===Further applications=== [10790980] |*Automatic translation [10790990] |*Automotive speech recognition (e.g., [[Ford Sync]]) [10791000] |*Telematics (e.g. vehicle Navigation Systems) [10791010] |*Court reporting (Realtime Voice Writing) [10791020] |*[[Hands-free computing]]: voice command recognition computer [[user interface]] [10791030] |*[[Home automation]] [10791040] |*[[Interactive voice response]] [10791050] |*[[Mobile telephony]], including mobile email [10791060] |*[[Multimodal interaction]] [10791070] |*[[Pronunciation]] evaluation in computer-aided language learning applications [10791080] |*[[Robotics]] [10791090] |*[[Transcription (linguistics)|Transcription]] (digital speech-to-text). [10791100] |*Speech-to-Text (Transcription of speech into mobile text messages) [10791110] |==Performance of speech recognition systems== [10791120] |The performance of speech recognition systems is usually specified in terms of accuracy and speed. [10791130] |Accuracy may be measured in terms of performance accuracy which is usually rated with [[word error rate]] (WER), whereas speed is measured with the [[real time factor]]. [10791140] |Other measures of accuracy include [[Single Word Error Rate]] (SWER) and [[Command Success Rate]] (CSR). [10791150] |Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions. [10791160] |There is some confusion, however, over the interchangeability of the terms "speech recognition" and "dictation". [10791170] |Commercially available speaker-dependent dictation systems usually require only a short period of training (sometimes also called `enrollment') and may successfully capture continuous speech with a large vocabulary at normal pace with a very high accuracy. [10791180] |Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy if operated under optimal conditions. [10791190] |`Optimal conditions' usually assume that users: [10791200] |* have speech characteristics which match the training data, [10791210] |* can achieve proper speaker adaptation, and [10791220] |* work in a clean noise environment (e.g. quiet office or laboratory space). [10791230] |This explains why some users, especially those whose speech is heavily accented, might achieve recognition rates much lower than expected. [10791240] |Speech recognition in video has become a popular search technology used by several video search companies. [10791250] |Limited vocabulary systems, requiring no training, can recognize a small number of words (for instance, the ten digits) as spoken by most speakers. [10791260] |Such systems are popular for routing incoming phone calls to their destinations in large organizations. [10791270] |Both [[Acoustic Model|acoustic modeling]] and [[language model]]ing are important parts of modern statistically-based speech recognition algorithms. [10791280] |Hidden Markov models (HMMs) are widely used in many systems. [10791290] |Language modeling has many other applications such as [[smart keyboard]] and [[document classification]]. [10791300] |===Hidden Markov model (HMM)-based speech recognition=== [10791310] |Modern general-purpose speech recognition systems are generally based on [[Hidden Markov Model|HMMs]]. [10791320] |These are statistical models which output a sequence of symbols or quantities. [10791330] |One possible reason why HMMs are used in speech recognition is that a speech signal could be viewed as a piecewise stationary signal or a short-time stationary signal. [10791340] |That is, one could assume in a short-time in the range of 10 milliseconds, speech could be approximated as a [[stationary process]]. [10791350] |Speech could thus be thought of as a [[Markov model]] for many stochastic processes. [10791360] |Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. [10791370] |In speech recognition, the hidden Markov model would output a sequence of ''n''-dimensional real-valued vectors (with ''n'' being a small integer, such as 10), outputting one of these every 10 milliseconds. [10791380] |The vectors would consist of [[cepstrum|cepstral]] coefficients, which are obtained by taking a [[Fourier transform]] of a short time window of speech and decorrelating the spectrum using a [[cosine transform]], then taking the first (most significant) coefficients. [10791390] |The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians which will give a likelihood for each observed vector. [10791400] |Each word, or (for more general speech recognition systems), each [[phoneme]], will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes. [10791410] |Described above are the core elements of the most common, HMM-based approach to speech recognition. [10791420] |Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. [10791430] |A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. [10791440] |The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semitied covariance transform (also known as maximum likelihood linear transform, or MLLT). [10791450] |Many systems use so-called discriminative training techniques which dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. [10791460] |Examples are maximum [[mutual information]] (MMI), minimum classification error (MCE) and minimum phone error (MPE). [10791470] |Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the [[Viterbi algorithm]] to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model which includes both the acoustic and language model information, or combining it statically beforehand (the [[finite state transducer]], or FST, approach). [10791480] |===Dynamic time warping (DTW)-based speech recognition=== [10791490] |Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach. [10791500] |Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. [10791510] |For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another they were walking more quickly, or even if there were accelerations and decelerations during the course of one observation. [10791520] |DTW has been applied to video, audio, and graphics – indeed, any data which can be turned into a linear representation can be analyzed with DTW. [10791530] |A well known application has been automatic speech recognition, to cope with different speaking speeds. [10791540] |In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other. [10791550] |This sequence alignment method is often used in the context of hidden Markov models. [10791560] |==Further information== [10791570] |Popular speech recognition conferences held each year or two include ICASSP, Eurospeech/ICSLP (now named Interspeech) and the IEEE ASRU. [10791580] |Conferences in the field of [[Natural Language Processing]], such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing. [10791590] |Important journals include the [[IEEE]] Transactions on Speech and Audio Processing (now named [[IEEE]] Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication. [10791600] |Books like "Fundamentals of Speech Recognition" by [[Lawrence Rabiner]] can be useful to acquire basic knowledge but may not be fully up to date (1993). [10791610] |Another good source can be "Statistical Methods for Speech Recognition" by Frederick Jelinek which is a more up to date book (1998). [10791620] |Even more up to date is "Computer Speech", by Manfred R. Schroeder, second edition published in 2004. [10791630] |A good insight into the techniques used in the best modern systems can be gained by paying attention to government sponsored evaluations such as those organised by [[DARPA]] (the largest speech recognition-related project ongoing as of 2007 is the GALE project, which involves both speech recognition and translation components). [10791640] |In terms of freely available resources, the [[HTK (software)|HTK]] book (and the accompanying HTK toolkit) is one place to start to both learn about speech recognition and to start experimenting. [10791650] |Another such resource is [[Carnegie Mellon University]]'s SPHINX toolkit. [10791660] |The AT&T libraries [http://www.research.att.com/projects/mohri/fsm FSM Library], [http://www.research.att.com/projects/mohri/grm GRM library], and [http://www.cs.nyu.edu/~mohri DCD library] are also general software libraries for large-vocabulary speech recognition. [10791670] |A useful review of the area of robustness in ASR is provided by Junqua and Haton (1995). [10800010] |
Speech synthesis
[10800020] |'''Speech synthesis''' is the artificial production of human [[Speech communication|speech]]. [10800030] |A computer system used for this purpose is called a '''speech synthesizer''', and can be implemented in [[software]] or [[Computer hardware|hardware]]. [10800040] |A '''text-to-speech (TTS)''' system converts normal language text into speech; other systems render [[symbolic linguistic representation]]s like [[phonetic transcription]]s into speech. [10800050] |Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a [[database]]. [10800060] |Systems differ in the size of the stored speech units; a system that stores [[phone]]s or [[diphone]]s provides the largest output range, but may lack clarity. [10800070] |For specific usage domains, the storage of entire words or sentences allows for high-quality output. [10800080] |Alternatively, a synthesizer can incorporate a model of the [[vocal tract]] and other human voice characteristics to create a completely "synthetic" voice output. [10800090] |The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood. [10800100] |An intelligible text-to-speech program allows people with [[visual impairment]]s or [[reading disability|reading disabilities]] to listen to written works on a home computer. [10800110] |Many computer operating systems have included speech synthesizers since the early 1980s. [10800120] |== Overview of text processing == [10800130] |A text-to-speech system (or "engine") is composed of two parts: a [[front-end]] and a back-end. [10800140] |The front-end has two major tasks. [10800150] |First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. [10800160] |This process is often called ''text normalization'', ''pre-processing'', or ''[[tokenization]]''. [10800170] |The front-end then assigns [[phonetic transcription]]s to each word, and divides and marks the text into [[prosody (linguistics)|prosodic units]], like [[phrase]]s, [[clause]]s, and [[sentence (linguistics)|sentence]]s. [10800180] |The process of assigning phonetic transcriptions to words is called ''text-to-phoneme'' or ''[[grapheme]]-to-phoneme'' conversion. [10800190] |Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. [10800200] |The back-end—often referred to as the ''synthesizer''—then converts the symbolic linguistic representation into sound. [10800210] |== History == [10800220] |Long before [[electronics|electronic]] [[signal processing]] was invented, there were those who tried to build machines to create human speech. [10800230] |Some early legends of the existence of [[Brazen Head|"speaking heads"]] involved [[Pope Silvester II|Gerbert of Aurillac]] (d. 1003 AD), [[Albertus Magnus]] (1198–1280), and [[Roger Bacon]] (1214–1294). [10800240] |In 1779, the [[Denmark|Danish]] scientist Christian Kratzenstein, working at the [[Russian Academy of Sciences]], built models of the human [[vocal tract]] that could produce the five long [[vowel]] sounds (in [[help:IPA|International Phonetic Alphabet]] notation, they are {{IPA|[aː]}}, {{IPA|[eː]}}, {{IPA|[iː]}}, {{IPA|[oː]}} and {{IPA|[uː]}}). [10800250] |This was followed by the [[bellows]]-operated "acoustic-mechanical speech machine" by [[Wolfgang von Kempelen]] of [[Vienna]], [[Austria]], described in a 1791 paper. [10800260] |This machine added models of the tongue and lips, enabling it to produce [[consonant]]s as well as vowels. [10800270] |In 1837, [[Charles Wheatstone]] produced a "speaking machine" based on von Kempelen's design, and in 1857, M. Faber built the "Euphonia". [10800280] |Wheatstone's design was resurrected in 1923 by Paget. [10800290] |In the 1930s, [[Bell Labs]] developed the [[Vocoder|VOCODER]], a keyboard-operated electronic speech analyzer and synthesizer that was said to be clearly intelligible. [10800300] |[[Homer Dudley]] refined this device into the VODER, which he exhibited at the [[1939 New York World's Fair]]. [10800310] |The [[Pattern playback]] was built by [[Franklin S. Cooper|Dr. Franklin S. Cooper]] and his colleagues at [[Haskins Laboratories]] in the late 1940s and completed in 1950. [10800320] |There were several different versions of this hardware device but only one currently survives. [10800330] |The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. [10800340] |Using this device, [[Alvin Liberman]] and colleagues were able to discover acoustic cues for the perception of [[phonetic]] segments (consonants and vowels). [10800350] |Early electronic speech synthesizers sounded robotic and were often barely intelligible. [10800360] |However, the quality of synthesized speech has steadily improved, and output from contemporary speech synthesis systems is sometimes indistinguishable from actual human speech. [10800370] |=== Electronic devices === [10800380] |The first computer-based speech synthesis systems were created in the late 1950s, and the first complete text-to-speech system was completed in 1968. [10800390] |In 1961, physicist [[John Larry Kelly, Jr]] and colleague Louis Gerstman used an [[IBM 704]] computer to synthesize speech, an event among the most prominent in the history of [[Bell Labs]]. [10800400] |Kelly's voice recorder synthesizer (vocoder) recreated the song "[[Daisy Bell]]", with musical accompaniment from [[Max Mathews]]. [10800410] |Coincidentally, [[Arthur C. Clarke]] was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. [10800420] |Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel ''[[2001: A Space Odyssey (novel)|2001: A Space Odyssey]]'', where the [[HAL 9000]] computer sings the same song as it is being put to sleep by astronaut [[Dave Bowman]]. [10800430] |Despite the success of purely electronic speech synthesis, research is still being conducted into mechanical speech synthesizers. [10800440] |== Synthesizer technologies == [10800450] |The most important qualities of a speech synthesis system are ''naturalness'' and ''[[Intelligibility]]''. [10800460] |Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. [10800470] |The ideal speech synthesizer is both natural and intelligible. [10800480] |Speech synthesis systems usually try to maximize both characteristics. [10800490] |The two primary technologies for generating synthetic speech waveforms are ''concatenative synthesis'' and ''[[formant]] synthesis''. [10800500] |Each technology has strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used. [10800510] |=== Concatenative synthesis === [10800520] |Concatenative synthesis is based on the [[concatenation]] (or stringing together) of segments of recorded speech. [10800530] |Generally, concatenative synthesis produces the most natural-sounding synthesized speech. [10800540] |However, differences between natural variations in speech and the nature of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output. [10800550] |There are three main sub-types of concatenative synthesis. [10800560] |
==== Unit selection synthesis ==== [10800570] |Unit selection synthesis uses large [[database]]s of recorded speech. [10800580] |During database creation, each recorded utterance is segmented into some or all of the following: individual [[phone]]s, [[diphone]]s, half-phones, [[syllable]]s, [[morpheme]]s, [[word]]s, [[phrase]]s, and [[Sentence (linguistics)|sentence]]s. [10800590] |Typically, the division into segments is done using a specially modified [[speech recognition|speech recognizer]] set to a "forced alignment" mode with some manual correction afterward, using visual representations such as the [[waveform]] and [[spectrogram]]. [10800600] |An [[index (database)|index]] of the units in the speech database is then created based on the segmentation and acoustic parameters like the [[fundamental frequency]] ([[pitch (music)|pitch]]), duration, position in the syllable, and neighboring phones. [10800610] |At [[runtime]], the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). [10800620] |This process is typically achieved using a specially weighted [[decision tree]]. [10800630] |Unit selection provides the greatest naturalness, because it applies only a small amount of [[digital signal processing]] (DSP) to the recorded speech. [10800640] |DSP often makes recorded speech sound less natural, although some systems use a small amount of signal processing at the point of concatenation to smooth the waveform. [10800650] |The output from the best unit-selection systems is often indistinguishable from real human voices, especially in contexts for which the TTS system has been tuned. [10800660] |However, maximum naturalness typically require unit-selection speech databases to be very large, in some systems ranging into the [[gigabyte]]s of recorded data, representing dozens of hours of speech. [10800670] |Also, unit selection algorithms have been known to select segments from a place that results in less than ideal synthesis (e.g. minor words become unclear) even when a better choice exists in the database.
[10800680] |
==== Diphone synthesis ==== [10800690] |Diphone synthesis uses a minimal speech database containing all the [[diphone]]s (sound-to-sound transitions) occurring in a language. [10800700] |The number of diphones depends on the [[phonotactics]] of the language: for example, Spanish has about 800 diphones, and German about 2500. [10800710] |In diphone synthesis, only one example of each diphone is contained in the speech database. [10800720] |At runtime, the target [[prosody]] of a sentence is superimposed on these minimal units by means of [[digital signal processing]] techniques such as [[linear predictive coding]], [[PSOLA]] or [[MBROLA]]. [10800730] |The quality of the resulting speech is generally worse than that of unit-selection systems, but more natural-sounding than the output of formant synthesizers. [10800740] |Diphone synthesis suffers from the sonic glitches of concatenative synthesis and the robotic-sounding nature of formant synthesis, and has few of the advantages of either approach other than small size. [10800750] |As such, its use in commercial applications is declining, although it continues to be used in research because there are a number of freely available software implementations.
[10800760] |
==== Domain-specific synthesis ==== [10800770] |Domain-specific synthesis concatenates prerecorded words and phrases to create complete utterances. [10800780] |It is used in applications where the variety of texts the system will output is limited to a particular domain, like transit schedule announcements or weather reports. [10800790] |The technology is very simple to implement, and has been in commercial use for a long time, in devices like talking clocks and calculators. [10800800] |The level of naturalness of these systems can be very high because the variety of sentence types is limited, and they closely match the prosody and intonation of the original recordings. [10800810] |Because these systems are limited by the words and phrases in their databases, they are not general-purpose and can only synthesize the combinations of words and phrases with which they have been preprogrammed. [10800820] |The blending of words within naturally spoken language however can still cause problems unless the many variations are taken into account. [10800830] |For example, in [[Rhotic and non-rhotic accents|non-rhotic]] dialects of English the in words like {{IPA|/ˈkliːə/}} is usually only pronounced when the following word has a vowel as its first letter (e.g. is realized as {{IPA|/ˌkliːəɹˈɑʊt/}}). [10800840] |Likewise in [[French language|French]], many final consonants become no longer silent if followed by a word that begins with a vowel, an effect called [[Liaison (French)|liaison]]. [10800845] |This [[alternation (linguistics)|alternation]] cannot be reproduced by a simple word-concatenation system, which would require additional complexity to be [[context-sensitive]].
[10800850] |=== Formant synthesis === [10800860] |[[Formant]] synthesis does not use human speech samples at runtime. [10800870] |Instead, the synthesized speech output is created using an acoustic model. [10800880] |Parameters such as [[fundamental frequency]], [[phonation|voicing]], and [[noise]] levels are varied over time to create a [[waveform]] of artificial speech. [10800890] |This method is sometimes called ''rules-based synthesis''; however, many concatenative systems also have rules-based components. [10800900] |Many systems based on formant synthesis technology generate artificial, robotic-sounding speech that would never be mistaken for human speech. [10800910] |However, maximum naturalness is not always the goal of a speech synthesis system, and formant synthesis systems have advantages over concatenative systems. [10800920] |Formant-synthesized speech can be reliably intelligible, even at very high speeds, avoiding the acoustic glitches that commonly plague concatenative systems. [10800930] |High-speed synthesized speech is used by the visually impaired to quickly navigate computers using a [[screen reader]]. [10800940] |Formant synthesizers are usually smaller programs than concatenative systems because they do not have a database of speech samples. [10800950] |They can therefore be used in [[embedded system]]s, where [[data storage device|memory]] and [[microprocessor]] power are especially limited. [10800960] |Because formant-based systems have complete control of all aspects of the output speech, a wide variety of prosodies and [[Intonation (linguistics)|intonation]]s can be output, conveying not just questions and statements, but a variety of emotions and tones of voice. [10800970] |Examples of non-real-time but highly accurate intonation control in formant synthesis include the work done in the late 1970s for the [[Texas Instruments]] toy [[Speak & Spell (game)|Speak & Spell]], and in the early 1980s [[Sega]] [[Video arcade|arcade]] machines. [10800980] |Creating proper intonation for these projects was painstaking, and the results have yet to be matched by real-time text-to-speech interfaces. [10800990] |=== Articulatory synthesis === [10801000] |[[Articulatory synthesis]] refers to computational techniques for synthesizing speech based on models of the human [[vocal tract]] and the articulation processes occurring there. [10801010] |The first articulatory synthesizer regularly used for laboratory experiments was developed at [[Haskins Laboratories]] in the mid-1970s by [[Philip Rubin]], Tom Baer, and Paul Mermelstein. [10801020] |This synthesizer, known as ASY, was based on vocal tract models developed at [[Bell Laboratories]] in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. [10801030] |Until recently, articulatory synthesis models have not been incorporated into commercial speech synthesis systems. [10801040] |A notable exception is the [[NeXT]]-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the [[University of Calgary]], where much of the original research was conducted. [10801050] |Following the demise of the various incarnations of NeXT (started by [[Steve Jobs]] in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under the [[GNU General Public License]], with work continuing as ''gnuspeech''. [10801060] |The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Carré's "distinctive region model". [10801070] |=== HMM-based synthesis === [10801080] |HMM-based synthesis is a synthesis method based on [[hidden Markov model]]s. [10801090] |In this system, the [[frequency spectrum]] ([[vocal tract]]), [[fundamental frequency]] (vocal source), and duration ([[prosody]]) of speech are modeled simultaneously by HMMs. [10801100] |Speech [[waveforms]] are generated from HMMs themselves based on the [[maximum likelihood]] criterion. [10801110] |=== Sinewave synthesis === [10801120] |[[Sinewave synthesis]] is a technique for synthesizing speech by replacing the [[formants]] (main bands of energy) with pure tone whistles. [10801130] |== Challenges == [10801140] |=== Text normalization challenges === [10801150] |The process of normalizing text is rarely straightforward. [10801160] |Texts are full of [[Heteronym (linguistics)|heteronym]]s, [[number]]s, and [[abbreviation]]s that all require expansion into a phonetic representation. [10801170] |There are many spellings in English which are pronounced differently based on context. [10801180] |For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project". [10801190] |Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. [10801200] |As a result, various [[heuristic]] techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence. [10801210] |Deciding how to convert numbers is another problem that TTS systems have to address. [10801220] |It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five." [10801230] |However, numbers occur in many different contexts; when a year or part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a [[social security number]], as "one three two five". [10801240] |A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous. [10801250] |Similarly, abbreviations can be ambiguous. [10801260] |For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". [10801270] |TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs. [10801280] |=== Text-to-phoneme challenges === [10801290] |Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme or grapheme-to-phoneme conversion ([[phoneme]] is the term used by linguists to describe distinctive sounds in a language). [10801300] |The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program. [10801310] |Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary. [10801320] |The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings. [10801330] |This is similar to the "sounding out", or [[synthetic phonics]], approach to learning reading. [10801340] |Each approach has advantages and drawbacks. [10801350] |The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary. [10801360] |As dictionary size grows, so too does the memory space requirements of the synthesis system. [10801370] |On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations. [10801380] |(Consider that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced [v].) [10801390] |As a result, nearly all speech synthesis systems use a combination of these approaches. [10801400] |Some languages, like [[Spanish language|Spanish]], have a very regular writing system, and the prediction of the pronunciation of words based on their spellings is quite successful. [10801410] |Speech synthesis systems for such languages often use the rule-based method extensively, resorting to dictionaries only for those few words, like foreign names and borrowings, whose pronunciations are not obvious from their spellings. [10801420] |On the other hand, speech synthesis systems for languages like [[English language|English]], which have extremely irregular spelling systems, are more likely to rely on dictionaries, and to use rule-based methods only for unusual words, or words that aren't in their dictionaries. [10801430] |=== Evaluation challenges === [10801440] |It is very difficult to evaluate speech synthesis systems consistently because there is no subjective criterion and usually different organizations use different speech data. [10801450] |The quality of a speech synthesis system highly depends on the quality of recording. [10801460] |Therefore, evaluating speech synthesis systems is almost the same as evaluating the recording skills. [10801470] |Recently researchers start evaluating speech synthesis systems using the common speech dataset. [10801480] |This may help people to compare the difference between technologies rather than recordings. [10801490] |=== Prosodics and emotional content === [10801500] |A recent study reported in the journal "'''Speech Communication'''" by Amy Drahota and colleagues at the [[University of Portsmouth]], [[UK]], reported that listeners to voice recordings could determine, at better than chance levels, whether or not the speaker was smiling. [10801510] |It was suggested that identification of the vocal features which signal emotional content may be used to help make synthesized speech sound more natural. [10801520] |== Dedicated hardware == [10801530] |*Votrax [10801540] |**SC-01A (analog formant) [10801550] |**SC-02 / SSI-263 / "Arctic 263" [10801560] |*General Instruments SP0256-AL2 (CTS256A-AL2, MEA8000) [10801570] |*Magnevation SpeakJet (www.speechchips.com TTS256) [10801580] |*Savage Innovations SoundGin [10801590] |*National Semiconductor DT1050 Digitalker (Mozer) [10801600] |*Silicon Systems SSI 263 (analog formant) [10801610] |*Texas Instruments [10801620] |**TMS5110A (LPC) [10801630] |**TMS5200 [10801640] |*Oki Semiconductor [10801650] |**MSM5205 [10801660] |**MSM5218RS (ADPCM) [10801670] |*Toshiba T6721A [10801680] |*Philips PCF8200 [10801690] |== Computer operating systems or outlets with speech synthesis == [10801700] |=== Apple === [10801710] |The first speech system integrated into an [[operating system]] was [[Apple Computer]]'s [[PlainTalk#The original MacInTalk|MacInTalk]] in 1984. [10801720] |Since the 1980s Macintosh Computers offered text to speech capabilities through The MacinTalk software. [10801730] |In the early 1990s Apple expanded its capabilities offering system wide text-to-speech support. [10801740] |With the introduction of faster PowerPC based computers they included higher quality voice sampling. [10801750] |Apple also introduced [[speech recognition]] into its systems which provided a fluid command set. [10801760] |More recently, Apple has added sample-based voices. [10801770] |Starting as a curiosity, the speech system of Apple [[Macintosh (computer)|Macintosh]] has evolved into a cutting edge fully-supported program, [[PlainTalk]], for people with vision problems. [10801780] |[[VoiceOver]] was included in Mac OS Tiger and more recently Mac OS Leopard. [10801790] |The voice shipping with Mac OS X 10.5 ("Leopard") is called "Alex" and features the taking of realistic-sounding breaths between sentences, as well as improved clarity at high read rates. [10801800] |=== AmigaOS === [10801810] |The second operating system with advanced speech synthesis capabilities was [[AmigaOS]], introduced in 1985. [10801820] |The voice synthesis was licensed by [[Commodore International]] from a third-party software house (Don't Ask Software, now Softvoice, Inc.) and it featured a complete system of voice emulation, with both male and female voices and "stress" indicator markers, made possible by advanced features of the [[Amiga]] hardware audio [[chipset]]. [10801830] |It was divided into a narrator device and a translator library. [10801840] |Amiga [[AmigaOS#Speech synthesis|Speak Handler]] featured a text-to-speech translator. [10801850] |AmigaOS considered speech synthesis a virtual hardware device, so the user could even redirect console output to it. [10801860] |Some Amiga programs, such as word processors, made extensive use of the speech system. [10801870] |=== Microsoft Windows === [10801880] |Modern [[Microsoft Windows|Windows]] systems use [[Speech Application Programming Interface#SAPI 1-4 API family|SAPI4]]- and [[Speech Application Programming Interface#SAPI 5 API family|SAPI5]]-based speech systems that include a [[speech recognition]] engine (SRE). [10801890] |SAPI 4.0 was available on Microsoft-based operating systems as a third-party add-on for systems like [[Windows 95]] and [[Windows 98]]. [10801900] |[[Windows 2000]] added a speech synthesis program called [[Microsoft Narrator|Narrator]], directly available to users. [10801910] |All Windows-compatible programs could make use of speech synthesis features, available through menus once installed on the system. [10801920] |[[Microsoft Speech Server]] is a complete package for voice synthesis and recognition, for commercial applications such as [[call centers]]. [10801930] |=== Internet === [10801940] |Currently, there are a number of [[Application software|applications]], [[plugin]]s and [[gadget]]s that can read messages directly from an [[e-mail client]] and web pages from a [[web browser]]. [10801950] |Some specialized [[Computer software|software]] can narrate [[RSS|RSS-feeds]]. [10801960] |On one hand, online RSS-narrators simplify information delivery by allowing users to listen to their favourite news sources and to convert them to [[podcast]]s. [10801970] |On the other hand, on-line RSS-readers are available on almost any [[Personal computer|PC]] connected to the Internet. [10801980] |Users can download generated audio files to portable devices, e.g. with a help of [[podcast]] receiver, and listen to them while walking, jogging or commuting to work. [10801990] |A growing field in internet based TTS technology is web-based assistive technology, e.g. Talklets. [10802000] |This web based approach to a traditionally locally installed form of software application can afford many of those requiring software for accessibility reason, the ability to access web content from public machines, or those belonging to others. [10802010] |While responsiveness is not as immediate as that of applications installed locally, the 'access anywhere' nature of it is the key benefit to this approach. [10802020] |=== Others === [10802030] |* Some models of Texas Instruments home computers produced in 1979 and 1981 ([[TI-99/4A|Texas Instruments TI-99/4 and TI-99/4A]]) were capable of text-to-phoneme synthesis or reciting complete words and phrases (text-to-dictionary), using a very popular Speech Synthesizer peripheral. [10802040] |TI used a proprietary [[codec]] to embed complete spoken phrases into applications, primarily video games. [10802050] |* Systems that operate on free and open source software systems including [[Linux|GNU/Linux]] are various, and include [[open-source]] programs such as the [[Festival Speech Synthesis System]] which uses diphone-based synthesis (and can use a limited number of [[MBROLA]] voices), and gnuspeech which uses articulatory synthesis from the [[Free Software Foundation]]. [10802060] |Other commercial vendor software also runs on GNU/Linux. [10802070] |* Several commercial companies are also developing speech synthesis systems (this list is reporting them just for the sake of information, not endorsing any specific product): [http://www.acapela-group.com Acapela Group], [[AT&T]], [[Cepstral]], [[DECtalk]], [[IBM ViaVoice]], [[IVONA|IVONA TTS]], [http://www.loquendo.com Loquendo TTS], [http://www.neospeech.com NeoSpeech TTS], [[Nuance Communications]], Rhetorical Systems, [http://www.svox.com SVOX] and [http://www.yakitome.com YAKiToMe!]. [10802080] |* Companies which developed speech synthesis systems but which are no longer in this business include BeST Speech (bought by L&H), [[Lernout & Hauspie]] (bankrupt), [[SpeechWorks]] (bought by Nuance) [10802090] |== Speech synthesis markup languages == [10802100] |A number of [[markup language]]s have been established for the rendition of text as speech in an [[XML]]-compliant format. [10802110] |The most recent is [[Speech Synthesis Markup Language]] (SSML), which became a [[W3C recommendation]] in 2004. [10802120] |Older speech synthesis markup languages include Java Speech Markup Language ([[JSML]]) and [[SABLE]]. [10802130] |Although each of these was proposed as a standard, none of them has been widely adopted. [10802140] |Speech synthesis markup languages are distinguished from dialogue markup languages. [10802150] |[[VoiceXML]], for example, includes tags related to speech recognition, dialogue management and touchtone dialing, in addition to text-to-speech markup. [10802160] |==Applications== [10802170] |===Accessibility=== [10802180] |Speech synthesis has long been a vital [[assistive technology]] tool and its application in this area is significant and widespread. [10802190] |It allows environmental barriers to be removed for people with a wide range of disabilities. [10802200] |The longest application has been in the use of [[screenreaders]] for people with [[visual impairment]], but text-to-speech systems are now commonly used by people with [[dyslexia]] and other reading difficulties as well as by pre-literate youngsters. [10802210] |They are also frequently employed to aid those with severe [[speech impairment]] usually through a dedicated [[voice output communication aid]]. [10802220] |===News service=== [10802230] |Sites such as [[Ananova]] have used speech synthesis to convert written news to audio content, which can be used for mobile applications. [10802240] |===Entertainment=== [10802250] |Speech synthesis techniques are used as well in the entertainment productions such as games, anime and similar. [10802260] |In 2007, Animo Limited announced the development of a software application package based on its speech synthesis software FineSpeech, explicitly geared towards customers in the entertainment industries, able to generate narration and lines of dialogue according to user specifications. [10802270] |Software such as [[Vocaloid]] can generate singing voices via lyrics and melody. [10802280] |This is also the aim of the Singing Computer project (which uses the [[GNU General Public License|GPL]] software [[GNU LilyPond|Lilypond]] and [[Festival Speech Synthesis System|Festival]]) to help blind people check their lyric input. [10810010] |
Statistical classification
[10810020] |'''Statistical classification''' is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a [[training set]] of previously labeled items. [10810030] |Formally, the problem can be stated as follows: given training data \{(\mathbf{x_1},y_1),\dots,(\mathbf{x_n}, y_n)\} produce a classifier h:\mathcal{X}\rightarrow\mathcal{Y} which maps an object \mathbf{x} \in \mathcal{X} to its classification label y \in \mathcal{Y}. [10810040] |For example, if the problem is filtering spam, then \mathbf{x_i} is some representation of an email and y is either "Spam" or "Non-Spam". [10810050] |Statistical classification algorithms are typically used in [[pattern recognition]] systems. [10810060] |'''Note:''' in [[community ecology]], the term "classification" is synonymous with what is commonly known (in [[machine learning]]) as [[data clustering|clustering]]. [10810070] |See that article for more information about purely [[unsupervised learning|unsupervised]] techniques. [10810080] |* The second problem is to consider classification as an [[estimation]] problem, where the goal is to estimate a function of the form [10810090] |:P({\rm class}|{\vec x}) = f\left(\vec x;\vec \theta\right) where the feature vector input is \vec x, and the function f is typically parameterized by some parameters \vec \theta. [10810100] |In the [[Bayesian statistics|Bayesian]] approach to this problem, instead of choosing a single parameter vector \vec \theta, the result is integrated over all possible thetas, with the thetas weighted by how likely they are given the training data D: [10810110] |:P({\rm class}|{\vec x}) = \int f\left(\vec x;\vec \theta\right)P(\vec \theta|D) d\vec \theta [10810120] |* The third problem is related to the second, but the problem is to estimate the [[conditional probability|class-conditional probabilities]] P(\vec x|{\rm class}) and then use [[Bayes' rule]] to produce the class probability as in the second problem. [10810130] |Examples of classification algorithms include: [10810140] |* [[Linear classifier]]s [10810150] |** [[Fisher's linear discriminant]] [10810160] |** [[Logistic regression]] [10810170] |** [[Naive Bayes classifier]] [10810180] |** [[Perceptron]] [10810190] |** [[Support vector machine]]s [10810200] |* [[Quadratic classifier]]s [10810210] |* [[Nearest_neighbor_(pattern_recognition)|k-nearest neighbor]] [10810220] |* [[Boosting]] [10810230] |* [[Decision tree]]s [10810240] |** [[Random forest]]s [10810250] |* [[Artificial neural networks|Neural network]]s [10810260] |* [[Bayesian network]]s [10810270] |* [[Hidden Markov model]]s [10810280] |An intriguing problem in pattern recognition yet to be solved is the relationship between the problem to be solved (data to be classified) and the performance of various pattern recognition algorithms (classifiers). [10810290] |Van der Walt and Barnard (see reference section) investigated very specific artificial data sets to determine conditions under which certain classifiers perform better and worse than others. [10810300] |Classifier performance depends greatly on the characteristics of the data to be classified. [10810310] |There is no single classifier that works best on all given problems (a phenomenon that may be explained by the [[No free lunch in search and optimization|No-free-lunch theorem]]). [10810320] |Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. [10810330] |Determining a suitable classifier for a given problem is however still more an art than a science. [10810340] |The most widely used classifiers are the [[Neural Network]] (Multi-layer Perceptron), [[Support Vector Machines]], [[KNN|k-Nearest Neighbours]], Gaussian Mixture Model, Gaussian, [[Naive Bayes]], [[Decision Tree]] and [[Radial Basis Function|RBF]] classifiers. [10810350] |== Evaluation == [10810360] |The measures [[Precision and Recall]] are popular metrics used to evaluate the quality of a classification system. [10810370] |More recently, [[Receiver Operating Characteristic]] (ROC) curves have been used to evaluate the tradeoff between true- and false-positive rates of classification algorithms. [10810380] |==Application domains== [10810390] |* [[Computer vision]] [10810400] |** [[Medical Imaging]] and Medical Image Analysis [10810410] |** [[Optical character recognition]] [10810420] |* [[Geostatistics]] [10810430] |* [[Speech recognition]] [10810440] |* [[Handwriting recognition]] [10810450] |* [[Biometric]] identification [10810460] |* [[Natural language processing]] [10810470] |* [[Document classification]] [10810480] |* Internet [[search engines]] [10810490] |* [[Credit scoring]] [10820010] |
Statistical machine translation
[10820020] |'''Statistical machine translation''' ('''SMT''') is a [[machine translation]] paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual [[text corpora]]. [10820030] |The statistical approach contrasts with the rule-based approaches to [[machine translation]] as well as with [[example-based machine translation]]. [10820040] |The first ideas of statistical machine translation were introduced by [[Warren Weaver]] in 1949, including the ideas of applying [[Claude Shannon]]'s [[information theory]]. [10820050] |Statistical machine translation was re-introduced in 1991 by researchers at [[IBM]]'s [[Thomas J. Watson Research Center]] and has contributed to the significant resurgence in interest in machine translation in recent years. [10820060] |As of 2006, it is by far the most widely-studied machine translation paradigm. [10820070] |==Benefits== [10820080] |The benefits of statistical machine translation over traditional paradigms that are most often cited are the following: [10820090] |* '''Better use of resources''' [10820100] |**There is a great deal of natural language in machine-readable format. [10820110] |**Generally, SMT systems are not tailored to any specific pair of languages. [10820120] |**Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages. [10820130] |* '''More natural translations''' [10820140] |The ideas behind statistical machine translation come out of [[information theory]]. [10820150] |Essentially, the document is translated on the [[probability]] p(e|f) that a string e in native language (for example, English) is the translation of a string f in foreign language (for example, French). [10820160] |Generally, these probabilities are estimated using techniques of [[parameter estimation]]. [10820170] |The [[Bayes Theorem]] is applied to p(e|f), the probability that the foreign string produces the native string to get p(e|f) \propto p(f|e) p(e), where the [[translation model]] p(f|e) is the probability that the native string is the translation of the foreign string, and the [[language model]] p(e) is the probability of seeing that native string. [10820180] |Mathematically speaking, finding the best translation \tilde{e} is done by picking up the one that gives the highest probability: [10820190] |: \tilde{e} = arg \max_{e \in e^*} p(e|f) = arg \max_{e\in e^*} p(f|e) p(e) . [10820200] |For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e^* in the native language. [10820210] |Performing the search efficiently is the work of a [[machine translation decoder]] that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. [10820220] |This trade-off between quality and time usage can also be found in [[speech recognition]]. [10820230] |As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. [10820240] |Language models are typically approximated by smoothed ''n''-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages. [10820250] |The statistical translation models were initially [[word]] based (Models 1-5 from [[IBM]]), but significant advances were made with the introduction of [[phrase]] based models. [10820260] |Recent work has incorporated [[syntax]] or quasi-syntactic structures. [10820270] |==Word-based translation== [10820280] |In word-based translation, translated elements are words. [10820290] |Typically, the number of words in translated sentences are different due to compound words, morphology and idioms. [10820300] |The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. [10820310] |Simple word-based translation is not able to translate language pairs with fertility rates different from one. [10820320] |To make word-based translation systems manage, for instance, high fertility rates, the system could be able to map a single word to multiple words, but not vice versa. [10820330] |For instance, if we are translating from French to English, each word in English could produce zero or more French words. [10820340] |But there's no way to group two English words producing a single French word. [10820350] |An example of a word-based translation system is the freely available [[GIZA++]] package ([[GPL]]ed), which includes [[IBM]] models. [10820360] |==Phrase-based translation== [10820370] |In phrase-based translation, the restrictions produced by word-based translation have been tried to reduce by translating sequences of words to sequences of words, where the lengths can differ. [10820380] |The sequences of words are called, for instance, blocks or phrases, but typically are not linguistic [[phrase]]s but phrases found using statistical methods from the corpus. [10820390] |Restricting the phrases to linguistic phrases has been shown to decrease translation quality. [10820400] |==Syntax-based translation== [10820410] |==Challenges with statistical machine translation== [10820420] |Problems that statistical machine translation have to deal with include [10820430] |=== Compound words === [10820440] |=== Idioms === [10820450] |=== Morphology === [10820460] |=== Different word orders === [10820470] |Word order in languages differ. [10820480] |Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages. [10820490] |There are also additional differences in word orders, for instance, where modifiers for nouns are located. [10820500] |In [[Speech Recognition]], the speech signal and the corresponding textual representation can be mapped to each other in blocks in order. [10820510] |This is not always the case with the same text in two languages. [10820520] |For SMT, the translation model is only able to translate small sequences of words and word order has to be taken into account somehow. [10820530] |Typical solution has been re-ordering models, where a distribution of location changes for each item of translation is approximated from aligned bi-text. [10820540] |Different location changes can be ranked with the help of the language model and the best can be selected. [10820550] |=== Syntax === [10820560] |=== Out of vocabulary (OOV) words === [10820570] |SMT systems store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated. [10820580] |Main reasons for out of vocabulary words are the limitation of training data, domain changes and morphology. [10830010] |
Statistics
[10830020] |'''Statistics''' is a [[Mathematics|mathematical science]] pertaining to the collection, analysis, interpretation or explanation, and presentation of [[data]]. [10830030] |It is applicable to a wide variety of [[academic discipline]]s, from the [[Natural science|natural]] and [[social science]]s to the [[humanities]], government and business. [10830040] |Statistical methods can be used to summarize or describe a collection of data; this is called '''[[descriptive statistics]]'''. [10830050] |In addition, patterns in the data may be [[mathematical model|modeled]] in a way that accounts for [[random]]ness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called '''[[inferential statistics]]'''. [10830060] |Both descriptive and inferential statistics comprise '''applied statistics'''. [10830070] |There is also a discipline called '''[[mathematical statistics]]''', which is concerned with the theoretical basis of the subject. [10830080] |The word '''''statistics''''' is also the plural of '''''[[statistic]]''''' (singular), which refers to the result of applying a statistical algorithm to a set of data, as in [[economic statistics]], [[crime statistics]], etc. [10830090] |==History== [10830100] |: [10830110] |''"Five men, [[Hermann Conring|Conring]],[[Gottfried Achenwall| Achenwall]], [[Johann Peter Süssmilch|Süssmilch]], [[John Graunt|Graunt]] and [[William Petty|Petty]] have been honored by different writers as the founder of statistics."'' claims one source (Willcox, Walter (1938) ''The Founder of Statistics''. [10830120] |Review of the [[International Statistical Institute]] 5(4):321-328.) [10830130] |Some scholars pinpoint the origin of statistics to 1662, with the publication of "[[Observations on the Bills of Mortality]]" by John Graunt. [10830140] |Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data. [10830150] |The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. [10830160] |Today, statistics is widely employed in government, business, and the natural and social sciences. [10830170] |Because of its empirical roots and its applications, statistics is generally considered not to be a subfield of pure mathematics, but rather a distinct branch of applied mathematics. [10830180] |Its mathematical foundations were laid in the 17th century with the development of [[probability theory]] by [[Pascal]] and [[Fermat]]. [10830190] |Probability theory arose from the study of games of chance. [10830200] |The [[method of least squares]] was first described by [[Carl Friedrich Gauss]] around 1794. [10830210] |The use of modern [[computer]]s has expedited large-scale statistical computation, and has also made possible new methods that are impractical to perform manually. [10830220] |==Overview== [10830230] |In applying statistics to a scientific, industrial, or societal problem, one begins with a process or [[statistical population|population]] to be studied. [10830240] |This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. [10830250] |It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a [[time series]]. [10830260] |For practical reasons, rather than compiling data about an entire population, one usually studies a chosen subset of the population, called a [[sampling (statistics)|sample]]. [10830270] |Data are collected about the sample in an observational or [[experiment]]al setting. [10830280] |The data are then subjected to statistical analysis, which serves two related purposes: description and inference. [10830290] |*[[Descriptive statistics]] can be used to summarize the data, either numerically or graphically, to describe the sample. [10830300] |Basic examples of numerical descriptors include the [[mean]] and [[standard deviation]]. [10830310] |Graphical summarizations include various kinds of charts and graphs. [10830320] |*[[Inferential statistics]] is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. [10830330] |These inferences may take the form of answers to yes/no questions ([[hypothesis testing]]), estimates of numerical characteristics ([[estimation]]), descriptions of association ([[correlation]]), or modeling of relationships ([[regression analysis|regression]]). [10830340] |Other [[mathematical model|modeling]] techniques include [[ANOVA]], [[time series]], and [[data mining]]. [10830350] |The concept of correlation is particularly noteworthy. [10830360] |Statistical analysis of a [[data set]] may reveal that two variables (that is, two properties of the population under consideration) tend to vary together, as if they are connected. [10830370] |For example, a study of annual income and age of death among people might find that poor people tend to have shorter lives than affluent people. [10830380] |The two variables are said to be correlated (which is a positive correlation in this case). [10830390] |However, one cannot immediately infer the existence of a causal relationship between the two variables. [10830400] |(See [[Correlation does not imply causation]].) [10830410] |The correlated phenomena could be caused by a third, previously unconsidered phenomenon, called a [[lurking variable]] or [[confounding variable]]. [10830420] |If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. [10830430] |A major problem lies in determining the extent to which the chosen sample is representative. [10830440] |Statistics offers methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place. [10830450] |(See [[experimental design]].) [10830460] |The fundamental mathematical concept employed in understanding such randomness is [[probability]]. [10830470] |[[Mathematical statistics]] (also called [[statistical theory]]) is the branch of [[applied mathematics]] that uses probability theory and [[mathematical analysis|analysis]] to examine the theoretical basis of statistics. [10830480] |The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method. [10830490] |[[Misuse of statistics]] can produce subtle but serious errors in description and interpretation — subtle in the sense that even experienced professionals sometimes make such errors, serious in the sense that they may affect, for instance, social policy, medical practice and the reliability of structures such as bridges. [10830500] |Even when statistics is correctly applied, the results can be difficult for the non-expert to interpret. [10830510] |For example, the [[statistical significance]] of a trend in the data, which measures the extent to which the trend could be caused by random variation in the sample, may not agree with one's intuitive sense of its significance. [10830520] |The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as [[statistical literacy]]. [10830530] |==Statistical methods== [10830540] |===Experimental and observational studies=== [10830550] |A common goal for a statistical research project is to investigate [[causality]], and in particular to draw a conclusion on the effect of changes in the values of predictors or [[independent variable]]s on response or [[dependent variable]]s. [10830560] |There are two major types of causal statistical studies, experimental studies and observational studies. [10830570] |In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. [10830580] |The difference between the two types lies in how the study is actually conducted. [10830590] |Each can be very effective. [10830600] |An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. [10830610] |In contrast, an observational study does not involve experimental manipulation. [10830620] |Instead, data are gathered and correlations between predictors and response are investigated. [10830630] |An example of an experimental study is the famous [[Hawthorne studies]], which attempted to test the changes to the working environment at the Hawthorne plant of the Western Electric Company. [10830640] |The researchers were interested in determining whether increased illumination would increase the productivity of the [[assembly line]] workers. [10830650] |The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected the productivity. [10830660] |It turned out that the productivity indeed improved (under the experimental conditions). [10830663] |(See [[Hawthorne effect]].) [10830665] |However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a [[control group]] and [[double-blind|blindedness]]. [10830670] |An example of an observational study is a study which explores the correlation between smoking and lung cancer. [10830680] |This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis. [10830690] |In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a [[case-control study]], and then look for the number of cases of lung cancer in each group. [10830700] |The basic steps of an experiment are; [10830710] |# Planning the research, including determining information sources, research subject selection, and [[ethics|ethical]] considerations for the proposed research and method. [10830720] |# [[Design of experiments]], concentrating on the system model and the interaction of independent and dependent variables. [10830730] |# [[summary statistics|Summarizing a collection of observations]] to feature their commonality by suppressing details. [10830740] |([[Descriptive statistics]]) [10830750] |# Reaching consensus about what [[statistical inference|the observations tell]] about the world being observed. [10830760] |([[Statistical inference]]) [10830770] |# Documenting / presenting the results of the study. [10830780] |===Levels of measurement=== [10830790] |:''See: [[Levels of measurement|Stanley Stevens' "Scales of measurement" (1946): nominal, ordinal, interval, ratio]]'' [10830800] |There are four types of measurements or [[level of measurement|levels of measurement]] or measurement scales used in statistics: nominal, ordinal, interval, and ratio. [10830810] |They have different degrees of usefulness in statistical [[research]]. [10830820] |Ratio measurements have both a zero value defined and the distances between different measurements defined; they provide the greatest flexibility in statistical methods that can be used for analyzing the data. [10830830] |Interval measurements have meaningful distances between measurements defined, but have no meaningful zero value defined (as in the case with IQ measurements or with temperature measurements in [[Fahrenheit]]). [10830840] |Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values. [10830850] |Nominal measurements have no meaningful rank order among values. [10830860] |Since variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are called together as categorical variables, whereas ratio and interval measurements are grouped together as quantitative or [[continuous variables]] due to their numerical nature. [10830870] |===Statistical techniques=== [10830880] |Some well known statistical [[Statistical hypothesis testing|test]]s and [[procedure]]s for [[research]] [[observation]]s are: [10830890] |* [[Student's t-test]] [10830900] |* [[chi-square test]] [10830910] |* [[Analysis of variance]] (ANOVA) [10830920] |* [[Mann-Whitney U]] [10830930] |* [[Regression analysis]] [10830940] |* [[Factor Analysis]] [10830950] |* [[Correlation]] [10830960] |* [[Pearson product-moment correlation coefficient]] [10830970] |* [[Spearman's rank correlation coefficient]] [10830980] |* [[Time Series Analysis]] [10830990] |==Specialized disciplines== [10831000] |Some fields of inquiry use applied statistics so extensively that they have [[specialized terminology]]. [10831010] |These disciplines include: [10831020] |* [[Actuarial science]] [10831030] |* [[Applied information economics]] [10831040] |* [[Biostatistics]] [10831050] |* [[Bootstrapping (statistics)|Bootstrap]] & [[Resampling (statistics)|Jackknife Resampling]] [10831060] |* [[Business statistics]] [10831070] |* [[Data analysis]] [10831080] |* [[Data mining]] (applying statistics and [[pattern recognition]] to discover knowledge from data) [10831090] |* [[Demography]] [10831100] |* [[Economic statistics]] (Econometrics) [10831110] |* [[Energy statistics]] [10831120] |* [[Engineering statistics]] [10831130] |* [[Environmental Statistics]] [10831140] |* [[Epidemiology]] [10831150] |* [[Geography]] and [[Geographic Information Systems]], more specifically in [[Spatial analysis]] [10831160] |* [[Image processing]] [10831170] |* [[Multivariate statistics|Multivariate Analysis]] [10831180] |* [[Psychological statistics]] [10831190] |* [[Quality]] [10831200] |* [[Social statistics]] [10831210] |* [[Statistical literacy]] [10831220] |* [[Statistical modeling]] [10831230] |* [[Statistical survey]]s [10831240] |* Process analysis and [[chemometrics]] (for analysis of data from [[analytical chemistry]] and [[chemical engineering]]) [10831250] |* [[Structured data analysis (statistics)]] [10831260] |* [[Survival analysis]] [10831270] |* [[Reliability engineering]] [10831280] |* Statistics in various sports, particularly [[Baseball statistics|baseball]] and [[Cricket statistics|cricket]] [10831290] |Statistics form a key basis tool in business and manufacturing as well. [10831300] |It is used to understand measurement systems variability, control processes (as in [[statistical process control]] or SPC), for summarizing data, and to make data-driven decisions. [10831310] |In these roles, it is a key tool, and perhaps the only reliable tool. [10831320] |==Statistical computing== [10831330] |The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. [10831340] |Early statistical models were almost always from the class of [[linear model]]s, but powerful computers, coupled with suitable numerical [[algorithms]], caused an increased interest in [[nonlinear regression|nonlinear models]] (especially [[neural networks]] and [[decision tree]]s) as well as the creation of new types, such as [[generalized linear model|generalised linear model]]s and [[multilevel model]]s. [10831350] |Increased computing power has also led to the growing popularity of computationally-intensive methods based on [[resampling (statistics)|resampling]], such as permutation tests and the [[bootstrapping (statistics)|bootstrap]], while techniques such as [[Gibbs sampling]] have made Bayesian methods more feasible. [10831360] |The computer revolution has implications for the future of statistics with new emphasis on "experimental" and "empirical" statistics. [10831370] |A large number of both general and special purpose [[List of statistical packages|statistical software]] are now available. [10831380] |== Misuse == [10831390] |: [10831400] |There is a general perception that statistical knowledge is all-too-frequently intentionally [[Misuse of statistics|misused]] by finding ways to interpret only the data that are favorable to the presenter. [10831410] |A famous saying attributed to [[Benjamin Disraeli]] is, "[[Lies, damned lies, and statistics|There are three kinds of lies: lies, damned lies, and statistics]]"; and Harvard President [[Lawrence Lowell]] wrote in 1909 that statistics, ''"like veal pies, are good if you know the person that made them, and are sure of the ingredients"''. [10831420] |If various studies appear to contradict one another, then the public may come to distrust such studies. [10831430] |For example, one study may suggest that a given diet or activity raises [[blood pressure]], while another may suggest that it lowers blood pressure. [10831440] |The discrepancy can arise from subtle variations in experimental design, such as differences in the patient groups or research protocols, that are not easily understood by the non-expert. [10831450] |(Media reports sometimes omit this vital contextual information entirely.) [10831460] |By choosing (or rejecting, or modifying) a certain sample, results can be manipulated. [10831470] |Such manipulations need not be malicious or devious; they can arise from unintentional biases of the researcher. [10831480] |The graphs used to summarize data can also be misleading. [10831490] |Deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis (the [[null hypothesis]]) to be "favored", and can also seem to exaggerate the importance of minor differences in large studies. [10831500] |A difference that is highly statistically significant can still be of no practical significance. [10831510] |(See [[Hypothesis test#Criticism|criticism of hypothesis testing]] and [[Null hypothesis#Controversy|controversy over the null hypothesis]].) [10831520] |One response is by giving a greater emphasis on the [[p-value|''p''-value]] than simply reporting whether a hypothesis is rejected at the given level of significance. [10831530] |The ''p''-value, however, does not indicate the size of the effect. [10831540] |Another increasingly common approach is to report [[confidence interval]]s. [10831550] |Although these are produced from the same calculations as those of hypothesis tests or ''p''-values, they describe both the size of the effect and the uncertainty surrounding it. [10840010] |
Syntax
[10840020] |In [[linguistics]], '''syntax''' (from [[Ancient Greek]] {{lang|grc|συν-}} ''syn-'', "together", and {{lang|grc|τάξις}} ''táxis'', "arrangement") is the study of the principles and rules for constructing [[sentence]]s in [[natural language]]s. [10840030] |In addition to referring to the discipline, the term ''syntax'' is also used to refer directly to the rules and principles that govern the sentence structure of any individual language, as in "the [[Irish syntax|syntax of Modern Irish]]". [10840040] |Modern research in syntax attempts to [[descriptive linguistics|describe languages]] in terms of such rules. [10840050] |Many professionals in this discipline attempt to find [[Universal Grammar|general rules]] that apply to all natural languages. [10840060] |The term ''syntax'' is also sometimes used to refer to the rules governing the behavior of mathematical systems, such as [[logic]], artificial formal languages, and computer programming languages. [10840070] |== Early history == [10840080] |Works on grammar were being written long before modern syntax came about; the ''Aṣṭādhyāyī'' of [[Pāṇini]] is often cited as an example of a pre-modern work that approaches the sophistication of a modern syntactic theory. [10840090] |In the West, the school of thought that came to be known as "traditional grammar" began with the work of [[Dionysius Thrax]]. [10840100] |For centuries, work in syntax was dominated by a framework known as {{lang|fr|''grammaire générale''}}, first expounded in 1660 by [[Antoine Arnauld]] in a book of the same title. [10840110] |This system took as its basic premise the assumption that language is a direct reflection of thought processes and therefore there is a single, most natural way to express a thought. [10840120] |That way, coincidentally, was exactly the way it was expressed in French. [10840130] |However, in the 19th century, with the development of [[historical-comparative linguistics]], linguists began to realize the sheer diversity of human language, and to question fundamental assumptions about the relationship between language and logic. [10840140] |It became apparent that there was no such thing as a most natural way to express a thought, and therefore logic could no longer be relied upon as a basis for studying the structure of language. [10840150] |The Port-Royal grammar modeled the study of syntax upon that of logic (indeed, large parts of the [[Port-Royal Logic]] were copied or adapted from the ''Grammaire générale''). [10840160] |Syntactic categories were identified with logical ones, and all sentences were analyzed in terms of "Subject – Copula – Predicate". [10840170] |Initially, this view was adopted even by the early comparative linguists such as [[Franz Bopp]]. [10840180] |The central role of syntax within theoretical linguistics became clear only in the 20th century, which could reasonably be called the "century of syntactic theory" as far as linguistics is concerned. [10840190] |For a detailed and critical survey of the history of syntax in the last two centuries, see the monumental work by Graffi (2001). [10840200] |==Modern theories== [10840210] |There are a number of theoretical approaches to the discipline of syntax. [10840220] |Many linguists (e.g. [[Noam Chomsky]]) see syntax as a branch of biology, since they conceive of syntax as the study of linguistic knowledge as embodied in the human [[mind]]. [10840240] |Others (e.g. [[Gerald Gazdar]]) take a more [[Philosophy of mathematics#Platonism|Platonistic]] view, since they regard syntax to be the study of an abstract [[formal system]]. [10840260] |Yet others (e.g. [[Joseph Greenberg]]) consider grammar a taxonomical device to reach broad generalizations across languages. [10840280] |Some of the major approaches to the discipline are listed below. [10840290] |===Generative grammar=== [10840300] |The hypothesis of [[generative grammar]] is that language is a structure of the human mind. [10840310] |The goal of generative grammar is to make a complete model of this inner language (known as ''[[i-language]]''). [10840320] |This model could be used to describe all human language and to predict the [[grammaticality]] of any given utterance (that is, to predict whether the utterance would sound correct to native speakers of the language). [10840330] |This approach to language was pioneered by [[Noam Chomsky]]. [10840340] |Most generative theories (although not all of them) assume that syntax is based upon the constituent structure of sentences. [10840350] |Generative grammars are among the theories that focus primarily on the form of a sentence, rather than its communicative function. [10840360] |Among the many generative theories of linguistics are: [10840370] |*[[Transformational Grammar]] (TG) (now largely out of date) [10840380] |*[[Government and binding theory]] (GB) (common in the late 1970s and 1980s) [10840390] |*[[Linguistic minimalism|Minimalism]] (MP) (the most recent Chomskyan version of generative grammar) [10840400] |Other theories that find their origin in the generative paradigm are: [10840410] |*[[Generative semantics]] (now largely out of date) [10840420] |*[[Relational grammar]] (RG) (now largely out of date) [10840430] |*[[Arc Pair grammar]] [10840440] |*[[Generalised phrase structure grammar|Generalized phrase structure grammar]] (GPSG; now largely out of date) [10840450] |*[[Head-driven phrase structure grammar]] (HPSG) [10840460] |*[[Lexical-functional grammar]] (LFG) [10840470] |===Categorial grammar === [10840480] |[[Categorial grammar]] is an approach that attributes the syntactic structure not to rules of grammar, but to the properties of the [[syntactic categories]] themselves. [10840490] |For example, rather than asserting that sentences are constructed by a rule that combines a noun phrase (NP) and a verb phrase (VP) (e.g. the [[phrase structure rule]] S → NP VP), in categorial grammar, such principles are embedded in the category of the [[head (linguistics)|head]] word itself. [10840500] |So the syntactic category for an [[intransitive]] verb is a complex formula representing the fact that the verb acts as a [[functor]] which requires an NP as an input and produces a sentence level structure as an output. [10840510] |This complex category is notated as (NP\S) instead of V. [10840515] |NP\S is read as " a category that searches to the left (indicated by \) for a NP (the element on the left) and outputs a sentence (the element on the right)". [10840520] |The category of [[transitive verb]] is defined as an element that requires two NPs (its subject and its direct object) to form a sentence. [10840530] |This is notated as (NP/(NP\S)) which means "a category that searches to the right (indicated by /) for an NP (the object), and generates a function (equivalent to the VP) which is (NP\S), which in turn represents a function that searches to the left for an NP and produces a sentence). [10840540] |[[Tree-adjoining grammar]] is a categorial grammar that adds in partial [[tree structure]]s to the categories. [10840550] |===Dependency grammar=== [10840560] |[[Dependency grammar]] is a different type of approach in which structure is determined by the [[relation]]s (such as [[grammatical relation]]s) between a word (a ''[[head (linguistics)|head]]'') and its dependents, rather than being based in constituent structure. [10840570] |For example, syntactic structure is described in terms of whether a particular [[noun]] is the [[subject]] or [[agent]] of the [[verb]], rather than describing the relations in terms of trees (one version of which is the [[parse tree]]) or other structural system. [10840580] |Some dependency-based theories of syntax: [10840590] |*[[Algebraic syntax]] [10840600] |*[[Word grammar]] [10840610] |*[[Operator Grammar]] [10840620] |===Stochastic/probabilistic grammars/network theories === [10840630] |Theoretical approaches to syntax that are based upon [[probability theory]] are known as [[stochastic grammar]]s. [10840640] |One common implementation of such an approach makes use of a [[neural network]] or [[connectionism]]. [10840650] |Some theories based within this approach are: [10840660] |*[[Optimality theory]] [10840670] |*[[Stochastic context-free grammar]] [10840680] |===Functionalist grammars=== [10840690] |Functionalist theories, although focused upon form, are driven by explanation based upon the function of a sentence (i.e. its communicative function). [10840700] |Some typical functionalist theories include: [10840710] |*[[Functional grammar]] (Dik) [10840720] |*[[Prague Linguistic Circle]] [10840730] |*[[Systemic functional grammar]] [10840740] |*[[Cognitive grammar]] [10840750] |*[[Construction grammar]] (CxG) [10840760] |*[[Role and reference grammar]] (RRG) [10850010] |
SYSTRAN
[10850020] |'''SYSTRAN''', founded by Dr. [[Peter Toma]] in [[1968]], is one of the oldest [[machine translation]] companies. [10850030] |SYSTRAN has done extensive work for the [[United States Department of Defense]] and the [[European Commission]]. [10850040] |SYSTRAN provides the technology for [[Yahoo!]] and [[AltaVista]]'s ([[Babel Fish (website)|Babel Fish]]) among others, but use of it was ended (circa 2007) for all of the language combinations offered by [[Google]]'s [[List of Google products#anchor_language_tools|language tools]]. [10850050] |Commercial versions of SYSTRAN operate with operating systems [[Microsoft Windows]] (including [[Windows Mobile]]), [[Linux]] and [[Solaris (operating system)|Solaris]]. [10850060] |== History == [10850070] |With its origin in the [[Georgetown-IBM experiment|Georgetown]] machine translation effort, SYSTRAN was one of the few machine translation systems to survive the major decrease of funding after the [[ALPAC|ALPAC Report]] of the mid-1960's. [10850080] |The company was established in [[La Jolla, San Diego, California|La Jolla]], [[California]] to work on translation of Russian to English text for the [[United States Air Force]] during the "[[Cold War]]". [10850090] |Large numbers of Russian scientific and technical documents were translated using SYSTRAN under the auspices of the USAF Foreign Technology Division (later the National Air and Space Intelligence Center) at [[Wright-Patterson Air Force Base]], Ohio. [10850100] |The quality of the translations, although only approximate, was usually adequate for understanding content. [10850110] |The company was sold during 1986 to the Gachot family, based in [[Paris]], [[France]], and is now traded publicly by the French stock exchange. [10850120] |It has a main office at the [[Grande Arche]] in [[La Defense]] and maintains a secondary office in [[La Jolla, San Diego, California]]. [10850130] |== Languages == [10850140] |Here is a list of the source and target languages SYSTRAN works with. [10850150] |Many of the pairs are to or from English or French. [10850160] |* Russian into English (1968) [10850170] |* English into Russian (1973) for the [[Apollo-Soyuz]] project [10850180] |* English source (1975) for the [[European Commission]] [10850190] |* Arabic [10850200] |* Chinese [10850210] |* Danish [10850220] |* Dutch [10850230] |* French [10850240] |* German [10850250] |* Greek [10850260] |* Hindi [10850270] |* Italian [10850280] |* Japanese [10850290] |* Korean [10850300] |* Norwegian [10850310] |* Serbo-Croatian [10850320] |* Spanish [10850330] |* Swedish [10850340] |* Persian [10850350] |* Polish [10850360] |* Portuguese [10850370] |* Ukrainian [10850380] |* Urdu [10860010] |
Text analytics
[10860020] |The term '''text analytics''' describes a set of linguistic, lexical, pattern recognition, extraction, tagging/structuring, visualization, and predictive techniques. [10860030] |The term also describes processes that apply these techniques, whether independently or in conjunction with query and analysis of fielded, numerical data, to solve business problems. [10860040] |These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing. [10860050] |A typical application is to scan a set of documents written in a [[natural language]] and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. [10860060] |Current approaches to text analytics use [[natural language processing]] techniques that focus on specialized domains. [10860070] |Typical subtasks are: [10860080] |* [[Named Entity Recognition]]: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions. [10860090] |* [[Coreference]]: identification chains of [[noun phrase]]s that refer to the same object. [10860100] |For example, [[Anaphora (linguistics)|anaphora]] is a type of coreference. [10860110] |* [[Relationship Extraction]]: extraction of named relationships between entities in text