[10710010] |
Predictive analytics
[10710020] |'''Predictive analytics''' encompasses a variety of techniques from [[statistics]] and [[data mining]] that analyze current and historical data to make predictions about future events. [10710030] |Such predictions rarely take the form of absolute statements, and are more likely to be expressed as values that correspond to the odds of a particular event or behavior taking place in the future. [10710040] |In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. [10710050] |Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. [10710060] |One of the most well-known applications is [[credit scoring]], which is used throughout [[financial services]]. [10710070] |Scoring models process a customer’s [[credit history]], [[loan application]], customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time. [10710080] |Predictive analytics are also used in [[insurance]], [[telecommunications]], [[retail]], [[travel]], [[healthcare]], [[Pharmaceutical company|pharmaceuticals]] and other fields. [10710090] |== Types of predictive analytics == [10710100] |Generally, predictive analytics is used to mean [[predictive modeling]], scoring of predictive models, and [[forecasting]]. [10710110] |However, people are increasingly using the term to describe related analytic disciplines, such as descriptive modeling and decision modeling or optimization. [10710120] |These disciplines also involve rigorous data analysis, and are widely used in business for segmentation and decision making, but have different purposes and the statistical techniques underlying them vary. [10710130] |===Predictive models=== [10710140] |Predictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in the future in order to improve [[marketing effectiveness]]. [10710150] |This category also encompasses models that seek out subtle data patterns to answer questions about customer performance, such as fraud detection models. [10710160] |Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction, in order to guide a decision. [10710170] |===Descriptive models=== [10710180] |Descriptive models “describe” relationships in data in a way that is often used to classify customers or prospects into groups. [10710190] |Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers or products. [10710200] |But the descriptive models do not rank-order customers by their likelihood of taking a particular action the way predictive models do. [10710210] |Descriptive models are often used “offline,” for example, to categorize customers by their product preferences and life stage. [10710220] |Descriptive modeling tools can be utilized to develop agent based models that can simulate large number of individualized agents to predict possible futures. [10710230] |===Decision models=== [10710240] |Decision models describe the relationship between all the elements of a decision — the known data (including results of predictive models), the decision and the forecast results of the decision — in order to predict the results of decisions involving many variables. [10710250] |These models can be used in optimization, a data-driven approach to improving decision logic that involves maximizing certain outcomes while minimizing others. [10710260] |Decision models are generally used offline, to develop decision logic or a set of business rules that will produce the desired action for every customer or circumstance. [10710270] |== Predictive analytics == [10710280] |===Definition=== [10710290] |Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns. [10710300] |The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes. [10710310] |===Current uses=== [10710320] |Although predictive analytics can be put to use in many applications, we outline a few examples where predictive analytics has shown positive impact in recent years. [10710330] |====Analytical Customer Relationship Management (CRM)==== [10710340] |Analytical [[Customer Relationship Management]] is a frequent commercial application of Predictive Analysis. [10710350] |Methods of predictive analysis are applied to customer data to pursue CRM objectives. [10710360] |====Direct marketing==== [10710370] |Product [[marketing]] is constantly faced with the challenge of coping with the increasing number of competing products, different consumer preferences and the variety of methods (channels) available to interact with each consumer. [10710380] |Efficient marketing is a process of understanding the amount of variability and tailoring the marketing strategy for greater profitability. [10710390] |Predictive analytics can help identify consumers with a higher likelihood of responding to a particular marketing offer. [10710400] |Models can be built using data from consumers’ past purchasing history and past response rates for each channel. [10710410] |Additional information about the consumers demographic, geographic and other characteristics can be used to make more accurate predictions. [10710420] |Targeting only these consumers can lead to substantial increase in response rate which can lead to a significant reduction in cost per acquisition. [10710430] |Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of products and marketing channels that should be used to target a given consumer. [10710440] |====Cross-sell==== [10710450] |Often corporate organizations collect and maintain abundant data (e.g. customer records, sale transactions) and exploiting hidden relationships in the data can provide a competitive advantage to the organization. [10710460] |For an organization that offers multiple products, an analysis of existing customer behavior can lead to efficient [[cross-selling|cross sell]] of products. [10710470] |This directly leads to higher profitability per customer and strengthening of the customer relationship. [10710480] |Predictive analytics can help analyze customers’ spending, usage and other behavior, and help cross-sell the right product at the right time. [10710490] |====Customer retention==== [10710500] |With the amount of competing services available, businesses need to focus efforts on maintaining continuous [[consumer satisfaction]]. [10710510] |In such a competitive scenario, [[consumer loyalty]] needs to be rewarded and [[customer attrition]] needs to be minimized. [10710520] |Businesses tend to respond to customer attrition on a reactive basis, acting only after the customer has initiated the process to terminate service. [10710530] |At this stage, the chance of changing the customer’s decision is almost impossible. [10710540] |Proper application of predictive analytics can lead to a more proactive retention strategy. [10710550] |By a frequent examination of a customer’s past service usage, service performance, spending and other behavior patterns, predictive models can determine the likelihood of a customer wanting to terminate service sometime in the near future. [10710560] |An intervention with lucrative offers can increase the chance of retaining the customer. [10710570] |Silent attrition is the behavior of a customer to slowly but steadily reduce usage and is another problem faced by many companies. [10710580] |Predictive analytics can also predict this behavior accurately and before it occurs, so that the company can take proper actions to increase customer activity. [10710590] |====Underwriting==== [10710600] |Many businesses have to account for risk exposure due to their different services and determine the cost needed to cover the risk. [10710610] |For example, auto insurance providers need to accurately determine the amount of premium to charge to cover each automobile and driver. [10710620] |A financial company needs to assess a borrower’s potential and ability to pay before granting a loan. [10710630] |For a health insurance provider, predictive analytics can analyze a few years of past medical claims data, as well as lab, pharmacy and other records where available, to predict how expensive an enrollee is likely to be in the future. [10710640] |Predictive analytics can help [[underwriting]] of these quantities by predicting the chances of illness, [[Default (finance)|default]], [[bankruptcy]], etc. [10710650] |Predictive analytics can streamline the process of customer acquisition, by predicting the future risk behavior of a customer using application level data. [10710660] |Proper predictive analytics can lead to proper pricing decisions, which can help mitigate future risk of default. [10710670] |====Collection analytics==== [10710680] |Every portfolio has a set of delinquent customers who do not make their payments on time. [10710690] |The financial institution has to undertake collection activities on these customers to recover the amounts due. [10710700] |A lot of collection resources are wasted on customers who are difficult or impossible to recover. [10710710] |Predictive analytics can help optimize the allocation of collection resources by identifying the most effective collection agencies, contact strategies, legal actions and other strategies to each customer, thus significantly increasing recovery at the same time reducing collection costs. [10710720] |====Fraud detection==== [10710730] |Fraud is a big problem for many businesses and can be of various types. [10710740] |Inaccurate credit applications, fraudulent transactions, [[identity theft]]s and false insurance claims are some examples of this problem. [10710750] |These problems plague firms all across the spectrum and some examples of likely victims are [[Credit card fraud|credit card issuers]], insurance companies, retail merchants, manufacturers, business to business suppliers and even services providers. [10710760] |This is an area where a predictive model is often used to help weed out the “bads” and reduce a business's exposure to fraud. [10710770] |====Portfolio, product or economy level prediction==== [10710780] |Often the focus of analysis is not the consumer but the product, portfolio, firm, industry or even the economy. [10710790] |For example a retailer might be interested in predicting store level demand for inventory management purposes. [10710800] |Or the Federal Reserve Board might be interested in predicting the unemployment rate for the next year. [10710810] |These type of problems can be addressed by predictive analytics using Time Series techniques (see below). [10710820] |Wrong Information.... [10710830] |==Statistical techniques== [10710840] |The approaches and techniques used to conduct predictive analytics can broadly be grouped into regression techniques and machine learning techniques. [10710850] |====Regression Techniques==== [10710860] |Regression models are the mainstay of predictive analytics. [10710870] |The focus lies on establishing a mathematical equation as a model to represent the interactions between the different variables in consideration. [10710880] |Depending on the situation, there is a wide variety of models that can be applied while performing predictive analytics. [10710890] |Some of them are briefly discussed below. [10710900] |=====Linear Regression Model===== [10710910] |The linear regression model analyzes the relationship between the response or dependent variable and a set of independent or predictor variables. [10710920] |This relationship is expressed as an equation that predicts the response variable as a linear function of the parameters. [10710930] |These parameters are adjusted so that a measure of fit is optimized. [10710940] |Much of the effort in model fitting is focused on minimizing the size of the residual, as well as ensuring that it is randomly distributed with respect to the model predictions. [10710950] |The goal of regression is to select the parameters of the model so as to minimize the sum of the squared residuals. [10710960] |This is referred to as '''[[ordinary least squares]]''' (OLS) estimation and results in best linear unbiased estimates (BLUE) of the parameters if and only if the [[Gauss–Markov theorem|Gauss-Markowitz]] assumptions are satisfied. [10710970] |Once the model has been estimated we would be interested to know if the predictor variables belong in the model – i.e. is the estimate of each variable’s contribution reliable? [10710980] |To do this we can check the statistical significance of the model’s coefficients which can be measured using the t-statistic. [10710990] |This amounts to testing whether the coefficient is significantly different from zero. [10711000] |How well the model predicts the dependent variable based on the value of the independent variables can be assessed by using the R² statistic. [10711010] |It measures predictive power of the model i.e. the proportion of the total variation in the dependent variable that is “explained” (accounted for) by variation in the independent variables. [10711020] |====Discrete choice models==== [10711030] |Multivariate regression (above) is generally used when the response variable is continuous and has an unbounded range. [10711040] |Often the response variable may not be continuous but rather discrete. [10711050] |While mathematically it is feasible to apply multivariate regression to discrete ordered dependent variables, some of the assumptions behind the theory of multivariate linear regression no longer hold, and there are other techniques such as discrete choice models which are better suited for this type of analysis. [10711060] |If the dependent variable is discrete, some of those superior methods are [[logistic regression]], [[multinomial logit]] and [[probit]] models. [10711070] |Logistic regression and probit models are used when the dependent variable is [[binary numeral system|binary]]. [10711080] |=====Logistic regression===== [10711090] |In a classification setting, assigning outcome probabilities to observations can be achieved through the use of a logistic model, which is basically a method which transforms information about the binary dependent variable into an unbounded continuous variable and estimates a regular multivariate model (See Allison’s Logistic Regression for more information on the theory of Logistic Regression). [10711100] |The [[Wald test|Wald]] and [[likelihood-ratio test]] are used to test the statistical significance of each coefficient b in the model (analogous to the t tests used in OLS regression; see above). [10711110] |A test assessing the goodness-of-fit of a classification model is the [[Hosmer and Lemeshow test]]. [10711120] |=====Multinomial logistic regression===== [10711130] |An extension of the [[binary logit model]] to cases where the dependent variable has more than 2 categories is the [[multinomial logit model]]. [10711140] |In such cases collapsing the data into two categories might not make good sense or may lead to loss in the richness of the data. [10711150] |The multinomial logit model is the appropriate technique in these cases, especially when the dependent variable categories are not ordered (for examples colors like red, blue, green). [10711160] |Some authors have extended multinomial regression to include feature selection/importance methods such as [[Random multinomial logit]]. [10711170] |=====Probit regression===== [10711180] |Probit models offer an alternative to logistic regression for modeling categorical dependent variables. [10711190] |Even though the outcomes tend to be similar, the underlying distributions are different. [10711200] |Probit models are popular in social sciences like economics. [10711210] |A good way to understand the key difference between probit and logit models, is to assume that there is a latent variable z. [10711220] |We do not observe z but instead observe y which takes the value 0 or 1. [10711230] |In the logit model we assume that follows a logistic distribution. [10711240] |In the probit model we assume that follows a standard normal distribution. [10711250] |Note that in social sciences (example economics), probit is often used to model situations where the observed variable y is continuous but takes values between 0 and 1. [10711260] |=====Logit vs. Probit===== [10711270] |The Probit model has been around longer than the logit model. [10711280] |They look identical, except that the logistic distribution tends to be a little flat tailed. [10711290] |In fact one of the reasons the logit model was formulated was that the probit model was extremely hard to compute because it involved calculating difficult integrals. [10711300] |Modern computing however has made this computation fairly simple. [10711310] |The coefficients obtained from the logit and probit model are also fairly close. [10711320] |However the odds ratio makes the logit model easier to interpret. [10711330] |For practical purposes the only reasons for choosing the probit model over the logistic model would be: [10711340] |* There is a strong belief that the underlying distribution is normal [10711350] |* The actual event is not a binary outcome (e.g. Bankrupt/not bankrupt) but a proportion (e.g. Proportion of population at different debt levels). [10711360] |==== Time series models==== [10711370] |[[Time series]] models are used for predicting or forecasting the future behavior of variables. [10711380] |These models account for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. [10711390] |As a result standard regression techniques cannot be applied to time series data and methodology has been developed to decompose the trend, seasonal and cyclical component of the series. [10711400] |Modeling the dynamic path of a variable can improve forecasts since the predictable component of the series can be projected into the future. [10711410] |Time series models estimate difference equations containing stochastic components. [10711420] |Two commonly used forms of these models are [[autoregressive model]]s (AR) and [[Moving average (technical analysis)|moving average]] (MA) models. [10711430] |The [[Box-Jenkins]] methodology (1976) developed by George Box and G.M. Jenkins combines the AR and MA models to produce the [[Autoregressive moving average model|ARMA]] (autoregressive moving average) model which is the cornerstone of stationary time series analysis. [10711440] |ARIMA (autoregressive integrated moving average models) on the other hand are used to describe non-stationary time series. [10711450] |Box and Jenkins suggest differencing a non stationary time series to obtain a stationary series to which an ARMA model can be applied. [10711460] |Non stationary time series have a pronounced trend and do not have a constant long-run mean or variance. [10711470] |Box and Jenkins proposed a three stage methodology which includes: model identification, estimation and validation. [10711480] |The identification stage involves identifying if the series is stationary or not and the presence of seasonality by examining plots of the series, autocorrelation and partial autocorrelation functions. [10711490] |In the estimation stage, models are estimated using non-linear time series or maximum likelihood estimation procedures. [10711500] |Finally the validation stage involves diagnostic checking such as plotting the residuals to detect outliers and evidence of model fit. [10711510] |In recent years time series models have become more sophisticated and attempt to model conditional heteroskedasticity with models such as ARCH ([[autoregressive conditional heteroskedasticity]]) and GARCH (generalized autoregressive conditional heteroskedasticity) models frequently used for financial time series. [10711520] |In addition time series models are also used to understand inter-relationships among economic variables represented by systems of equations using VAR (vector autoregression) and structural VAR models. [10711530] |==== Survival or duration analysis==== [10711540] |[[Survival analysis]] is another name for time to event analysis. [10711550] |These techniques were primarily developed in the medical and biological sciences, but they are also widely used in the social sciences like economics, as well as in engineering (reliability and failure time analysis). [10711560] |Censoring and non-normality which are characteristic of survival data generate difficulty when trying to analyze the data using conventional statistical models such as multiple linear regression. [10711570] |The Normal distribution, being a symmetric distribution, takes positive as well as negative values, but duration by its very nature cannot be negative and therefore normality cannot be assumed when dealing with duration/survival data. [10711580] |Hence the normality assumption of regression models is violated. [10711590] |A censored observation is defined as an observation with incomplete information. [10711600] |Censoring introduces distortions into traditional statistical methods and is essentially a defect of the sample data. [10711610] |The assumption is that if the data were not censored it would be representative of the population of interest. [10711620] |In survival analysis, censored observations arise whenever the dependent variable of interest represents the time to a terminal event, and the duration of the study is limited in time. [10711630] |An important concept in survival analysis is the hazard rate. [10711640] |The hazard rate is defined as the probability that the event will occur at time t conditional on surviving until time t. [10711650] |Another concept related to the hazard rate is the survival function which can be defined as the probability of surviving to time t. [10711660] |Most models try to model the hazard rate by choosing the underlying distribution depending on the shape of the hazard function. [10711670] |A distribution whose hazard function slopes upward is said to have positive duration dependence, a decreasing hazard shows negative duration dependence whereas constant hazard is a process with no memory usually characterized by the exponential distribution. [10711680] |Some of the distributional choices in survival models are: F, gamma, Weibull, log normal, inverse normal, exponential etc. [10711690] |All these distributions are for a non-negative random variable. [10711700] |Duration models can be parametric, non-parametric or semi-parametric. [10711710] |Some of the models commonly used are Kaplan-Meier, Cox proportional hazard model (non parametric). [10711720] |==== Classification and regression trees==== [10711730] |Classification and regression trees (CART) is a [[non-parametric statistics|non-parametric]] technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively. [10711740] |Trees are formed by a collection of rules based on values of certain variables in the modeling data set [10711750] |* Rules are selected based on how well splits based on variables’ values can differentiate observations based on the dependent variable [10711760] |* Once a rule is selected and splits a node into two, the same logic is applied to each “child” node (i.e. it is a recursive procedure) [10711770] |* Splitting stops when CART detects no further gain can be made, or some pre-set stopping rules are met [10711780] |Each branch of the tree ends in a terminal node [10711790] |* Each observation falls into one and exactly one terminal node [10711800] |* Each terminal node is uniquely defined by a set of rules [10711810] |A very popular method for predictive analytics is Leo Breiman's [[Random forests]] or derived versions of this technique like [[Random multinomial logit]]. [10711820] |==== Multivariate adaptive regression splines==== [10711830] |[[Multivariate adaptive regression splines]] (MARS) is a [[Non-parametric statistics|non-parametric]] technique that builds flexible models by fitting [[piecewise linear regression]]s. [10711840] |An important concept associated with regression splines is that of a knot. [10711850] |Knot is where one local regression model gives way to another and thus is the point of intersection between two splines. [10711860] |In multivariate and adaptive regression splines, [[basis function]]s are the tool used for generalizing the search for knots. [10711870] |Basis functions are a set of functions used to represent the information contained in one or more variables. [10711880] |Multivariate and Adaptive Regression Splines model almost always creates the basis functions in pairs. [10711890] |Multivariate and adaptive regression spline approach deliberately overfits the model and then prunes to get to the optimal model. [10711900] |The algorithm is computationally very intensive and in practice we are required to specify an upper limit on the number of basis functions. [10711910] |=== Machine learning techniques=== [10711920] |[[Machine learning]], a branch of artificial intelligence, was originally employed to develop techniques to enable computers to learn. [10711930] |Today, since it includes a number of advanced statistical methods for regression and classification, it finds application in a wide variety of fields including [[medical diagnostics]], [[credit card fraud detection]], [[Face recognition|face]] and [[speech recognition]] and analysis of the [[stock market]]. [10711940] |In certain applications it is sufficient to directly predict the dependent variable without focusing on the underlying relationships between variables. [10711950] |In other cases, the underlying relationships can be very complex and the mathematical form of the dependencies unknown. [10711960] |For such cases, machine learning techniques emulate [[human cognition]] and learn from training examples to predict future events. [10711970] |A brief discussion of some of these methods used commonly for predictive analytics is provided below. [10711980] |A detailed study of machine learning can be found in Mitchell (1997). [10711990] |==== Neural networks==== [10712000] |[[Neural networks]] are [[Nonlinearity|nonlinear]] sophisticated modeling techniques that are able to [[Model (abstract)|model]] complex functions. [10712010] |They can be applied to problems of [[Time series|prediction]], [[Statistical classification|classification]] or [[Control theory|control]] in a wide spectrum of fields such as [[finance]], [[cognitive psychology]]/[[cognitive neuroscience|neuroscience]], [[medicine]], [[engineering]], and [[physics]]. [10712020] |Neural networks are used when the exact nature of the relationship between inputs and output is not known. [10712030] |A key feature of neural networks is that they learn the relationship between inputs and output through training. [10712040] |There are two types of training in neural networks used by different networks, [[Supervised learning|supervised]] and [[Unsupervised learning|unsupervised]] training, with supervised being the most common one. [10712050] |Some examples of neural network training techniques are [[backpropagation]], quick propagation, [[Conjugate gradient method|conjugate gradient descent]], [[Radial basis function|projection operator]], Delta-Bar-Delta etc. [10712060] |Theses are applied to network architectures such as multilayer [[perceptron]]s, [[Self-organizing map|Kohonen network]]s, [[Hopfield network]]s, etc. [10712070] |====Radial basis functions==== [10712080] |A [[radial basis function]] (RBF) is a function which has built into it a distance criterion with respect to a center. [10712090] |Such functions can be used very efficiently for interpolation and for smoothing of data. [10712100] |Radial basis functions have been applied in the area of [[neural network]]s where they are used as a replacement for the sigmoidal transfer function. [10712110] |Such networks have 3 layers, the input layer, the hidden layer with the RBF non-linearity and a linear output layer. [10712120] |The most popular choice for the non-linearity is the Gaussian. [10712130] |RBF networks have the advantage of not being locked into local minima as do the [[feed-forward]] networks such as the multilayer perceptron. [10712140] |==== Support vector machines==== [10712150] |[[Support Vector Machine]]s (SVM) are used to detect and exploit complex patterns in data by clustering, classifying and ranking the data. [10712160] |They are learning machines that are used to perform binary classifications and regression estimations. [10712170] |They commonly use kernel based methods to apply linear classification techniques to non-linear classification problems. [10712180] |There are a number of types of SVM such as linear, polynomial, sigmoid etc. [10712190] |==== Naïve Bayes==== [10712200] |[[Naive Bayes classifier|Naïve Bayes]] based on Bayes conditional probability rule is used for performing classification tasks. [10712210] |Naïve Bayes assumes the predictors are statistically independent which makes it an effective classification tool that is easy to interpret. [10712220] |It is best employed when faced with the problem of ‘curse of dimensionality’ i.e. when the number of predictors is very high. [10712230] |==== k-nearest neighbours==== [10712240] |The [[K-nearest neighbor algorithm|nearest neighbour algorithm]] (KNN) belongs to the class of pattern recognition statistical methods. [10712250] |The method does not impose a priori any assumptions about the distribution from which the modeling sample is drawn. [10712260] |It involves a training set with both positive and negative values. [10712270] |A new sample is classified by calculating the distance to the nearest neighbouring training case. [10712280] |The sign of that point will determine the classification of the sample. [10712290] |In the k-nearest neighbour classifier, the k nearest points are considered and the sign of the majority is used to classify the sample. [10712300] |The performance of the kNN algorithm is influenced by three main factors: (1) the distance measure used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. [10712310] |It can be proved that, unlike other methods, this method is universally asymptotically convergent, i.e.: as the size of the training set increases, if the observations are iid, regardless of the distribution from which the sample is drawn, the predicted class will converge to the class assignment that minimizes misclassification error. [10712320] |See Devroy et alt. [10712330] |==Popular tools== [10712340] |There are numerous tools available in the marketplace which help with the execution of predictive analytics. [10712350] |These range from those which need very little user sophistication to those that are designed for the expert practitioner. [10712360] |The difference between these tools is often in the level of customization and heavy data lifting allowed. [10712370] |For traditional statistical modeling some of the popular tools are [[DAP (software)|DAP]]/[[SAS Institute|SAS]], S-Plus, [[PSPP]]/[[SPSS]] and Stata. [10712380] |For machine learning/data mining type of applications, KnowledgeSEEKER, KnowledgeSTUDIO, Enterprise Miner, GeneXproTools, [[Viscovery]], Clementine, [[KXEN Inc.|KXEN Analytic Framework]], [[InforSense]] and Excel Miner are some of the popularly used options. [10712390] |Classification Tree analysis can be performed using CART software. [10712400] |SOMine is a predictive analytics tool based on [[self-organizing map]]s (SOMs) available from [[Viscovery Software]]. [10712410] |[[R (programming_language)|R]] is a very powerful tool that can be used to perform almost any kind of statistical analysis, and is freely downloadable. [10712420] |[[WEKA]] is a freely available [[open source|open-source]] collection of [[machine learning]] methods for pattern classification, regression, clustering, and some types of meta-learning, which can be used for predictive analytics. [10712430] |[[RapidMiner]] is another freely available integrated [[open source|open-source]] software environment for predictive analytics, [[data mining]], and [[machine learning]] fully integrating WEKA and providing an even larger number of methods for predictive analytics. [10712440] |Recently, in an attempt to provide a standard language for expressing predictive models, the [[Predictive Model Markup Language]] (PMML) has been proposed. [10712450] |Such an XML-based language provides a way for the different tools to define predictive models and to share these between PMML compliant applications. [10712460] |Several tools already produce or consume PMML documents, these include [[ADAPA]], [[IBM DB2]] Warehouse, CART, SAS Enterprise Miner, and [[SPSS]]. [10712470] |Predictive analytics has also found its way into the IT lexicon, most notably in the area of IT Automation. [10712480] |Vendors such as [[Stratavia]] and their [[Data Palette]] product offer predictive analytics as part of their automation platform, predicting how resources will behave in the future and automate the environment accordingly. [10712490] |The widespread use of predictive analytics in industry has led to the proliferation of numerous productized solutions firms. [10712500] |Some of them are highly specialized (focusing, for example, on fraud detection, automatic saleslead generation or response modeling) in a specific domain ([[Fair Isaac]] for credit card scores) or industry verticals (MarketRx in Pharmaceutical). [10712510] |Others provide predictive analytics services in support of a wide range of business problems across industry verticals ([[Fifth C]]). [10712520] |Predictive Analytics competitions are also fairly common and often pit academics and Industry practitioners (see for example, KDD CUP). [10712530] |==Conclusion== [10712540] |Predictive analytics adds great value to a businesses decision making capabilities by allowing it to formulate smart policies on the basis of predictions of future outcomes. [10712550] |A broad range of tools and techniques are available for this type of analysis and their selection is determined by the analytical maturity of the firm as well as the specific requirements of the problem being solved. [10712560] |==Education== [10712570] |Predictive analytics is taught at the following institutions: [10712580] |* Ghent University, Belgium: [http://www.mma.UGent.be Master of Marketing Analysis], an 8-month advanced master degree taught in English with strong emphasis on applications of predictive analytics in Analytical CRM. [10720010] |
RapidMiner
[10720020] |'''RapidMiner''' (formerly YALE (Yet Another Learning Environment)) is an environment for [[machine learning]] and [[data mining]] experiments. [10720030] |It allows experiments to be made up of a large number of arbitrarily nestable operators, described in [[XML]] files which can easily be created with RapidMiner's [[graphical user interface]]. [10720040] |Applications of RapidMiner cover both research and real-world data mining tasks. [10720050] |The initial version has been developed by the Artificial Intelligence Unit of [[Dortmund University of Technology|University of Dortmund]] since [[2001]]. [10720060] |It is distributed under a [[GNU]] license, and has been hosted by [[SourceForge]] since [[2004]]. [10720070] |RapidMiner provides more than 400 operators for all main machine learning procedures, including input and output, and data preprocessing and visualization. [10720080] |It is written in the [[Java (programming language)|Java programming language]] and therefore can work on all popular operating systems. [10720090] |It also integrates all learning schemes and attribute evaluators of the [[Weka (machine learning)|Weka]] learning environment. [10720100] |== Properties == [10720110] |Some properties of RapidMiner are: [10720120] |* written in Java [10720130] |* [[knowledge discovery]] processes are modeled as operator trees [10720140] |* internal XML representation ensures standardized interchange format of data mining experiments [10720150] |* scripting language allows for automatic large-scale experiments [10720160] |* multi-layered data view concept ensures efficient and transparent data handling [10720170] |* [[graphical user interface]], [[command line]] mode ([[Batch file|batch mode]]), and [[Java API]] for using RapidMiner from your own programs [10720180] |* [[plugin]] and [[Extension (computing)|extension]] mechanisms, several plugins already exist [10720190] |* [[plotting]] facility offering a large set of high-dimensional visualization schemes for data and models [10720200] |* applications include [[text mining]], multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. [10730010] |
Russian language
[10730020] |'''Russian''' ([[:Media:Ru-russkiy jizyk.ogg|]] ([[Wikipedia:Media help|help]]•[[:Image:Ru-russkiy jizyk.ogg|info]]), [[Romanization of Russian|transliteration]]: , {{IPA-ru|ˈruskʲɪj jɪˈzɨk}}) is the most geographically widespread language of [[Eurasia]], the most widely spoken of the [[Slavic languages]], and the largest [[native language]] in [[Europe]]. [10730030] |Russian belongs to the family of [[Indo-European languages]] and is one of three (or, according to some authorities , four) living members of the [[East Slavic languages]], the others being [[Belarusian language|Belarusian]] and [[Ukrainian language|Ukrainian]] (and possibly [[Rusyn language|Rusyn]], often considered a dialect of Ukrainian). [10730040] |It is also spoken by the countries of the [[Russophone]]. [10730050] |Written examples of Old East Slavonic are attested from the 10th century onwards. [10730060] |Today Russian is widely used outside [[Russia]]. [10730070] |It is applied as a means of coding and storage of universal knowledge — 60–70% of all world information is published in English and Russian languages. [10730080] |Over a quarter of the world's scientific literature is published in Russian. [10730090] |Russian is also a necessary accessory of world communications systems (broadcasts, air- and space communication, etc). [10730100] |Due to the status of the [[Soviet Union]] as a [[superpower]], Russian had great political importance in the 20th century. [10730110] |Hence, the language is one of the [[United Nations#Languages|official languages]] of the [[United Nations]]. [10730120] |Russian distinguishes between [[consonant]] [[phoneme]]s with [[palatalization|palatal]] [[secondary articulation]] and those without, the so-called ''soft'' and ''hard'' sounds. [10730130] |This distinction is found between pairs of almost all consonants and is one of the most distinguishing features of the language. [10730140] |Another important aspect is the [[vowel reduction|reduction]] of [[stress (linguistics)|unstressed]] [[vowel]]s, which is somewhat similar to [[Unstressed and reduced vowels in English|that of English]]. [10730150] |Stress, which is unpredictable, is not normally indicated orthographically. [10730160] |According to the Institute of Russian Language of the Russian Academy of Sciences, an optional [[acute accent]] () may, and sometimes should, be used to mark stress. [10730170] |For example, it is used to distinguish between otherwise identical words, especially when context doesn't make it obvious: ''замо́к/за́мок'' (lock/castle), ''сто́ящий/стоя́щий'' (worthwhile/standing), ''чудно́/чу́дно'' (this is odd/this is marvellous), ''молоде́ц/мо́лодец'' (attaboy/fine young man), ''узна́ю/узнаю́'' (I shall learn it/I am learning it), ''отреза́ть/отре́зать'' (infinitive for "cut"/perfective for "cut"); to indicate the proper pronouncation of uncommon words, especially personal and family names (''афе́ра, гу́ру, Гарси́а, Оле́ша, Фе́рми''), and to express the stressed word in the sentence (''Ты́ съел печенье?/Ты съе́л печенье?/Ты съел пече́нье?'' - Was it you who eat the cookie?/Did you eat the cookie?/Was the cookie your meal?). [10730180] |Acute accents are mandatory in lexical dictionaries and books intended to be used either by children or foreign readers. [10730190] |==Classification== [10730200] |Russian is a [[Slavic languages|Slavic language]] in the [[Indo-European Languages|Indo-European family]]. [10730210] |From the point of view of the [[spoken language]], its closest relatives are [[Ukrainian language|Ukrainian]] and [[Belarusian language|Belarusian]], the other two national languages in the [[East Slavic languages|East Slavic]] group. [10730220] |In many places in eastern [[Ukraine]] and [[Belarus]], these languages are spoken interchangeably, and in certain areas traditional bilingualism resulted in language mixture, e.g. [[Surzhyk]] in eastern Ukraine and [[Trasianka]] in Belarus. [10730240] |An East Slavic [[Old Novgorod dialect]], although vanished during the fifteenth or sixteenth century, is sometimes considered to have played a significant role in formation of the modern Russian language. [10730250] |The vocabulary (mainly abstract and literary words), principles of word formation, and, to some extent, inflections and literary style of Russian have been also influenced by [[Church Slavonic language|Church Slavonic]], a developed and partly adopted form of the [[South Slavic languages|South Slavic]] [[Old Church Slavonic]] language used by the [[Russian Orthodox Church]]. [10730260] |However, the East Slavic forms have tended to be used exclusively in the various dialects that are experiencing a rapid decline. [10730270] |In some cases, both the [[East Slavic languages|East Slavic]] and the [[Church Slavonic]] forms are in use, with slightly different meanings. [10730280] |''For details, see [[Russian phonology]] and [[History of the Russian language]].'' [10730290] |Russian phonology and syntax (especially in northern dialects) have also been influenced to some extent by the numerous Finnic languages of the [[Finno-Ugric languages|Finno-Ugric subfamily]]: [[Merya language|Merya]], [[Moksha language|Moksha]], [[Muromian language|Muromian]], the language of the [[Meshchera]], [[Veps language|Veps]], et cetera. [10730300] |These languages, some of them now extinct, used to be spoken in the center and in the north of what is now the European part of Russia. [10730310] |They came in contact with Eastern Slavic as far back as the early Middle Ages and eventually served as substratum for the modern Russian language. [10730320] |The Russian dialects spoken north, north-east and north-west of [[Moscow]] have a considerable number of words of Finno-Ugric origin. [10730330] |Over the course of centuries, the vocabulary and literary style of Russian have also been influenced by Turkic/Caucasian/Central Asian languages, as well as Western/Central European languages such as [[Polish language|Polish]], [[Latin]], [[Dutch language|Dutch]], [[German language|German]], [[French language|French]], and [[English language|English]]. [10730340] |According to the [[Defense Language Institute]] in [[Monterey, California]], Russian is classified as a level III language in terms of learning difficulty for native English speakers, requiring approximately 780 hours of immersion instruction to achieve intermediate fluency. [10730350] |It is also regarded by the [[United States Intelligence Community]] as a "hard target" language, due to both its difficulty to master for English speakers as well as due to its critical role in American world policy. [10730360] |==Geographic distribution== [10730370] |Russian is primarily spoken in [[Russia]] and, to a lesser extent, the other countries that were once constituent republics of the [[Soviet Union|USSR]]. [10730380] |Until [[1917]], it was the sole official language of the [[Russian Empire]]. [10730390] |During the Soviet period, the policy toward the languages of the various other ethnic groups fluctuated in practice. [10730400] |Though each of the constituent republics had its own official language, the unifying role and superior status was reserved for Russian. [10730410] |Following the break-up of [[1991]], several of the newly independent states have encouraged their native languages, which has partly reversed the privileged status of Russian, though its role as the language of post-Soviet national intercourse throughout the region has continued. [10730420] |In [[Latvia]], notably, its official recognition and legality in the classroom have been a topic of considerable debate in a country where more than one-third of the population is Russian-speaking, consisting mostly of post-[[World War II]] immigrants from Russia and other parts of the former [[USSR]] (Belarus, Ukraine). [10730430] |Similarly, in [[Estonia]], the Soviet-era immigrants and their Russian-speaking descendants constitute 25,6% of the country's current population and 58,6% of the native Estonian population is also able to speak Russian. [10730440] |In all, 67,8% of Estonia's population can speak Russian. [10730450] |In [[Kazakhstan]] and [[Kyrgyzstan]], Russian remains a co-official language with [[Kazakh language|Kazakh]] and [[Kyrgyz language|Kyrgyz]] respectively. [10730460] |Large Russian-speaking communities still exist in northern Kazakhstan, and ethnic Russians comprise 25.6 % of Kazakhstan's population. [10730470] |A much smaller Russian-speaking minority in [[Lithuania]] has represented less than 1/10 of the country's overall population. [10730480] |Nevertheless more than half of the population of the [[Baltic states]] are able to hold a conversation in Russian and almost all have at least some familiarity with the most basic spoken and written phrases. [10730490] |The Russian control of [[Finland]] in 1809–1918, however, has left few Russian speakers in Finland. [10730500] |There are 33,400 Russian speakers in Finland, amounting to 0.6% of the population. [10730510] |5000 (0.1%) of them are late 19th century and 20th century immigrants, and the rest are recent immigrants, who have arrived in the 90's and later. [10730520] |In the twentieth century, Russian was widely taught in the schools of the members of the old [[Warsaw Pact]] and in other [[Communist state|countries]] that used to be allies of the USSR. [10730530] |In particular, these countries include [[Poland]], [[Bulgaria]], the [[Czech Republic]], [[Slovakia]], [[Hungary]], [[Romania]], [[Albania]] and [[Cuba]]. [10730540] |However, younger generations are usually not fluent in it, because Russian is no longer mandatory in the school system. [10730550] |It is currently the most widely-taught foreign language in [[Mongolia]]. [10730560] |Russian is also spoken in [[Israel]] by at least 750,000 ethnic [[Jew]]ish immigrants from the former [[Soviet Union]] (1999 census). [10730570] |The Israeli [[Mass media|press]] and [[website]]s regularly publish material in Russian. [10730580] |Sizable Russian-speaking communities also exist in [[North America]], especially in large urban centers of the [[United States|U.S.]] and [[Canada]] such as [[New York City]], [[Philadelphia]], [[Boston, Massachusetts|Boston]], [[Los Angeles, California|Los Angeles]], [[San Francisco]], [[Seattle]], [[Toronto]], [[Baltimore]], [[Miami, Florida|Miami]], [[Chicago]], [[Denver]], and the [[Cleveland, Ohio|Cleveland]] suburb of [[Richmond Heights, Ohio|Richmond Heights]]. [10730590] |In the former two, Russian-speaking groups total over half a million. [10730600] |In a number of locations they issue their own newspapers, and live in their self-sufficient neighborhoods (especially the generation of immigrants who started arriving in the early sixties). [10730610] |Only about a quarter of them are ethnic Russians, however. [10730620] |Before the [[dissolution of the Soviet Union]], the overwhelming majority of [[Russophone]]s in North America were Russian-speaking [[Jews]]. [10730630] |Afterwards the influx from the countries of the former [[Soviet Union]] changed the statistics somewhat. [10730640] |According to the [[United States 2000 Census]], Russian is the primary language spoken in the homes of over 700,000 individuals living in the United States. [10730650] |Significant Russian-speaking groups also exist in [[Western Europe]]. [10730660] |These have been fed by several waves of immigrants since the beginning of the twentieth century, each with its own flavor of language. [10730670] |[[Germany]], the [[United Kingdom]], [[Spain]], [[France]], [[Italy]], [[Belgium]], [[Greece]], [[Brazil]], [[Norway]], [[Austria]], and [[Turkey]] have significant Russian-speaking communities totaling 3 million people. [10730680] |Two thirds of them are actually Russian-speaking descendants of [[German people|Germans]], [[Greeks]], [[Jews]], [[Armenians]], or [[Ukrainians]] who either repatriated after the [[USSR]] collapsed or are just looking for temporary employment. [10730690] |Recent estimates of the total number of speakers of Russian: [10730700] |===Official status=== [10730710] |Russian is the official language of [[Russia]]. [10730720] |It is also an official language of [[Belarus]], [[Kazakhstan]], [[Kyrgyzstan]], an unofficial but widely spoken language in [[Ukraine]] and the de facto official language of the [[List of unrecognized countries|unrecognized]] of [[Transnistria]], [[South Ossetia]] and [[Abkhazia]]. [10730730] |Russian is one of the [[United Nations#Languages|six official languages]] of the [[United Nations]]. [10730740] |Education in Russian is still a popular choice for both Russian as a second language (RSL) and native speakers in Russia as well as many of the former Soviet republics. [10730750] |97% of the public school students of Russia, 75% in Belarus, 41% in Kazakhstan, 25% in [[Ukraine]], 23% in Kyrgyzstan, 21% in [[Moldova]], 7% in [[Azerbaijan]], 5% in [[Georgia (country)|Georgia]] and 2% in [[Armenia]] and [[Tajikistan]] receive their education only or mostly in Russian. [10730760] |Although the corresponding percentage of ethnic Russians is 78% in [[Russia]], 10% in [[Belarus]], 26% in [[Kazakhstan]], 17% in [[Ukraine]], 9% in [[Kyrgyzstan]], 6% in [[Republic of Moldova|Moldova]], 2% in [[Azerbaijan]], 1.5% in [[Georgia (country)|Georgia]] and less than 1% in both [[Armenia]] and [[Tajikistan]]. [10730770] |Russian-language schooling is also available in Latvia, Estonia and Lithuania, but due to education reforms, a number of subjects taught in Russian are reduced at the high school level. [10730780] |The language has a co-official status alongside [[Moldovan language|Moldovan]] in the autonomies of [[Gagauzia]] and [[Transnistria]] in [[Moldova]], and in seven [[Romania]]n [[Commune in Romania|communes]] in [[Tulcea County|Tulcea]] and [[Constanţa County|Constanţa]] counties. [10730790] |In these localities, Russian-speaking [[Lipovans]], who are a recognized ethnic minority, make up more than 20% of the population. [10730800] |Thus, according to Romania's minority rights law, education, signage, and access to public administration and the justice system are provided in Russian alongside Romanian. [10730810] |In the [[Crimea|Autonomous Republic of Crimea]] in Ukraine, Russian is an officially recognized language alongside with [[Crimean Tatar language|Crimean Tatar]], but in reality, is the only language used by the government, thus being a ''[[de facto]]'' official language. [10730820] |===Dialects=== [10730830] |Despite leveling after 1900, especially in matters of vocabulary, a number of dialects exist in Russia. [10730840] |Some linguists divide the dialects of the Russian language into two primary regional groupings, "Northern" and "Southern", with [[Moscow]] lying on the zone of transition between the two. [10730850] |Others divide the language into three groupings, Northern, Central and Southern, with Moscow lying in the Central region. [10730860] |[[Dialectology]] within Russia recognizes dozens of smaller-scale variants. [10730870] |The dialects often show distinct and non-standard features of pronunciation and intonation, vocabulary, and grammar. [10730880] |Some of these are relics of ancient usage now completely discarded by the standard language. [10730890] |The [[northern Russian dialects]] and those spoken along the [[Volga River]] typically pronounce unstressed {{IPA|/o/}} clearly (the phenomenon called [[vowel reduction in Russian#Back vowels|okanye]]/оканье). [10730900] |East of Moscow, particularly in [[Ryazan Region]], unstressed {{IPA|/e/}} and {{IPA|/a/}} following [[palatalization|palatalized]] consonants and preceding a stressed syllable are not reduced to {{IPA|[ɪ]}} (like in the Moscow dialect), being instead pronounced as {{IPA|/a/}} in such positions (e.g. несл'''и''' is pronounced as {{IPA|[nʲasˈlʲi]}}, not as {{IPA|[nʲɪsˈlʲi]}}) - this is called [[yakanye]]/ яканье; many southern dialects have a palatalized final {{IPA|/tʲ/}} in 3rd person forms of verbs (this is unpalatalized in the standard dialect) and a fricative {{IPA|[ɣ]}} where the standard dialect has {{IPA|[g]}}. [10730910] |However, in certain areas south of Moscow, e.g. in and around [[Tula, Russia|Tula]], {{IPA|/g/}} is pronounced as in the Moscow and northern dialects unless it precedes a voiceless plosive or a pause. [10730920] |In this position {{IPA|/g/}} is lenited and devoiced to the fricative {{IPA|[x]}}, e.g. друг {{IPA|[drux]}} (in Moscow's dialect, only Бог {{IPA|[box]}}, лёгкий {{IPA|[lʲɵxʲkʲɪj]}}, мягкий {{IPA|[ˈmʲæxʲkʲɪj]}} and some derivatives follow this rule). [10730930] |Some of these features (e.g. a [[debuccalization|debuccalized]] or [[lenition|lenited]] {{IPA|/g/}} and palatalized final {{IPA|/tʲ/}} in 3rd person forms of verbs) are also present in modern [[Ukrainian language|Ukrainian]], indicating either a linguistic continuum or strong influence one way or the other. [10730940] |The city of [[Veliky Novgorod]] has historically displayed a feature called chokanye/tsokanye (чоканье/цоканье), where {{IPA|/ʨ/}} and {{IPA|/ʦ/}} were confused (this is thought to be due to influence from [[Finnish language|Finnish]], which doesn't distinguish these sounds). [10730950] |So, '''ц'''апля ("heron") has been recorded as 'чапля'. [10730960] |Also, the second palatalization of [[Velar consonant|velar]]s did not occur there, so the so-called '''ě²''' (from the Proto-Slavonic diphthong *ai) did not cause {{IPA|/k, g, x/}} to shift to {{IPA|/ʦ, ʣ, s/}}; therefore where [[Standard Russian]] has '''ц'''епь ("chain"), the form '''к'''епь {{IPA|[kʲepʲ]}} is attested in earlier texts. [10730970] |Among the first to study Russian dialects was [[Mikhail Lomonosov|Lomonosov]] in the eighteenth century. [10730980] |In the nineteenth, [[Vladimir Dal]] compiled the first dictionary that included dialectal vocabulary. [10730990] |Detailed mapping of Russian dialects began at the turn of the twentieth century. [10731000] |In modern times, the monumental ''Dialectological Atlas of the Russian Language'' (''Диалектологический атлас русского языка'' {{IPA|[dʲɪɐˌlʲɛktəlɐˈgʲiʨɪskʲɪj ˈatləs ˈruskəvə jɪzɨˈka]}}), was published in 3 folio volumes 1986–1989, after four decades of preparatory work. [10731010] |The ''standard language'' is based on (but not identical to) the Moscow dialect. [10731020] |===Derived languages=== [10731030] |* [[Balachka]] a dialect, spoken primarily by [[Cossacks]], in the regions of Don, [[Kuban]] and [[Terek]]. [10731040] |* [[Fenya]], a criminal [[argot]] of ancient origin, with Russian grammar, but with distinct vocabulary. [10731050] |* [[Nadsat]], the fictional language spoken in '[[A Clockwork Orange]]' uses a lot of Russian words and Russian slang. [10731060] |* [[Surzhyk]] is a language with Russian and Ukrainian features, spoken in some areas of Ukraine [10731070] |* [[Trasianka]] is a language with Russian and Belarusian features used by a large portion of the rural population in [[Belarus]]. [10731080] |* [[Quelia]], a pseudo pidgin of German and Russian. [10731090] |* [[Runglish]], Russian-English pidgin. [10731100] |This word is also used by English speakers to describe the way in which Russians attempt to speak English using Russian morphology and/or syntax. [10731110] |* [[Russenorsk language|Russenorsk]] is an extinct [[pidgin]] language with mostly Russian vocabulary and mostly [[Norwegian language|Norwegian]] grammar, used for communication between [[Russians]] and [[Norway|Norwegian]] traders in the Pomor trade in [[Finnmark]] and the [[Kola Peninsula]]. [10731120] |==Writing system== [10731130] |===Alphabet=== [10731140] |Russian is written using a modified version of the [[Cyrillic alphabet|Cyrillic (кириллица)]] alphabet. [10731150] |The Russian alphabet consists of 33 letters. [10731160] |The following table gives their upper case forms, along with [[help:IPA|IPA]] values for each letter's typical sound: [10731170] |Older letters of the Russian alphabet include <>, which merged to <е> ({{IPA|/e/}}); <і> and <>, which both merged to <и>({{IPA|/i/}}); <>, which merged to <ф> ({{IPA|/f/}}); and <>, which merged to <я> ({{IPA|/ja/}} or {{IPA|/ʲa/}}). [10731180] |While these older letters have been abandoned at one time or another, they may be used in this and related articles. [10731190] |The [[yer]]s <ъ> and <ь> originally indicated the pronunciation of ''ultra-short'' or ''reduced'' {{IPA|/ŭ/}}, {{IPA|/ĭ/}}. [10731200] |The Russian alphabet has many systems of [[character encoding]]. [10731210] |[[KOI8-R]] was designed by the government and was intended to serve as the standard encoding. [10731220] |This encoding is still used in UNIX-like operating systems. [10731230] |Nevertheless, the spread of [[MS-DOS]] and [[Microsoft Windows]] created chaos and ended by establishing different encodings as de-facto standards. [10731240] |For communication purposes, a number of conversion applications were developed. [10731245] |"[[iconv]]" is an example that is supported by most versions of [[Linux]], [[Macintosh]] and some other [[operating system]]s. [10731250] |Most implementations (especially old ones) of the character encoding for the Russian language are aimed at simultaneous use of English and Russian characters only and do not include support for any other language. [10731260] |Certain hopes for a unification of the character encoding for the Russian alphabet are related to the [[Unicode|Unicode standard]], specifically designed for peaceful coexistence of various languages, including even [[dead language]]s. [10731270] |[[Unicode]] also supports the letters of the [[Early Cyrillic alphabet]], which have many similarities with the [[Greek alphabet]]. [10731280] |===Orthography=== [10731290] |Russian spelling is reasonably phonemic in practice. [10731300] |It is in fact a balance among phonemics, morphology, etymology, and grammar; and, like that of most living languages, has its share of inconsistencies and controversial points. [10731310] |A number of rigid [[spelling rule]]s introduced between the 1880s and 1910s have been responsible for the latter whilst trying to eliminate the former. [10731320] |The current spelling follows the major reform of 1918, and the final codification of 1956. [10731330] |An update proposed in the late 1990s has met a hostile reception, and has not been formally adopted. [10731340] |The punctuation, originally based on Byzantine Greek, was in the seventeenth and eighteenth centuries reformulated on the French and German models. [10731350] |==Sounds== [10731360] |The phonological system of Russian is inherited from [[Common Slavonic]], but underwent considerable modification in the early historical period, before being largely settled by about 1400. [10731370] |The language possesses five vowels, which are written with different letters depending on whether or not the preceding consonant is [[palatalization|palatalized]]. [10731380] |The consonants typically come in plain vs. palatalized pairs, which are traditionally called ''hard'' and ''soft.'' [10731390] |(The ''hard'' consonants are often [[velarization|velarized]], especially before back vowels, although in some dialects the velarization is limited to hard {{IPA|/l/}}). [10731400] |The standard language, based on the Moscow dialect, possesses heavy stress and moderate variation in pitch. [10731410] |Stressed vowels are somewhat lengthened, while unstressed vowels tend to be reduced to near-close vowels or an unclear [[schwa]]. [10731420] |(See also: [[vowel reduction in Russian]].) [10731430] |The Russian [[syllable]] structure can be quite complex with both initial and final consonant clusters of up to 4 consecutive sounds. [10731440] |Using a formula with V standing for the nucleus (vowel) and C for each consonant the structure can be described as follows: [10731450] |(C)(C)(C)(C)V(C)(C)(C)(C) [10731460] |Clusters of four consonants are not very common, however, especially within a morpheme. [10731470] |===Consonants=== [10731480] |Russian is notable for its distinction based on [[palatalization]] of most of the consonants. [10731490] |While {{IPA|/k/, /g/, /x/}} do have palatalized [[allophone]]s {{IPA|[kʲ, gʲ, xʲ]}}, only {{IPA|/kʲ/}} might be considered a phoneme, though it is marginal and generally not considered distinctive (the only native [[minimal pair]] which argues for {{IPA|/kʲ/}} to be a separate phoneme is "это ткёт"/"этот кот"). [10731500] |Palatalization means that the center of the tongue is raised during and after the articulation of the consonant. [10731510] |In the case of {{IPA|/tʲ/ and /dʲ/}}, the tongue is raised enough to produce slight frication (affricate sounds). [10731520] |These sounds: {{IPA|/t, d, ʦ, s, z, n and rʲ/}} are [[dental consonant|dental]], that is pronounced with the tip of the tongue against the teeth rather than against the [[alveolar ridge]]. [10731530] |==Grammar== [10731540] |Russian has preserved an [[Indo-European languages|Indo-European]] [[Synthetic language|synthetic]]-[[inflection]]al structure, although considerable leveling has taken place. [10731550] |Russian grammar encompasses [10731560] |* a highly [[Synthetic language|synthetic]] '''morphology''' [10731570] |* a '''syntax''' that, for the literary language, is the conscious fusion of three elements: [10731580] |** a polished [[vernacular]] foundation; [10731590] |** a [[Church Slavonic language|Church Slavonic]] inheritance; [10731600] |** a [[Western Europe]]an style. [10731610] |The spoken language has been influenced by the literary one, but continues to preserve characteristic forms. [10731620] |The dialects show various non-standard grammatical features, some of which are archaisms or descendants of old forms since discarded by the literary language. [10731630] |==Vocabulary== [10731640] |See [[History of the Russian language]] for an account of the successive foreign influences on the Russian language. [10731650] |The total number of words in Russian is difficult to reckon because of the ability to agglutinate and create manifold compounds, diminutives, etc. (see [[Russian grammar#Word Formation|Word Formation]] under [[Russian grammar]]). [10731660] |The number of listed words or entries in some of the major dictionaries published during the last two centuries, and the total vocabulary of [[Pushkin]] (who is credited with greatly augmenting and codifying literary Russian), are as follows: [10731670] |(As a historical aside, [[Vladimir Ivanovich Dal|Dahl]] was, in the second half of the nineteenth century, still insisting that the proper spelling of the adjective '''русский''', which was at that time applied uniformly to all the Orthodox Eastern Slavic subjects of the Empire, as well as to its one official language, be spelled '''руский''' with one s, in accordance with ancient tradition and what he termed the "spirit of the language". [10731680] |He was contradicted by the philologist Grot, who distinctly heard the s lengthened or doubled). [10731690] |=== Proverbs and sayings === [10731700] |The Russian language is replete with many hundreds of proverbs ('''пословица''' {{IPA|[pɐˈslo.vʲɪ.ʦə]}}) and sayings ('''поговоркa''' {{IPA|[pə.gɐˈvo.rkə]}}). [10731710] |These were already tabulated by the seventeenth century, and collected and studied in the nineteenth and twentieth, with the folk-tales being an especially fertile source. [10731720] |==History and examples== [10731730] |The history of Russian language may be divided into the following periods. [10731740] |* [[History of the Russian language#Kievan period and feudal breakup|Kievan period and feudal breakup]] [10731750] |* [[History of the Russian language#The Tatar yoke and the Grand Duchy of Lithuania|The Tatar yoke and the Grand Duchy of Lithuania]] [10731760] |* [[History of the Russian language#The Moscovite period (15th–17th centuries)|The Moscovite period (15th–17th centuries)]] [10731770] |* [[History of the Russian language#Empire (18th–19th centuries)|Empire (18th–19th centuries)]] [10731780] |* [[History of the Russian language#Soviet period and beyond (20th century)|Soviet period and beyond (20th century)]] [10731790] |Judging by the historical records, by approximately 1000 AD the predominant ethnic group over much of modern European [[Russia]], [[Ukraine]], and [[Belarus]] was the Eastern branch of the [[Slavic peoples|Slavs]], speaking a closely related group of dialects. [10731800] |The political unification of this region into [[Kievan Rus']] in about 880, from which modern Russia, Ukraine and Belarus trace their origins, established [[Old East Slavic]] as a literary and commercial language. [10731810] |It was soon followed by the adoption of [[Christianity]] in 988 and the introduction of the South Slavic [[Old Church Slavonic]] as the liturgical and official language. [10731820] |Borrowings and [[calque]]s from Byzantine [[Greek language|Greek]] began to enter the [[Old East Slavic]] and spoken dialects at this time, which in their turn modified the [[Old Church Slavonic]] as well. [10731830] |Dialectal differentiation accelerated after the breakup of [[Kievan Rus]] in approximately 1100. [10731840] |On the territories of modern [[Belarus]] and [[Ukraine]] emerged [[Ruthenian language|Ruthenian]] and in modern [[Russia]] [[History of the Russian language|medieval Russian]]. [10731850] |They definitely became distinct in 13th century by the time of division of that land between the [[Grand Duchy of Lithuania]] on the west and independent Novgorod Feudal Republic plus small duchies which were vassals of the Tatars on the east. [10731860] |The official language in Moscow and Novgorod, and later, in the growing Moscow Rus’, was [[Church Slavonic]] which evolved from [[Old Church Slavonic]] and remained [[Diglossia|the literary language]] until the Petrine age, when its usage shrank drastically to biblical and liturgical texts. [10731870] |Russian developed under a strong influence of the Church Slavonic until the close of the seventeenth century; the influence reversed afterwards leading to corruption of liturgical texts. [10731880] |The political reforms of [[Peter I of Russia|Peter the Great]] were accompanied by a reform of the alphabet, and achieved their goal of secularization and Westernization. [10731890] |Blocks of specialized vocabulary were adopted from the languages of Western Europe. [10731900] |By 1800, a significant portion of the gentry spoke [[French language|French]], less often [[German language|German]], on an everyday basis. [10731910] |Many Russian novels of the 19th century, e.g. Lev Tolstoy’s "War and Peace", contain entire paragraphs and even pages in French with no translation given, with an assumption that educated readers won't need one. [10731920] |The modern literary language is usually considered to date from the time of [[Aleksandr Pushkin]] in the first third of the nineteenth century. [10731930] |Pushkin revolutionized Russian literature by rejecting archaic grammar and vocabulary (so called "высокий стиль" — "high style") in favor of grammar and vocabulary found in the spoken language of the time. [10731940] |Even modern readers of younger age may only experience slight difficulties understanding some words in Pushkin’s texts, since only few words used by Pushkin became archaic or changed meaning. [10731950] |On the other hand, many expressions used by Russian writers of the early 19th century, in particular Pushkin, [[Lermontov]], [[Gogol]], Griboiädov, became proverbs or sayings which can be frequently found even in the modern Russian colloquial speech. [10731960] |The political upheavals of the early twentieth century and the wholesale changes of political ideology gave written Russian its modern appearance after the spelling reform of 1918. [10731970] |Political circumstances and Soviet accomplishments in military, scientific, and technological matters (especially cosmonautics), gave Russian a world-wide prestige, especially during the middle third of the twentieth century. [10740010] |
Web search engine
[10740020] |A '''Web search engine''' is a [[search engine (computing)|search engine]] designed to search for information on the [[World Wide Web]]. [10740030] |Information may consist of [[web page]]s, images and other types of files. [10740040] |Some search engines also mine data available in newsbooks, databases, or [[Web directory|open directories]]. [10740050] |Unlike [[Web directories]], which are maintained by human editors, search engines operate algorithmically or are a mixture of [[algorithmic]] and human input. [10740060] |==History== [10740070] |Before there were search engines there was a complete list of all webservers. [10740080] |The list was edited by [[Tim Berners-Lee]] and hosted on the CERN webserver. [10740090] |One historical snapshot from 1992 remains. [10740100] |As more and more webservers went online the central list could not keep up. [10740110] |On the NCSA Site new servers were announced under the title "What's New!", but no complete listing existed any more. [10740120] |The very first tool used for searching on the (pre-web) Internet was [[Archie search engine|Archie]]. [10740130] |The name stands for "archive" without the "v". [10740140] |It was created in 1990 by [[Alan Emtage]], a student at [[McGill University]] in Montreal. [10740150] |The program downloaded the directory listings of all the files located on public anonymous FTP ([[File Transfer Protocol]]) sites, creating a searchable database of file names; however, Archie did not index the contents of these sites. [10740160] |The rise of [[Gopher (protocol)|Gopher]] (created in 1991 by [[Mark McCahill]] at the [[University of Minnesota]]) led to two new search programs, [[Veronica (computer)|Veronica]] and [[Jughead (computer)|Jughead]]. [10740170] |Like Archie, they searched the file names and titles stored in Gopher index systems. [10740180] |Veronica ('''V'''ery '''E'''asy '''R'''odent-'''O'''riented '''N'''et-wide '''I'''ndex to '''C'''omputerized '''A'''rchives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. [10740190] |Jughead ('''J'''onzy's '''U'''niversal '''G'''opher '''H'''ierarchy '''E'''xcavation '''A'''nd '''D'''isplay) was a tool for obtaining menu information from specific Gopher servers. [10740200] |While the name of the search engine "[[Archie search engine|Archie]]" was not a reference to the [[Archie Comics|Archie comic book]] series, "[[Veronica Lodge|Veronica]]" and "[[Jughead Jones|Jughead]]" are characters in the series, thus referencing their predecessor. [10740210] |The first Web search engine was Wandex, a now-defunct index collected by the [[World Wide Web Wanderer]], a [[web crawler]] developed by Matthew Gray at [[Massachusetts Institute of Technology|MIT]] in 1993. [10740220] |Another very early search engine, [[Aliweb]], also appeared in 1993. [10740230] |[[JumpStation]] (released in early 1994) used a crawler to find web pages for searching, but search was limited to the title of web pages only. [10740240] |One of the first "full text" crawler-based search engines was [[WebCrawler]], which came out in 1994. [10740250] |Unlike its predecessors, it let users search for any word in any webpage, which became the standard for all major search engines since. [10740260] |It was also the first one to be widely known by the public. [10740270] |Also in 1994 [[Lycos]] (which started at [[Carnegie Mellon University]]) was launched, and became a major commercial endeavor. [10740280] |Soon after, many search engines appeared and vied for popularity. [10740290] |These included [[Magellan]], [[Excite]], [[Infoseek]], [[Inktomi]], [[Northern Light Group|Northern Light]], and [[AltaVista]]. [10740300] |[[Yahoo!]] was among the most popular ways for people to find web pages of interest, but its search function operated on its [[web directory]], rather than full-text copies of web pages. [10740310] |Information seekers could also browse the directory instead of doing a keyword-based search. [10740320] |In 1996, [[Netscape]] was looking to give a single search engine an exclusive deal to be their featured search engine. [10740330] |There was so much interest that instead a deal was struck with Netscape by 5 of the major search engines, where for $5Million per year each search engine would be in a rotation on the Netscape search engine page. [10740340] |These five engines were: [[Yahoo!]], [[Magellan]], [[Lycos]], [[Infoseek]] and [[Excite]]. [10740350] |Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. [10740360] |Several companies entered the market spectacularly, receiving record gains during their [[initial public offering]]s. [10740370] |Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light. [10740380] |Many search engine companies were caught up in the [[dot-com bubble]], a speculation-driven market boom that peaked in 1999 and ended in 2001. [10740390] |Around 2000, the [[Google Search|Google search engine]] rose to prominence. [10740400] |The company achieved better results for many searches with an innovation called [[PageRank]]. [10740410] |This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. [10740420] |Google also maintained a minimalist interface to its search engine. [10740430] |In contrast, many of its competitors embedded a search engine in a [[web portal]]. [10740440] |By 2000, Yahoo was providing search services based on [[Inktomi]]'s search engine. [10740450] |Yahoo! acquired [[Inktomi]] in 2002, and [[Overture]] (which owned [[AlltheWeb]] and [[AltaVista]]) in 2003. [10740460] |Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions. [10740470] |Microsoft first launched MSN Search (since re-branded [[Live Search]]) in the fall of 1998 using search results from [[Inktomi]]. [10740480] |In early 1999 the site began to display listings from [[Looksmart]] blended with results from [[Inktomi]] except for a short time in 1999 when results from [[AltaVista]] were used instead. [10740490] |In 2004, Microsoft began a transition to its own search technology, powered by its own [[web crawler]] (called [[msnbot]]). [10740500] |As of late 2007, Google was by far the most popular Web search engine worldwide. [10740510] |A number of country-specific search engine companies have become prominent; for example [[Baidu]] is the most popular search engine in the [[People's Republic of China]] and [[guruji.com]] in [[India]]. [10740520] |==How Web search engines work== [10740530] |A search engine operates, in the following order [10740540] |# [[Web crawling]] [10740550] |# [[Index (search engine)|Indexing]] [10740560] |# [[Web search query|Searching]] [10740570] |Web search engines work by storing information about many web pages, which they retrieve from the WWW itself. [10740580] |These pages are retrieved by a [[Web crawler]] (sometimes also known as a spider) — an automated Web browser which follows every link it sees. [10740590] |Exclusions can be made by the use of [[robots.txt]]. [10740600] |The contents of each page are then analyzed to determine how it should be [[Search engine indexing|indexed]] (for example, words are extracted from the titles, headings, or special fields called [[meta tags]]). [10740610] |Data about web pages are stored in an index database for use in later queries. [10740620] |Some search engines, such as [[Google]], store all or part of the source page (referred to as a [[web cache|cache]]) as well as information about the web pages, whereas others, such as [[AltaVista]], store every word of every page they find. [10740630] |This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. [10740640] |This problem might be considered to be a mild form of [[linkrot]], and Google's handling of it increases [[usability]] by satisfying [[user expectations]] that the search terms will be on the returned webpage. [10740650] |This satisfies the [[principle of least astonishment]] since the user normally expects the search terms to be on the returned pages. [10740660] |Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere. [10740670] |When a user enters a [[web search query|query]] into a search engine (typically by using [[Keyword (Internet search)|key word]]s), the engine examines its [[inverted index|index]] and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. [10740680] |Most search engines support the use of the [[boolean operators]] AND, OR and NOT to further specify the [[web search query|search query]]. [10740690] |Some search engines provide an advanced feature called [[Proximity search (text)|proximity search]] which allows users to define the distance between keywords. [10740700] |The usefulness of a search engine depends on the [[relevance (information retrieval)|relevance]] of the '''result set''' it gives back. [10740710] |While there may be millions of webpages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. [10740720] |Most search engines employ methods to [[rank order|rank]] the results to provide the "best" results first. [10740730] |How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. [10740740] |The methods also change over time as Internet usage changes and new techniques evolve. [10740750] |Most Web search engines are commercial ventures supported by [[advertising]] revenue and, as a result, some employ the controversial practice of allowing advertisers to pay money to have their listings ranked higher in search results. [10740760] |Those search engines which do not accept money for their search engine results make money by running search related ads alongside the regular search engine results. [10740770] |The search engines make money every time someone clicks on one of these ads. [10740780] |The vast majority of search engines are run by private companies using proprietary algorithms and closed databases, though [[List of search engines#Open source search engines|some]] are open source. [10740790] |Revenue in the web search portals industry is projected to grow in 2008 by 13.4 percent, with broadband connections expected to rise by 15.1 percent. [10740800] |Between 2008 and 2012, industry revenue is projected to rise by 56 percent as Internet penetration still has some way to go to reach full saturation in American households. [10740810] |Furthermore, broadband services are projected to account for an ever increasing share of domestic Internet users, rising to 118.7 million by 2012, with an increasing share accounted for by fiber-optic and high speed cable lines. [10750010] |
Semantics
[10750020] |'''Semantics''' is the study of meaning in communication. [10750030] |The word derives from [[Greek language|Greek]] ''σημαντικός'' (''semantikos''), "significant", from ''σημαίνω'' (''semaino''), "to signify, to indicate" and that from ''σήμα'' (''sema''), "sign, mark, token". [10750040] |In [[linguistics]] it is the study of interpretation of signs as used by [[agent]]s or [[community|communities]] within particular circumstances and contexts. [10750050] |It has related meanings in several other fields. [10750060] |Semanticists differ on what constitutes [[Meaning (linguistics)|meaning]] in an expression. [10750070] |For example, in the sentence, "John loves a bagel", the word ''bagel'' may refer to the object itself, which is its ''literal'' meaning or ''[[denotation]]'', but it may also refer to many other figurative associations, such as how it meets John's hunger, etc., which may be its ''[[connotation]]''. [10750080] |Traditionally, the [[formal semantic]] view restricts semantics to its literal meaning, and relegates all figurative associations to [[pragmatics]], but this distinction is increasingly difficult to defend. [10750090] |The degree to which a theorist subscribes to the literal-figurative distinction decreases as one moves from the [[formal semantic]], [[semiotic]], [[pragmatic]], to the [[cognitive semantic]] traditions. [10750100] |The word ''semantic'' in its modern sense is considered to have first appeared in [[French language|French]] as ''sémantique'' in [[Michel Bréal]]'s 1897 book, ''Essai de sémantique'. [10750110] |In [[International Scientific Vocabulary]] semantics is also called ''[[semasiology]]''. [10750120] |The discipline of Semantics is distinct from [[General semantics|Alfred Korzybski's General Semantics]], which is a system for looking at non-immediate, or abstract meanings. [10750130] |==Linguistics== [10750140] |In [[linguistics]], '''semantics''' is the subfield that is devoted to the study of meaning, as inherent at the levels of words, phrases, sentences, and even larger units of [[discourse]] (referred to as ''texts''). [10750150] |The basic area of study is the meaning of [[sign (semiotics)|sign]]s, and the study of relations between different linguistic units: [[homonym]]y, [[synonym]]y, [[antonym]]y, [[polysemy]], [[paronyms]], [[hypernym]]y, [[hyponym]]y, [[meronymy]], [[metonymy]], [[holonymy]], [[exocentric]]ity / [[endocentric]]ity, linguistic [[compound (linguistics)|compounds]]. [10750160] |A key concern is how meaning attaches to larger chunks of text, possibly as a result of the composition from smaller units of meaning. [10750170] |Traditionally, semantics has included the study of connotative ''[[word sense|sense]]'' and denotative ''[[reference]]'', [[truth condition]]s, [[argument structure]], [[thematic role]]s, [[discourse analysis]], and the linkage of all of these to syntax. [10750180] |[[Formal semantics|Formal semanticists]] are concerned with the modeling of meaning in terms of the semantics of logic. [10750190] |Thus the sentence ''John loves a bagel'' above can be broken down into its constituents (signs), of which the unit ''loves'' may serve as both syntactic and semantic [[head (linguistics)|head]]. [10750200] |In the late 1960s, [[Richard Montague]] proposed a system for defining semantic entries in the lexicon in terms of [[lambda calculus]]. [10750210] |Thus, the syntactic [[parsing|parse]] of the sentence above would now indicate ''loves'' as the head, and its entry in the lexicon would point to the arguments as the agent, ''John'', and the object, ''bagel'', with a special role for the article "a" (which Montague called a quantifier). [10750220] |This resulted in the sentence being associated with the logical predicate ''loves (John, bagel)'', thus linking semantics to [[categorial grammar]] models of [[syntax]]. [10750230] |The logical predicate thus obtained would be elaborated further, e.g. using truth theory models, which ultimately relate meanings to a set of [[Tarski]]ian universals, which may lie outside the logic. [10750240] |The notion of such meaning atoms or primitives are basic to the [[language of thought]] hypothesis from the 70s. [10750250] |Despite its elegance, [[Montague grammar]] was limited by the context-dependent variability in word sense, and led to several attempts at incorporating context, such as : [10750260] |*[[situation semantics]] ('80s): Truth-values are incomplete, they get assigned based on context [10750270] |*[[generative lexicon]] ('90s): categories (types) are incomplete, and get assigned based on context [10750280] |===The dynamic turn in semantics=== [10750290] |In the [[Noam Chomsky|Chomskian]] tradition in linguistics there was no mechanism for the learning of semantic relations, and the [[Psychological nativism|nativist]] view considered all semantic notions as inborn. [10750300] |Thus, even novel concepts were proposed to have been dormant in some sense. [10750310] |This traditional view was also unable to address many issues such as [[metaphor]] or associative meanings, and [[semantic change]], where meanings within a linguistic community change over time, and [[qualia]] or subjective experience. [10750320] |Another issue not addressed by the nativist model was how perceptual cues are combined in thought, e.g. in [[mental rotation]]. [10750330] |This traditional view of semantics, as an innate finite meaning inherent in a [[lexical unit]] that can be composed to generate meanings for larger chunks of discourse, is now being fiercely debated in the emerging domain of [[cognitive linguistics]] and also in the non-[[Jerry Fodor|Fodorian]] camp in [[Philosophy of Language]]. [10750340] |The challenge is motivated by [10750350] |* factors internal to language, such as the problem of resolving [[indexical]] or [[anaphora]] (e.g. ''this x'', ''him'', ''last week''). [10750360] |In these situations "context" serves as the input, but the interpreted utterance also modifies the context, so it is also the output. [10750370] |Thus, the interpretation is necessarily dynamic and the meaning of sentences is viewed as context-change potentials instead of [[propositions]]. [10750380] |* factors external to language, i.e. language is not a set of labels stuck on things, but "a toolbox, the importance of whose elements lie in the way they function rather than their attachments to things." [10750390] |This view reflects the position of the later [[Wittgenstein]] and his famous ''game'' example, and is related to the positions of [[Willard Van Orman Quine|Quine]], [[Donald Davidson (philosopher)|Davidson]], and others. [10750400] |A concrete example of the latter phenomenon is semantic [[underspecification]] — meanings are not complete without some elements of context. [10750410] |To take an example of a single word, "red", its meaning in a phrase such as ''red book'' is similar to many other usages, and can be viewed as compositional. [10750420] |However, the colours implied in phrases such as "red wine" (very dark), and "red hair" (coppery), or "red soil", or "red skin" are very different. [10750430] |Indeed, these colours by themselves would not be called "red" by native speakers. [10750440] |These instances are contrastive, so "red wine" is so called only in comparison with the other kind of wine (which also is not "white" for the same reasons). [10750450] |This view goes back to [[Ferdinand de Saussure|de Saussure]]: [10750460] |:Each of a set of synonyms like ''redouter'' ('to dread'), ''craindre'' ('to fear'), ''avoir peur'' ('to be afraid') has its particular value only because they stand in contrast with one another. [10750470] |No word has a value that can be identified independently of what else is in its vicinity. [10750480] |and may go back to earlier [[India]]n views on language, especially the [[Nyaya]] view of words as [[Semantic indicator|indicators]] and not carriers of meaning. [10750490] |An attempt to defend a system based on propositional meaning for semantic underspecification can be found in the [[Generative Lexicon]] model of [[James Pustejovsky]], who extends contextual operations (based on type shifting) into the lexicon. [10750500] |Thus meanings are generated on the fly based on finite context. [10750510] |===Prototype theory=== [10750520] |Another set of concepts related to fuzziness in semantics is based on [[Prototype Theory|prototype]]s. [10750530] |The work of [[Eleanor Rosch]] and [[George Lakoff]] in the 1970s led to a view that natural categories are not characterizable in terms of necessary and sufficient conditions, but are graded (fuzzy at their boundaries) and inconsistent as to the status of their constituent members. [10750540] |Systems of categories are not objectively "out there" in the world but are rooted in people's experience. [10750550] |These categories evolve as [[learning theory (education)|learned]] concepts of the world — meaning is not an objective truth, but a subjective construct, learned from experience, and language arises out of the "grounding of our conceptual systems in shared [[embodied philosophy|embodiment]] and bodily experience". [10750560] |A corollary of this is that the conceptual categories (i.e. the lexicon) will not be identical for different cultures, or indeed, for every individual in the same culture. [10750570] |This leads to another debate (see the [[Whorf-Sapir hypothesis]] or [[Eskimo words for snow]]). [10750580] |==Computer science== [10750590] |In [[computer science]], where it is considered as an application of [[mathematical logic]], semantics reflects the meaning of programs or functions. [10750600] |In this regard, semantics permits programs to be separated into their syntactical part (grammatical structure) and their semantic part (meaning). [10750610] |For instance, the following statements use different syntaxes (languages), but result in the same semantic: [10750620] |* x += y; ([[C (programming language)|C]], [[Java (programming language)|Java]], etc.) [10750630] |* x := x + y; ([[Pascal (programming language)|Pascal]]) [10750640] |* Let x = x + y; (early [[BASIC]]) [10750650] |* x = x + y (most BASIC dialects, [[Fortran]]) [10750660] |Generally these operations would all perform an arithmetical addition of 'y' to 'x' and store the result in a variable 'x'. [10750670] |Semantics for computer applications falls into three categories: [10750680] |* [[Operational semantics]]: The meaning of a construct is specified by the computation it induces when it is executed on a machine. [10750690] |In particular, it is of interest ''how'' the effect of a computation is produced. [10750700] |* [[Denotational semantics]]: Meanings are modelled by mathematical objects that represent the effect of executing the constructs. [10750710] |Thus ''only'' the effect is of interest, not how it is obtained. [10750720] |* [[Axiomatic semantics]]: Specific properties of the effect of executing the constructs as expressed as ''assertions''. [10750730] |Thus there may be aspects of the executions that are ignored. [10750740] |The '''[[Semantic Web]]''' refers to the extension of the [[World Wide Web]] through the embedding of additional semantic [[metadata]]; s.a. [10750750] |[[Web Ontology Language]] (OWL). [10750760] |==Psychology== [10750770] |In [[psychology]], ''[[semantic memory]]'' is memory for meaning, in other words, the aspect of memory that preserves only the ''gist'', the general significance, of remembered experience, while [[episodic memory]] is memory for the ephemeral details, the individual features, or the unique particulars of experience. [10750780] |Word meaning is measured by the company they keep; the relationships among words themselves in a [[semantic network]]. [10750790] |In a network created by people analyzing their understanding of the word (such as [[Wordnet]]) the links and decomposition structures of the network are few in number and kind; and include "part of", "kind of", and similar links. [10750800] |In automated [[ontologies]] the links are computed vectors without explicit meaning. [10750810] |Various automated technologies are being developed to compute the meaning of words: [[latent semantic indexing]] and [[support vector machines]] as well as [[natural language processing]], [[neural networks]] and [[predicate calculus]] techniques. [10750820] |Semantics has been reported to drive the course of psychotherapeutic interventions. [10750830] |Language structure can determine the treatment approach to drug-abusing patients. . [10750840] |While working in Europe for the US Information Agency, American psychiatrist, Dr. A. James Giannini reported semantic differences in medical approaches to addiction treatment.. [10750850] |English speaking countries used the term "drug dependence" to describe a rather passive pathology in their patients. [10750860] |As a result the physician's role was more active. [10750870] |Southern European countries such as Italy and Yugoslavia utilized the concept of "tossicomania" (i.e. toxic mania) to describe a more acive rather than passive role of the addict. [10750880] |As a result the treating physician's role shifted to that of a more passive guide than that of an active interventionist. . [10760010] |
Sentence (linguistics)
[10760020] |In [[linguistics]], a '''sentence''' is a grammatical unit of one or more words, bearing minimal syntactic relation to the words that precede or follow it, often preceded and followed in speech by pauses, having one of a small number of characteristic intonation patterns, and typically expressing an independent statement, question, request, command, etc. [10760030] |Sentences are generally characterized in most languages by the presence of a [[finite verb]], e.g. "[[The quick brown fox jumps over the lazy dog]]". [10760050] |==Components of a sentence== [10760060] |A simple ''complete sentence'' consists of a ''[[subject (grammar)|subject]]'' and a ''[[predicate (grammar)|predicate]]''. [10760070] |The subject is typically a [[noun phrase]], though other kinds of phrases (such as [[gerund]] phrases) work as well, and some languages allow subjects to be omitted. [10760080] |The predicate is a finite [[verb phrase]]: it's a finite verb together with zero or more [[object (grammar)|objects]], zero or more [[complement (linguistics)|complements]], and zero or more [[adverbial]]s. [10760090] |See also [[copula]] for the consequences of this verb on the theory of sentence structure. [10760100] |===Clauses=== [10760110] |A [[clause]] consists of a subject and a verb. [10760120] |There are two types of clauses: independent and subordinate (dependent). [10760130] |An independent clause consists of a subject verb and also demonstrates a complete thought: for example, "I am sad." [10760140] |A subordinate clause consists of a subject and a verb, but demonstrates an incomplete thought: for example, "Because I had to move." [10760150] |==Classification== [10760160] |===By structure=== [10760170] |One traditional scheme for classifying [[English language|English]] sentences is by the number and types of [[finite verb|finite]] [[clause]]s: [10760180] |* A ''[[simple sentence]]'' consists of a single [[independent clause]] with no [[dependent clause]]s. [10760190] |* A ''[[compound sentence (linguistics)|compound sentence]]'' consists of multiple independent clauses with no dependent clauses. [10760200] |These clauses are joined together using [[grammatical conjunction|conjunctions]], [[punctuation]], or both. [10760210] |* A ''[[complex sentence]]'' consists of one or more independent clauses with at least one dependent clause. [10760220] |* A ''[[complex-compound sentence]]'' (or ''compound-complex sentence'') consists of multiple independent clauses, at least one of which has at least one dependent clause. [10760230] |===By purpose=== [10760240] |Sentences can also be classified based on their purpose: [10760250] |*A ''declarative sentence'' or ''declaration'', the most common type, commonly makes a statement: ''I am going home.'' [10760260] |*A ''negative sentence'' or ''[[negation (linguistics)|negation]]'' denies that a statement is true: ''I am not going home.'' [10760270] |*An ''interrogative sentence'' or ''[[question]]'' is commonly used to request information — ''When are you going to work?'' — but sometimes not; ''see'' [[rhetorical question]]. [10760280] |*An ''exclamatory sentence'' or ''[[exclamation]]'' is generally a more emphatic form of statement: ''What a wonderful day this is!'' [10760290] |===Major and minor sentences=== [10760300] |A major sentence is a ''regular'' sentence; it has a [[subject (grammar)|subject]] and a [[predicate (grammar)|predicate]]. [10760310] |For example: ''I have a ball.'' [10760320] |In this sentence one can change the persons: ''We have a ball.'' [10760330] |However, a minor sentence is an irregular type of sentence. [10760340] |It does not contain a finite verb. [10760350] |For example, "Mary!" [10760360] |"Yes." [10760370] |"Coffee." etc. [10760380] |Other examples of minor sentences are headings (e.g. the heading of this entry), stereotyped expressions (''Hello!''), emotional expressions (''Wow!''), proverbs, etc. [10760390] |This can also include sentences which do not contain verbs (e.g. ''The more, the merrier.'') in order to intensify the meaning around the nouns (normally found in poetry and catchphrases) by Judee N.. [10770010] |
Computer software
[10770020] |'''Computer software,''' or just '''software''' is a general term used to describe a collection of [[computer program]]s, [[procedures]] and documentation that perform some tasks on a computer system. [10770030] |The term includes [[application software]] such as [[word processor]]s which perform productive tasks for users, [[system software]] such as [[operating system]]s, which interface with [[hardware]] to provide the necessary services for application software, and [[middleware]] which controls and co-ordinates [[Distributed computing|distributed systems]]. [10770040] |"Software" is sometimes used in a broader context to mean anything which is not hardware but which is ''used'' with hardware, such as film, tapes and records. [10770050] |==Relationship to computer hardware== [10770060] |[[Computer]] software is so called to distinguish it from [[computer hardware]], which encompasses the physical interconnections and devices required to store and execute (or run) the software. [10770070] |At the lowest level, software consists of a [[machine language]] specific to an individual processor. [10770080] |A machine language consists of groups of binary values signifying processor instructions which change the state of the computer from its preceding state. [10770090] |Software is an ordered sequence of instructions for changing the state of the computer hardware in a particular sequence. [10770100] |It is usually written in [[high-level programming language]]s that are easier and more efficient for humans to use (closer to [[natural language]]) than machine language. [10770110] |High-level languages are [[compiler|compiled]] or [[interpreter (computing)|interpreted]] into machine language object code. [10770120] |Software may also be written in an [[assembly language]], essentially, a mnemonic representation of a machine language using a natural language alphabet. [10770130] |Assembly language must be assembled into object code via an [[assembly language#Assembler|assembler]]. [10770140] |The term "software" was first used in this sense by [[John W. Tukey]] in [[1958]]. [10770150] |In [[computer science]] and [[software engineering]], '''computer software''' is all computer programs. [10770160] |The theory that is the basis for most modern software was first proposed by [[Alan Turing]] in his [[1935]] essay ''Computable numbers with an application to the Entscheidungsproblem''. [10770170] |==Types== [10770180] |Practical [[computer system]]s divide [[software system]]s into three major classes: [[system software]], [[programming software]] and [[application software]], although the distinction is arbitrary, and often blurred. [10770190] |*'''[[System software]]''' helps run the [[computer hardware]] and [[computer system]]. [10770200] |It includes [[operating system]]s, [[device driver]]s, diagnostic tools, [[Server (computing)|server]]s, [[windowing system]]s, [[software utility|utilities]] and more. [10770210] |The purpose of systems software is to insulate the applications programmer as much as possible from the details of the particular computer complex being used, especially memory and other hardware features, and such as accessory devices as communications, printers, readers, displays, keyboards, etc. [10770220] |*'''[[Programming software]]''' usually provides tools to assist a [[programmer]] in writing [[computer program]]s, and software using different [[programming language]]s in a more convenient way. [10770230] |The tools include [[text editors]], [[compilers]], [[interpreter (computing)|interpreters]], [[linkers]], [[debuggers]], and so on. [10770240] |An [[Integrated development environment]] (IDE) merges those tools into a software bundle, and a programmer may not need to type multiple [[command]]s for compiling, interpreting, debugging, tracing, and etc., because the IDE usually has an advanced ''[[graphical user interface]],'' or GUI. [10770250] |*'''[[Application software]]''' allows end users to accomplish one or more specific (non-computer related) [[task]]s. [10770260] |Typical applications include [[Industry|industrial]] [[automation]], [[business software]], [[educational software]], [[medical software]], [[database]]s, and [[computer games]]. [10770270] |Businesses are probably the biggest users of application software, but almost every field of human activity now uses some form of application software [10770280] |==Program and library== [10770290] |A [[Computer program|program]] may not be sufficiently complete for execution by a [[computer]]. [10770300] |In particular, it may require additional software from a [[software library]] in order to be complete. [10770310] |Such a library may include software components used by [[stand-alone]] programs, but which cannot work on their own. [10770320] |Thus, programs may include standard routines that are common to many programs, extracted from these libraries. [10770330] |Libraries may also ''include'' 'stand-alone' programs which are activated by some [[event-driven programming|computer event]] and/or perform some function (e.g., of computer 'housekeeping') but do not return data to their calling program. [10770340] |Libraries may be [[Execution (computers)|called]] by one to many other programs; programs may call zero to many other programs. [10770350] |==Three layers== [10770360] |Users often see things differently than programmers. [10770370] |People who use modern general purpose computers (as opposed to [[embedded system]]s, [[analog computer]]s, [[supercomputer]]s, etc.) usually see three layers of software performing a variety of tasks: platform, application, and user software. [10770380] |;Platform software: [10770390] |[[Platform (computing)|Platform]] includes the [[firmware]], [[device driver]]s, an [[operating system]], and typically a [[graphical user interface]] which, in total, allow a user to interact with the computer and its [[peripheral]]s (associated equipment). [10770400] |Platform software often comes bundled with the computer. [10770410] |On a [[Personal computer|PC]] you will usually have the ability to change the platform software. [10770420] |;Application software: [10770430] |[[Application software]] or Applications are what most people think of when they think of software. [10770440] |Typical examples include office suites and video games. [10770450] |Application software is often purchased separately from computer hardware. [10770460] |Sometimes applications are bundled with the computer, but that does not change the fact that they run as independent applications. [10770470] |Applications are almost always independent programs from the operating system, though they are often tailored for specific platforms. [10770480] |Most users think of compilers, databases, and other "system software" as applications. [10770490] |;User-written software: [10770500] |[[End-user development]] tailors systems to meet users' specific needs. [10770510] |User software include spreadsheet templates, word processor macros, scientific simulations, and scripts for graphics and animations. [10770520] |Even email filters are a kind of user software. [10770530] |Users create this software themselves and often overlook how important it is. [10770535] |Depending on how competently the user-written software has been integrated into purchased application packages, many users may not be aware of the distinction between the purchased packages, and what has been added by fellow co-workers. [10770540] |==Creation== [10770550] |==Operation== [10770560] |Computer software has to be "loaded" into the [[computer storage|computer's storage]] (such as a ''[[hard drive]]'', ''memory'', or ''[[RAM]]''). [10770570] |Once the software has loaded, the computer is able to ''execute'' the software. [10770580] |This involves passing [[instruction (computer science)|instructions]] from the application software, through the system software, to the [[hardware]] which ultimately receives the instruction as [[machine language|machine code]]. [10770590] |Each instruction causes the computer to carry out an operation -- moving [[data (computing)|data]], carrying out a [[computation]], or altering the [[control flow]] of instructions. [10770600] |Data movement is typically from one place in memory to another. [10770610] |Sometimes it involves moving data between memory and registers which enable high-speed data access in the CPU. [10770620] |Moving data, especially large amounts of it, can be costly. [10770630] |So, this is sometimes avoided by using "pointers" to data instead. [10770640] |Computations include simple operations such as incrementing the value of a variable data element. [10770650] |More complex computations may involve many operations and data elements together. [10770660] |Instructions may be performed sequentially, conditionally, or iteratively. [10770670] |Sequential instructions are those operations that are performed one after another. [10770680] |Conditional instructions are performed such that different sets of instructions execute depending on the value(s) of some data. [10770690] |In some languages this is known as an "if" statement. [10770700] |Iterative instructions are performed repetitively and may depend on some data value. [10770710] |This is sometimes called a "loop." [10770720] |Often, one instruction may "call" another set of instructions that are defined in some other program or [[module (programming)|module]]. [10770730] |When more than one computer processor is used, instructions may be executed simultaneously. [10770740] |A simple example of the way software operates is what happens when a user selects an entry such as "Copy" from a menu. [10770750] |In this case, a conditional instruction is executed to copy text from data in a 'document' area residing in memory, perhaps to an intermediate storage area known as a 'clipboard' data area. [10770760] |If a different menu entry such as "Paste" is chosen, the software may execute the instructions to copy the text from the clipboard data area to a specific location in the same or another document in memory. [10770770] |Depending on the application, even the example above could become complicated. [10770780] |The field of [[software engineering]] endeavors to manage the complexity of how software operates. [10770790] |This is especially true for software that operates in the context of a large or powerful [[computer system]]. [10770800] |Currently, almost the only limitations on the use of computer software in applications is the ingenuity of the designer/programmer. [10770810] |Consequently, large areas of activities (such as playing grand master level chess) formerly assumed to be incapable of software simulation are now routinely programmed. [10770820] |The only area that has so far proved reasonably secure from software simulation is the realm of human art— especially, pleasing music and literature. [10770830] |Kinds of software by operation: [[computer program]] as [[executable]], [[source code]] or [[script (computer programming)|script]], [[computer configuration|configuration]]. [10770840] |==Quality and reliability== [10770850] |[[Software reliability]] considers the errors, faults, and failures related to the design, implementation and operation of software. [10770860] |'''See''' [[Computer security audit|Software auditing]], [[Software quality]], [[Software testing]], and [[Software reliability]]. [10770870] |==License== [10770880] |[[Software license]] gives the user the right to use the software in the licensed environment, some software comes with the license when purchased off the shelf, or an OEM license when bundled with hardware. [10770890] |Other software comes with a [[free software licence]], granting the recipient the rights to modify and redistribute the software. [10770900] |Software can also be in the form of [[freeware]] or [[shareware]]. [10770910] |See also [[License Management]]. [10770920] |==Patents== [10770930] |The issue of [[software patent]]s is controversial. [10770940] |Some believe that they hinder [[software development]], while others argue that software patents provide an important incentive to spur software innovation. [10770950] |See [[software patent debate]]. [10770960] |==Ethics and rights for software users== [10770970] |Being a new part of society, the idea of what rights users of software should have is not very developed. [10770980] |Some, such as the [[free software community]], believe that software users should be free to modify and redistribute the software they use. [10770990] |They argue that these rights are necessary so that each individual can control their computer, and so that everyone can cooperate, if they choose, to work together as a community and control the direction that software progresses in. [10770995] |Others believe that software authors should have the power to say what rights the user will get. [10771000] |==Software companies and non-profit organizations== [10771010] |Examples of non-profit software organizations : [[Free Software Foundation]], [[GNU Project]], [[Mozilla Foundation]] [10771020] |Examples of large software companies are: [[Microsoft]], [[IBM]], [[Oracle_Corporation|Oracle]], [[SAP AG|SAP]] and [[HP]].