and Policy

. Purpose . Th e purpose of this study is to examine the internal consistency of wine guides by comparing the judgements of expert wine tasters and reviewers. A classifi cation of wines is provided to establish whether expert reviews of similar wines are coherent. Design/methodology/approach . Sentiment analysis based on natural language processing techniques was used to compare quantitative and qualitative reviews between experts. In addition, a fi nite mixture model was used to classify wines into categories to analyse internal consistency between ratings. Findings . Th e results for a sample of more than 200,000 Wine Enthusiast ratings reveal signifi cant diff erences between expert reviews. Th is fi nding indicates that there are no standard criteria for reviewing wines included in the guide. Originality . Wine guides are amongst the most widely used marketing resources in the wine industry. Th ey provide a signal to consumers about the quality of wines, guiding their purchase decisions. Th ey also infl u-ence the reputation of brands and the performance of companies producing these wines. Th e main contribution of this study is to propose a new way to compare the reviews of wine guide experts.


INTRODUCTION
Information infl uences users' decision-making processes.However, information asymmetry generally exists in the buyer-seller relationship because each party has a diff erent amount of information about products [1].Research on experiential and hedonic consumption has shown that consumers' behaviour is aff ected by "social infl uence including peer input (word-of-mouth) and judgments of respected experts (professional evaluations)" [2, p. 180].
Wine is an experience product whose quality cannot be assessed by consumers before purchase and consumption [3,4].This feature of wine increases the complexity of the purchase decision process.Thus, information asymmetries arise between consumers and winemakers in relation to product quality.Accordingly, high-and low-quality products can coexist in the market [5].Wineries employ different marketing strategies to reduce these asymmetries and inform the market about the quality of their products [6].Some use advertising in the mainstream media and encourage positive word-ofmouth communication amongst consumers [7,8].They also use awards in national and international competitions as part of their branding and communication strategies [6].Finally, receiving high ratings in well-known wine guides, which are managed by experts and prescribers, can also help reduce information asymmetries between winemakers and consumers.
This study focuses on the social influence of experts in wine guides.Wine guides offer thousands of reviews of wines from around the world, basing their reviews on the opinions of panels of experts who taste these wines.
The assumption is that consumers use judgements of wine quality by expert reviewers in wine guides as a source of information to make purchase decisions [9].These expert reviewers might consequently influence the performance of the wine-producing companies.Previous research has in fact shown that there is a relationship between online reviews and consumer choice and firm sales [10,11].However, despite the potential impact on consumers and wineries, the nature and effects of expert opinions in wine guides remains an underresearched topic.
Wine experts usually provide a quantitative (score) and a qualitative (comment) review.The aim of this study is to test the consistency between these two assessments (quantitative and qualitative) of tasted wines.For wine guides to offer a credible source of information, both assessments of the same wine should match.That is, higher scores should be aligned with more positive comments.This analysis can confirm the role of expert evaluations as a credible source of information for consumers.
To test the consistency of wine experts' reviews, the qualitative content (i.e.tasting notes) is examined using sentiment analysis based on natural language processing techniques.Then, these reviews and other relevant variables (origin and grape variety) are used to establish whether expert reviews of similar wines are coherent.Coherence is examined by classifying wines according to reviews and wine-related variables.A finite mixture model is employed for this classification.The study con-text is the Wine Enthusiast guide, one of the most prestigious wine guides in the world.The results show significant differences between expert reviews, which raises doubts about the usefulness and credibility of wine guides as a source of information.

Wine guides as a marketing tool
Guides are extremely popular in the wine industry because they offer a point of comparison across brands [12] and provide consumers with a signal of wine quality.Wine guides are based on the opinions of experts and professional tasters, who follow standardised, systematic procedures that aim to provide a rigorous assessment of wines.These experts and tasters are assumed to be independent of wineries, thus helping consumers make informed purchase decisions, as the learning process necessary for consumers to become wine experts themselves takes time [13].
Research has highlighted the effect of wine expert recommendations from a marketing perspective.Parsons and Thompson [14] showed that consumers attribute high credibility to independent wine expert recommendations.Friberg and Grönqvist [15] found a significant effect of positive reviews by experts on the sales of the wines they had tasted.The scores that wines receive in these guides can also influence other marketing variables.A line of research has focused on the effect of expert reviews on wine prices [16].For instance, studies have shown a positive effect of this type of evaluation on product prices, associated with a greater product reputation [7,17].Ashenfelter and Jones [18] showed that the influence of expert ratings on the price of wine is even greater than that of other factors such as terroir conditions or climate, which are commonly used to predict wine prices [19].Wine research has also used the sensory reviews of experts in wine guides to measure wine quality and brand reputation [20].Dressler [21] analysed the reputation of German wineries, individually and collectively, using three wine guides (Feinschmecker, Gault Millau and Eichelmann) and found consistent judgements across all three.Focused on Sicilian wines, Roma et al. [9] used experts' scores in wine guides as a proxy of firm (wine) reputation.This approach is common in the wine literature [22].However, despite this evidence, the impact of a positive expert review on the price of a wine may depend not only on the reputation of the wine itself but also on the reputation of the expert [23,24] because not all experts or guides have the same reputation and prestige [25].Consistency of expert product reviews: an application to wine guides

The expert-consistency effect
According to dual-process theory [26], individuals' opinions and even behaviours are based on informational and normative influences such as those from expert reviews [27][28][29].Information has a greater impact on the receiver if the sender is perceived as credible.Expert information is believed to be more credible and accurate (i.e.consistent) than non-expert information [30,31].
In the context of wine, it is difficult to identify the factors that each expert considers when making judgements and rating wines because there is no common frame of reference across guides [16,32].An expert's rating is not necessarily an objective indicator of the quality of a wine because experts make judgements based on their own personal preferences.Thus, when an expert gives a high rating to a certain wine, it is not intended to convey the idea that the wine is of a higher quality than another wine with a lower rating.This lack of comparability arises because ratings of wines are conditioned by several factors such as origin, vintage, winery, price and even the expectations of the expert.Therefore, a higher score for one wine than for another simply indicates an expert's greater preference for that wine.
Consequently, despite their alleged objectivity (as stated in wine guides), expert reviews cannot be considered absolute objective assessments of wine quality.For instance, they may be biased by experts' personal preferences [33].Evidence regarding the consistency of expert judgements is somewhat mixed.Some authors have found consistency between different experts' reviews of the same wine (e.g.[34]).However, other authors have expressed concern about inconsistencies between different experts' opinions of wine quality and even inconsistencies in reviews by the same expert over time (e.g.[35][36][37]).Cao and Stokes [38] reported that personal bias in wine expert reviews translates into different ratings, discriminatory capacity and variability in the ratings of different wines.Likewise, Ashton [35,39] observed that wine guides focus on a few wines and cannot be considered fair representations of the entire market, noting that even the number of tasters used to issue a rating can influence the rating.These guides continue to be highly important in many markets and are used as a reference by consumers around the world.Therefore, further investigation of the effects of expert consistency/inconsistency is warranted.

Sentiment analysis: a tool for analysing the consistency of expert reviews
In recent years, natural language processing research techniques have allowed researchers to perform tex-tual and sentiment analysis of reviews by both experts and consumers (e.g.[40][41][42][43][44][45][46]).Sentiment analysis is a subfield within natural language processing techniques that focuses on automatically classifying a text through its valence [47].It enables the extraction of information on opinions about a subject (from users or experts) for a certain product [48,49].Previous research has shown that this type of analysis based on the characteristics of the product can provide more precise information than a general analysis of the overall (numerical) assessment [50].Recent literature reviews have highlighted the importance and uniqueness of sentiment analysis in marketing research [51] and in hospitality and tourism [52].
In the context of wine guides, users typically find two ratings or judgements of a given wine.The first is a numerical score, usually on a scale of 0 to 100 points or 0 to 20 points, depending on the guide.Some guides only publish wines that receive a minimum score of 80 or 85 points.The second rating is a qualitative review based on tasting notes for the wine.These tasting notes consist of a brief literal description of the sensory and organoleptic qualities of the wine [53].Although numerical scores are easily interpretable, the natural limitations of language hinder and complicate the task of using words to convey what a wine is really like and to describe the sensations that the expert wants to convey [54].Sometimes, the sensory characteristics of wines are so special or unusual that there may not be the right words to describe it.Furthermore, some authors suggest that the language of professional tasting, which is used to describe the sensory properties of a wine, is based on jargon and vocabulary that is so complex and difficult to decipher that only the experts themselves or the most experienced consumers can understand it.In fact, Peynaud and Blouin [55] found that for professional tasting notes to be effective, consumers must have a high level of understanding about tasting, which is not always the case.Sometimes, these tasting notes may be pretentious, offering little informational validity for consumers [56].
Therefore, sentiment analysis based on each of the characteristics considered in the tasting notes could offer a broader and more accurate illustration of how experts review a wine.From an analytical perspective, the opinions of experts require analysis at the sentence level [57].This sentence-level focus is necessary because experts who review wines consider different characteristics or attributes and generally have a different opinion on each of these aspects.Although many sentiment analysis tools can easily divide comments into negative, positive or neutral, a textual review of a given wine may contain phrases with different polarities because experts may have different feelings about each characteristic of the wine.For instance, the standard tasting phases (i.e.sight, smell and taste) may have diff erent polarities, with some aspects being rated positively, others negatively and others neutrally.In addition, there may be diff erent degrees of positive or negative opinions.Accordingly, reviews cannot be qualifi ed simply as positive, negative or neutral.Instead, they include a series of additive perceptions that create a nuanced rating and provide specifi c information on each of the aspects evaluated by the expert.For instance, some characteristics of the wine (e.g. in the olfactory phase of tasting) may be rated positively, whereas others (e.g.related to the palate) may be negatively rated.
In sum, sentiment analysis techniques could lead to precise inference of the overall numerical score for the wine.Th erefore, these techniques are particularly useful for examining the opinions of experts about the wines in a guide.Nguyen et al. [58] recently employed a similar approach, focusing on so-called online expert users.

METHOD
Th is study focuses on reviews by 19 professional wine tasters from the Wine Enthusiast guide between 1999 and 2019.Wine Enthusiast Magazine is one of the most prestigious international magazines in the sector, together with Th e Wine Advocate (Robert Parker).Each review included qualitative tasting notes, in which the expert gave a judgement on the tasted wine, a quantitative score of the wine (from 80 to 100 points), and some additional characteristics such as price, origin and grape variety (see Figure 1).Th e wines were from 43 countries and their price ranged from 4 dollars to 3,400 dollars.Aft er the elimination of outliers and missing cases, the fi nal sample contained 201,004 reviews.
Th e method had two stages.Th e fi rst stage involved that quantitative ratings as well as qualitative reviews were compared among the diff erent experts in the guide.Reviews published in the guide were made by 19 experts, as well as some other anonymous reviewers.Although the comparison of quantitative ratings was straightforward, the comparison of qualitative reviews required prior analysis of tasting notes using sentiment analysis.Th is analysis was carried out using the AFINN lexicon.AFINN consists of 2,477 words in English that express a certain degree of positive or negative sentiment.Th is corpus of words, produced by Finn Arup Nielsen between 2009 and 2011, contains a rating for words ranging from −5 (most negative sentiment) to +5 (most positive sentiment).Th is lexicon displays the information in two columns: the word next to its corresponding value (e.g."awesome" -4 or "awful" -3).In this study, the sentiment value of the expert review was calculated as the sum of the polarity of each of the words used in the review.In essence, each review was divided into sentences, and each sentence into words.To evaluate one sentence of the review, each word was assigned a value according to the AFINN lexicon.Adding up the values of all words in the sentence gave an evaluation of that specifi c comment.Once this process had been performed for all sentences in the review, the evaluations of each sentence or comment were summed to give an overall score for the review.Because an expert review covers diff erent aspects, diff erent opinions can be found in the same review.Th at is, the same review might contain both positive comments (e.g.regarding palate) and negative comments (e.g.regarding nose).However, the additive procedure employed in this study gave an overall evaluation of the intensity (value) and polarity (positive/negative) of the review based on the evaluation of each comment in the review.Compared to the alternative of using the average of the individual evaluations of each word, this additive procedure accounted for the length of the review because there is evidence that longer reviews provide greater added value to the tasting note of the wine [53].In addition, it provided a broader ranking of the review than a simple classifi cation as positive, negative or neutral.
In the second stage, the wines were classified according to their characteristics using techniques based Consistency of expert product reviews: an application to wine guides on cluster analysis.The starting assumption was that wines in a given group were homogeneous but different from the wines in other groups.Each wine was defined by a set of variables related to its review (qualitative and quantitative), origin and grape.The objective of this stage was to group similar wines by comparing specific vectors for the set of variables used in this study.An N × d matrix was created for this analysis, where the columns were the variables, and the rows were the observations.Each observation (i.e.row) was a vector of dimension d, denoted as x i .The data set was denoted as x = (x i ) i∈{1,⋯,N} .Each observation had d cont continuous variables in ℝ d cont and d cat categorical variables, with {1,⋯,m j } levels for each nominal variable j.Hence, To classify the observations into groups that could be interpreted in a meaningful way, an unsupervised learning method was used.It was hypothesised that there existed hidden or latent variables (unobserved random variables) for all data points in the data set that associated a specific cluster to each observation.Thus, the latent variable model was a mixture model.
In a mixture model, K distributions are mixed, and it is assumed that each observation belongs to one of them.The latent variable z i for observation i corresponds to one of the distributions in the mixture.In other words, the latent variable z i is the cluster to which observation x i belongs.If the number of clusters is K, then z i ∈{1,⋯,K}, and the set of latent variables is denoted as z = (z i ) i∈{1,⋯,N} .In a mixture model, the data generation process is assumed to be p(z,x) = p(z i )p(x i |z i = k).Here, p(z i ) is a multinomial distribution, where η k = Pr(z i = k) is the probability that observation i belongs to cluster k.The set of probabilities η = (η k ) k∈{1,⋯,K} are referred to as the mixing weights.Furthermore, is the probability distribution of the data in cluster k, and θ k are the parameters of this distribution.The probability density function is given as follows: where θ = (θ k ) k∈{1,⋯,K} is the set of all parameters for the distributions in the mixture, including the mixing weights.
For continuous variables, the cluster distributions were multivariate Gaussian distributions ϕ k (x i |θ k ) = N(x i |μ k ,Σk), where the parameters of the distribution k, θ k = {μ k ,Σk} were the mean vector μ k and covariance matrix Σk.Categorical variables were assumed to be independent multivariate multinomial variables distributed conditional on the latent variable.Therefore, , where α jk is the vector of parameters (event probabilities) for the multinomial distribution associated with variable j in cluster k, and its dimension is m j .
For the estimation of the parameters, the R package Rmixmod version 2.1.5was used.This package maximises the log-likelihood with an expectation maximisation (EM) algorithm as follows: Θ = {η,θ}, the set of all parameters of the mixture.Once the wines had been classified into similar groups, the differences between the expert reviews of the wines belonging to each cluster were analysed.The data processing and estimation was carried out in MATLAB.

RESULTS
In the first stage, the quantitative and qualitative expert reviews in the guide were compared.The average score of the tasted wines was 88.81 points (SD = 3.03), with a minimum of 80 points and a maximum of 100.The experts used an average of 40.56 words in their descriptions of wines (SD = 11.28), with a minimum of three words and a maximum of 135.The average sentiment score was 3.2 points (SD = 7.02), with a minimum of -33 points and a maximum of 41.The average price was 36.62 dollars (SD = 43.17),with a minimum of 4 dollars and a maximum of 3,400 dollars.
Table 1 presents the average quantitative and sentiment ratings for each expert.It also shows the average number of words used by each expert in the tasting notes.There are statistically significant differences between the experts' quantitative ratings.There are also differences in the nuances provided in the tasting notes, as reflected by the differences in the number of words used and the sentiment ratings for the experts.
In the second stage, the wines were classified according to their characteristics using techniques based on cluster analysis.The proposed model was estimated for K = 2,…,7 clusters in relation to the wines appearing in this guide.To identify the clusters, four variables were used: the quantitative rating, sentiment score of the tasting note, country of origin of the wine and grape variety.The model selection criterion was the Bayesian information criterion (BIC; [59] Schwarz 1978).This criterion suggested that K = 4 was the number of groups that best fit the data (see Table 2).External validation is also desirable to confirm the usefulness of the cluster solution.External validation consisted of examining whether there were also intercluster differences in variables other than those used to classify the wines.This external validation served as an exploratory investigation of the influence of the cluster structure and main characteristics [60].To this end, the price variable was also examined (see Table 2).
The empirical findings reveal some interesting differences between the clusters.The first group, "top-ofthe-range wines (best quality)", consists of wines with a well-above-average rating based on both sentiment and quantitative ratings.These wines are also on average more expensive.It consists of red and white wines, mainly from France.The second group, "low-price wines (affordable/low cost)", consists of wines with a belowaverage quantitative score but with a slightly positive sentiment rating.The average price of wines in this group is well below the average for the entire sample.This group includes white and red wines from North and South America, France and Spain.The third group, "overpriced wines", consists of wines with a neutral sentiment rating but a roughly average quantitative score.These wines' average price is well above the average for the entire sample.They are mostly red wines from the United States and Italy.Finally, the fourth group, "bestvalue wines (smart choice)", consists of wines with a roughly average quantitative score and a below-average qualitative rating.They also have a lower-than-average price.This group mainly consists of white wines from the United States.
The differences between the four groups were significant for the four variables considered in the analysis.In addition, for the external validation of the four clusters, ANOVA was used to test whether the prices differed between clusters.The price variable (4064.87;< 0.0001) was significantly different between clusters, thereby externally validating the classification presented in this research.
Once the wines had been classified into homogeneous groups, the average sentiment evaluations of the 57 Consistency of expert product reviews: an application to wine guides tasters were calculated for each group.The results indicate that the differences between the experts' reviews differ significantly, which shows that there are no standard criteria for reviewing the wines in the guide (see Table 3).This result reinforces the earlier idea (see Table 1) that tasting notes might differ amongst wine experts, even when the tasted wines are similar and receive a comparable quantitative rating.

CONCLUSIONS
Wine guides written by professional and expert tasters are widely used in the wine industry to market wine, providing important information signals for consumers around the world.However, despite the importance of these guides, some authors have expressed doubts about the consistency of the scores and reviews they provide.The objective of this study was to analyse the internal consistency of the scores and reviews of the experts and professional tasters writing for a specific guide.The method included sentiment analysis of the tasting notes and a novel clustering technique that identified groups of wines with similar characteristics.
The results show considerable divergence between the qualitative and quantitative assessments by professional tasters in the Wine Enthusiast wine guide.Although most consumers trust the guide to reduce their information asymmetries with respect to winemakers, disparity in the criteria used by the guide's experts raises doubts over its effectiveness as a source of reliable, verified, standardised information for consumers.In fact, even when wines are grouped according to their characteristics, there are still discrepancies amongst experts.Therefore, it cannot be said that the guide follows a single, uniform set of criteria for its wine reviews.
These results have managerial implications for the wine sector.First, the results have implications for wineries whose wines are tasted by experts writing for this guide.These wineries should be aware that experts' personal preferences may affect their judgements.Hence, knowing the personal tastes and background of each expert could help wineries improve the ratings of their wines.Second, these results are important for the management of the guide itself.The reputation and prestige of a particular guide is the basis of consumers' trust in that guide, which is considered a reliable and independent source of information.If the reviews in the guide are inconsistent and the experts do not reach a consensus when rating wines, doubts may arise about the reliability of these reviews, depending on which expert tasted the wine.These doubts could ultimately affect the publication's reputation.
Finally, regarding the limitations of this study, only one guide (Wine Enthusiast) was analysed.It is not possible to extrapolate these results to other specialist publications within the sector.Furthermore, the sentiment analysis was carried out using a specific lexicon.Although this lexicon has been widely used in academic studies, it is not the only available alternative, nor is it specific to the wine sector.These limitations open new research opportunities that should be addressed in the future.Future research could also explore the effect of reviewer expertise in the context of wine guides.Reviewer expertise has already been shown to influence reviewer ratings in the context of hotel and restaurant review platforms [58].Finally, future research could extend this analysis to other markets where guides based on expert reviews are also common.Examples include the film and television industry, where sentiment analysis techniques have already been used to study expert and consumer opinions [2] but not to study specialised guides (e.g.Rotten Tomatoes).

Table 1 .
Ratings of wines according to experts.

Table 2 .
Descriptive analysis of clusters with mean and standard deviation (in parentheses).

Table 3 .
Test of differences of experts' sentiment ratings.