Degree of Agreement Measured
On the surface, these data seem suitable for analysis using methods used for 2 × 2 tables (if the variable is categorical) or correlation (if numerical), which we discussed earlier in this series. [1,2] However, further examination would show that this is not true. In these methods, the two measurements on each individual refer to different variables (e.B exposure and outcome or height and weight, etc.), while in “agree studies”, both measurements refer to the same variable (e.B. chest x-rays evaluated by two radiologists, or hemoglobin measured using two methods). κ = (observed agreement [Po] – expected agreement [Pe])/(1-expected agreement [Pe]). For ordinal data where there are more than two categories, it is useful to know whether the ratings of the different evaluators varied by a small measure or a large quantity. For example, microbiologists can assess bacterial growth on culture plates as: none, occasional, moderate or confluent. Here, evaluations of a particular plate by two assessors as “occasional” or “moderate” would imply a lower degree of discord than if these scores were “growth-free” or “confluent”. Kappa`s weighted statistics take this difference into account.
This therefore gives a higher value if the respondents` responses match more closely, with the maximum values for a perfect match; Conversely, a larger difference between two ratings provides a lower weighted kappa value. The techniques for assigning the weighting of the difference between categories (linear, square) may vary. If two instruments or techniques are used to measure the same variable on a continuous scale, Bland-Altman diagrams can be used to estimate the match. This diagram is a scatter plot of the difference between the two measurements (Y axis) compared to the average of the two measures (X axis). Thus, it provides a graphical representation of distortion (average difference between the two observers or techniques) with compliance limits of 95%. The latter results from the formula: Scatter plot, which shows the correlation between hemoglobin measurements from two methods for the data presented in Table 3 and Figure 1. The dotted line is a trend line (least squares line) through the observed values, and the correlation coefficient is 0.98. However, the individual points are far from the perfect correspondence line (solid black line) The degree of agreement between several measures of the same set is called__________. It reflects the reproducibility of a particular type of measurement. Agreement between measures refers to the degree of agreement between two (or more) sets of measures. Statistical methods to verify compliance are used to assess variability between examiners or to decide whether one technique for measuring one variable can replace another. In this article, we look at the agreed statistical measures for different types of data and discuss the differences between these and those used to assess correlation.
It is important to note that in each of the three situations in Table 1, the pass percentages are the same for both examiners, and if both examiners are compared to a usual 2 × 2 test for matched data (McNemar test), there would be no difference between their performance; On the other hand, the agreement between observers is very different in all three situations. The basic concept to understand here is that the “agreement” quantifies the concordance between the two examiners for each of the “pairs” of notes, rather than the similarity of the overall percentage of success between the examiners. Methods for assessing agreement between observers according to the type of variables measured and the number of observers Statistical methods for conformity assessment vary according to the type of variable studied and the number of observers between whom agreement is sought. These are summarized in Table 2 and are explained below. Now consider a hypothetical situation where examiners do exactly that, that is, give notes by throwing a coin; Heads = Success, Tails = Failure Table 1, Situation 2]. In this case, one would expect 25% (= 0.50 × 0.50) of students to receive the “failed” grade of both and 25% to receive the “failed” grade of both – an overall “expected” approval rate for “passed” or “failed” of 50% (= 0.25 + 0.25 = 0.50). Therefore, the observed approval rate (80% in situation 1) must be interpreted taking into account that 50% of the approval was expected by pure chance. These listeners could have improved this by 50% (best possible match minus random expected match = 100% −50% = 50%), but only got 30% (observed consent minus random expected match = 80%−50% = 30%). Thus, their actual performance of being consistent is 30% / 50% = 60%. Readers are invited to consult the following articles, which contain measures of agreement: Cohen`s kappa (κ) randomly calculates the agreement between observers, taking into account the expected agreement, as follows: Two methods are available to evaluate the correspondence between measurements of a continuous variable via observers, instruments, timings, etc.
One of them, the intraclass correlation coefficient (ICC), provides a single measure of the degree of agreement, and the other, the Bland-Altman diagram, further provides a quantitative estimate of the proximity of the values of two measures. Take the case of two examiners A and B who evaluate the answer sheets of 20 students in a class and rate each of them as “passed” or “failed”, with each examiner passing half of the students. Table 1 shows three different situations that can occur. In situation 1 of this table, eight students receive a “passed” grade from both examiners, eight receive a “failed” grade from both examiners, and four receive a successful grade from one examiner but a “failed” grade from the other (two passed by A and the other two by B). Thus, the results of the two examiners for 16/20 students agree (agreement = 16/20 = 0.80, disagreement = 4/20 = 0.20). That sounds pretty good. However, this does not take into account the fact that some of the notes may have been conjectures and that the agreement may have only been reached by chance. Kalantri et al. studied the accuracy and reliability of pallor as a tool for detecting anemia.
[5] They concluded that “clinical assessment of pallor in severe anemia may exclude and be modest.” However, inter-observer agreement for the detection of pallor was very low (kappa values = 0.07 for conjunctival pallor and 0.20 for pallor of the tongue), meaning that pallor is an unreliable sign of diagnosis of anemia. Compliance limits = mean difference observed ± 1.96 × standard deviation of the observed differences. Cohen`s κ can also be used when the same assessor assesses the same patients at two points in time (e.B. at 2-week intervals) or reassesses the same response sheets after 2 weeks in the example above. Its limitations are as follows: (i) it does not take into account the extent of the differences, which makes it unsuitable for ordinal data, (ii) it cannot be used if there are more than two evaluators, and (iii) it does not distinguish between agreement for positive and negative results – which can be important in clinical situations (for example.B. incorrect diagnosis of a disease or incorrect exclusion may have different consequences). Accuracy indicates how close a measurement is to the correct value for that measurement. The accuracy of a measurement system refers to the proximity of the correspondence between repeated measurements (which are repeated under the same conditions). Measurements can be both accurate and precise, accurate but not precise, precise but not exact, or neither. Imagine two ophthalmologists measuring intraocular pressure with a tonometer.
Each patient therefore has two measured values – one of each observer. CCI provides an estimate of the overall concordance between these measures. It is somewhat similar to “analysis of variance” in that it examines intercouple variances expressed as a proportion of the overall variance of observations (i.e., the total variability of the “2n” observations, which should be the sum of the variances within and between pairs). The CCI can take a value from 0 to 1, where 0 indicates no match and 1 indicates a perfect match. As stated above, correlation is not synonymous with agreement. Correlation refers to the presence of a relationship between two different variables, while agreement examines the concordance between two measures of a variable. Two groups of highly correlated observations may have a bad agreement; However, if the two sets of values match, they will certainly be highly correlated. For example, in the example of hemoglobin, although the match is poor, the correlation coefficient between the values of the two methods is high [Figure 2]; (r = 0.98). The other way to look at it is that, although the individual points are not close enough to the dotted line (smaller square line;[ 2] which indicates a good correlation), they are quite far from the solid black line which represents the perfect match line (Figure 2: the continuous black line). In case of a good match, the points should fall on or near this line (the solid black line).
For the three situations presented in Table 1, the use of the McNemar test (which is intended to compare the matched categorical data) would show no difference. However, this cannot be interpreted as evidence of a match. The McNemar test compares overall proportions; Therefore, any situation where the overall proportion of the two examiners (e.B. situations 1, 2 and 3 of Table 1) is similar would result in no difference. .