Talk:Kurtosis
This level-5 vital article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
Range of Values Kurtosis can take
[edit]I think it would be helpful to be clear what range of values the kurtosis statistic can take. I can infer that there is a lower bound of -2, when the article discusses the binomial distribution being an extreme case; this took a fair bit of reading. There is nothing about an upper bound; presumably one exists else you end up with improper distribution? — Preceding unsigned comment added by 194.176.105.139 (talk) 09:48, 4 July 2012 (UTC)
- No, there is no upper bound. Try three atoms: two atoms at -1 and +1, of probability 0.01 each, and atom at 0 of probability 0.98. And you'll easily guess how the kurtosis can be arbitrarily large. Boris Tsirelson (talk) 08:22, 25 March 2015 (UTC)
- The sample kurtosis has an upper bound for distribution drawing from the real numbers. The upper bound can be derived from the binomial distribution. Kurtosis <= n-2+1/(n-1)-3, where n is the sample size [1]Zechmann2 (talk) 23:47, 19 November 2021 (UTC)
What is Gamma?
[edit]The author defines kurtosis in terms of Gamma, but fails to define Gamma? Is it the Gamma Distribution? Then why doesn't it have 2 parameters? Is it the Factorial? Then why does it have a non-integer parameter? Is it acceptable to use ambiguous functions in a definition without disambiguating them?
Hsfrey (talk) 00:46, 17 March 2012 (UTC)
Wikipedia inconsistancy
[edit]Hi statistic wikipedia folks. In this page the Kurtosis definition has a "-3" in it (because the normal has a Kurtosis of 3 so this definition "normalises" things so to say). Subtracting this 3 is actually a convention, maybe this should be mentioned.
A more important point is that every single page on distributions I've encountered here does NOT include the -3 in the Kurtosis formula given on the right (correct me if I'm wrong? I didn't recalculate them all manually :)). So while this is only a matter of convention, we should at least get wikipedia consistent with its own definition conventions? The easiest way seems adapting the definition in this page.
Regards
woutersmet
The reason for this (I think!) is that people who have contributed to this page are from an econometrics background where its common to assume a conditional normal distribution. Hence the -3. —Preceding unsigned comment added by 62.30.156.106 (talk) 21:45, 14 March 2008 (UTC)
Standardized moment
[edit]If this is the "fourth standardized moment", what are the other 3 and what is a standardized moment anyway? do we need an article on it? -- Tarquin 10:39 Feb 6, 2003 (UTC)
- The first three are the mean, standard deviation, and skewness, if I recall correctly.
- Actually, the word "standarized" refers to the fact that the fourth moment is divided by the 4th power of the standard deviation. — Miguel 15:53, 2005 Apr 19 (UTC)
- Thank you :-) It's nice when wikipedia comes up with answers so quickly! -- Tarquin 11:04 Feb 6, 2003 (UTC)
- I think the term "central moments" is also used. See also http://planetmath.org/encyclopedia/Moment.htm
- No, central moments are distinct from standardized moments. --MarkSweep (call me collect) 02:14, 6 December 2006 (UTC)
Peakedness
[edit]Kurtosis is a measure of the peakedness ... so what does that mean? If I have a positive kurtosis, is my distribution pointy? Is it flat? -- JohnFouhy, 01:53, 11 Nov 2004
- I've tried to put the answer to this in the article: high kurtosis is 'peaked' or 'pointy', low kurtosis is 'rounded'. Kappa 05:15, 9 Nov 2004 (UTC)
It has been pointed out that kurtosis is not synonymous with shape or peakedness, even for symmetric unimodal distributions, please see:
1) A common error concerning kurtosis, Kaplansky - Journal of the American Statistical Association, 1945
2) Kurtosis: a critical review, Balanda, HL MacGillivray - American Statistician, 1988 —Preceding unsigned comment added by Studentt2046 (talk • contribs) 16:27, 10 March 2009 (UTC)
Just backing up that we should not describe Kurtosis as "peakedness", see "Kurtosis as Peakedness, 1905–2014. R.I.P." in The American Statistician 11 Aug 2014 — Preceding unsigned comment added by 130.225.116.170 (talk) 09:33, 2 October 2014 (UTC)
Unfortunately, I couldn't find a citeable source for this, but I suspect that the use of the term "peakedness" is in part driven by surface metrology; if a surface has sharp peaks, its height distribution will have a pronounced tail and therefore high curtosis. I have seen "kortosis=peakedness" quite often in books on surface metrology. I will try to find more on that. 2A02:8071:28A:7700:4125:57A9:C1EB:6C80 (talk) 23:36, 7 March 2021 (UTC)
(Note: In whatever field of study, metrology, imaging, finance, or whatever, the beta(.5,1) distribution provides a canonical example to demonstrate that a sharp peak does not imply higher kurtosis.) — Preceding unsigned comment added by BigBendRegion (talk • contribs) 16:28, 5 June 2021 (UTC)
Peakedness, as I understand, is not intended to imply anything about the shape of the _distribution_, which the cited statistical reference above suggests. Such notions are just plain wrong, but that is not how it is supposed to be interpreted. The term implies that peaks in e.g. a time signal will contribute to a higher kurtosis value (because of the fourth moment, and _yes_, outliers such as peaks in a time signal _do_ contribute to higher kurtosis values). That is, if you want to compare signals (data series) with respect to how much peaks they contain, kurtosis may be used as a measure. Kurtosis has for instance been used as a complement to equivalent SPL (i.e. the average RMS) to catch sound signals with a lot transients/peaks in cases where they don't contribute much to the SPL level. An example of such "correct" usage of peakedness/kurtosis can e.g. be found here: https://www.bksv.com/media/doc/bo0510.pdf. Therefore, it is misleading to state "scaled version of the fourth moment of the distribution. This number is related to the tails of the distribution, not its peak;[2] hence, the sometimes-seen characterization of kurtosis as "peakedness" is incorrect." (quotation from this Wiki article). This is ONLY a proper conclusion if the peakedness is supposed to refer to the shape of the _distribution_, which, as far as I know, is _not_ the intended meaning. Instead of saying it is "incorrect", the text should just explain that peakedness does not suggest that higher kurtosis corresponds to more or higher peaks in the statistical distribution, but it suggest that higher kurtosis suggest more/higher peaks in an associated data _series_ (e.g. a time series). — Preceding unsigned comment added by 79.136.121.89 (talk) 08:19, 20 September 2023 (UTC)
Mistake
[edit]I believe the equation for the sample kurtosis is incorrect (n should be in denominator, not numerator). I fixed it. Neema Sep 7, 2005
Ratio of cumulants
[edit]The statement, "This is because the kurtosis as we have defined it is the ratio of the fourth cumulant and the square of the second cumulant of the probability distribution," does not explain (to me, at least) why it is obvious that subtracting three gives the pretty sample mean result. Isn't it just a result of cranking through the algebra, and if so, should we include this explanation? More concretely, the kurtosis is a ratio of central moments, not cumulants. I don't want to change one false explanation that I don't understand to another, though. Gray 01:30, 15 January 2006 (UTC)
- After thinking a little more, I'm just going to remove the sentence. Please explain why if you restore it. Gray 20:58, 15 January 2006 (UTC)
Mesokurtic
[edit]It says: "Distributions with zero kurtosis are called mesokurtic. The most prominent example of a mesokurtic distribution is the normal distribution family, regardless of the values of its parameters." Yet here: http://en.wikipedia.org/wiki/Normal_distribution, we can see that Kurtosis = 3, it's Skewness that = 0 for normal. Agree? Disagree?
- Thanks, that's now fixed. There are two ways to define kurtosis (ratio of moments vs. ratio of cumulants), as explained in the article. Wikipedia uses the convention (as do most modern sources) that kurtosis is defined as a ratio of cumulants, which makes the kurtosis of the normal distribution identically zero. --MarkSweep (call me collect) 14:43, 24 July 2006 (UTC)
Unbiasedness
[edit]I have just added a discussion to the skewness page. Similar comments apply here. Unbiasedness of the given kurtosis estimator requires independence of the observations and does not therefore apply to a finite population.
The independent observations version is biased, but the bias is small. This is because, although we can make the numerator and denominator unbiased separately, the ratio will still be biased. Removing this bias can be done only for specific populations. The best we can do is either:
1 use an unbiased estimate for the fourth moment about the mean,
2 use an unbiased estimate of the fourth cumulant,
in the numerator; and either:
3 use an unbiased estimate for the variance,
4 use an unbiased estimate for the square of the variance,
in the denominator.
According to the article, the given formula is 2 and 3 but I have not checked this. User:Terry Moore 11 Jun 2005
So who's Kurt?
[edit]I mean, what is the etymology of the term? -FZ 19:48, 22 Jun 2005 (UTC)
- It's obviously a modern term of Greek origin (κυρτωσις, fem.). The OED gives the non-specialized meaning as "a bulging, convexity". The Liddell-Scott-Jones lexicon has "bulging, of blood-vessels", "convexity of the sea's surface" and "being humpbacked". According to the OED (corroborated by "Earliest Known Uses of Some of the Words of Mathematics" and by a search on JSTOR), the first occurrence in print of the modern technical term is in an article by Karl Pearson from June 1905. --MarkSweep 21:05, 22 Jun 2005 (UTC)
Kurtosis Excess?
[edit]I've heard of "excess kurtosis," but not vice-versa. Is "kurtosis excess" a common term? Gray 01:12, 15 January 2006 (UTC)
Diagram?
[edit]A picture would be nice ... (one is needed for skewness as well. I'd whip one up, but final projects have me beat right now. 24.7.106.155 08:27, 19 April 2006 (UTC)
- The current picture is nice because it shows real data, but it has some problems:
- it does not cite any references for the source of the data
- it is not what we need here: kurtosis comes along when variance is not enough, so for a real case to be interesting one should find a situation where two distributions with the same mean and variance are symmetric or similarly asimmetric (null or same skewness) and yet have a different kurtosis; is this the case here? I am not sure, as no mention of variance is made in the comment to the picture
- it is not in vector form (SVG) --Pot (talk) 13:21, 18 December 2008 (UTC)
The picture is taken from in my doctoral thesis ("A.Meskauskas.Gravitorpic reaction: the role of calcium and phytochrome", defended in 1997, Vilnius State University, Lithuania). I added this note to the picture description in commons. The picture represents my own experimental data but the dissertation should be considered a published work. The real experimental data cannot be "adjusted" in any "preferred" way but in reality likely no scientist will ever observe an "absolutely clean" effect that only changes kurtosis and nothing else. Audriusa (talk) 14:41, 19 December 2008 (UTC)
- And in fact kurtosis is rarely used in real experimental data. One of the fields where it is used is for big quantities of experimental data that would seem well modelled by a Gaussian process. If the weight of the tails turns out out be important, then a kurtosis estimate can be necessary. It allows one to take apart a Gaussian from something that resembles it. As I said above, this mostly makes sense when comparing distributions with same mean, variance and skewness. And, if you want to give an example, I argue that this is indeed necessary. So I think that your example is illustrative only if the two variances are equal. Can you tell us if this is the case? --Pot (talk) 16:50, 19 December 2008 (UTC)
- The dispersion changes from 21.780 (control) to 16.597 (far red). The mean, however, does not change much if to take the +- intervals into consideration (from 10.173 +- 0.975 to 8.687 +- 0.831). So, if comparing only the mean, we would likely conclude that the far red light has no any effect in experiment. But the histograms do look very different. One of the possible explanations can be periodic oscillations around the mean value in time (when the experiment gives the "momentary picture"). Far red light may stop these oscillations, making the output more uniform. Audriusa (talk) 20:38, 20 December 2008 (UTC)
- Thank you for clarifying this. However, as I pointed out above, once you have two distributions with the same mean, you start considering higher moments. The first one above the mean is variance. Only if variances are equal you resort to using even higher moments; this is not very common, as the higher the moment the more sensitive to noise. And in practice using moments higher than the variance with few samples is not very significant. So, once again, are the variances equal for the two cases you proposed as an illustration? --Pot (talk) 14:27, 21 December 2008 (UTC)
- From your talk just comes that the higher moments should only be compared if the lower moments are equal. This is a simple and clear sentence. How sure you are about this? Any references? Audriusa (talk) 17:59, 30 December 2008 (UTC)
- Google books may let you browse this. It is the famous handbook "Numerical Recipes in C" (but the Fortran version contains the same text). The issue is that higher moments are generally less robust than lower moments, because they involve higher powers of the input data. The advice given in the book is that skewness and kurtosis should be used with caution or, better yet, not at all. More specifically, the skewness of a batch of N samples from a normal distribution is about The book goes on suggesting that In real life it is good practice to believe in skewnesses only when they are several or many times as large as this. Here we are speaking about kurtosis, for which the relevant figure is For the example figure that you added, this means that the difference on sample kurtoses can be considered significant only if Even if this is the case, resorting to higher moments, which are inherently less robust, will only be justified where lower moments cannot do the job. I think that adding a section both in skewness and kurtosis explaining these concepts is a good idea. Pot (talk) 12:54, 8 January 2009 (UTC)
- From your talk just comes that the higher moments should only be compared if the lower moments are equal. This is a simple and clear sentence. How sure you are about this? Any references? Audriusa (talk) 17:59, 30 December 2008 (UTC)
- Thank you for clarifying this. However, as I pointed out above, once you have two distributions with the same mean, you start considering higher moments. The first one above the mean is variance. Only if variances are equal you resort to using even higher moments; this is not very common, as the higher the moment the more sensitive to noise. And in practice using moments higher than the variance with few samples is not very significant. So, once again, are the variances equal for the two cases you proposed as an illustration? --Pot (talk) 14:27, 21 December 2008 (UTC)
Range?
[edit]Is the range -2, +infinity correct? why not -3, +infinity?
Yes, the range is correct. In general all distributions must satisfy . The minimum value of is −2. --MarkSweep (call me collect) 02:26, 6 December 2006 (UTC)
- I take that back, will look into it later. --MarkSweep (call me collect) 09:32, 6 December 2006 (UTC)
I corrected the french article, which given 0, +infinity for kurtosis ( so -3, +infinity for excess kurtosis). The good range for kurtosis are : 1 , +infinity and for the excess kurtosis : -2 , +infinity
Very simple demonstration :
We have
or
so
with , we have
This demonstration can be realize with Jensen's inegality (add 10/16/09).
Jensen's Inegality :
We have
so
Thierry —Preceding unsigned comment added by 132.169.19.128 (talk) 08:22, 4 June 2009 (UTC)
Sample kurtosis
[edit]Is the given formula for the sample kurtosis really right? Isn't it supposed to have the -3 in the denominator? --60.12.8.166
In the discussion of the "D" formula, the summation seems to be over i terms, whereas the post lists: "xi - the value of the x'th measurement" I think this should read: "xi - the value of the i'th measurement of x" (or something close) --Twopoint718 19:25, 13 May 2007 (UTC)
Shape
[edit]In terms of shape, a leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values)
Is that right? How can a function have both a greater probability near the mean and a greater probability at the tails? Ditto for platykurtic distributions--DocGov 21:49, 18 November 2006 (UTC)
- Yes, that's right. One typically has in mind symmetric unimodal distributions, and leptokurtic ones have a higher peak at the mode and fatter tails than the standard normal distribution. For an example have a look at the section on the Pearson type VII family I just added. --MarkSweep (call me collect) 02:29, 6 December 2006 (UTC)
- On the other hand, the Cauchy distribution has a lower peak than the standard normal yet fatter tails than any density in the Pearson type VII family. However, its kurtosis and other moments are undefined. --MarkSweep (call me collect) 04:00, 6 December 2006 (UTC)
- Another explanation: it's not just peaks and tails, don't forget about the shoulders. Leptokurtic density with a higher peak and fatter tails have lower shoulders than the normal distribution. Take the density of the Laplace distribution with unit variance:
- For reference, the standard normal density is
- Now f and g intersect at four points, whose x values are . Focus on three intervals (on the positive half-line, the negative case is the same under symmetry):
- Peak Here the Laplace density is greater than the normal density and so the Laplace probability of this interval (that is, the definite integral of the density) is greater (0.25 vs. 0.19 for the normal density).
- Shoulder Here the normal density is greater than the Laplace. The normal probability of this interval is 0.30 vs. 0.23 for the Laplace.
- Tail Here the Laplace density is again greater. Laplace probability is 0.02, normal probability is 0.01.
- Because we focus on the positive half-line, the probabilities for each distribution sum to 0.5. And even though the Laplace density allocates about twice as much mass to the tail compared with the normal density, in absolute terms the difference is very small. The peak of the Laplace is acute and the region around it is narrow, hence the difference in probability between the two distributions is not very pronounced. The normal distribution compensates by having more mass in the shoulder interval (0.49,2.34). --MarkSweep (call me collect) 08:57, 6 December 2006 (UTC)
Looking at the Pearson Distribution page - isn't the example a Pearson V, not Pearson VII as stated in the title? And, if not, where is more info on Type VII - the Pearson Wikipedia page only goes up to V. 128.152.20.33 19:34, 7 December 2006 (UTC)
- Obviously the article on the Pearson distributions is woefully incomplete. As the present article points out, the Pearson type VII distributions are precisely the symmetric type IV distributions. --MarkSweep (call me collect) 05:35, 8 December 2006 (UTC)
- Hello Mark, I know it's 16 years since you wrote your comments, but I just want to say that I believe I understand what kurtosis is, and what you say is very good: the Laplace example is very good, one typically has in mind symmetric unimodal distributions is a good point (well, unimodal goes only with leptokurtic, right?), and the peak - shoulders - tails remark is spot-on (btw I think of this as head - shoulders - arms). People who say leptokurtic means fatter tails but not more peakedness obviously don't understand that the one is not possible without the other. The article has a lot crap statements, because some Wikipedians blindly believe what is written in Scientific American or in their favourite book. Okay, enough ranted, thanks again for your words of reason. --Herbmuell (talk) 20:49, 22 March 2022 (UTC)
(Herbmuell's comment "one is not possible without the other" is false. Take the beta(.5,1) distribution and reflect it around the origin. The new distribution is (1) symmetric and (2) infinitely peaked, but (3) it is light tailed (kurtosis <3). For another example, take the U(-1,1) distribution and mix it with the Cauchy, with mixing probabilities .99999 and.00001. The resulting distribution is (1) symmetric and (2) appears perfectly flat over 99.999% of the observable data, but (3) has fat tails (infinite kurtosis).)BigBendRegion (talk) 15:42, 20 June 2022 (UTC)
Was someone having us on ? (hoax)
[edit]"A distribution whose kurtosis is deemed unaccepatably large or small is said to be kurtoxic. Similarly, if the degree of skew is too great or little, it is said to be skewicked" – two words that had no hits in Google. I think someone was kidding us. DFH 20:33, 9 February 2007 (UTC)
- Agree, zero google hits. --Salix alba (talk) 21:24, 9 February 2007 (UTC)
leptokurtic / platykurtic
[edit]I think the definitions of lepto-/platy- kurtic in the article are confusing: the prefixes are reversed. I'm not confident enough in statistics to change this. Could someone who understands the subject check that this is the correct usage?
A distribution with positive kurtosis is called leptokurtic, or leptokurtotic. In terms of shape, a leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values). Examples of leptokurtic distributions include the Laplace distribution and the logistic distribution.
A distribution with negative kurtosis is called platykurtic, or platykurtotic. In terms of shape, a platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "heavy tails" (that is, a higher probability than a normally distributed variable of extreme values).
leptokurtic: –adjective Statistics. 1. (of a frequency distribution) being more concentrated about the mean than the corresponding normal distribution. 2. (of a frequency distribution curve) having a high, narrow concentration about the mode. [Origin: 1900–05; lepto- + irreg. transliteration of Gk kyrt(ós) swelling + -ic]
lepto- a combining form meaning "thin," "fine," "slight"
platykurtic: 1. (of a frequency distribution) less concentrated about the mean than the corresponding normal distribution. 2. (of a frequency distribution curve) having a wide, rather flat distribution about the mode. [Origin: 1900–05; platy- + kurt- (irreg. < Gk kyrtós bulging, swelling) + -ic]
platy- a combining form meaning "flat," "broad".
--Blick 19:43, 21 February 2007 (UTC)
- The current usage is correct and agrees with other references, e.g. [1][2][3]. DFH 21:39, 21 February 2007 (UTC)
- I don't think that the problem is with the words platykurtic and leptokurtic, which is what your references are to. It's the issue that leptokurtic is described as having heavy tails. The more common explanation is that leptokurtic distributions have thin tails and that platykurtic distributions have heavy tails. Phillipkwood (talk)
- I'm not sure about the prefixes, but the changes you made earlier today were definitely wrong and did not agree with other outside sources. I wasted a lot of time trying to make sense of it before I noticed your edit, and then I undid it. Tabako (talk) 00:11, 11 November 2008 (UTC)
Well, I think it does agree with outside sources, at least the _American Statistican_ Maybe, to make it less confusing, it's helpful to talk about length (which is what you're talking about) and thinness. Here's a quote:(Kevin P. Balanda and H. L. MacGillivray The American Statistician, Vol. 42, No. 2 (May, 1988), pp. 111-119.) Who write: "Dyson (1943) gave two amusing mnemonics attributed to Student for these names: platykurtic curves, like playpuses, are square with short tails whereas leptokurtic curves are high with long tails, like kangaroos, noted for "lepping" The terms supposedly refer to the general shape of a distribution, withplatykurtic distributions being flat topped compared with the normal, leptokurtic distributions being more sharply peaked than the normal, and mesokurtic distributions having shape comparable to that of the normal. So, yes, "leptokurtic" distributions have long and thin tails, Platykurtic distributions have short heavy tails.). —Preceding unsigned comment added by 128.206.28.43 (talk) 15:56, 11 November 2008 (UTC)
- I'm still not sure about this. I'm suggesting that instead of describing a platykurtic distribution as one with "thin tails", we should say "broad peak". Would you agree? --Blick 07:30, 5 March 2007 (UTC)
Not really. Moments are more sensitive to the tails, because of the way powers work. The squares of 1 , 2, 3 etc. are 1, 4, 9 etc. which are successively spaced farther apart. The effect is greater for 4th powers. So, although the names playkurtic and leptokurtic are inspired by the appearance of the centre of the density function, the tails are more important. Also it is the behaviour of the tails that determine how robust statistical methods will be and the kurtosis is one diagnostic for that.203.97.74.238 00:46, 1 September 2007 (UTC)Terry Moore
- I agree with Terry, but given the American Statistician terminology, I made a minor edit to the page to reflect this discussion- i.e., that "thin" and "thick" refers to the height of the PDF. Reading the original American Statistician paper, reflects some of the language on this point and this seemed to be the most accurate compromise. I checked the "standard terminology" references above, and nothing is mentioned in those about thickness versus thinness- they're just definitions that all of us seem to agree on. Phillipkwood (talk) 15:00, 8 December 2008 (UTC)
- Hm. I looked at it and I suggest that:
- "peak" should be peak → done
- "fat tail", "thin tail" should be fat tail, thin tail → done
- fat tail should be a link → done
- thin tail should be sub Gaussian (not super Gaussian, and without quotes) → done
- fat tail should be super Gaussian (not sub Gaussian, and without quotes) → done
- Other than these typographical changes, the terms leptokurtic and mesokurtic should be made consistent in the article and between articles (such as those about fat tail and heavy tail) → done. --Fpoto (talk) 18:37, 8 December 2008 (UTC)
L-kurtosis
[edit]I don't have the time to write about that, but I think the article should mention L-kurtosis, too. --Gaborgulya (talk) 01:13, 22 January 2008 (UTC)
why 3?
[edit]to find out if its mesokurtic, platykurtic or leptokurtic, why compare it to 3? —Preceding unsigned comment added by Reesete (talk • contribs) 10:18, 5 March 2008 (UTC)
The expected Kurtosis for sample of IID standard normal data is 3 (see the wiki article on the normal distribution for more). We tend to refer to excess kurtosis as the sample kurtosis of a series -3 for that reason.. —Preceding unsigned comment added by 62.30.156.106 (talk) 21:42, 14 March 2008 (UTC)
Bias?
[edit]Perhaps the article should include more explicit notes on bias. In particular, I'm wondering why the formula is using biased estimates of sample moments about the mean; perhaps someone more knowledgeable than I might explain why this is the preferred formula? —Preceding unsigned comment added by 140.247.11.37 (talk) 14:30, 25 June 2008 (UTC)
Excess kurtosis - confusing phrasing
[edit]The way the "modern" definition is phrased in the article makes it look like could be what they're referring to as excess kurtosis.
- Yes, this is correct.
However, I get the impression that "excess kurtosis" is actually the "minus 3" term. Is this correct? kostmo (talk) 05:57, 25 September 2008 (UTC)
- It is the expression containing the -3, which is equal to . I think the phrase is correctly stated. --Pot (talk) 22:27, 3 January 2009 (UTC)
A nice definition
[edit]I found this:
k is best interpreted as a measure of dispersion of the values of Z^2 around their expected value of 1, where as usual Z = (X-mu)/sigma
It has been written by Dick Darlington in an old mail thread. It does not account for the -3 used in Wikipedia's article, but it is clear and could be added to the initial definition. --Pot (talk) 10:40, 19 February 2009 (UTC)
Get Better Examples
[edit]Okay, I barely understand the statistical part of the article, why do you have to use an example that involves something only botanists and biologists can understand.. I undertsnad that encyclopaedias are supposed to be erudite, but not pedantic. They shouldn't make you have to keep clicking on newer and newer subjects that you have to read up on just so you can understand the one you originally started with. An example in an encyclopaedia is supposed simple and straightforward, something the uninitiated laymen can understand, not something having to do with red-lights and gravitropic celeoptiles. It's the people's encyclopaedia, you don't have to dumb it down to make it more accessible. My point is, get a better visual example for what kurtosis is. —Preceding unsigned comment added by 70.73.34.109 (talk) 10:26, 30 April 2009 (UTC)
I agree - two clear examples, of very high and very low kurtosis, would make this article much clearer, and much easier to understand at a glance. Use a couple of every-day activities to prove the point. 165.193.168.6 (talk) 12:27, 13 August 2013 (UTC)
Glaring Error?
[edit]I'm no statistician, but the description of leptokurtosis currently says it has a more acute peak and fatter tails, whereas playkurtosis has a flatter peak and thinner tails. A quick mental diagram demonstrates to me that this is impossible, and the author(s) must have confused the thickness of the tails for the two cases. A leptokurtic curve must have thinner tails and a platykurtic curve must have fatter tails. Unless anyone objects, I'll correct this in a moment. —Preceding unsigned comment added by 194.153.106.254 (talk) 10:33, 23 July 2009 (UTC)
- No error, the description was correct, I've reverted your change. Just try to read and understand the article, the description is reasonably well done and graphical examples are in place. Next time, please do no change a math description unless you fully understand it. Raising a problem in the discussion page is good, but wait for someone to answer you doubts before editing. --Pot (talk) 14:17, 23 July 2009 (UTC)
- I suspect this paragraph has been reversed again. It says "In terms of shape, a leptokurtic distribution has a more acute peak around the mean and thinner tails. Looking at the rest of the article, it should says fatter tails, isn't it ? However I don't feel confident enough to change this. -- Pierre —Preceding unsigned comment added by 160.228.203.130 (talk) 13:52, 17 May 2011 (UTC)
- Well, since I've noticed the false change was just done today, I've reverted it myself -- Pierre —Preceding unsigned comment added by 160.228.203.130 (talk) 14:00, 17 May 2011 (UTC)
Alternative to -3 for kurtosis!
[edit]On Latin wiki page for Distributio normalis you find a recent (2003) scientific paper which rearranges differently the fourth moment to define a number said in English arch (and in Latin fornix) which ranges from 0 to infinity (and for the normal distribution is 1) instead of the quite strange [-2, infinity). by Alexor65 — Preceding unsigned comment added by Alexor65 (talk • contribs) 20:24, 29 March 2011 (UTC)
?Dead link "Celebrating 100 years of Kurtosis"
[edit]Link "Celebrating 100 years of Kurtosis" does not work because file has changed address, now it is at least in faculty.etsu.edu/seier/doc/Kurtosis100years.doc ----Alexor65 —Preceding unsigned comment added by 151.76.68.54 (talk) 21:02, 2 April 2011 (UTC)
First sentence and citation
[edit]The first sentence states "In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable, although some sources are insistent that heavy tails, and not peakedness, is what is really being measured by kurtosis.[1]"
The reference given says, "The heaviness of the tails of a distribution affects the behavior of many statistics. Hence it is useful to have a measure of tail heaviness. One such measure is kurtosis...Statistical literature sometimes reports that kurtosis measures the peakedness of a density. However, heavy tails have much more influence on kurtosis than does the shape of the distribution near the mean (Kaplansky 1945; Ali 1974; Johnson, et al. 1980)."
The reference seems to directly contradict the first sentence. — Preceding unsigned comment added by 140.226.46.75 (talk) 21:26, 7 October 2011 (UTC)
- Two things need to be distinguished: (i) kurtosis as a general concept or thing that might be measured ... I have 3 stats dictionaries that say that this is essentially "peakedness"; (ii) the specific measure of kurtosis based on centred ordinary fourth (and second moments) ... for which the property that it is more a measure of log-tailedness might well be valid. Overall this article needs to be re-arranged to distinguish these two points, along the lines of what is current in the article Skewness and which includes several different measures of skewness. The use of the SAS reference (alone) is/was unclear as is can be considered correct as stated, since the SAS reference is an example of a "source" that does claim that "heavy tails have much more influence on kurtosis than does the shape of the distribution near the mean". (Thus a ref for the end of the sentence, not the whole thing.) I will add another ref and rearrange the start. Melcombe (talk) 12:03, 10 October 2011 (UTC)
Clarity of Introduction
[edit]When I first read through the introduction, I did not understand what it was saying. However, I read through it again, and, perhaps because I picked up on some word I missed the fist time around, the meaning of the introduction became completely clear. As this seems to suggest that understanding the introduction hinges (or at least is heavily dependent on) on noticing and understanding a very small portion of it, I would suggest that a small, one-sentence "introduction introduction" be added above (or included in) the current introduction, such that it would quickly convey to readers a general "complexity level" at which the article deals with its subject. To clarify, in this context I am using the phrase "complexity level" to refer to a measure of a work's position on a "sliding scale" of sorts that measures the amount that a work is affected by the general tendency of larger words to become more critical to comprehension as the complexity of a work's subject (among other factors) increases. For instance, a college-level thermodynamics textbook is unlikely to spend the same amount of time leading up to a definition of thermal conduction and insulation that an elementary-level science textbook would. As such, a prior understanding of thermal conduction and insulation becomes more necessary to understand the rest of the book in the college textbook than in the elementary school textbook.
Alternately, the "complexity level" could be reduced, for instance, by using more familiar terms than, also for instance, "peakedness", which, while helping the reader to associate the concept with common phrases such as "highly peaked", could perhaps be moved lower in the introduction (or even put into the article itself) and replaced by another term, such as "sharpness", and reducing the repetition of clauses, such as removing the redundant "just as for skewness" in the second sentence.
Aero-Plex (talk) 17:24, 10 November 2011 (UTC)
(Split due to needing to reset the router)
EDIT:
Unfortunately, I do not have the necessary time to read and edit the article now, and probably won't for some time, so I cannot edit the article for now. However, from my brief skim through, I did notice that 1. there is a noticeable amount of repetition of terminology, which could be improved, 2. no compact, direct description of a graph with high/low kurtosis is made in the text, and I could only find one by looking in the image descriptions, which may not be noticed by some (I suggest that a sentence along the lines of "High kurtosis causes narrow curves, while low kurtosis causes wide graphs." be added somewhere in the article where it would be noticed), and 3. the "coin toss" example could be better elaborated on, as it seems like it could be very helpful, especially for people who are only coming to this page for a quick summary.
Aero-Plex (talk) 17:41, 10 November 2011 (UTC)
The usual estimator of the population kurtosis
[edit]I just conducted a simulation study which seems to confirm that "The usual estimator of the population kurtosis" is in fact an estimator for excess curtosis. Which seems to make sense given the lest part of the formula -3*X — Preceding unsigned comment added by 83.89.29.84 (talk) 23:28, 24 January 2012 (UTC) ALSO: it claims that the estimator is used in Excel, however the excel formula seems to use standard deviation rather than variance: http://office.microsoft.com/en-us/excel-help/kurt-HP005209150.aspx BUT the wiki artickle claims it must be the unbiased standard deviation estimator, which i dont believe exist.. — Preceding unsigned comment added by 83.89.29.84 (talk) 00:11, 25 January 2012 (UTC)
wrong wrong wrong about peakedness
[edit]As has been pointed out above, kurtosis is NOT an accurate measure of peakedness. This should be obvious by looking at a graph of the Student's t-distribution with degrees of freedom above 4 and trying to see if you can see anything approaching sharp peakedness as the d.o.f. drops down to 4 and the kurtosis shoots up to infinity. Similarly, look at the gamma distribution graph and try to notice any correlation at all between the sharpness and softness of the peak when k > 1 and the smallness of k (higher kurtosis). The point is that kurtosis measures ONLY heaviness of the tails — and contrary to the former text, there's no difference in this respect between Pearson's kurtosis and excess kurtosis. (Nor can there be, since the two are identical save for being shifted by 3). In fact, it should be obvious that heavy tails and sharp peaks CANNOT in general be correlated -- i.e. could radically change the shape of the peak in the middle of the graph by a strictly local rearrangement of the nearby area while leaving the tails entirely untouched. It's rather sad that a basic article like this had such basic errors for such a long time, but at least they are fixed now. Benwing (talk) 07:10, 23 March 2012 (UTC)
- OK, it's more complicated than I thought. In fact, Balanda and MacGillivray claim that kurtosis isn't necessarily an accurate measure of tail weight, either, and propose a vague definition of moving probability mass off the shoulders onto the peak and tail, which IMO isn't very helpful intuitively. I still think the most intuitive interpretation should basically specify tail-heaviness. I will rewrite the remainder of the article (post-intro) to be more careful about this. Benwing (talk) 08:09, 23 March 2012 (UTC)
I found the following article to be very helpful in explaining the common misconceptions about kurtosis, specifically related to its use a a measure of "peakedness" and "tail weight." (http://www.columbia.edu/~ld208/psymeth97.pdf) Basically, it explains that kurtosis is a movement of mass not explained by the variance. Thus, when we see heavier tails, this means that data points are spread further out, which should lead to an increase in the variance - BUT if the there is also an increase in the number of data points near the mean, this leads to a decrease in the variance; kurtosis is able to explain the change in shape of a distribution when both of these occurrences happen to equality.
I also believe this article makes an important point about the distributions crossing twice, which is helpful to dispel misconceptions about kurtosis.
- Peakedness (as used in practical applications) is not supposed to imply anything about the statistical distribution: Higher kurtosis suggest more/higher peaks in an associated data series (e.g. a time series). Hence, higher kurtosis implies more "peakedness" in that sense. Unfortunately, this is a concept that had got another interpretation in the statistical community, in which there is a debate of peaks in the distribution (which is a totally different question that results in the conclusion "kurtosis is NOT an accurate measure of peakedness", to which obvioulsly should be appended "of STATISTICAL DISTRIBUTIONS"). But peakedness is not intended to be interpreted in that way when used correctly. See the last paragraph in the discussion above about Peakedness for details. — Preceding unsigned comment added by 79.136.121.89 (talk) 09:46, 20 September 2023 (UTC)
Expand Applications
[edit]Section Kurtosis#Applications is tagged with {{Expand section}}
; here are some suggestions: Special:WhatLinksHere/Kurtosis. Fgnievinski (talk) 04:34, 19 January 2013 (UTC)
Add sources from IDRE/UCLA for the various definitions of kurtosis, including citations and which versions SAS, SPSS and STATA use
[edit]Here are some sources from IDRE/UCLA for the various definitions of kurtosis, including citations and which versions SAS, SPSS and STATA use [4], [5]. Regards, Anameofmyveryown (talk) 18:53, 11 March 2013 (UTC)
Edit protected erroneous reference ?
[edit]This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Hi. Ref. 4 (Pearson, Biometrika 1929) about the lower bound of the kurtosis, is edit protected and it is erroneous, including the doi: anybody can check. Even if the ref to Pearson can be corrected, the previous reference (paper of 2013) which was reverted the 5 February 2014 remains eligible because (a) it applies to a wider class of distributions than the finite discrete ones (because the proof of 2013 use math expectations), and (b) it is issued from a more general inequality applied to d-variate distributions, established in 2004 (ref cited in the 2013 paper). I can send the pdfs of the 2004 and of the 2013 paper to the administrator (please just tell me how to do that) and to interested people. At least please correct the erroneous reference about Pearson, if it is relevant. If it is not possible, please undo the change of the 5 February 2014, and replace ref. 4 by the previous ref 4, which is: [2] Thank you. Michel.
- Requested further details follow.
About the actual ref 4: Pearson, K. (1929). "Editorial note to ‘Inequalities for moments of frequency functions and for various statistical constants’". Biometrika 21 (1–4): 370–375. doi:10.1093/biomet/21.1-4.361 (1) The toc of Biometrika 1929, 21(1-4) is at: http://biomet.oxfordjournals.org/content/21/1-4.toc I failed to find this paper of Pearson on this toc, and I failed to find it with ZMATH. The doi redirects to the paper of Joanes and Gill, "The Statistician" 1998, vol 47, part 1, pp. 183-189. Indeed it deals with skeweness and kurtosis, but it does not cite Pearson and it does not give a general proof of the inequality valid for any random variable distribution. Anyway there is disagreement between the doi and the ref to Pearson. (2) The 2013 paper is publicly available on http://petitjeanmichel.free.fr/itoweb.petitjean.skewness.html (see ref. 2: "download pdf paper"): see the result top of p.3 and eq. 6. The proof of the more general inequality for random vectors is in my paper: "From Shape Similarity to Shape Complementarity: Toward a Docking Theory." J. Math. Chem. 2004,35[3],147-158. (DOI 10.1023/B:JOMC.0000033252.59423.6b), see eq. A10 in the appendix. I cannot load it on the web due to the copyright. Only one assumption: the moments of order 4 must exist (so, it is not restricted to samples). I do not claim to have discovered the sharp lower bound of the kurtosis, even in its more general form, and I do not care if my 2013 paper is not cited. However I was the first to mention the inequality on the Wikipedia page, and at first glance my own proof seems to be original. I just say that the reader should be directed to a proof valid in all cases, e.g. via a valid source. If the ref works only for samples, the text should be updated accordingly. To conclude, I give you the hint for the full proof for random variables (for vectors, see the 2004 paper), available to anybody aware of math expectations: X1 and X2 are random variables, translate X2, calculate the translation minimizing the variance of the squared difference of the random variables, and look at the expression of the minimized variance: it should be a non negative quantity, hence the desired inequality. Mailto: petitjean.chiral@gmail.com (preferred) or michel.petitjean@univ-paris-diderot.fr 81.194.29.18 (talk) 13:55, 10 December 2014 (UTC)
— Preceding unsigned comment added by 81.194.29.18 (talk) 18:54, 8 December 2014 (UTC)
References
- ^ Byers, R. H. (2000). The population distribution could be interpreted as the sample size n approaching infinity, which would indicate the population kurtosis having an upper bound approaching infinity. On the maximum of the standardized fourth moment. InterStat, 1(2), 1-7.
- ^ Petitjean M. (2013), "The Chiral Index: Applications to Multivariate Distributions and to 3D molecular graphs", Proceedings of 12th International Symposium on Operational Research in Slovenia SOR’13, pp. 11-16, L. Zadnik Stirn, J. Zerovnik, J. Povh, S. Drobne, A. Lisec, Eds., Slovenian Society INFORMATIKA (SDI), Section for Operations Research (SOR), ISBN 978-961-6165-40-2
- Not done: According to the page's protection level and your user rights, you should be able to edit the page yourself. If you seem to be unable to, please reopen the request with further details. Anupmehra -Let's talk! 13:19, 9 December 2014 (UTC)
I pasted the wrong doi of ref.4 (mouse catched the doi of the line above). In fact the Editorial is appended to the paper of Shohat. I cancel my request. Please accept my apologies for the inconvenience caused. Thanks for your patience.81.194.29.18 (talk) 14:33, 10 December 2014 (UTC)
Why kurtosis should not be interpreted as "peakedness"
[edit]It seems that people keep wanting to insert something about "peakedness" into the interpretation of kurtosis.
What follows is a clear explanation of why “peakedness” is simply wrong as a descriptor of kurtosis.
Suppose someone tells you that they have calculated negative excess kurtosis either from data or from a probability distribution function (pdf). According to the “peakedness” dogma (started unfortunately by Pearson in 1905, and carried forward by R.A. Fisher through the 14th edition of his classic text, Statistical Methods for Research Workers), you are supposed to conclude that the distribution is “flat-topped” when graphed. But this is obviously false in general. For one example, the beta distribution beta(.5,1) has an infinite peak and has negative excess kurtosis. For another example, the 0.5*N(0, 1) + 0.5*N(4,1) mixture distribution is bimodal (wavy); not flat at all, and also has negative excess kurtosis. These are just two examples out of an infinite number of other non-flat-topped distributions having negative excess kurtosis.
Yes, the continuous uniform distribution U(0,1) is flat-topped and has negative excess kurtosis. But obviously, a single example does not prove the general case. If that were so, we could say, based on the beta(.5,1) distribution, that negative excess kurtosis implies that the pdf is "infinitely pointy." We could also say, based on the 0.5*N(0, 1) + 0.5*N(4,1) distribution, that negative excess kurtosis implies that the pdf is "wavy." It’s like saying, “well, I know all bears are mammals, so it must be the case that all mammals are bears.”
Now suppose someone tells you that they have calculated positive excess kurtosis from either data or a pdf. According to the “peakedness” dogma (again, started by Pearson in 1905), you are supposed to conclude that the distribution is “peaked” or “pointy” when graphed. But this is also obviously false in general. For example, take a U(0,1) distribution and mix it with a N(0,1000000) distribution, with .00001 mixing probability on the normal. The resulting distribution, when graphed, appears perfectly flat at its peak, but has very high kurtosis.
You can play the same game with any distribution other than U(0,1). If you take a distribution with any shape peak whatsoever, then mix it with a much wider distribution like N(0,1000000), with small mixing probability, you will get a pdf with the same shape of peak (flat, bimodal, trimodal, sinusoidal, whatever) as the original, but with high kurtosis.
And yes, the Laplace distribution has positive excess kurtosis and is pointy. But you can have any shape of the peak whatsoever and have positive excess kurtosis. So the bear/mammal analogy applies again.
One thing that can be said about cases where the data exhibit high kurtosis is that when you draw the histogram, the peak will occupy a narrow vertical strip of the graph. The reason this happens is that there will be a very small proportion of outliers (call them “rare extreme observations” if you do not like the term “outliers”) that occupy most of the horizontal scale, leading to an appearance of the histogram that some have characterized as “peaked” or “concentrated toward the mean.”
But the outliers do not determine the shape of the peak. When you zoom in on the bulk of the data, which is, after all, what is most commonly observed, you can have any shape whatsoever – pointy, inverted U, flat, sinusoidal, bimodal, trimodal, etc.
So, given that someone tells you that there is high kurtosis, all you can legitimately infer, in the absence of any other information, is that there are rare, extreme data points (or potentially observable data points). Other than the rare, extreme data points, you have no idea whatsoever as to what is the shape of the peak without actually drawing the histogram (or pdf), and zooming in on the location of the majority of the (potential) data points.
And given that someone tells you that there is negative excess kurtosis, all you can legitimately infer, in the absence of any other information, is that the outlier characteristic of the data (or pdf) is less extreme than that of a normal distribution. But you will have no idea whatsoever as to what is the shape of the peak, without actually drawing the histogram (or pdf).
The logic for why the kurtosis statistic measures outliers (rare, extreme observations in the case of data; potential rare, extreme observations in the case of a pdf) rather than the peak is actually quite simple. Kurtosis is the average (or expected value in the case of the pdf) of the Z-scores (-values), each taken to the 4th power. In the case where there are (potential) outliers, there will be some extremely large values, giving a high kurtosis. If there are less outliers than, say, predicted by a normal pdf, then the most extreme values will not be particularly large, giving smaller kurtosis.
What of the peak? Well, near the peak, the values are extremely small and contribute very little to their overall average (which again, is the kurtosis). That is why kurtosis tells you virtually nothing about the shape of the peak. I give mathematical bounds on the contribution of the data near the peak to the kurtosis measure in the following article:
Kurtosis as Peakedness, 1905 – 2014. R.I.P. The American Statistician, 68, 191–195.
I hope this helps.
Peter Westfall
P.S. The height of the peak is also unrelated to kurtosis; see Kaplansky, I. (1945), “A Common Error Concerning Kurtosis,” Journal of the American Statistical Association, 40, 259. But the “height” misinterpretation also seems to persist.
P.P.S. Some believe that the "peakedness" and "flatness" interpretation holds in the special case of symmetric unimodal distributions, because in the esteemed journal Psychological Methods, De Carlo (1997) states at the beginning of the abstract to his paper "On the Meaning and Use of Kurtosis," as follows:
“For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness.”
But this statement is also easily shown to be false as regards "peakedness" and "flatness":
Take a U(-1,1) mixed with a N(0,1000000), with mixing p=.0001 on the normal. The distribution is symmetric and unimodal, has extremely high kurtosis, but appears flat at its peak when graphed. The high kurtosis here, as with all distributions, is explained by potential outliers, not by the peak.
Now mix a beta(.5,1) with a -beta(.5,1), with equal probabilities. The distribution is symmetric and unimodal, has negative excess kurtosis, but has an infinite peak. The negative excess kurtosis here, as with all distributions, is explained by paucity of potential outliers, not by the peak. (The maximum possible absolute is around 2.24 for this distribution).
— Preceding unsigned comment added by 129.118.195.172 (talk) 16:07, 31 August 2017 (UTC)
- Peakedness (as used in practical applications) is not supposed to imply anything about the statistical distribution. Higher kurtosis suggest more/higher peaks in an associated data series (e.g. a time series). Therefore, it does not help to argue about the shape of the distribution when you discuss peakedness. It is not intended to be interpreted that way (though a lot of people believe that). See the discussion above about Peakedness. 79.136.121.89 (talk) 08:41, 20 September 2023 (UTC)
Are the terms leptokurtic/platykurtic still meaningful?
[edit]If kurtosis is not a measure of the "peakedness", are the terms "leptokurtic" and "platykurtic" still meaningful? Don't they just mean "more peaked" and "less peaked"? Or do they need to be either abandoned or re-defined? --Roland (talk) 21:41, 5 November 2018 (UTC)
- Pearson really botched this with with his poor word choices. How about "magnacaudatic" instead of "leptokurtic"; "parvacaudatic" instead of "platykurtic"; "mediacaudatic" instead of "mesokurtic"? Those are Latin-ish terms for "heavy-tailed", "light-tailed", and "medium-tailed." Or, just use the latter terms and eschew obfuscation. BigBendRegion (talk) 13:19, 6 November 2018 (UTC)
- It looks, not only should the terms "leptokurtic", "platykurtic" and "mesokurtic" cease to be used, but also the term "kurtosis" per se should retire, as it just means "peakedness". Isn't it time to invent a Greek-based term meaning "tailedness"? The Greek word for "tail" is "ουρά", or "oura". What would be an Anglocized word based on that to mean "tailedness"? The word "kurtosis", if continues to be used, will be a misnomer, or anachronism. --Roland (talk) 20:58, 8 November 2018 (UTC)
Excess Kurtosis as a Measure of Agreement of 2 Datasets
[edit]When comparing 2 datasets, we often compute the distribution of the differences, or errors, of the 2 datasets. When excess kurtosis was considered a measure of "peakedness", a larger positive excess kurtosis would imply better agreement. Now it has been realized that excess kurtosis is actually a measure of "tailedness". What, then, should we wish for when we wish 2 datasets agree? A kurtosis as small as possible? --Roland (talk) 00:59, 27 November 2018 (UTC)
- Please give a cite. I have heard of using quantile matching to measure agreement between data sets, but never kurtosis. In any event, similar kurtosis of two data sets indicates similar tail weight (or outlier character) for the two, so in that sense there is a match of sorts. BigBendRegion (talk) 13:43, 28 November 2018 (UTC)
- I have a copy of Numerical Recipes in FORTRAN: The Art of Scientific Computing by Press et al., 2nd Ed., 1992. A subroutine, moment, in this book gives you the mean, average deviation, standard deviation, skewness and excess kurtosis the same time. (The explanation of excess kurtosis in the book is based on the old idea of "peakedness".)
- I often compare model results with observed data that were acquired at different times and locations. One thing I do is computing the differences between the model results and the observed data and examining the statistics of the differences. Naturally, the modelers aim to make their results agree with the observed data as much as possible, and they would look at every such statistic as a measure of level of agreement: smaller absolute values of the mean, the standard deviation and the skewness suggest better agreement than larger ones.
- As for the excess kurtosis, if it were a measure of "peakedness", a larger positive one would be better than a smaller one. However, now it is a measure of "tailedness", and a larger tailedness, or more outliers, would imply worse agreement. This seems to suggest that we now should aim to make excess kurtosis as small as possible rather than as large as possible. --Roland (talk) 21:24, 3 December 2018 (UTC)
Sole reference in section "Kurtosis convergence" is a dead link
[edit]The section's reference, http://www.cs.albany.edu/~lsw/homepage/PUBLICATIONS_files/ICCP.pdf, is dead.
Kurtosis in SPSS
[edit]It may also be mentioned that is used by SPSS to calculate the kurtosis. — Preceding unsigned comment added by Ad van der Ven (talk • contribs) 08:54, 25 April 2019 (UTC)
Section on entropy
[edit]The section on the maximality of the normal distribution's entropy does not seem very relevant to this article. Also the formula for the entropy is missing a minus sign. FilipeS (talk) 00:22, 13 August 2024 (UTC)