Measures of Central Tendency | Mean, Median, Mode

Central Tendency:

A Descriptive summary of a data set through a single value that reflects the center of the data Distribution.

Central Tendency

Let's say you're in college final year. And you're choosing a career in data science and you debating between python and R developer so you decide that you're even lookout how much python developers make as supposed to R developers. Fortunately these days data's often easily accessible. You find the frequency distributions of python developers are is., And R developers are is like this.

R developers salary

Python developers salary

So let's say that these histograms are created from the data from everyone who either programming developers in python and R. And the X axis represents their annual income in thousands. From these distributions approximately what income do most python developers make? And approximately what income do most R developers make?

Judging by these distributions it looks most python developers makes somewhere between two blue dots )as mentioned in above image) per year. So if you say anything between these two numbers then you got it right. So here we're focusing on the center of the distribution.

Here we determined an interval estimate that you're likely to make what either majors so in the case of python developers between two blue dots. But ideally we want one number that describes the entire data set. This allows us to quickly summaries all of our data.

How would you choose one number (or a small range of numbers) that accurately represents the typical salary of python or R developers?

The value at which frequency is highest is called mode. And it certainly works in describing the distribution. The most common value is the mode.

The value in the middle of the distribution is called median.

And finally the average is just statistic that rest at a specific spot in the middle of distribution.

So we know that the mode, median and average can describe the distribution. They each has strengths and weaknesses.

Mode:

Remember that the mode occurs with the highest frequency. It is an actual value, which has the highest concentration of items in and around it.

According to Croxton and Cowden “ The mode of a distribution is the value at the point around which the items tend to be most heavily concentrated. It may be regarded at the most typical of a series of values”.

Properties of Mode:

Remember that the mode occurs on the X axis.

The mode can be used to describe any type of data we have, whether it's numerical or categorical.
All scores in the data set don't affect the mode.
It may not be unique.
It may not exist.

Well there is a procedure To define the mode, where you look at all the data values, you see which one occurs the most or you'll look at the histogram and you see which bin has the highest frequency, we can't describe the mode was in equation. And this is why we often use the mean or average.

Mean:

Unlike the mode the mean (Average) takes all values into account because we add them all up then we divide by how many values there are.

Properties of Mean:

All scores in the distribution affects the mean.
The mean can be described with a formula.
Many samples from the same population will have similar means.
The mean of a sample can be used to make inferences about the population it came from.

Outliers:

The Mean or Average can be misleading when we have outliers. Values that are unexpectedly different from the other observed values. Outliers creates skewed distributions by pulling the mean towards the outlier. This makes the mean a lot less representative of the middle of the data. This is why the median comes in. Which is in the middle of the data.

outlier

Median:

The median is that value of the variate which divides the group into two equal parts, one part comprising all values greater, and the other, all values less than median.

Properties of Median:

Median is not influenced by extreme values because it is positional average.
Median can be calculated in case of distribution with open end intervals.
Median can be located even if the data are incomplete.
Median can be located even for qualitative factors such as ability, honesty etc.

When to use the Mean:

Symmetric Distribution, Continuous data.

When to use the Median:

Skewed Distribution, Continuous Data, Ordinal data.

When to use the Mode:

Categorical data, Ordinal Data, Probability Distributions

Effective methods for calculating central tendency in statistical analysis

Which Measure of central tendency is best and why?

It depends upon the data that we have. we have to investigate the data like whether it is skewed or not, type of data etc.,

when we have a symmetrical distribution for continuous data the mean, median and mode are equal. in this case we have to use the mean because it includes, all the values in the data. In skewed distribution, the median is the best measure of central tendency.

Kulli Data Science

Search This Blog