- The five numbers that help describe the Center, Spread and Shape of the data.
- X(smallest)
- First Quartile (Q1)
- Median (Q2)
- Third Quartile(Q3)
- X(largest)
- BoxPlot is based on these five measures
- BoxPlot has Q1 and Q3 as its edges
- BoxPlot can be horizontally or vertically plotted
- If data is symmetric around the Median, the box and central line are centered between the endpoints as shown below
- BoxPlot can be Left-Skewed or Right-Skewed or Symmetric based on the data-set distribution as shown below
BoxPlot example showing an outlier
- BoxPlot plotted using the following data-set: 0, 2, 2, 2, 3, 3, 4, 5, 5, 9, 27
- X(smallest) = 0
- First Quartile (Q1) = 2nd 2
- Median (Q2) = 6th element 3
- Third Quartile(Q3) = 2nd 5
- X(largest) = 27
- A value is considered an outlier if it is more than 1.5 times the Inter-Quartile-Range (IQR) below Q1 and above Q3.
- IQR = Q3 - Q1 = 5 - 2 = 3
- Lower Limit for outlier below Q1 = Q1 - (1.5 * IQR) = 2 - 4.5 = -2.5
- Upper Limit for outlier above Q3 = Q3 + (1.5 * IQR) = 5 + 4.5 = 9.5
- 27 > 9.5 and hence is the outlier in the data-set
BoxPlot usage
- Used when comparing segment performance
- Used to identify the pattern in the data-set and any outliers in the data-set
Comments
Post a Comment