Five Number Summary and BoxPlot

  • The five numbers that help describe the Center, Spread and Shape of the data.
    • X(smallest)
    • First Quartile (Q1)
    • Median (Q2)
    • Third Quartile(Q3)
    • X(largest)
  • BoxPlot is based on these five measures
 
  • BoxPlot has Q1 and Q3 as its edges
  • BoxPlot can be horizontally or vertically plotted
  • If data is symmetric around the Median, the box and central line are centered between the endpoints as shown below

  • BoxPlot can be Left-Skewed or Right-Skewed or Symmetric based on the data-set distribution as shown below




BoxPlot example showing an outlier
  • BoxPlot plotted using the following data-set: 0, 2, 2, 2, 3, 3, 4, 5, 5, 9, 27
    • X(smallest) = 0
    • First Quartile (Q1) = 2nd 2
    • Median (Q2) = 6th element 3
    • Third Quartile(Q3) = 2nd 5
    • X(largest) = 27
  • A value is considered an outlier if it is more than 1.5 times the Inter-Quartile-Range (IQR) below Q1 and above Q3. 
    • IQR = Q3 - Q1 = 5 - 2 = 3
    • Lower Limit for outlier below Q1 = Q1 - (1.5 * IQR) = 2 - 4.5 = -2.5
    • Upper Limit for outlier above Q3 = Q3 + (1.5 * IQR) = 5 + 4.5 = 9.5
    • 27 > 9.5 and hence is the outlier in the data-set
 
BoxPlot usage
  • Used when comparing segment performance
  • Used to identify the pattern in the data-set and any outliers in the data-set

Comments