Recapitulation

In the previous post, we had looked at the basic principles of making combinatorial arguments. We saw that these arguments, while deficient in that they do not yield any numerical values, give us insights about how we can count ways of doing a task. In the conclusion of that post, I introduced the idea of the possibility of double counting and suggested that combinatorial arguments allow us to avoid doing that.

Double counting is a particularly pressing problem, especially in democracies. When we wish to determine the ‘will of the people’, we dare not forget that each citizen falls within a number of demographic categories. For example, I am a Christian male, who is currently self-employed and who has graduate degrees. If I give my opinion on some matter and am counted separately under each of these categories, then my opinion will be given undue weight and would not be fair to others.

The question then arises, “How do we ensure that we count every item without counting any item more than once?” Anyone who is involved in any kind of statistical study needs to pay attention to this so that each data point is given the same weightage. So let me introduce you to the inclusion-exclusion principle.

Basics with Two Categories

We begin with the most basic set-up with all items classified into two categories (say A and B). Since items can have multiple classifications, we can represent this using the Venn diagram below:

The Venn diagram has three regions labelled 1, 2, and 3. Set A includes regions 1 and 2, while set B includes regions 2 and 3. If we just add the number of items in sets A and B, we can see that the central region (2) will be counted twice. In other words, the items that belong in both sets will be double counted. To avoid this, we need to subtract the items that belong in the intersection of the two sets.

This can be written as

The last term on the right side of both equations recognizes that some items have been ‘included’ more than they need to be and explicitly ‘excludes’ them from the count. This adjustment by excluding what was included too many times is what gives the principle its name.

Counting with Three Categories

Of course, we know that demographics includes many more than two categories. So how would the inclusion-exclusion change if we had three categories? The Venn diagram below indicates the situation.

As shown in the diagram, there are now 7 distinct regions. Set A includes regions 1, 2, 4, and 5. Set B includes regions 2, 3, 4, and 6. And set C includes regions 4, 5, 6, and 7. If we add sets A, B, and C, we will count regions 2, 5, and 6 twice and region 4 thrice. We can proceed as before and exclude the intersection regions of two sets. But since region 4 belongs to A∩B, B∩C, and C∩A, we will have excluded region 4 thrice. This means that regions 1, 2, 3, 5, 6, and 7 are now counted once, but region 4 is not counted. Of course, it is a simple matter to add it back to include it. This is represented as

The terms in red we have encountered before in the case where we had only two categories, A and B, and had to compensate for overcounting the regions in the intersection of the two sets. The term in green is a new term, introduced because, by compensating for the overcounting of the regions in the intersections of pairs of sets, we ended up undercounting the intersection of all three sets.

Extension to Four Categories

What would happen if there were more than 3 categories? Unfortunately, a 2 dimensional representation will no longer suffice. So let us label the regions as follows: 1: only A; 2: only B; 3: only C; 4: only D; 5: only A and B; 6: only A and C; 7: only A and D; 8: only B and C; 9: only B and D; 10: only C and D; 11: only A, B, and C; 12: only A, B, and D; 13: only A, C, and D; 14: only B, C, and D; and 15: A, B, C, and D.

Now set A includes regions 1, 5, 6, 7, 11, 12, 13, and 15. Similarly set B includes regions 2, 5, 8, 9, 11, 12, 14, and 15. Set C includes regions 3, 6, 8, 10, 11, 13, 14, and 15, And D includes regions 4, 7, 9, 10, 12, 13, 14, and 15. Now, when we just add A, B, C, and D, the regions 5 to 10 will be counted twice, regions 11, 12, 13, and 14 will be counted thrice, and region 15 four times. The over counting for regions 5 to 10 would need an ‘exclusion’ by subtracting the intersection of pairs of sets. However, this would mean that the regions that involve the intersection of exactly three sets (i.e. 11, 12, 13, and 14) would have been ‘excluded’ three times over, meaning that they would not be ‘included’ anymore. It would also mean that the region 15 has been counted negative 2 times! This can be adjusted by now ‘including’ these four regions. However, this means that region 15 is now ‘included’ 2 times. Hence, we make a final adjustment and subtract the intersection of all four sets. This is represented as

Here the red terms represent the adjustments for regions 5 to 10, the green ones for regions 11 to 14 and the blue on for region 15.

Generalization

If we continue in this manner, adding more categories, we will be able to demonstrate that

This can be written in a more compact form as

I recommend you take a close look at how the last two equations are actually the same as we will need such ‘parsing’ of formulas in future posts!

What we have seen here is that we can recursively ‘exclude’ intersections with greater number of sets and in so doing avoid the twin traps of overcounting and undercounting. In the next post we will look at a way of counting an interesting arrangements of objects known as a ‘derangement’. Hope you stay sane till then!

Posted in

2 responses to “Abjuring Double Counting”

  1. No Place at Home – Acutely Obtuse Avatar

    […] of combinatorial arguments, which allowed us to gain some insight on the process of selection. In Abjuring Double Counting we saw the recursive process used to ensure that, when we count members of sets that have common […]

    Like

  2. Return of an Old Friend – Acutely Obtuse Avatar

    […] us to the area of combinatorics and specifically combinatorial arguments. In the second post, Abjuring Double Counting, we looked at the inclusion-exclusion principle that is used to ensure that every element is […]

    Like

Leave a reply to Return of an Old Friend – Acutely Obtuse Cancel reply