We are in the middle of a series of posts on π. Our journey began with A Piece of the Pi which was followed by Off On a Tangent. Last week, I posted The Rewards of Repetition in which I promised that we would break down the Gauss-Legendre algorithm into the distinct parts. Of course, as mentioned in the previous post, the algorithm uses mathematics that is way above the level of the blog. I will not be explaining any of that. I will only explain those parts that are at the level of this blog. So let’s proceed.
The Gauss-Legendre Iteration
First, let us recall the algorithm. The Gauss-Legendre algorithm begins with the initial values:
The iterative steps are defined as follows:
Then the approximation for π is defined as
The question, of course, is why this process even works. Surely this can’t be some random method that Gauss and Legendre independently stumbled upon! So what is the rationale behind this strange algorithm? After all, if we are honest, there is nothing in the steps that lead self-evidently to a conclusion that the iterations would converge to π. So let us consider the steps involved and why these initial values and iterative method leads to the approximation of π.
Comparison of Means
The method begins with the realization that, given to distinct positive numbers, their arithmetic mean (AM) is always greater than their geometric mean. Of course, the AM and GM are defined as
So we can proceed as follows since a and b are distinct
Convergence of an and bn
Now the iterative formulas
Assure us that
and
This means that the sequence bn is steadily (monotonically) increasing while the sequence an is steadily (monotonically) decreasing. Since we start with the fixed values of a0 and b0, this means that the two sequences will necessarily converge.
Now if we define
we can obtain
which, further yields
This means that the sequence cn also converges. And since cn cannot be negative, it must converge to 0. This means that an and bn converge to the same value. Since this common value is obtained by converging series of arithmetic and geometric means, it is called the arithmetic-geometric mean (AGM). Let us designate it with m.
The Magic of Calculus
The next stage involves some calculus. Since I haven’t introduced calculus in the blog yet, let me just present the results. Those who wish to read in detail can look at the proofs here. Anyway, the method begins by defining
Then with some nifty substitutions and change of limits, we are able to prove
The observant reader will now realize why we had done all the stuff related to AM and GM and the convergence of the same. Now it is a trivial step to conclude that
From this we can conclude that
where, m, as defined earlier, is the AGM of a0 and b0. What this means is that
and
Now using the Gamma and Beta functions it is possible to prove
What this means is that if we start with the initial values as stated earlier, each iteration of
gets us closer to the value of π.
The iterative process of the Gauss-Legendre method converges so quickly that, after only 25 iterations, the algorithm generates 45 million digits of π.
Conclusion
I apologize to the reader for not including all the calculus behind the algorithm. I have done that in the interests of not making this post too heavy. I know that it is still quite heavy. But that is the nature of the beast! There are no rapidly converging methods for approximating the value of π that do not involve a lot of calculus. Indeed, some algorithms derived using calculus are really quite strange – so strange that even the Gauss-Legendre algorithm would seem quite sensible! We will tackle some of them in the next post, after which I will lay this π to rest.
The past week got quite busy all of a sudden and the week ahead also looks like it will be the same. I know I’m in the middle of a series of post on π. I will continue with the series in a couple of weeks. For now, I leave you with some collections of earlier posts.
It’s frustrating. Yes, you heard me right. It’s frustrating. You may be wondering why I am saying this and in what context. So here goes. Just the other day, I was looking up some perspectives on a mathematics discussion forum. Quite naturally, there were many mathematics teachers in the forum. And quite naturally there was some discussion about questions asked by students.
As a mathematics teacher, I find these questions quite illuminating. In fact, this is one of the reasons I love being a teacher. You never know what would spark the interest of some student enough to formulate a question. So even though what I teach might be much the same from year to year, because who I teach changes, it is never boring. Indeed, even when you teach the same group of students, as I did from 2016 to 2018 and then 2020 to 2023, you can see their questions change as they themselves mature and change.
But it’s time to stop this nostalgic reverie and return to my frustration!
Case I: Equivalent Fractions
One question on the forum was asked by the teacher of a student in elementary school. I presume this is between Grade 1 and Grade 5. The student was learning fractions. When they came to the concept of equivalent fractions, the student asked the teacher what was different between the fractions 6/8 and 3/4. The teacher asked the question on the forum, requesting advice on how to respond to the student.
The poor teacher! The backlash she faced was unconscionable. So many of her fellow mathematics teachers pounced on her to tell her not just that both fractions are the same but that any mathematics teacher worth his/her salt should know this. Some questioned her competence to teach the subject. I commented, taking her side, and posted a link to a previous post in which I demonstrate that equivalent fractions might have the same value but do not contain the same information.
I faced the same issue in a recent class. A student asked me what the difference between various equivalent fractions was. That this student is currently in Grade 12 and is a reasonably astute student is an indictment against all his mathematics teachers.
Anyway, the question arose when, in answer to a problem, the student gave me the following numbers:
The student had calculated accurately and the numerical values of each of these fractions was spot on. However, I told him that, by reducing the fractions to their lowest forms, he had lost something crucial. It was then that he asked me the question about equivalent fractions.
So I gave him another set of fractions:
I asked him what links this set of fractions. He struggled, unsuccessfully, for a few minutes. Then I told him that I had generated this set of fractions in a very similar way to the one with which he had generated the first set. I gave him a few more minutes to try his luck, to no avail.
What was it that gave rise to these two sets of numbers? There is clearly no discernible pattern in either of the sets and they may seem to be just some random fractions generated by an addled mind. However, what if I told you that the first set, before reducing to equivalent fractions, was
Right away, you will recognize that there is a pattern. The numerator is composed of multiples of 5, while the denominator is composed of factorials, starting with the factorial of 3. If I asked you to give me a formula for generating the terms, you will quickly be able to tell me:
Similarly, if, for the second set of numbers, I gave you
you will equally quickly be able to tell me
You see, while the numbers in the first set and third set have the same corresponding values (and similarly with the second and fourth sets), something is lost by reducing the fractions to their lowest terms. And with this loss, we are rendered incapable of generating further terms in the sequence because the process of reducing the fractions destroys the pattern.
Case II: Quadratic Expressions
This shows up in other places too. For example, one of the students I am currently tutoring is studying quadratics at school. Now this student is exceptionally bright. When I was concluding a class, I asked her to think about the difference between
and
For those unfamiliar with the above, let me give an example.
The above is an example of the fact that any quadratic expression of the form
can be expressed in the form
But what was the point of the re-expression? Why should we bother to do it? In the next class I asked my student if she had given this some thought. She said she had but that she could not figure out in what way
is different from
Since we had just started learning about quadratics, I did not fault her for not being able to answer the question. After all, that’s what learning is. But she told me that she had asked her teacher about the difference. And her teacher had told her that there was no difference because
I’m glad the class was online. Otherwise, my student would have felt my fury palpably. It is likely that she did sense it even over our Zoom call! The statement of the teacher is absolute rot and betrays a failure to look below the surface. For, while both expressions are indeed equivalent, the expression on the right tells me that this quadratic expression, if plotted as a function of x, has a vertex at the point (1, -13). You may wonder, “What’s the big deal? So what if the expression on the left gives me the coordinates of the vertex? I could do a few steps of algebra and get it.” Absolutely true.
However, if you were given the functions
you would not know that they all have the same vertex. However, if they were written as
the fact that the vertices are at (1, -13) is indisputable. But more than that, the second set of equations also tell us, from the signs on the coefficients 3 and -13, 5 and -13, and -6 and -13, that f(x) = 0 and g(x) = 0 will have real roots, while h(x) = 0 will only have complex roots.
Case III: Theory of Equations
The same kind of issue shows up when we attempt to solve equations in general. Suppose, for example, you are asked to solve the equation
How would you proceed? Of course, you can use some general results from the theory of equations and conclude
But after this, what? Every student who has fumbled with these results will know that, if you try to solve these equations by some standard methods like elimination or substitution, you will revert to the original equation. So while the above four results give some valuable insights about sums and products of the roots of the equation, they do not bring us any closer to actually solving the equation.
When it comes to just simple quadratic equations, of course, the above method is powerful. After all, suppose we are given the equation
A little back of the napkin calculations will tell me that I need to find two numbers α and β such that
With a little effort, I can determine that the divisors of 3705 are 1, 3, 5, 13, 15, 19, 39, 57, 65, 95, 195, 247, 285, 741, 1235, and 3705. I can place them in pairs as follows: (1, 3705), (3, 1235), (5, 741), (13, 285), (15, 247), (19, 195), (39, 95), and (57, 65), where the numbers in each pair gives the product of 3705. Now since I have -3705 and not 3705, I need two numbers of opposite signs. And since I have -56 and not 56, I need the number with the larger magnitude to be negative. This gives the pair of numbers to be 39 and -95. Now we can proceed to split the middle term and factorize, giving
This allows us to see that we have the product of two numbers equalling zero, which must mean that either of the two numbers is zero. This allows us to conclude
While this method is relatively simple and straightforward in the case of quadratics, there is absolutely no way to use it when it comes to any generic polynomial equation of higher degree, like the quartic equation we saw earlier.
However, the actual way of solving the quadratic, by having a product equal to zero, should allow us to see that, if we had a quartic equation of similar form, we would be able to solve it. Indeed, the earlier quartic equation can be written as
This will allow us to conclude that
In other words, though it is almost child’s play for a student in grade 9 or above to show that
can be expanded to give
it would take even a skilled mathematician quite a few iterations of trial and error before she was able to solve
However, most high school students and perhaps even some middle school students would be able to solve
It is, therefore, quite frustrating when students come to me in Grade 11 and tell me that their teachers either found no difference between the expanded forms of expressions and the factorized forms or that they actually preferred the expanded form. Granted that the parentheses seem intrusive at times, one person may certainly prefer the visual appeal of the expanded form to the factorized form. However, when it is clear that the expanded form is difficult, if not impossible, to solve, this would be like preferring opacity to lucidity.
Case IV: Permutations and Combinations
I attribute this unfortunate preference to the noose like grip that textbook publishers and calculator manufacturers have on the global high school examination boards and schools. This fourth case, should highlight this point well. Suppose we were faced with the following problem. In how many ways can a committee of 7 people and another of 8 people be chosen out of 25 people such that 2 members of each committee are to be designated as president and secretary and such that no person belongs to both committees? How would we proceed?
We could first select the 7 member committee, which can be done in 25C7 ways. Now among these 7 members, we have to choose 1 to be president and 1 to be secretary, which can be done in 7P2 ways. We use a permutation because it matters whether a person is president or if she is secretary. Now we have 18 people remaining. We can choose the second committee of 8 members in 18C8 ways and the president and secretary in 8P2 ways. So in all we have
Most textbooks will state that the answer is 49473074851200 or 4.947×1013 ways or something of the sort, where each individual term has been calculated and then the product evaluated. But this is an onerous task to do by hand. Hence, expecting the student to get either answer is expecting them to use a calculator. But perhaps the above number is too large for most of us to even process. Let’s deal with some smaller numbers.
Suppose I have 5 students. On one day, I wish to send a team of 3 of them for a debate. Quite naturally, this can be done in 5C3 = 20 ways. On another day, I need to send one student to the staff room to collect my books and another to the library to collect a journal. This can be done in 5C1 × 4C1 = 5 × 4 = 20 ways. Both tasks can be done in 20 ways. However, the number 20 obscures how it was obtained. If the textbook gives the answer as 20 for the first situation, but a student used the second method and obtained 20, she would think she has understood the concepts since she obtained the ‘correct’ numerical answer. However, her thinking is obviously flawed. However, if the textbook gave the answers as 5C3 and 5C1 × 4C1 respectively, any diligent student would be able to see where she had gone wrong.
When I have taught large classes of students, I have often given them a random number and asked them to give me a series of operations with only a particular single digit number that would yield the random number. For example, suppose the random number is 23 and the chosen single digit number is 4. Then I could have
There are infinitely many ways in which I can use just the number 4 and obtain 23. The challenge would be for each student to give a unique set of operations. And when I have done this exercise, never once have we needed to repeat a set of operations.
What I am trying to say is that expressions such as 5C1 × 4C1 or 5C3 or, as in the original problem,
may be numerically equal to 20, 20, and 49473074851200 but contain far more information and give far more insight than the explicit numbers. However, if you are a calculator manufacturer, none of these insightful forms will help you sell your products!
Escaping the Trap
So you, that is, the calculator manufacturer, and the examination boards and the textbook publishers form a nexus in which these explicit but uninsightful forms are given preference over the forms that can be used to yield mathematical insight and promote mathematical understanding.
Unfortunately, most teachers go ‘by the book’. And so over time they forget that 3/4 and 6/8 may be numerically equivalent, but have quite different meanings. They forget that 20 is not just a number but could have been obtained in many ways, most of which are incorrect. They tow the line and look at the bottom line, that is, the final answer, concluding, if the answers match, that the student must have correctly executed the right sequence of operations. But as we have seen, this need not be the case.
It is time that the examination boards took a firm stance and actually favored the development of mathematical understanding of students rather than their ability to use a tool that is obsolete. I mean, where in the world does anyone use a calculator these days? Some old style retail outlets may use four function calculators to add the prices of items on a list and maybe also to calculate any sales tax or GST. However, where does anyone use a calculator for trigonometric or logarithmic or exponential functions except in school where it is forced onto the students? If we want students to be able to use some kind of technology to evaluate complicated functions, then we should teach them a programming language rather than make them use calculators. At least the former has real world relevance for many jobs while the latter has absolutely none!
Smoke and Mirrors. Photo by theilr. (Source: Flickr)
I have long been fascinated by patterns that exist in numerical sequences. Early exploration led me to the discovery of the Fibonacci series and the associated Golden Ratio. I also came across Pascal’s Triangle, which remains a source of intrigue. Hence, when I was introduced to the tests for divisibility, I devoured them with a voraciousness unmatched till then in my life. By some accounts, I stumbled upon the tests for divisibility by 3 and 9 all by myself, without the direct involvement of my teachers.
And when my teachers ‘failed’ – or so it seemed to my young mind – to give me a test for divisibility by 7, I attempted to derive one such test. Of course, I failed at the task. Try as I might, I could come away with no universal test that would hold true for a number having any number of digits. What was it about 7?
You see, we have tests for divisibility by 2 (number ends with 0, 2, 4, 6, or 8), 3 (sum of digits is divisible by 3), 4 (last 2 digits are divisible by 4), 5 (number ends in 0 or 5), 6 (number satisfies conditions for 2 and 3), 8 (last 3 digits are divisible by 8), 9 (sum of digits is divisible by 9), 10 (number ends with 0), 11 (difference of the sums of alternate digits is divisible by 11), and 12 (number satisfies conditions for 3 and 4). Why was 7 so stubborn?
Moreover, why did these tests work out the way they did? Perhaps if we uncovered the reasons for which the tests for 2, 3, 4, 5, 6, 8, etc. are what they are, we can gain some insight into why 7 refuses to submit to our efforts.
In what follows, I will use the notation [wxyz] to denote a 4 digit number which has w in the thousands position, x in the hundreds position, y in the tens position, and z in the units position. Hence, the value of the number
Naturally, the first test of divisibility is that by 2. In order to derive this test, we do the following:
Here it is clear that the sum of the first three terms is divisible by 2. Hence, the number [wxyz] as a whole will be divisible by 2 if and only if z is divisible by 2, which happens when z = 0, 2, 4, 6, or 8, giving the test for divisibility by 2.
Divisibility by 3
When we reach divisibility by 3, we can do the following:
That is
Here it is clear that the first term is divisible by 3 and 9. Hence, the number [wxyz] as a whole would be divisible by 3 or 9 if the second term, that is the sum of the digits, is divisible by 3 or 9 respectively. This gives rise to the tests for 3 and 9.
Divisibility by 4
For the test for divisibility by 4, we manipulate our number as follows:
Here it is clear that the first term is divisible by 4. Hence the number [wxyz] as a whole will be divisible if the second term, which is the number [yz] formed by the last two digits, is divisible by 4. This yields the test for 4.
Divisibility by 5
The test for divisibility by 5 is easy to remember. But it is obtained by doing the following:
Here it is clear that the first term is divisible by 5. Hence, the number [wxyz] as a whole will be divisible by 5 if z is divisible by 5, which happens when z is either 0 or 5, giving us the test for 5.
Divisibility by 6 and 12
Since 6 is the product of 2 and 3, a number that satisfies the condition for 2 and 3 will be divisible by 6. Similarly, since 12 is the product of 3 and 4, a number that satisfies the condition for 3 and 4 will be divisible by 12.
The test for divisibility by 8 is an extension of that derived for 4. We proceed as follows:
Here it is clear that the first term is divisible by 8. Hence, the number [wxyz] as a whole will be divisible by 8 if the second term, which is the number formed by the last three digits, is divisible by 8. Hence, we get the test for 8.
Divisibility by 10
The test for divisibility by 10 follows the same logic as that for 5. In order to derive it we do the following:
Here it is clear that the first term is divisible by 10. Hence, the number [wxyz] as a whole will be divisible by 10 if z is divisible by 10, which happens when z is0, giving us the test for 10.
Divisibility by 11
When we come to divisibility by 11, we realize that it is not as straightforward as the ones that came before. Indeed, while the test for divisibility by 11 was taught when I was in school, it has mostly been removed from the mathematics curriculum today. Nevertheless, the test for divisibility by 11 is quite instructive. So let us see what we can learn from it. Given the number [wxyz],, we can do the following:
Here it is clear that the term in red is divisible by 11. Hence, the number [wxyz] will be divisible by 11 if the term in green, which is the difference or the sum of alternate digits, is divisible by 11, yielding the test for 11.
Divisibility by 7
When we come to the number 7, there is indeed a test for divisibility by 7. It goes as follows. Suppose we have the number […xyz]. Then if […xy] – 2z is divisible by 7, the original number is divisible by 7.
For example, consider the number 658. We can proceed as follows:
Since 49 is divisible by 7, it follows that 658 is divisible by 7. But how does this work? Consider the number […xyz]. We can proceed as follows:
Here the term in green is obviously divisible by 7. The term in blue is the one we are asked to test for divisibility by 7. If it is divisible by 7, then the whole number is divisible by 7. But what if we had a much larger number? How would we proceed? Suppose, for example, that we were testing for 548975. We would have to proceed as follows:
Since 35 is divisible by 7, we can conclude that 548975 is divisible by 7. This is a recursive method that is fool proof. It is also easy to remember. However, due to the recursion and the fact that most of us probably barely remember multiples of 7 that are 3 or more digits long, for a number with n digits, we will have to perform n – 2 recursions before reaching a 2 digit number that we could recognize as a multiple of 7 or not.
Each recursion involves a multiplication and a subtraction. This results in 2n – 4 operations for the n – 2 recursions.
Attempt at Divisibility by 7
We can see that in most of the tests, we generate 2 terms, the first of which is automatically divisible by the number under consideration. Then the test devolves into testing if the second term is divisible by that number. For the tests for 3 and 9 we saw that one less than a power of 10 is automatically divisible by 3 and 9. For the tests for 4 and 8 we saw that multiples of 100 and 1000 are automatically divisible by 4 and 8 respectively.
Even for the test for 7 we followed the same basic idea indicated by the 21z term. However, while the other tests yielded the answer in a single step, the test for divisibility by 7 involved a recursive approach. Hence, for a 6 digit number, we had to perform 4 recursions to obtain a 2 digit number for which divisibility by 7 can be placed in memory.
The test for 11 was most illuminating. We saw that some powers of 10 (i.e. 10 and 1000) are 1 less than a multiple of 11 while other powers of 10 (i.e. 1 and 100) are 1 more than a multiple of 11. Could we use a similar approach to obtain another test for divisibility by 7?
To attempt that, let us see how we fare when we divide powers of 10 by 7. The table below shows the remainders.
After the 6th power of 10, that is 1,000,000, the patterns repeats. And this is because
Hence, when we have a number of the form [xyzxyz] it is necessarily divisible by 7. Since 1,000,000 is 1 more than a number that follows this pattern (i.e. 999999) the pattern of remainders will repeat after every 6th power of 10.
Now we observe that the first three powers of 10 give remainders of 1, 3, and 2 respectively. The next three powers of 10 give negative remainders in the same order.
So if I had the number [uvwxyz], we can rewrite it as
Here the term in red is obviously divisible by 7. Hence, the whole number [uvwxyz] is divisible by 7 if the term in green is divisible by 7. Let’s put this to the test and let’s label the term in braces as D.
Consider a 2 digit number (say 42). Here, u = v = w = x = 0, y = 4 and z = 2.
Since 14 is divisible by 7, 42 is divisible by 7.
Consider 91. Here
Since 28 is divisible by 7, 91 is divisible by 7.
Consider a 3 digit number (say 434). Here, u = v = w = 0, x = 4, y = 3 and z = 4.
Since 21 is divisible by 7, 434 is divisible by 7.
Consider a 4 digit number (say 2072). Here, u = v = 0, w = 2, x = 0, y = 7 and z = 2.
Since 21 is divisible by 7, 2072 is divisible by 7.
Consider a 5 digit number (say 24962). Here, u = 0, v = 2, w = 4, x = 9, y = 6 and z = 2.
Since 28 is divisible by 7, 24962 is divisible by 7.
Consider a 6 digit number (say 548975). Here, u = 5, v = 4, w = 8, x = 9, y = 7 and z = 5.
Since 14 is divisible by 7, 548975 is divisible by 7.
Now I grant that this process is not intuitive. There is no identifiable uniformity that would allow it to function as a test that is easy and quick to use as the other tests. There is, of course, a pattern. In each block of 3 digits, the digits are multiplied by 1, 3, and 2, respectively going from right to left. Alternate blocks are to be added and once all that is done, the difference between the sum of alternate blocks is to be evaluated. The resulting number, which will, in all likelihood, be a 2-digit number, is tested for divisibility by 7 and the result here tells us about the divisibility by 7 of the original number.
If we counted the number of operations involved, we will see that each of the parentheses in the green term above, involves 4 operations. For a number that has 3m digits, we will have m groups with m – 1 operations linking the groups, yielding 5m – 1 operations in all. With the previous test, a number of similar length (i.e. 3m), would require 6m – 4 operations. It is clear, then, that the second test will be quicker than the first. However, the first test is more straightforward than the second.
Stepping Back
However, despite the fact that the second test is not intuitive, such a process reveals the logic behind the other tests. Those tests are quite easy to implement, allowing us to take for granted their efficacy. And if my memory serves me right, none of my teachers ever really explained to me why those tests work, as I have done above. Hence, these algorithms come across to many students as some kind of mathematical wizardry, where reliable results are obtained by some numerical sleight of hand. Actually, I have a nagging feeling that many mathematics teachers themselves have no idea why the tests for 3, 4, 6, 8, and 9 work and most probably have no clue about the test for 11.
If we take a step back to understand what happened when we derived this test for divisibility by 7 we can see that, when we divided powers of 10 by 7, we obtained 6 different remainders. We took powers of 10 because that is precisely what the place value system of writing numbers shows us as we have seen when we expanded [uvwxyz] and other numbers. Concerning the remainders, we can further observe that in fact we obtain all the possible remainders we can get when we divide by 7. This is actually revealed in the recurring pattern ‘142857’ when we divide any integer that is not a multiple of 7 by 7. The increasing powers of 10 only shift this pattern rightward to give us the remainders as follows
Here the numbers with the same color indicate how the pattern is generated, that is, how one remainder morphs into the next as division is continued. Note especially the numbers in parentheses in the last column and compare them with the numbers in the second column of the previous table.
Since the recurring pattern for division by 7 is a string of 6 numbers (142857), we get 6 different remainders, which, in the case for 7, are all the remainders we could obtain. Due to this our test involved 6 different numbers u, v, w, x, y, and z. Since the pattern of remainders 1, 3, 2, -1, -3, -2 conveniently formed 2 blocks of 3 numbers, we were able to reduce the test to a replicating pattern involving the sum of 3 numbers for which the difference is finally taken.
Now, when we obtain tests for divisibility, we need focus only on the prime numbers as seen in the very brief descriptions for divisibility by 6 and 12. Powers of 2 present a unique situation as seen in the tests for 2, 4, and 8. However, if we focus on the prime numbers, the next one we encounter is 13. What would a test for divisibility by 13 look like? It pays to note that the recurring pattern for division by 13 is ‘076923’ and then to generate the following table:
Now it helps to not rush to any conclusions. While division by 7 and 13 produced patterns of length 6, this is not the case for all primes, as we should expect. For example, division by 17, the next prime, produces the full complement of 16 possible remainders. Anyway, getting back to 13, I encourage the reader to use the table above and the ones before it to obtain a test for divisibility by 13. Also do attempt a test similar to the recursive first one. Please do put your findings in the comments. And I will see you next week.
In the last three posts, we have been dealing with some issues related to counting. In Learning to Count, I introduced us to the area of combinatorics and specifically combinatorial arguments. In the second post, Abjuring Double Counting, we looked at the inclusion-exclusion principle that is used to ensure that every element is counted exactly once. In last week’s post, No Place at Home, the third in this series, we looked at the idea of derangements and used the inclusion-exclusion principle and a recursive formula to derive the expression for the number of derangements in a set of n items. To jog your memory, we proved that
Revisiting an Old Friend
In the previous series of posts, we had dealt with the number e. In the second post in that series, Naturally Bounded, we had derived an infinite series expression for e as
In the above equation, we make the following substitution:
This gives us
Please read the above carefully. Raising the equation to the power x we get
Using the same techniques as we used in Naturally Bounded, we will obtain
If we substitute x=-1 in the above we will get
A Surprising Result
Now consider the equation
We can rewrite this as
Taking a limit on both sides as n approaches infinity, we will get
Expanding the summation on the right side we will obtain
But we just showed that the right side is related to e. Specifically, we can write
Or taking reciprocals we get
In other words, as n increases, the ratio of the number of permutations of n items to the number of derangements of the n items approaches e. This is a surprising result as there is no obvious reason for which e should show up in this ratio.
An Approximate Explanation
However, let us consider that we have a set of n items. If we now place items in random positions, the probability that item 1 occupies its own spot will be 1/n. Hence, the probability that it does not occupy its own spot will be 1-1/n. The probability that item 2 occupies its own spot will be 1/(n-1), since one spot has already been occupied by item 1. Hence, the probability that item 2 is not in its own spot is 1-1/(n-1). For large values of n, 1-1/n and 1-1/(n-1) are nearly equal. For example, if n=1,000,000, the corresponding values are 0.999999 and 0.999998999998…, where the difference is approximately 0.000000000001. Since the probability values are so close to each other, we can approximate the probabilities that each of the n items to not occupy its own position to be 1-1/n.
This leads us to the overall probability being
Taking the limit as n approaches infinity we get
So we can see that the reason e appears in our study of derangements is that, for large values of n, the placement of the items in positions not their own are almost independent of each other and almost equal to each other. That we have used the word ‘almost’ twice is the reason why e only shows up in the limiting case and not for any finite values of n. Of course, as we place more items in different positions, the error involved in the approximation will increase. Hence, the above reasoning cannot fully account for the appearance of e in the expressions about derangements.
Hidden in Plain Sight
However, recall that, in the previous post, we had derived
If we divide throughout by n!, we will get
This reduces to
The term in red is the probability that a random permutation of n items is a derangement. The term in green is the probability that a random permutation of n-1 items is a derangement. We can see that as n gets larger, the right side gets closer to zero at the rate of a factorial. Hence, we can conclude that the two probabilities on the left side converge to some specific value.
Further, we had derived the expression for the number of derangements as
which can be rearranged to give
As observed in the previous post, the nesting of the parentheses yield the alternating signs of the terms. We also saw that the nesting of the parentheses was the result of the application of the inclusion-exclusion principle. In other words, the alternating signs are the result precisely of the way the derangements are counted. Hence, even though we might not have expected the appearance of e and might even be quite surprised by it, it is actually something that was in the expression all along.
Deranged No More
I have enjoyed writing these posts on counting, especially the last two on derangements. When I first encountered the idea, it was a novelty. Why would we ever want to place something where it does not belong? Being a person who does like order (though you may not suspect it if you looked at my desk!), the thought of placing something where it did not belong seemed only to be an instance of mathematicians playing around. But, as we saw in the previous post, derangements have a wide range of applications, some not quite intuitive. My interest was further piqued when I first saw how derangements are linked to e. The startling result hammered home the idea that mathematics, though often divided into incommensurate silos while we are taught it in school, is actually a unified body of knowledge.
I hope you have also received some enjoyment and maybe some stirring of interest through these posts. I would really like to hear from you. Please do write a comment and let me know what you liked, what you disliked, and what you found confusing.
In the previous two posts, I introduced ideas related to counting. In Learning to Count we introduced the idea of combinatorial arguments, which allowed us to gain some insight on the process of selection. In Abjuring Double Counting we saw the recursive process used to ensure that, when we count members of sets that have common elements, we neither omit them nor double count them. I had signed off last time with the promise that we will look at something I called ‘derangements’. Of course, the dictionary states that derangement is “the state of being completely unable to think clearly or behave in a controlled way, especially because of mental illness.” Quite obviously, this is not what I’m referring to! So what am I referring to?
Understanding Derangements
In mathematics, specifically in the area of combinatorics, a derangement is a permutation or arrangement in which no item occupies a position it previously occupied. This, I guess, is still a little vague. So let us consider a couple of examples.
Suppose we have a set of n letters that are in the mailperson’s bag. Obviously, each of these letters has an address on the envelop. However, if the mailperson delivers them such that no letter is delivered to the address on its envelop, then we have a derangement.
Or suppose n patrons of a restaurant check their coats in with the maître d’. On their way out the patrons collect the coats, but none of them receives their own coats. In that case, we have a derangement.
We could visualize a derangement as below
Here, we have five items (A, B, C, D, and E). However, none of the items in green matches the corresponding item in red. Using the ordering ABCDE, the above derangement is BEACD. In what follows, I will use red lettering to indicate an item that is not deranged, that is, an item that is placed where it belongs. And I will use green lettering to indicate an item that is deranged, that is, an item that is not placed where it belongs.
We can begin counting the number of possible derangements with small set sizes. For example, with 2 items, A and B, there are 2 permutations:
As shown above, there is only one derangement.
If we increase the set size to 3 items, A, B, and C, there will be 6 permutations:
As can be seen above, there are only 2 full derangements.
We can increase the set size to 4 items, A, B, C, and D, to get 24 permutations:
As seen above, there are 9 full derangements, indicated in green boxes.
Now, with a set of 5 items, we will have 5! or 120 different permutations. I don’t know about you, but I’m disinclined to even write out all these permutations, let alone scrutinize each to determine which items are deranged, then determine which permutations are indeed full derangements, and then to count the number of full derangements. If you have way too much time on hand for mind numbing tasks, please feel free to step away and undertake this not just for a set of 5 items, but a set of (say) 10 items, for which you can write down and scrutinize all the 3,628,800 permutations!
You see, the factorial function grows at an alarming rate. What might even be manageable for a set of 5 items, with 120 permutations, becomes an onerous task for a set that is just double the size, with millions of permutations. Hence, there must be a better way of determining the number of derangements given the size of the set. And, thankfully, there is!
We will proceed along two routes, both yielding the same result. But because the paths are different, we will discover different insights on the process. The first route we will adopt will use the inclusion-exclusion principle that we saw in the previous post. The second route will use a recursive formula.
Derangements using the Inclusion-Exclusion Principle
Given n items, there are n! different ways of permuting them. Out of these n! ways, there are (n-1)! arrangements in which item 1 is fixed in its own position. These (n-1)! arrangements include arrangements in which some other items may also be in their own positions. Similarly, there are (n-1)! arrangements in which item 2 is fixed in its own position. Again, these (n-1)! arrangements include arrangements in which some other items may also be in their own positions.
Of course, for each of the n items there are (n-1)! arrangements in which that item is fixed in its own position. Since there are n items in all, we are looking at
arrangements in which at least 1 item is fixed in its own position.
So now we have in consideration the initial n! less the arrangements in which 1 item is fixed. Hence under consideration now are
arrangements. It is clear, however, that there has been double counting for arrangements in which any pairs of items are both in their own positions. Hence, per the inclusion-exclusion principle, we now need to add all the arrangements in which pairs of items are fixed in their positions.
Suppose items 1 and 2 are fixed. There are (n-2)! such arrangements and these arrangements include arrangements in which items other than 1 and 2 are also fixed in their positions. There are, of course nC2 different pairs of items and for each such pair there will be (n-2)! arrangements in which the items in the pair are fixed in their positions. So we are looking at
arrangements in which 2 or more items are fixed in their positions. As mentioned, due to double counting, these need to be added back. After this adjustment we will be considering
arrangements. Of course, now we have overcompensated for the arrangements in which 3 items are fixed in their positions. We will need to subtract these.
Now supposed items 1, 2, and 3 are fixed in their positions. There are (n-3)! such arrangements and these will include arrangements in which items other than 1, 2, and 3 are also fixed in tier positions. There are nC3 different triplets of items and for each such triplet there will be (n-3)! arrangements in which the items of that triplet are fixed in their positions. So we are looking at
These arrangements will need to be subtracted from the running total that we have, yielding
Continuing in this manner, we can see that the final number of derangements after all the appropriate inclusion-exclusion will be
This can be written in a more compact form as
Using this formula we obtain the table below for values of n up to 10.
Derangements using a Recursive Formula
The second method for counting derangements is by using a recursive formula. We will first derive the recursive formula and then modify to demonstrate that it is equivalent to the one we just derived. Once again, we start with a set of n items. Now consider item 1. In order for us to end with a derangement, this item must be placed in one of positions 2 to n. Hence, there are n-1 ways of placing item 1. Since the number of derangements is a whole number, we can conclude that
where X is another whole number that we have to determine. Till now, we have placed 1 of the n items in a position other than its own.
Now let us focus on the kth item since it has been dislodged by item 1. We can either place it in position 1 or in one of the remaining n-2 positions. It we place the kth item in position 1, then we still have to derange the remaining n-2 items, which can be done in Dn-2 ways. If we do not place the kth item in position 1, then there are still n-1 items, including the kth item, that have to be deranged, which can be done in Dn-1 ways. Since the kth item must go either in position 1 or in one of the other n-2 positions, the two cases are mutually exclusive and exhaustive. Hence, we can conclude that
The expression in the square brackets should remind us of the binomial identity
You will recall from two posts back that we had considered the matter of selecting r out of n+1 items and we focused on one item, concluding that the selection either does not contain this item, represented by the first term on the left, or does contain this item, represented by the second term on the left.
While deriving the recursive equation
we used the same idea of focusing on one item and deciding whether it still needs to be deranged, represented by the first term in the square brackets, or has already been deranged, represented by the second term.
This can be expanded and rearranged as follows:
Supposed we designate
Then we can see that the earlier equation can be written as
Using this recursively we get
Substituting in terms of the derangement values we get
This can be rearranged as
We can now substitute values for n to obtain the corresponding expressions for Dn. This will give us
Considering the expression for D6, we can see that it can be rewritten as
And since 0!=1!=1, we can further write it as
Using sigma notation, this can be written as
This can be generalized to
which is exactly the same expression we obtained earlier using the inclusion-exclusion principle.
Derangements and the Inclusion-Exclusion Principle
The formula for derangements can be further written as
where the nested parentheses show how the inclusion-exclusion principle works, much like the matryoshka dolls in the image at the start of this post. What this formula tells us it that we can calculate the number of derangements by considering the total number of permutations and subtracting the cases where at least 1 item is fixed, subtracting the cases where at least 2 items are fixed, subtracting the cases where at least 3 items are fixed, and so on. Due to the fact that (-1)n oscillates between 1 and -1, the nested parentheses accomplish the same final result as do the alternating signs in the original formula we obtained, namely
Why Derangements aren’t Deranged
So far the discussion has been reasonably theoretical. And you might wonder if there is any reason for understanding derangements beyond the curiosity of mathematicians. I’ve tried to determine how the idea of derangements came about but have not been able to land on a definitive answer. It is quite likely, as in many cases in mathematics, that some mathematician just toyed with the idea and that the idea later found applications. So what are some of these applications?
In the workplace, if you wish to ensure everyone in a team has the opportunity to be responsible for every process, the idea of derangements is essential. In fact, if the team leader intentionally uses the idea of derangements, he/she could reallocate in a more efficient manner than just reassigning and checking if he/she has obtained a full derangement.
In coding, the idea of derangements is used to check for coding errors. In stochastic processes, derangements are used to determine the entropy of a system. In chemistry, the idea of derangements are used in the study of isomers, the number of which increases rapidly especially for compounds with long carbon chains. In graph theory, the idea of derangements can be used to determine unique walks or paths in a graph. In optimization, derangements are used in situations similar to the traveling salesman problem or the assignment problem.
In other words, while it is likely that the idea of derangements began with the toying of a mathematician, it has a wide range of applications today. But what surprises me in something that I am going to deal with in the next post. It is a result that is shocking to say the least and downright mysterious. Whatever could it be? And with that cliff hanger, I bid you au revoir for a week.
In the previous post, we had looked at the basic principles of making combinatorial arguments. We saw that these arguments, while deficient in that they do not yield any numerical values, give us insights about how we can count ways of doing a task. In the conclusion of that post, I introduced the idea of the possibility of double counting and suggested that combinatorial arguments allow us to avoid doing that.
Double counting is a particularly pressing problem, especially in democracies. When we wish to determine the ‘will of the people’, we dare not forget that each citizen falls within a number of demographic categories. For example, I am a Christian male, who is currently self-employed and who has graduate degrees. If I give my opinion on some matter and am counted separately under each of these categories, then my opinion will be given undue weight and would not be fair to others.
The question then arises, “How do we ensure that we count every item without counting any item more than once?” Anyone who is involved in any kind of statistical study needs to pay attention to this so that each data point is given the same weightage. So let me introduce you to the inclusion-exclusion principle.
Basics with Two Categories
We begin with the most basic set-up with all items classified into two categories (say A and B). Since items can have multiple classifications, we can represent this using the Venn diagram below:
The Venn diagram has three regions labelled 1, 2, and 3. Set A includes regions 1 and 2, while set B includes regions 2 and 3. If we just add the number of items in sets A and B, we can see that the central region (2) will be counted twice. In other words, the items that belong in both sets will be double counted. To avoid this, we need to subtract the items that belong in the intersection of the two sets.
This can be written as
The last term on the right side of both equations recognizes that some items have been ‘included’ more than they need to be and explicitly ‘excludes’ them from the count. This adjustment by excluding what was included too many times is what gives the principle its name.
Counting with Three Categories
Of course, we know that demographics includes many more than two categories. So how would the inclusion-exclusion change if we had three categories? The Venn diagram below indicates the situation.
As shown in the diagram, there are now 7 distinct regions. Set A includes regions 1, 2, 4, and 5. Set B includes regions 2, 3, 4, and 6. And set C includes regions 4, 5, 6, and 7. If we add sets A, B, and C, we will count regions 2, 5, and 6 twice and region 4 thrice. We can proceed as before and exclude the intersection regions of two sets. But since region 4 belongs to A∩B, B∩C, and C∩A, we will have excluded region 4 thrice. This means that regions 1, 2, 3, 5, 6, and 7 are now counted once, but region 4 is not counted. Of course, it is a simple matter to add it back to include it. This is represented as
The terms in red we have encountered before in the case where we had only two categories, A and B, and had to compensate for overcounting the regions in the intersection of the two sets. The term in green is a new term, introduced because, by compensating for the overcounting of the regions in the intersections of pairs of sets, we ended up undercounting the intersection of all three sets.
Extension to Four Categories
What would happen if there were more than 3 categories? Unfortunately, a 2 dimensional representation will no longer suffice. So let us label the regions as follows: 1: only A; 2: only B; 3: only C; 4: only D; 5: only A and B; 6: only A and C; 7: only A and D; 8: only B and C; 9: only B and D; 10: only C and D; 11: only A, B, and C; 12: only A, B, and D; 13: only A, C, and D; 14: only B, C, and D; and 15: A, B, C, and D.
Now set A includes regions 1, 5, 6, 7, 11, 12, 13, and 15. Similarly set B includes regions 2, 5, 8, 9, 11, 12, 14, and 15. Set C includes regions 3, 6, 8, 10, 11, 13, 14, and 15, And D includes regions 4, 7, 9, 10, 12, 13, 14, and 15. Now, when we just add A, B, C, and D, the regions 5 to 10 will be counted twice, regions 11, 12, 13, and 14 will be counted thrice, and region 15 four times. The over counting for regions 5 to 10 would need an ‘exclusion’ by subtracting the intersection of pairs of sets. However, this would mean that the regions that involve the intersection of exactly three sets (i.e. 11, 12, 13, and 14) would have been ‘excluded’ three times over, meaning that they would not be ‘included’ anymore. It would also mean that the region 15 has been counted negative 2 times! This can be adjusted by now ‘including’ these four regions. However, this means that region 15 is now ‘included’ 2 times. Hence, we make a final adjustment and subtract the intersection of all four sets. This is represented as
Here the red terms represent the adjustments for regions 5 to 10, the green ones for regions 11 to 14 and the blue on for region 15.
Generalization
If we continue in this manner, adding more categories, we will be able to demonstrate that
This can be written in a more compact form as
I recommend you take a close look at how the last two equations are actually the same as we will need such ‘parsing’ of formulas in future posts!
What we have seen here is that we can recursively ‘exclude’ intersections with greater number of sets and in so doing avoid the twin traps of overcounting and undercounting. In the next post we will look at a way of counting an interesting arrangements of objects known as a ‘derangement’. Hope you stay sane till then!
I have been teaching high school students for a while now. While I have taught physics and theory of knowledge, my main focus has been mathematics. One of the more fascinating areas of the high school mathematics syllabus is probability and statistics. Now don’t get me wrong. I know that, in an earlier post, I bemoaned the early inclusion of statistics, especially when it is done only to justify the use of a calculator. I have not changed my position on that. I still believe that many of our syllabuses, especially those that come from the Western hemisphere, are artificially bloated with topics that require the use of calculators, a move that reduces the priority of conceptual understanding in favor of brute force methods.
Indeed, it is precisely a method used in the area of probability and statistics that demonstrates how there is elegance in one approach to problem solving that is completely absent in another. The method I am referring to is commonly known as Combinatorial Proofs or Combinatorial Arguments.
The Meaning of ‘Combinatorics’
Now, many of you may be wondering what in the world ‘combinatorial’ means. Combinatorics is the branch of mathematics that deals with grouping of elements of finite sets of objects. Specifically, combinatorics is concerned with the number of such groups that can be formed rather than the groups per se. So, for example, if we have a set
Then the subsets with exactly 2 elements are
While I have considered the elements of the set to be the first three natural numbers, there is no reason we cannot extend this idea to any set with 3 distinct elements. Hence, in general we can say that, given a set containing 3 distinct elements, there are 3 ways to form a subset with exactly 2 elements. Another way of stating this is that the number of ways of selecting 2 out of 3 distinct objects is 3.
Backtracking a Bit
We have seen this idea before. In a post on fractions, I introduced the idea that the number of ways of selecting r out of n distinct items is
We revisited this idea in a post on the number e, in which we obtained the lower and upper bounds for e.
Using the above formula, we can prove some crucial results. For example, let us prove that
We can proceed as follows:
Or let us prove that
We can proceed as follows
Here, the proof depends, in large part, on our ability to recognize that the expression in red is common in both terms, allowing us to consider it a factor of the entire expression. However, this is not necessarily obvious. And unless one has developed some skill with manipulating binomial coefficients, it is unlikely one would be able to identify that such an expression actually is common to both terms.
The two ‘equations’ we have proved are known as ‘’binomial’ identities because they involve the coefficients that arise from the binomial expansion. Some of them can be quite difficult to prove using the brute force method of substituting the formula for nCr and then manipulating the various expressions. However, the method of using combinatorial arguments allows us to make logical arguments to prove the result. This actually gives us the insight about why the identity exists, which the brute force method cannot do – or at least cannot easily do.
Selection Equals Unselection
Suppose we have to prove that
using combinatorial arguments.
We can argue as follows. Suppose we have n items and we have to select r of them for some purpose. This obviously can be done in nCr ways. However, by selecting the r items for the purpose, we automatically create a selection of the remaining n-r items not for that purpose – that is for some other purpose. Selecting n-r items for some other purpose can obviously be done in nCn-r ways. Since the selection of r items automatically creates a selection of the remaining n-r items, the number of ways must be equal. Hence,
What this means is that selecting r out of n things or n-r items out of n things can be done in the same number of ways. Alternatively, we could say that the number of r member committees that can be formed from n candidates is the same as the number of n-r member committees that can be formed from the same n candidates.
Included or Excluded
Suppose we have to prove that
using combinatorial arguments.
We can argue as follows. Suppose we have n+1 items and we have to select r of them. This can obviously be done in n+1Cr ways. Now let us focus on any one of the n+1 items and let us call it item X. Now there are two cases. Either X is in our final selection or X is not. If X is not, then we still have to select r items from the remaining n items, which can obviously be done in nCr ways. If X is in the final selection, then we have to select r-1 items from the remaining n items, which can obviously be done in nCr-1 ways.
Since in both cases we end up selecting r items out of n+1 items it follows that
In other words, the number of ways of selecting r items out of n+1 items equals the sum of the number of ways of selecting r items out of n items, excluding one particular item, and the number of ways of selecting r-1 items out of n items, including the same item.
Committees and Sub-committees
Suppose we have to prove that
using combinatorial arguments.
We could argue as follows. Suppose we have n items from which we have to select r items. This can be done in nCr ways. Now suppose we select k items from the r that we originally selected. This can be done in rCk ways. The two selections can obviously be done in nCr×rCk ways.
However, we can make these selections in a different way. Let us first select the k items from the original n items. This can be done in nCk ways. Now we have to select r-k items from the remaining n-k items, which can be done in n-kCr-k ways. The two selections can obviously be done in nCk×n-kCr-k ways.
Since in both cases we end up with r-k items for one purpose and k items for a second purpose, the number of ways must be the same. Hence,
In other words, the number of ways of forming a committee, with r members, with a sub-committee, with k members, is equal to the number of ways of first forming the sub-committee, of k members, and then selecting the r-k members of the committee who are not a part of the sub-committee from the remaining unselected n-k members.
Wrapping Up
Now it is true that the combinatorial arguments do not give us any idea about how many ways the selection can be done. In that sense, they are deficient. However, the combinatorial arguments give us a clear understanding of how we can go about counting some specified subsets of a given set. The insights gained by the combinatorial arguments allow us to understand how best to enumerate sets so as to avoid double-counting and omission. This is not possible with a formula because much is lost when we express things in a compact formula. The frugality and austerity of mathematics itself causes things to become opaque. By using logical proofs, such as those involved in the combinatorial arguments, we can understand how the results we memorize come about. Unfortunately, because we all too often want a numerical answer to a question, we do not spend much time teaching and learning such elegant forms of argumentation. But by doing so we miss out on yet another factor that gives rise to beauty in mathematics.
In the graphs below, I have used the data from the relevant sources and may or may not have performed simple mathematical operations (addition, subtraction, multiplication, or division) on the data. The purpose of this is to obscure the factor being discussed, before revealing it, while ensuring that the essential shape of the distribution remains the same. In most cases, as expected, the data is not strictly normally distributed. However, for the sake of making the discussion less burdensome, I have assumed a normal distribution and have done a best fit for the data.
Statistics and the Potential Abuse of Mathematics
We have perhaps all heard the saying, “Lies, damned lies, and statistics.” This is presumably from an 1894 paper read by a doctor called M. Price, who argued that there were “the proverbial kinds of falsehoods, ‘lies, damned lies, and statistics.’” According to Book Browse, “This expression is generally used in order to cast doubt on statistics produced by somebody a person does not agree with, or to describe how statistics can be manipulated to support almost any position.”
It is true that statistics can be misused. I remember when I was working at a place that coached students for the IIT-JEE. This institute had classes at about a dozen locations in the city. At each location we started out with about 40 students in each class. This meant that we began with about 500 students. However, toward the end of the 2 year program, most classes were down to between 10 and 30 students. At one location we had 25 students. When the results of the JEE were announced, it turned out that 15 of these 25 students had succeeded. From the remaining locations about another 15 had succeeded.
What should we have reported? That our success rate was 40 out of 500 students, including those who had dropped out because we had ‘failed’ them, for a success of 8%, which was then more than 3 times the national average? Or that at this particular location our success was 60% – about 25 times the national average? What would have made for better advertising? Obviously the second strategy! And while it told the truth, namely that at that location 60% of the students had succeeded, it did not tell the whole truth, namely that, of the students excluded from the sample, only 15 out of 475 had succeeded? Interestingly, even 15 out of 475, or 3.15%, is slightly higher than the national average. In other words, by any metric, the institute had done better than the national average. However, human greed is such that I do not have to tell you which statistic was finally used in the advertisement campaigns for the next year!
Mathematics itself is unbiased and unmoved by our preferences and prejudices. However, it can be used, misused, and abused to suit all sorts of positions. Unless we have a strong ethical foundation, then, we will abuse the realm of mathematics. And as I have explained in another post, mathematics carries great weight in our societies. Hence, if we are able to support some position using mathematics, even if the mathematics is abused, it will most likely carry a lot of weight and manage to convince many people. The only way to counter this is to delve into the mathematics to give the full picture of what is being discussed, hopefully to highlight the ways in which the mathematics has been abused.
Recently, I came across an example of just such abuse. I will, however, assume that the people who propagated this abuse actually did not understand the mathematics involved. Otherwise, I would have to question their intentions. I think the lesser accusation is that they have failed to understand how mathematics works rather than that they intentionally have misled people down a path that is, at least from the perspective of mathematics, a blatant lie. But before we get to that, let me set the stage with a few other contexts in which similar data can be used.
Compatibility of Data
Consider, for example, the figure below:
Fig. 1. Variable A on the x-axis for two groups in green and orange.
Fig. 1 shows the distribution of some measure for two groups. One group is in green, while the other is in orange. As declared in the opening disclaimer, I have represented the data as being normally distributed. Apart from this, the population size for both groups is identical. What we can see is that the green group has a lower mean, accounting for its peak being to the left of the orange peak, and a lower standard deviation, accounting for its peak being higher than the orange peak.
Can the data be combined? Of course, from the perspective of mathematics, we just have sets of numbers! So we can combine the two datasets to get:
Fig. 2. Variable A on the x-axis for two groups in green and orange. The combined distribution is in blue.
In Fig. 2, the blue graph indicates what we would get if we combined the two datasets. Since the size of both original groups was the same and because of another factor we will shortly discuss, the blue graph seems to also be normally distributed. It actually isn’t, as we will see as we continue. However, even if we assume that the blue graph is normally distributed, we can see that it has a mean and standard deviation between those of the two original datasets. As mentioned earlier, from just a number crunching perspective there is no problem doing this since we are only dealing with some measure that is valid for every element in both datasets. However, we should ask if this makes sense in the real world since we are using mathematics to represent the real world.
So we have to first ask what was being measured. Suppose I told you that for both groups what was measured was the diameter of the component. What would you conclude? You may conclude that combining the two datasets is perfectly fine since we were measuring the same quantity for both groups.
However, what if I told you that the green group represents the outer diameter of a group of bolts and that the orange group represents the inner diameter of a group of nuts? Right away you would see that combining the two groups is a meaningless activity because bolts are bolts and nuts are nuts! In fact, by combining the two sets we lose the ability to determine what percentage of nuts and bolts in the two groups actually fit each other within some tolerance band. The blue line is as meaningless as any graph drawn by a random monkey with a pen!
What we can conclude from this is that it is crucially important when combining two datasets to know whether or not such a combination actually makes sense. In the context of the nuts and bolts, a bolt with a diameter greater than the mean diameter of the nuts does not function as a nut! It is still a bolt, but may have fewer compatible nuts among the orange group.
Loss of Specificity
Or consider the graphs below:
Fig. 3. Variable B on the x-axis for two groups in green and orange.
Once again, we have the distribution of some measure for two groups – green and orange. Here the size of the green group is larger than the size of the orange group. What we can glean from the graphs is that each distribution has a mode, which happens to be the mean since, as mentioned in the opening disclaimer, I have adjusted the data so that the distributions are normal. We can also see that the mode of the green group is higher than that of the orange group, while the standard deviation of the green group is smaller than that of the orange group. Once again, we can combine the two datasets to get:
Fig. 4. Variable B on the x-axis for two groups in green and orange. The combined distribution is in blue.
Since the green group was larger than the orange group, the resultant blue distribution is visibly no longer normal. It is skewed toward the green graph because of the large size of the green group. But also now the mean value is lower. This is because the means of the original green and orange groups were quite distinct. If we compare Fig. 3 with Fig. 1 we will see that the peak of the orange graph in Fig. 1 is inside the peak of the green graph, whereas in Fig. 3 both peaks are at quite distinct values.
In other words, Fig. 3 says that whatever is being measured has significantly different values for the green group than for the orange group. Hence, the two distributions do not reach their peaks or taper off near each other as they do in Fig. 1.
Due to the facts that the means of the two distributions are markedly different and that the green group is larger than the orange group, the effect is akin to pulling the right tail of the orange graph, resulting in the blue graph, which now has a non-normal distribution. However, while the original graphs had two clear modal values, the blue graph now has an indistinct single modal value that is much closer to the modal value of the green graph than to the modal value of the orange graph.
But are we allowed to combine the two datasets? In this case, what we are looking at are salaries in Austria, with the green graph representing men and the orange graph representing women. Combining the two graphs is certainly permissible since it would tell us the salary distribution without sex being a factor. Such information is certainly meaningful and could, in some contexts, be relevant.
However, once the data is combined we must observe three things. First, the combined dataset is not bimodal, but has a single mode. This is not necessarily the case as we will shortly see. However, the point made here is that it is fallacious to assume that two sets of data, each with a mode, can be combined in a meaningful way and still remain bimodal. Second, the blue graph does not tell us about the sex wage discrepancy that the green and orange graphs communicate. This is only to be expected. Third, once we combine the two graphs that were based on sex, we have a single dataset that does not have any sex identifiers and hence can no longer be used to make sex based conclusions. Once we combine datasets that were separated on the basis of some factor, that factor can no longer be distinguished from within the combined dataset.
What this means, in this particular context, is that, if we wish to reduce the sex salary gap, we must not combine the two datasets, but must allow them to stand alongside each other as in Fig. 3.
Loss of Information
Suppose, though we had the following distributions:
Fig. 5. Variable C on the x-axis for two groups in green and orange.
Once again, we have two groups, represented by green and orange and the data in both sets are normally distributed. In actuality the data in these two sets is very close to normal. Hence, I have not had to ‘massage’ the data much to make them normal. Here the size of the green set is slightly larger than that of the orange set. We can see that the orange set has a mean that is lower than that of the green set. It also has a smaller standard deviation than the data in the green set. What would happen if we combined the two sets? We would get the following:
Fig. 6. Variable C on the x-axis for two groups in green and orange. The combined distribution is in blue.
If we pay close attention to the blue graph we will notice the following. First, the data has remained bimodal. As mentioned earlier, this is a possibility but not a guarantee. Second, the modes are not as pronounced as before. This is because the population size has increased, thereby reducing the associated probability for any particular value of the measure. In other words, assuming it is meaningful to combine the two datasets, then if we do, we can no longer refer to the original green and orange lines because now only the blue graph exists since we have ignored whatever it was that separated the green and orange graphs.
Hence, now, even though the blue graph is bimodal, we have actually lost the ability to determine what factor contributed to the two modes. Hence, by combining the two graphs we have set aside any discussion based on whatever it was that separated the green and orange graphs.
In this case, the measure is the weight of persons in a study of automobile accidents involving pedestrians. Here combining the data would yield the information for humans without any consideration of sex. However, given the anatomical and physiological differences between men and women, combining the data would actually make it less useful. Remember, this data is obtained from a study about automobile accidents involving pedestrians. The blue graph only tells us that there are two modes, but does not tell us what the modes represent since the data concerning sex was ignored. Indeed, between the modes of the blue graph, which constituted the majority of persons being studied, there is no way of knowing where the greater representation is that of women and vice versa. In particular, there is no way of knowing that, for weights lower than indicated by the point of intersection of all three graphs, more than 80% of the accidents involve women. This means that any company that relies on the blue graph is in no position to design an automobile that protects women as well as men, but can only guess about who will be affected.
Loss of Distinctions
Another set of graphs I wish to deal with before proceeding to the reason for which I wrote this post is below:
Fig. 7. Variable D on the x-axis for two groups in green and orange.
Here we have some measure that has a very similar profile for the green and the orange graphs. The modal height is almost similar, leading to the conclusion that the variation of the data, or standard deviations, are almost identical. The only major difference here is the value of the mean, with the green data having a larger mean than the orange data. Also, the size of the green dataset is only slightly larger than that of the orange set. If we combine the two sets we get:
Fig. 7. Variable D on the x-axis for two groups in green and orange. The combined distribution is in blue.
As with the change between Fig. 3 and Fig. 4, we see a stretching of the line, yielding a lower modal height. Since the size of the green and orange sets are roughly equal, the resultant is almost symmetric, like the original two datasets. However, because the modal values are so different, the blue graph is actually not a normal curve. This is different from what we saw between Fig. 1 and Fig. 2, where the proximity of the two modal values and the equal sizes of the two datasets yielded a resultant blue graph that was very close to being normally distributed.
However, as we can see from Fig. 8, the resultant actually is not normally distributed. Anyone with some familiarity of normal distributions will know that the resultant will not be normally distributed. Note that here we are not adding two normally distributed variables. If that was what we were doing, it would yield a normally distributed variable that was the sum of the two independent variables. Rather, what we are doing here is combining the datasets and then determining what the distribution of the combination will be.
Here the two original datasets represent the heights of people from 20 countries. Once again, the green graph represents men and the orange graph represents women. What does the blue graph represent? Obviously, it represents height distribution without consideration of sex. While that may be worthwhile in some contexts, what this does is get rid of something that is crucial to our understanding of humans, namely that we are sexual beings and, as a sexually reproducing species, there is something like sexual dimorphism that actually does serve to distinguish between the sexes. In other words, while it may be true that a greater percentage of men than women have a height greater than 200 cm, it does not follow, on the basis of height, that a woman who is actually 200 cm tall is more a man than a woman! The original green and orange distributions enable us to recognize this. But the blue distribution does not allow us to say anything. In fact, the Bayesian question, “Given that a particular human has a height of 200 cm, what is the probability that this human is a woman?” cannot be answered by using only the blue graph.
Necessary Biology Excursus
In the preceding section, I have mentioned sexual dimorphism. As sexually reproducing animals, sexual dimorphism is something that is expected for humans. While some measurements, like intelligence, cannot be reliably used to distinguish between women and men, archeologists regularly use the size and shape of skeletal bones to determine if they were studying the remains of a woman or a man. The conditions in which the skeletal remains were less useful were when the population being studied itself was relatively unknown. The one skeletal factor that provides almost certain identification of the sex of the person is the shape of the pelvis. There are, of course, other factors that play a role in sexual dimorphism, including muscle mass, body fat, lean body mass, and fat distribution. Of course, from the perspective of reproduction itself, the gametes produced by women and men are considerably different.
What can be said, then, of these two different kinds of traits, one for which the difference between women and men is negligible or inconsequential, and the other for which the difference is substantial? Let us consider each of these in turn.
Suppose we consider a trait like intelligence, for which there is no significant difference between women and men. We would get the same profile for the whole human population as we did for each sex separately since there is no significant difference. In this case, the sex of the person does not matter since the data leads us to understand that men can be as intelligent as women.
But suppose we consider traits for which there are significant differences. It has been found that, in every factor that contributes to strength of an athlete, like lean body mass, muscle length, and muscle thickness, women are considerably weaker than men. While this difference is likely partly due to the role of testosterone, recent studies indicate that another factor is the sex chromosome that women and men possess. The XX chromosome produces cells that constitute women’s bodies while the XY chromosome produces cells that constitute men’s bodies.
In other words, while there is a distribution among members of the same sex, it would be ludicrous to claim that someone with XY chromosomes in the cells of his body and who happens to be short and less muscular is less of a man or actually a woman!
Spurious Mathematics
Fig. 9. Contrived figure created without data and with spurious ‘variables’ to support claims about ‘gender spectrum’. (Source: Cade Hildreth)
Despite this some people claim that sex exists on a spectrum. One resource, for instance, declares, “A person’s sex can be female, male, or intersex—which can present as an infinite number of biological combinations.” (sic) As an aside, as discussed elsewhere, infinity is not a number. So saying ‘an infinite number’ is misleading at best. This probably indicates that the author of the article has a tenuous grasp of mathematics at best. Anyway, it pays to observe that, while I presented diameter (Fig. 1 & 2), income (Fig. 3 & 4), weight (Fig. 5 & 6), and height (Fig. 7 & 8) on the x-axis, the figure above refers to ‘gender spectrum’, which itself is the issue being discussed. Since there is no quantifiable way of specifying what the variable that determines one’s ‘gender spectrum’ value is, this is nothing but a spurious variable and an example of circular reasoning.
Moreover, even if we assume that we can quantify this ‘gender spectrum’ variable, as we have seen, once we combine datasets, we lose the ability to identify anything on the graph. In fact the process of combining the datasets may not still yield two modal values, as we saw with Fig. 2, 4, and 8. Hence, asserting that there are still two modal values is to assume the result. In fact, to label one peak ‘Women’ and the other ‘Men’ after combining the datasets and getting rid of the differences is disingenuous. Indeed, without any actual data concerning what belongs on the x- and y-axes, the figure is just something that is concocted to give the impression that there is a mathematical basis for the claim being made.
Yet, let us assume that, were we given some data, it still would give us two modal values. This does not mean that, because people find themselves in the region labeled ‘Other Genders Exist’, it actually means that other genders exist anymore than the existence of a short man means he belongs to some other gender. Rather, such a combination could only exist if there is sufficient distinction between the graphs for women and men, as we saw in Fig. 6 but not in Figs. 2, 4, and 8. Such distinct graphs should actually lead to the conclusion that the two sexes are markedly different and that the datasets should not be combined rather than that there is a region in the middle that indicates an infinite variability of sexes or gender. In fact, as we saw with Fig. 2, if we have two incommensurable datasets, in that case nuts and bolts, the existence of a large diameter bolt does not make it less of a bolt and something in between a nut and a bolt! And the fact that there are thousands of nuts and bolts that lie in the intermediate region does not mean that there are infinitely many ‘species’ between nuts and bolts!
In fact, the argument about infinite sexes and genders is shown to be specious when we consider the example of the nuts and bolts. Unless we are able to demonstrate that two datasets can be legitimately combined, as in the case with Fig. 3 & 4, we do so without any mathematical basis.
Consider, for example, what would happen if Fig. 3 & 4 did not represent the distribution of salaries but the distribution of the amount of testosterone in an athlete’s blood sample, which works since there are in general more men athletes than women athletes. The blue line in Fig. 4 would then represent absolutely nothing because the original data was obtained using the sex of the person in mind. A woman athlete who had a high testosterone level would not qualify as less than a woman on these grounds. In such a situation, any combination of the datasets would reduce our ability to determine, for example, if a woman athlete had actually doped herself with testosterone. After all, the data line appropriate for women is the orange one. However, once we combine the datasets, we only have the blue graph to refer to. But, as we can see, the loss of the left peak renders even women athletes who have doped themselves impossible to identify since they may still fall to the left of the blue peak.
Conclusion
This does not mean I believe there are no people who experience discomfort with their bodies. However, we need to be careful what we mean by this. While I may experience some discomfort with my body, I may conclude that I have discomfort with being in a man’s body since this is the only body I have experienced. However, to draw the conclusion that this must mean I am a woman trapped in a man’s body is illogical because, no matter how many resources I read, no matter how many women I speak to, I will never actually know what it means to be a woman, let alone in a woman’s body.
For instance, I may surround myself with indigenous South Africans day in and day out. But that would not make me truly understand what it was for them to go through apartheid. I may immerse myself in Chinese culture, but it would not enable me to understand the Century of Humiliation. Unless we experience something in our bodies we actually cannot truly appreciate or understand what that experience entails. Anything that is experienced in our bodies, such as our sexuality or gender, requires just such an embodied experience before we can claim it is something we are experiencing.
However, returning to the mathematical side of things, what we can say is that we need some basis outside mathematics that would allow for treating the datasets obtained for women and men as commensurable. Without such a basis external to mathematics, mathematics can be abused, as I have demonstrated. What could such an external basis be? We need to be able to identify some variable that can be measured in all humans without first considering them separately as women and men. Then the single dataset should demonstrate a bimodal behavior. But this only provides mathematical support for a claim. It is certainly necessary. But mathematical support cannot be considered sufficient.
Rather, we need to be able to provide a biological explanation for the phenomenon. And if we are claiming that sex or gender is non-binary, there needs to be a biological basis for such an explanation. I doubt we can find such a basis because we exist, as a species, to propagate the species. Reproduction is the key purpose of any species and, for our species, this happens through sexual reproduction. This means that there are distinct gametes that facilitate the reproduction of the species. That some members of the species do not have any gametes or have both kinds of gametes does not mean that there are more than two gametes.
Indeed, such an argument would be like saying that, because some people are born with no limbs and others with extra limbs, there are infinitely many ways of being limbed and that a person who has no limbs represents another way of being limbed. This is a dangerous line of thought that normalizes what is clearly a physical disability. People with physical disabilities have only recently, and in not too many countries, earned hard won liberties and access to learning and physical spaces. Saying that being born with no limbs is another way of being limbed rather than recognizing that such a person deserves genuine support from society so that they can benefit and contribute as much as anyone else would only betray a lack of compassion on our part.
Hence, I would conclude by claiming that, until our species evolves to require at least a third gamete, the idea that sex and gender are not a binary is wishful thinking at best and unmathematical and unscientific propaganda at worst.
Anyone who has studied mathematics till at least the middle school will know that there are certain mathematical ‘artifacts’ called ‘proofs’. Whether we understand them or not, proofs form one of the foundational structures of mathematics, allowing us to take one idea and extend it logically in a variety of directions to obtain new, and often surprising, results. Since proofs are often just thrown at us without much of an explanation of how the argument is mathematically rigorous, I wish to devote a few posts (not consecutive) to dealing with specific approaches to proofs.
If we had a good mathematics teacher in middle and high school, she/he would have at least told us about a few different approaches to proofs. Very likely, this would have come in the context of geometry, though it is possible that some teachers experimented with introducing their students to proofs in arithmetic, such as that there are an infinite number of primes or that every number has a unique prime factorization. While these may not have been done in a very formal manner, given that the students might have lacked the symbolic language for executing a formal proof, I applaud such teachers.
In this post, I wish to address a common approach to mathematical proof known as ‘proof by contradiction’ or more technically reductio ad absurdum, Latin for ‘reducing to the absurd’. I don’t know about you, but to me there’s something more visceral in the statement ‘reducing to the absurd’ than ‘proof by contradiction’. I think it’s time we recovered some of these older, more visceral statements and junked the more cerebral statements. I mean, mathematics is cerebral enough on its own! We don’t need more phrases that are cerebral. We need something that gets us in the guts! So let’s see what absurdities we can avoid.
A Note on Proof
Before we move to that, I wish to say a word about the word ‘proof’. It has a very distinct meaning in mathematical contexts. Unfortunately, the use of the word in everyday speech does two things that render our civil discourse difficult. First, most of us know that ‘proof’ is something that belongs to the rigorous realm of mathematics. Hence, when anyone uses the word ‘proof’, we assume they are speaking of the same kind of thing as mathematicians speak of when they use the word ‘proof’. In other words, we are not discriminating enough to recognize that words are equivocal and have differing meanings in different contexts. Second, we assume that rigorous ‘proof’ is possible in fields outside of mathematics. Since the word ‘proof’ is used, in my view illegitimately, in other fields, and since we have not allowed for the equivocality of words, we conclude that rigorous ‘proof’ is possible in other areas.
Hence, I have often heard claims such as that the theory of evolution has been proved or that the theory of relativity has been proved. Similarly, the legal idea of ‘proof beyond reasonable doubt’ is also not an instance of ‘proof’ per se. All these are examples of abduction, which I dealt with in an earlier post. They are inferences to the best explanation based on the available data. Abduction is a powerful tool that should not be discounted. However, since it is based on limited data, it cannot function as mathematical proof. Rather, mathematicians use the term ‘abduction’ or more commonly ‘inference’ to denote this kind of reasoning.
The conflation of the mathematical term ‘proof’ with other methods of reasoning not only undermines what the explorations in other fields actually entail, but also it assumes that these theories rest on as firm foundations as mathematical theorems. This results in a failure to understand what is being claimed in other fields when a theory is proposed, thereby actually proving to be a hindrance to inquiry in the other fields. The most notable difference is that mathematical theorems are not subject to change based on any further evidence. This is patently untrue about the theories in the sciences and other non-mathematical fields, which, being data driven, are subject to revision when new data becomes available.
Mathematical proof, however, leads to a statement – possibly a theorem – that is true for all instances of the item being studied and is not subject to change. If it could change, it is not something that has been proved. For example, the theorem that, in Euclidean geometry, the sum of the internal angles of a triangle add up to 180° is not something that is tentatively held. The claim of this statement is that there is no triangle in Euclidean geometry that does not satisfy this property.
With that out of the way, we can turn our attention to the method of reductio ad absurdum. The process of this method of proof is ingenious and I would like to thank the first person who thought of it. It involves making an assumption and following that assumption logically until we reach a point where the assumption is disproved. This means that, by assuming something is true, we are able to prove that its negation is also true. Since this is absurd, the conclusion is that the assumption must be false. Hence, the bottom line for this method of proof is the understanding that a claim and its opposite cannot both be true at the same time. When we assume that statement A is true and conclude that this must mean that its negation, ¬A, is also true, we have reached something that is ‘absurd’, hence the name.
I will consider two examples of reductio ad absurdum, which will enable us to see the brilliance and elegance of this line of reasoning. After that, we will take a step back to identify some important aspects of this method of proof. Following that, I will look at an interesting third example of reductio ad absurdum before drawing this post to a close.
Rationally Irrational
As mentioned in an earlier post, if we have an isosceles right angled triangle with legs of length 1 unit, Pythagoras’ theorem yields the length of the hypotenuse as √2 units, which I claimed is an irrational number. But how do we prove it?
We start by assuming the opposite to be true, namely that √2 is a rational number. Hence, we must be able to find two integers p and q such that p÷q = √2. Using the idea of equivalent fractions, which I dealt with in an earlier post, we make the additional assumption that p and q do not have any common prime divisors. Hence, p = 18 and q = 8 would not be allowed since 2 is a common divisor. Rather, for the same numerical value we can choose p = 9 and q = 4.
So we proceed to square the equation to get:
Now since p and q are integers, their squares must also be integers, from the closure property of integers over multiplication. Hence, the right side of the last equation must be even since q2, which is an integer, is multiplied by 2. This leads us to conclude that the left side must be even as well, since an odd number cannot be equal to an even number. Now, since the left side is a perfect square, it can be even only if p is even, since the square of an odd number is necessarily odd.
So we assume that p = 2k, where k is an integer. This will ensure that p2 will be even. This leads to:
Now, the left side of the equation is necessarily even, since k2 is multiplied by 2. This must mean that the right side of the equation is also even, which can happen only if q is even.
So what we have concluded is that bothp and q are even. This means that they both have 2 as a divisor, which is absurd since we assumed they had no common prime divisors.
Suppose, though, that we tried this method on a number that is not irrational. So suppose we tried this with √4, which we know is equal to 2 and, hence, rational. If we proceed as before, we get:
Once again, since the right is a multiple of 4, it must be a multiple of 2 and, hence, even. This means the left side is even as well Proceeding as we did earlier, we get:
Here we do not have an absurdity because all we can conclude is that k = q, which does not violate anything we initially assumed.
What we can conclude is that this process is foolproof. If the number we are dealing with is irrational, it will yield an absurdity. But if the number is rational, we will not reach an absurdity. In fact, we can use this method to test for any number of the form:
where both m and n are positive integers. The reader is encouraged to proceed with the proof. I will post one in the comments in a week or two.
Infinitely Primed
This brings us to the second instance of reductio ad absurdum. Here I present Euclid’s proof that there are infinitely many primes. Note that I did not say, “The number of primes is infinity” because, as mentioned in an earlier post, infinity is not a number!
As with the case of the square root of two, we begin by asserting the opposite. In this case, we assume that the number of primes is finite. Let us say that there are n primes designated as p1, p2, p3, …, pn.
Now consider the number
Now, all natural numbers greater than 1, fall into two categories. They are either prime or composite. N is obviously greater than 1 and, hence, must be either prime or composite. If it is prime, then we have found another prime apart from those among p1, p2, p3, …, pn, which is absurd since we assumed that we had listed all the primes when we considered the n primes in this list.
So, perhaps N is composite. However, consider the product
It is clear that all the primes in our original list (i.e., p1, p2, p3, …, pn) are divisors of this product. Since the smallest prime is 2, it follows that none of the n primes are divisors of N. However, if N is composite, it must have a divisor between 1 and itself. And this divisor itself must have one or more prime divisors that are not in our original list. Hence, even if N is composite, we have proved the existence of at least one prime not in our original list, which is absurd since we assumed we had listed all the primes. For example, suppose our full list of primes is 2, 3, 5, 7, 11 and 13. Then N = 30031. But 30031 = 59 × 509, both of which are primes not in the original list.
In both cases, that is, if N is prime or if N is composite, we have reached an absurdity, which means that our original assumption, namely that there are a finite number of primes is false.
Down to Brasstacks
Let us now step back to see what features of both proofs allow us to pull off the reductio ad absurdum. In the case with √2, we had two options. Either this number was rational or it was irrational. In the case of the primes, N was either prime or composite. We can see that, in both cases, the possibilities describe what mathematicians call mutually exclusive and exhaustive sets. What in the world does this mean?
Mutually exclusive means that there is no overlap between the sets. That is, we cannot find a single element that belongs to both sets. In the case of rational and irrational numbers, this is achieved by the definition of the rational numbers, leading to the conclusion that any number that does not satisfy the definition must be irrational. Hence, through the definitions themselves we ensure that no number can be both rational and irrational. In the case of the prime and composite numbers, once again, mutual exclusivity is achieved through the definition of a prime number. Here it pays to note that the number 1 is considered to be neither prime nor composite. And since N is greater than 1, we know then that 1 is excluded from consideration. Among all the remaining natural numbers, each number either satisfies the definition of being a prime number or it doesn’t, thereby making it a composite number. Hence, once again, through the definitions themselves we ensure that no number can be both prime and composite.
I also mentioned that the sets are exhaustive. This means that there is no number under consideration that does not belong to one of the sets that have been defined. Once again, this is achieved by the definitions themselves. In the case of the rational and irrational numbers, one set is defined as satisfying the definition, leading to the other set automatically including the numbers that do not satisfy the definition. In other words, there can be no real number that does not fall into either category. Similarly, in the case of the primes and composites, the definition of one provides the definition of the other through the negation of the first definition. This means that, barring the exception of 1, there is no natural number that is neither prime nor composite.
So what we achieve by the categorization into rational and irrational or prime and composite is that every number under consideration, real numbers in the first case and natural numbers greater than 1 in the second, belongs to one and only one of these categories. In other words, for any number under consideration, there is no ambiguity about the set to which it belongs and there can be no other heretofore undefined set to which it belongs.
Actual v/s Potential Infinities
The idea of defining mutually exclusive and exhaustive sets is so groundbreaking that it is used beyond reductio ad absurdum and I will deal with these uses in a later post. However, here I wish to address one more example of reductio ad absurdum. Mathematicians differentiate between what they call ‘actual infinities’ and ‘potential infinities’. An ‘actual infinity’ refers to a complete set that actually lists infinitely many elements. A ‘potential infinity’, however, refers to a way of building the set so that all the infinitely many elements are potentially listed.
If the set of natural numbers is an actual infinity, then consider the following mapping from the set of natural numbers to itself.
We can readily recognize that this maps each natural number to its square. Quite obviously, the top row can be incremented by 1 indefinitely, yielding infinitely many numbers in the top row. Since the set of natural numbers is closed over multiplication, this means that each element in the top row has a corresponding square in the bottom row.
However, it is clear that the bottom row does not have quite a few numbers that actually belong to the set of natural numbers. For example 2, 3, 5, 6, 7, 8, 10, 11, … are natural numbers that are not in the bottom row.
Now suppose the set of natural numbers represents an ‘actual infinity’. This would mean that it is possible to list all the natural numbers. Now consider the fact that every number in the top row has a corresponding number in the bottom row. Hence, the two sets represented by the two rows must have the same size. However, consider the fact that the numbers 2, 3, 5, 6, 7, 8, 10, 11, … do not appear in the bottom row. Hence, the bottom row is missing some numbers in the top row. This means that the size of the set in the bottom row is smaller than the set in the top row.
So we have proved that the two sets have the same size and that they have different sizes, which is absurd. Hence, our assumption that the set of natural numbers is an ‘actual infinity’ must be a false assumption and should be rejected.
Applicability of Reductio ad Absurdum
We can see that reductio ad absurdum can be used in a variety of contexts. It is one of the more powerful methods of proof that mathematicians use precisely because the logic behind it is simple and elegant, namely that, if an assumption leads to an absurdity, then the assumption must be false. Also, from the three examples I have considered, we can see that it can be used in highly symbolic contexts (e.g. irrationality of √2) to contexts in which symbols are not even needed (e.g. ‘actual or potential infinity’ of natural numbers). Because of this reductio ad absurdum can also be used in non-mathematical contexts, as long as the logic is followed rigorously.
Suppose, for example, someone, trying to undermine some view that I hold, says, “All opinions are equally valid.” In response to this I could propose the opinion, “The opinion that ‘all opinions are equally valid’ is invalid.” Since this is an opinion and the original claim was that all opinions are equally valid, it must be true that the second opinion must be equally valid, rendering the first invalid!
The preceding paragraph actually highlights some serious flaws in argumentation that one encounters today. Many people make all sorts of claims that, if put in the context of reductio ad absurdum, would quickly fall to pieces. This stems from beliefs about climate change, politics, psychology, and religion, to say nothing about gender and sexuality, where, of late, some quite laughable assertions have been made that do not withstand logical scrutiny. I plan to explore one such claim in the next post. Till then, allow the absurdities to come to the rescue!