Acutely Obtuse

A Historical Mathematics Problem

October 11, 2024
The Siege of Yodfat

Site of ancient Yodfat. (Source: Wikipedia)

In AD 67, in the middle of the first Jewish-Roman War, a rag tag band of forty Jewish revolutionaries managed to get themselves besieged in Yodfat. Realizing they were going to be captured and perhaps tortured, the group decided that they would commit mass suicide. However, given the Jewish aversion to committing suicide, none of them wanted to kill themselves. Also, some of them argued, if each person was given the responsibility of killing himself, there could be some squeamish rebels who would simply not carry through with the decision, leading to some of them losing their lives while others saved their own lives.

In order to avoid such rejection of their collective decision to kill themselves rather than being captured, they devised a plan where they cast lots and the lots told them how they were to arrange themselves. They would then arrange themselves in a circle in order of the lots. Then the first person would drive his spear through the second, killing him. Then the third person would drive his spear through the fourth, killing him. This would proceed until only one person was left and he was expected to kill himself. Hence, only the last person was expected to kill himself and end the mass suicide.

As it so happened, when thirty-eight of them had been killed, the last two decided that they would stop the killing and surrender to the Romans! One of the survivors was the Jewish historian Josephus, who narrates this story in The Wars of the Jews (Book 3, chapter 8, paragraph 7).

The Josephus Problem

This problem has come to be known as the ‘Josephus Problem’ because Josephus himself reports that he did not want to die but actually wanted to surrender to the Romans.

Memorial to the Jewish defenders of Yodfat, which fell to Roman forces on July 20, 67 CE. (Source: Wikipedia)

Hence, the problem can be framed as follows: Given forty people arranged in a circle, with each alternate person being killed by the person before him, what is the position you should be in to survive? We can generalize this as follows: Given n people arranged in a circle, with each alternate person being killed by the person before him, what is the position you should be in to survive? This involves skipping 1 person at each stage. However, the problem can be further generalized as follows: Given n people arranged in a circle, with k people being skipped and the person in position k+1 being killed by the person before him, what is the position you should be in to survive? Is there a solution to this general problem and, if so, how would we find it?

The Survivor Position

We start small! Suppose we have 3 people: A, B, and C. A kills B and C kills A. Hence, the survivor is in position 3. Suppose we have 4 people: A, B, C, and D. A kills B, C kills D, and A kills C. Hence, the survivor is in position 1. If we continue in this way we will obtain the following table

We should be able to observe some patterns here. No one in an even position is ever the survivor. This is because, when we skip just one person each time, all the even positions are killed in the first round itself. Next, whenever the number of people equals a power of 2 (i.e. 2, 4, 8, 16, etc.), the survivor is in position 1. Between two consecutive powers of 2 (i.e. 2 and 4, or 4 and 8 or 8 and 16, etc.), the survivor position proceeds according to successive odd numbers.

Hence, we can obtain the following algorithm:
1. Given, the number of people (n), determine the largest power (say m) of 2 that is less than or equal to n.
2. Calculate p = n – 2^m + 1.
3. The survivor will be in position s = 2p – 1.
Let’s test this algorithm. Say n = 13. Now 2³ = 8 < 13 < 16 = 2⁴. Hence, m = 3. This gives us p = 13 – 2³ + 1 = 6. This gives, s = 2(6) – 1 = 11, which is the survivor position according to the table.

Say n = 16. Now 2⁴ = 16. Hence, m = 4. This gives us p = 16 – 2⁴ + 1 = 1. This gives, s = 2(1) – 1 = 1, which is the survivor position according to the table.

For the situation in which Josephus found himself n = 40. Now 2⁵ = 32 < 40 < 64 = 2⁶. Hence, m = 5. This gives us p = 40 – 2⁵ + 1 = 9. This gives, s = 2(9) – 1 = 17. Hence, Josephus would have been safe if he had drawn the lot for the 17^th position.

The Second Last Survivor Position

However, remember that Josephus was one of two who remained and who decided to renege on their commitment to the fallen comrades. So he need not have been the last man standing. He could have been the second last man, the one whose fate is would have been to be killed by the man in the 17^th position. What position would this second last man have held? We can generate a similar table as before to obtain

The first line is in red because of the strange case in which a person occupying an even numbered position survives. However, this is only because we are starting with 2 people. If we observe the third column, we observe once again that, apart from the anomalous ‘2’, there are only odd numbers. Further, we can see that the position resets to 1 whenever we reach 6, 12, 24, etc. This is a geometric sequence and the general term can be express as

Between successive terms of this series, the third column merely goes through all the odd numbers. So we can obtain the following algorithm:
1. Given, the number of people (n), determine the largest term the series u_k that is less than or equal to n.
2. Calculate q = n – u_k + 1.
3. The second last survivor will be in position t = 2q – 1.
Let’s test this algorithm. Say n = 13. Now 6×2^2-1 = 12 < 13 < 24 = 6×2^3-1. Hence, k = 2. This gives us q = 13 – 6×2^2-1 + 1 = 2. This gives, t = 2(2) – 1 = 3, which is the second last survivor position according to the table.

Say n = 24. Now 6×2^3-1 = 24. Hence, k = 3. This gives us q = 24 – 6×2^3-1 + 1 = 1. This gives, t = 2(1) – 1 = 1, which is the second last survivor position according to the table.

For the situation in which Josephus found himself n = 40. Now 6×2^3-1 = 24 < 13 < 48 = 6×2^4-1. Hence, k = 3. This gives us q = 40 – 6×2^3-1 + 1 = 17. This gives, t = 2(17) – 1 = 33. Hence, Josephus would have been the second last survivor if he had drawn the lot for the 33^rd position.

Throwing Down the Gauntlet

So we have now obtained positions for the last and second last survivors. However, what we have done is look at the tables and use them to obtain the ‘algorithms’ for the positions. We have not used any rigorous mathematics. What we have done is play around with the situations and keep an eye out for patterns. While there are certainly ways of deriving these ‘algorithms’, we can see that we do not need any rigorous mathematics to pull off something quite remarkable. For example, suppose we have n = 1000. Imagine, a circle of one thousand people. What would we obtain?

Using the first algorithm we would get: 2⁹ = 512 < 1000 < 1024 = 2¹⁰. Hence, m = 9. This gives us p = 1000 – 2⁹ + 1 = 489. This gives, s = 2(489) – 1 = 977. Using the second algorithm we would get: 6×2^8-1 = 768 < 1000 < 1536 = 6×2^9-1. Hence, k = 8. This gives us q = 1000 – 6×2^8-1 + 1 = 233. This gives, t = 2(233) – 1 = 465. Hence, the last survivor is the person in position 977 and the second last survivor the one in position 465.

Now you know how to set up the problem and draw up the tables, can you obtain the ‘algorithms’ is we skip not one but two people each time? What kind of patterns would we see?
Getting Squared Away

October 4, 2024

Confession

Playing with numbers is one of my favorite ways to pass time. Of course, we can get to numbers per se in a variety of ways. One can be from simply doodling on a page. I remember drawing grids of dots on a page and playing all kinds of games with my classmates when I was in school. I’d like to say that this was only during breaks or free periods, but I know I won’t be able to pull the wool over your eyes, at least in this regard. Those of you who know me personally also know the glint that appears in my eyes when I am up to no good. Oh well!

As the well known saying goes, “An idle mind is the devil’s workshop.” And I often found myself bored stiff in many of my classes. In some of them I did take a perverse pleasure in disrupting the class. This was in classes where I had some kind of antagonism toward the teacher. But in other classes, where I found the teacher likable, I would engage in my random play, one kind of which was drawing grids and either contriving new games to play on the grid or discovering intriguing patterns.

Anyway, during my dot grid days I would attempt to come up with different types of grids. And of course, as mentioned earlier, I had to count the number of dots and recognize patterns. It was only much later that I realized that my (successful) efforts at distracting myself in some classes were actually leading to a common recreational mathematics idea – the centered square numbers. Let me explain.

Introduction to Centered Square Numbers

The centered square starts with a single dot. Each successive centered square is obtained then by surrounding the existing centered square with another layer of dots to complete a new square. The first four members of this sequence are shown in the figure below.

Allow me to explain the notation. In the term C_4,3, the ‘4’ refers to the number of sides in the regular polygons we are dealing with. In this case, since we are considering centered squares, the first number is a ‘4’, indicating a quadrilateral. The ‘3’ refers to the third member in the sequence. The C indicates the count of the number of dots.

In the figure, apart from the first member, each figure has both red dots and grey dots. If you focus on either the red or gray dots, you will notice that the set of dots of that color form a square. So in the third member, the red dots form a square of side 2 dots while the gray dots form a square of side 3 dots. Similarly, in the fourth member, the gray dots form a square of side 3 dots and the red dots form a square of side 4 dots. This is what is indicated on the right side of each equation. For the third member, 4 is the number of dots in a square of side 2 while 9 is the number of dots in a square of side 3. Similarly, for the fourth member, the square of side 3 has 9 dots while the square of side 4 has 16 dots.

We can now add the numbers on the right side of the equations to get

From this it is clear that the number of dots in the n^th member is equal to the sum of the squares of n and n – 1. Hence, we can write

Fruits of Algebraic Massaging

With a little bit of algebraic ‘massaging’ we can show that

What this tells us is that the n^th centered square number is equal to half the sum of the square of the n^th odd number and 1. This can be illustrated as in the diagrams below:

Here the dots in gray indicate the count for the centered square while the total number of dots is the square of the n^th odd number.

We can also ‘massage’ the expression for C_4,n as follows:

Now, if we consider the sum of the first n – 1 natural numbers we can see that

This means that the n^th centered square number is one more than four times the sum of the first n – 1 natural numbers. We can depict this as follows:

Here, the central black dot represents the 1 that is added at the end. The remaining dots are divided into 4 triangles, 2 in gray and 2 in red. Each of these triangles represents the sum of the first n – 1 natural numbers.

Further, since

we can conclude that each centered square number is necessarily odd. We could obtain this intuitively from the fact that the centered square number is the sum of consecutive square numbers. Since one number must be odd and the other even, this must mean that we are adding an odd number, since the square of an odd number is odd, and an even number, since the square of an even number is even.

Of course, given the m^th odd number, we can obtain

This means that the square of an odd number is always 1 more than a multiple of 4. This leads easily to the conclusion that all centered square numbers are 1 more than a multiple of 4. This was depicted in the previous diagram since there are 4 identical triangles and the single black dot.

Absolution

As I played around with the grid of dots in my early days, I did recognize some of these patterns. Not all, of course. But it was quite intriguing that the simple endeavor of making a grid of dots could yield so many interesting properties. This began out of boredom in the classroom. I am confident that countless many of the discoveries and inventions we now know about and enjoy had their beginnings in the boredom of a child in a class he/she was forced to attend. If this is the case, I think I can be absolved for my boredom fueled explorations!
Numbers are Playthings

September 27, 2024

Inhale

Ok. It’s time for a little breather. On 17 March 2024 we began a series of posts on e. Then we moved to another series on principles of counting. Then last week we completed a series on π. All these were quite heavy and I’m sure some of you are still reeling from the shock caused by the posts on π. So for a few weeks I think we should lighten up a bit. I will dedicate the next few posts to recreational mathematics.

From as long as I can remember, I have enjoyed playing with numbers. I enjoyed performing all sorts of operations on them in the attempt to recognize patterns. At an early age, and I believe independent of any external influence, I noticed that when you multiply single digit numbers by nine, the digit in the units place keeps reducing by 1 while the digit in the tens place increases by 1.

Playing with numbers is exhilarating. The joy that one can get from a few simple operations cannot be expressed. In this post, I wish to give you one way I have occupied myself for endless hours and one easy but mathematically based trick. The challenge is to use four copies of the same number (from 1 to 9) and any of the mathematical operations (+, −, ×, ÷, a^b, and √a), with as many sets of parentheses to make as many numbers as possible.

Massaging the Numbers

For example, we can choose to use four ‘2’s. In that case, we can have:

(2 ÷ 2) ÷ (2 ÷ 2) = 1

(2 ÷ 2) + (2 ÷ 2) = 2

(2 + 2 + 2) ÷ 2 = 3

(2 + 2) ÷ 2 × 2 = 4

2 × 2 + 2 ÷ 2 = 5

2 × 2 × 2 – 2 = 6

2^{(2 + 2 ÷ 2)} = 8

(2 + 2 ÷ 2)² = 9

2 × 2 × 2 + 2 = 10

You get the idea. Is there a way of getting 7 as the answer? What about 11? What’s the largest number you can get using four ‘2’s?

We could repeat the same with another number. Try it on your own. Believe me, it keeps your mind engaged and, when you make a breakthrough, there is a serious dopamine rush.

A Constant Result

Now concerning the trick I mentioned earlier. Ask someone to choose a three digit number where the hundreds digit and the units digit as not the same. Now ask them to reverse the order of the digits. So if they choose 356, after reversing, they get 653.

Now ask them to subtract the small from the larger. With the above numbers 653 – 356 = 297. (If they get a two digit number ask them to place a 0 in the hundreds place. So if they had chosen 433, they would have 433 – 344 = 099.) Ask them to reverse the digits again and add the two numbers. So with the original number we chose we would get 297 + 792 = 1089. (If they had 099, they would now get 099 + 990 = 1089.) But tell them not to give you any of these results.

Now, the answer is always 1089. So you could keep a book with around 200 pages and memorize the 9^th line on page 108. So when they have finished adding, tell them to turn to the page indicated by the first three digits (i.e. 108) and check the line indicated by the units digit (i.e. 9). Now you read out the line you have memorized and bamboozle everyone. Of course, since the answer is always 1089, you cannot do this trick with the same person multiple times.

But why does it work? Say we have a in the hundreds place, b in the tens place, and c in the units place. For now let’s assume that a > c. Then the value of the original number is 100a + 10b + c

When we reverse the order, we get a number whose value is 100c + 10b + a. When you subtract this from the original number you will get 100(a – c) + (c – a).

Now since a > c, it followed that c – a is negative. Hence, when we subtracted, we would have had to borrow. However, from the form 100(a – c) + (c – a), there is nothing that represents the tens place. So we will have to borrow 1 from the hundreds place, and then a 1 from the tens place, giving

100(a – c) + (c – a) = 100(a – c – 1) + 10 × 9 + (10 + c – a)

When we reverse this number we get 100(10 + c – a) + 10 × 9 + (a – c – 1)

Now when we add the last two numbers we get

100(a – c – 1 + 10 + c – a) + 10 × 9 + 10 × 9 + (10 + c – a + a – c – 1)

which on simplification gives 900 + 180 + 9 = 1089

Exhale

As we have seen, numbers provide us with a lot of entertainment. We will continue to look at this lighter side of mathematics in the few posts that follow. Don’t forget to try your hand with four ‘3’s or four ‘4’s. And astound someone who does not read this blog with the trick. And tell them to come over here and read the other posts. I’ll see you next week.
Inside the Gauss-Legendra Algorithm

September 13, 2024

Recapitulation

We are in the middle of a series of posts on π. Our journey began with A Piece of the Pi which was followed by Off On a Tangent. Last week, I posted The Rewards of Repetition in which I promised that we would break down the Gauss-Legendre algorithm into the distinct parts. Of course, as mentioned in the previous post, the algorithm uses mathematics that is way above the level of the blog. I will not be explaining any of that. I will only explain those parts that are at the level of this blog. So let’s proceed.

The Gauss-Legendre Iteration

First, let us recall the algorithm. The Gauss-Legendre algorithm begins with the initial values:

The iterative steps are defined as follows:

Then the approximation for π is defined as

The question, of course, is why this process even works. Surely this can’t be some random method that Gauss and Legendre independently stumbled upon! So what is the rationale behind this strange algorithm? After all, if we are honest, there is nothing in the steps that lead self-evidently to a conclusion that the iterations would converge to π. So let us consider the steps involved and why these initial values and iterative method leads to the approximation of π.

Comparison of Means

The method begins with the realization that, given to distinct positive numbers, their arithmetic mean (AM) is always greater than their geometric mean. Of course, the AM and GM are defined as

So we can proceed as follows since a and b are distinct

Convergence of a_n and b_n

Now the iterative formulas

Assure us that

and

This means that the sequence b_n is steadily (monotonically) increasing while the sequence a_n is steadily (monotonically) decreasing. Since we start with the fixed values of a₀ and b₀, this means that the two sequences will necessarily converge.

Now if we define

we can obtain

which, further yields

This means that the sequence c_n also converges. And since c_n cannot be negative, it must converge to 0. This means that a_n and b_n converge to the same value. Since this common value is obtained by converging series of arithmetic and geometric means, it is called the arithmetic-geometric mean (AGM). Let us designate it with m.

The Magic of Calculus

The next stage involves some calculus. Since I haven’t introduced calculus in the blog yet, let me just present the results. Those who wish to read in detail can look at the proofs here. Anyway, the method begins by defining

Then with some nifty substitutions and change of limits, we are able to prove

The observant reader will now realize why we had done all the stuff related to AM and GM and the convergence of the same. Now it is a trivial step to conclude that

From this we can conclude that

where, m, as defined earlier, is the AGM of a₀ and b₀. What this means is that

and

Now using the Gamma and Beta functions it is possible to prove

What this means is that if we start with the initial values as stated earlier, each iteration of

gets us closer to the value of π.

The iterative process of the Gauss-Legendre method converges so quickly that, after only 25 iterations, the algorithm generates 45 million digits of π.

Conclusion

I apologize to the reader for not including all the calculus behind the algorithm. I have done that in the interests of not making this post too heavy. I know that it is still quite heavy. But that is the nature of the beast! There are no rapidly converging methods for approximating the value of π that do not involve a lot of calculus. Indeed, some algorithms derived using calculus are really quite strange – so strange that even the Gauss-Legendre algorithm would seem quite sensible! We will tackle some of them in the next post, after which I will lay this π to rest.
Breather

August 23, 2024

The past week got quite busy all of a sudden and the week ahead also looks like it will be the same. I know I’m in the middle of a series of post on π. I will continue with the series in a couple of weeks. For now, I leave you with some collections of earlier posts.

Posts on e

Posts on mathematical pedagogy

Posts on number theory

Posts on π

Posts on proofs
Breaking the Nexus

August 2, 2024

The Growing Frustration

It’s frustrating. Yes, you heard me right. It’s frustrating. You may be wondering why I am saying this and in what context. So here goes. Just the other day, I was looking up some perspectives on a mathematics discussion forum. Quite naturally, there were many mathematics teachers in the forum. And quite naturally there was some discussion about questions asked by students.

As a mathematics teacher, I find these questions quite illuminating. In fact, this is one of the reasons I love being a teacher. You never know what would spark the interest of some student enough to formulate a question. So even though what I teach might be much the same from year to year, because who I teach changes, it is never boring. Indeed, even when you teach the same group of students, as I did from 2016 to 2018 and then 2020 to 2023, you can see their questions change as they themselves mature and change.

But it’s time to stop this nostalgic reverie and return to my frustration!

Case I: Equivalent Fractions

One question on the forum was asked by the teacher of a student in elementary school. I presume this is between Grade 1 and Grade 5. The student was learning fractions. When they came to the concept of equivalent fractions, the student asked the teacher what was different between the fractions 6/8 and 3/4. The teacher asked the question on the forum, requesting advice on how to respond to the student.

The poor teacher! The backlash she faced was unconscionable. So many of her fellow mathematics teachers pounced on her to tell her not just that both fractions are the same but that any mathematics teacher worth his/her salt should know this. Some questioned her competence to teach the subject. I commented, taking her side, and posted a link to a previous post in which I demonstrate that equivalent fractions might have the same value but do not contain the same information.

I faced the same issue in a recent class. A student asked me what the difference between various equivalent fractions was. That this student is currently in Grade 12 and is a reasonably astute student is an indictment against all his mathematics teachers.

Anyway, the question arose when, in answer to a problem, the student gave me the following numbers:

The student had calculated accurately and the numerical values of each of these fractions was spot on. However, I told him that, by reducing the fractions to their lowest forms, he had lost something crucial. It was then that he asked me the question about equivalent fractions.

So I gave him another set of fractions:

I asked him what links this set of fractions. He struggled, unsuccessfully, for a few minutes. Then I told him that I had generated this set of fractions in a very similar way to the one with which he had generated the first set. I gave him a few more minutes to try his luck, to no avail.

What was it that gave rise to these two sets of numbers? There is clearly no discernible pattern in either of the sets and they may seem to be just some random fractions generated by an addled mind. However, what if I told you that the first set, before reducing to equivalent fractions, was

Right away, you will recognize that there is a pattern. The numerator is composed of multiples of 5, while the denominator is composed of factorials, starting with the factorial of 3. If I asked you to give me a formula for generating the terms, you will quickly be able to tell me:

Similarly, if, for the second set of numbers, I gave you

you will equally quickly be able to tell me

You see, while the numbers in the first set and third set have the same corresponding values (and similarly with the second and fourth sets), something is lost by reducing the fractions to their lowest terms. And with this loss, we are rendered incapable of generating further terms in the sequence because the process of reducing the fractions destroys the pattern.

Case II: Quadratic Expressions

This shows up in other places too. For example, one of the students I am currently tutoring is studying quadratics at school. Now this student is exceptionally bright. When I was concluding a class, I asked her to think about the difference between

and

For those unfamiliar with the above, let me give an example.

The above is an example of the fact that any quadratic expression of the form

can be expressed in the form

But what was the point of the re-expression? Why should we bother to do it? In the next class I asked my student if she had given this some thought. She said she had but that she could not figure out in what way

is different from

Since we had just started learning about quadratics, I did not fault her for not being able to answer the question. After all, that’s what learning is. But she told me that she had asked her teacher about the difference. And her teacher had told her that there was no difference because

I’m glad the class was online. Otherwise, my student would have felt my fury palpably. It is likely that she did sense it even over our Zoom call! The statement of the teacher is absolute rot and betrays a failure to look below the surface. For, while both expressions are indeed equivalent, the expression on the right tells me that this quadratic expression, if plotted as a function of x, has a vertex at the point (1, -13). You may wonder, “What’s the big deal? So what if the expression on the left gives me the coordinates of the vertex? I could do a few steps of algebra and get it.” Absolutely true.

However, if you were given the functions

you would not know that they all have the same vertex. However, if they were written as

the fact that the vertices are at (1, -13) is indisputable. But more than that, the second set of equations also tell us, from the signs on the coefficients 3 and -13, 5 and -13, and -6 and -13, that f(x) = 0 and g(x) = 0 will have real roots, while h(x) = 0 will only have complex roots.

Case III: Theory of Equations

The same kind of issue shows up when we attempt to solve equations in general. Suppose, for example, you are asked to solve the equation

How would you proceed? Of course, you can use some general results from the theory of equations and conclude

But after this, what? Every student who has fumbled with these results will know that, if you try to solve these equations by some standard methods like elimination or substitution, you will revert to the original equation. So while the above four results give some valuable insights about sums and products of the roots of the equation, they do not bring us any closer to actually solving the equation.

When it comes to just simple quadratic equations, of course, the above method is powerful. After all, suppose we are given the equation

A little back of the napkin calculations will tell me that I need to find two numbers α and β such that

With a little effort, I can determine that the divisors of 3705 are 1, 3, 5, 13, 15, 19, 39, 57, 65, 95, 195, 247, 285, 741, 1235, and 3705. I can place them in pairs as follows: (1, 3705), (3, 1235), (5, 741), (13, 285), (15, 247), (19, 195), (39, 95), and (57, 65), where the numbers in each pair gives the product of 3705. Now since I have -3705 and not 3705, I need two numbers of opposite signs. And since I have -56 and not 56, I need the number with the larger magnitude to be negative. This gives the pair of numbers to be 39 and -95. Now we can proceed to split the middle term and factorize, giving

This allows us to see that we have the product of two numbers equalling zero, which must mean that either of the two numbers is zero. This allows us to conclude

While this method is relatively simple and straightforward in the case of quadratics, there is absolutely no way to use it when it comes to any generic polynomial equation of higher degree, like the quartic equation we saw earlier.

However, the actual way of solving the quadratic, by having a product equal to zero, should allow us to see that, if we had a quartic equation of similar form, we would be able to solve it. Indeed, the earlier quartic equation can be written as

This will allow us to conclude that

In other words, though it is almost child’s play for a student in grade 9 or above to show that

can be expanded to give

it would take even a skilled mathematician quite a few iterations of trial and error before she was able to solve

However, most high school students and perhaps even some middle school students would be able to solve

It is, therefore, quite frustrating when students come to me in Grade 11 and tell me that their teachers either found no difference between the expanded forms of expressions and the factorized forms or that they actually preferred the expanded form. Granted that the parentheses seem intrusive at times, one person may certainly prefer the visual appeal of the expanded form to the factorized form. However, when it is clear that the expanded form is difficult, if not impossible, to solve, this would be like preferring opacity to lucidity.

Case IV: Permutations and Combinations

I attribute this unfortunate preference to the noose like grip that textbook publishers and calculator manufacturers have on the global high school examination boards and schools. This fourth case, should highlight this point well. Suppose we were faced with the following problem. In how many ways can a committee of 7 people and another of 8 people be chosen out of 25 people such that 2 members of each committee are to be designated as president and secretary and such that no person belongs to both committees? How would we proceed?

We could first select the 7 member committee, which can be done in ²⁵C₇ ways. Now among these 7 members, we have to choose 1 to be president and 1 to be secretary, which can be done in ⁷P₂ ways. We use a permutation because it matters whether a person is president or if she is secretary. Now we have 18 people remaining. We can choose the second committee of 8 members in ¹⁸C₈ ways and the president and secretary in ⁸P₂ ways. So in all we have

Most textbooks will state that the answer is 49473074851200 or 4.947×10¹³ ways or something of the sort, where each individual term has been calculated and then the product evaluated. But this is an onerous task to do by hand. Hence, expecting the student to get either answer is expecting them to use a calculator. But perhaps the above number is too large for most of us to even process. Let’s deal with some smaller numbers.

Suppose I have 5 students. On one day, I wish to send a team of 3 of them for a debate. Quite naturally, this can be done in ⁵C₃ = 20 ways. On another day, I need to send one student to the staff room to collect my books and another to the library to collect a journal. This can be done in ⁵C₁ × ⁴C₁ = 5 × 4 = 20 ways. Both tasks can be done in 20 ways. However, the number 20 obscures how it was obtained. If the textbook gives the answer as 20 for the first situation, but a student used the second method and obtained 20, she would think she has understood the concepts since she obtained the ‘correct’ numerical answer. However, her thinking is obviously flawed. However, if the textbook gave the answers as ⁵C₃ and ⁵C₁ × ⁴C₁ respectively, any diligent student would be able to see where she had gone wrong.

When I have taught large classes of students, I have often given them a random number and asked them to give me a series of operations with only a particular single digit number that would yield the random number. For example, suppose the random number is 23 and the chosen single digit number is 4. Then I could have

There are infinitely many ways in which I can use just the number 4 and obtain 23. The challenge would be for each student to give a unique set of operations. And when I have done this exercise, never once have we needed to repeat a set of operations.

What I am trying to say is that expressions such as ⁵C₁ × ⁴C₁ or ⁵C₃ or, as in the original problem,

may be numerically equal to 20, 20, and 49473074851200 but contain far more information and give far more insight than the explicit numbers. However, if you are a calculator manufacturer, none of these insightful forms will help you sell your products!

Escaping the Trap

So you, that is, the calculator manufacturer, and the examination boards and the textbook publishers form a nexus in which these explicit but uninsightful forms are given preference over the forms that can be used to yield mathematical insight and promote mathematical understanding.

Unfortunately, most teachers go ‘by the book’. And so over time they forget that 3/4 and 6/8 may be numerically equivalent, but have quite different meanings. They forget that 20 is not just a number but could have been obtained in many ways, most of which are incorrect. They tow the line and look at the bottom line, that is, the final answer, concluding, if the answers match, that the student must have correctly executed the right sequence of operations. But as we have seen, this need not be the case.

It is time that the examination boards took a firm stance and actually favored the development of mathematical understanding of students rather than their ability to use a tool that is obsolete. I mean, where in the world does anyone use a calculator these days? Some old style retail outlets may use four function calculators to add the prices of items on a list and maybe also to calculate any sales tax or GST. However, where does anyone use a calculator for trigonometric or logarithmic or exponential functions except in school where it is forced onto the students? If we want students to be able to use some kind of technology to evaluate complicated functions, then we should teach them a programming language rather than make them use calculators. At least the former has real world relevance for many jobs while the latter has absolutely none!
Behind the Smoke and Mirrors

July 26, 2024

Setting the Stage

Smoke and Mirrors. Photo by theilr. (Source: Flickr)

I have long been fascinated by patterns that exist in numerical sequences. Early exploration led me to the discovery of the Fibonacci series and the associated Golden Ratio. I also came across Pascal’s Triangle, which remains a source of intrigue. Hence, when I was introduced to the tests for divisibility, I devoured them with a voraciousness unmatched till then in my life. By some accounts, I stumbled upon the tests for divisibility by 3 and 9 all by myself, without the direct involvement of my teachers.

And when my teachers ‘failed’ – or so it seemed to my young mind – to give me a test for divisibility by 7, I attempted to derive one such test. Of course, I failed at the task. Try as I might, I could come away with no universal test that would hold true for a number having any number of digits. What was it about 7?

You see, we have tests for divisibility by 2 (number ends with 0, 2, 4, 6, or 8), 3 (sum of digits is divisible by 3), 4 (last 2 digits are divisible by 4), 5 (number ends in 0 or 5), 6 (number satisfies conditions for 2 and 3), 8 (last 3 digits are divisible by 8), 9 (sum of digits is divisible by 9), 10 (number ends with 0), 11 (difference of the sums of alternate digits is divisible by 11), and 12 (number satisfies conditions for 3 and 4). Why was 7 so stubborn?

Moreover, why did these tests work out the way they did? Perhaps if we uncovered the reasons for which the tests for 2, 3, 4, 5, 6, 8, etc. are what they are, we can gain some insight into why 7 refuses to submit to our efforts.

In what follows, I will use the notation [wxyz] to denote a 4 digit number which has w in the thousands position, x in the hundreds position, y in the tens position, and z in the units position. Hence, the value of the number

Divisibility by 2

Tessellation pattern. (Source: Mathematical Mysteries)

Naturally, the first test of divisibility is that by 2. In order to derive this test, we do the following:

Here it is clear that the sum of the first three terms is divisible by 2. Hence, the number [wxyz] as a whole will be divisible by 2 if and only if z is divisible by 2, which happens when z = 0, 2, 4, 6, or 8, giving the test for divisibility by 2.

Divisibility by 3

When we reach divisibility by 3, we can do the following:

That is

Here it is clear that the first term is divisible by 3 and 9. Hence, the number [wxyz] as a whole would be divisible by 3 or 9 if the second term, that is the sum of the digits, is divisible by 3 or 9 respectively. This gives rise to the tests for 3 and 9.

Divisibility by 4

For the test for divisibility by 4, we manipulate our number as follows:

Here it is clear that the first term is divisible by 4. Hence the number [wxyz] as a whole will be divisible if the second term, which is the number [yz] formed by the last two digits, is divisible by 4. This yields the test for 4.

Divisibility by 5

The test for divisibility by 5 is easy to remember. But it is obtained by doing the following:

Here it is clear that the first term is divisible by 5. Hence, the number [wxyz] as a whole will be divisible by 5 if z is divisible by 5, which happens when z is either 0 or 5, giving us the test for 5.

Divisibility by 6 and 12

Since 6 is the product of 2 and 3, a number that satisfies the condition for 2 and 3 will be divisible by 6. Similarly, since 12 is the product of 3 and 4, a number that satisfies the condition for 3 and 4 will be divisible by 12.

Depiction of the Mandelbrot Set. (Source: Boston University)

Divisibility by 8

The test for divisibility by 8 is an extension of that derived for 4. We proceed as follows:

Here it is clear that the first term is divisible by 8. Hence, the number [wxyz] as a whole will be divisible by 8 if the second term, which is the number formed by the last three digits, is divisible by 8. Hence, we get the test for 8.

Divisibility by 10

The test for divisibility by 10 follows the same logic as that for 5. In order to derive it we do the following:

Here it is clear that the first term is divisible by 10. Hence, the number [wxyz] as a whole will be divisible by 10 if z is divisible by 10, which happens when z is0, giving us the test for 10.

Divisibility by 11

When we come to divisibility by 11, we realize that it is not as straightforward as the ones that came before. Indeed, while the test for divisibility by 11 was taught when I was in school, it has mostly been removed from the mathematics curriculum today. Nevertheless, the test for divisibility by 11 is quite instructive. So let us see what we can learn from it. Given the number [wxyz],, we can do the following:

Here it is clear that the term in red is divisible by 11. Hence, the number [wxyz] will be divisible by 11 if the term in green, which is the difference or the sum of alternate digits, is divisible by 11, yielding the test for 11.

Divisibility by 7

When we come to the number 7, there is indeed a test for divisibility by 7. It goes as follows. Suppose we have the number […xyz]. Then if […xy] – 2z is divisible by 7, the original number is divisible by 7.

For example, consider the number 658. We can proceed as follows:

Since 49 is divisible by 7, it follows that 658 is divisible by 7. But how does this work? Consider the number […xyz]. We can proceed as follows:

Here the term in green is obviously divisible by 7. The term in blue is the one we are asked to test for divisibility by 7. If it is divisible by 7, then the whole number is divisible by 7. But what if we had a much larger number? How would we proceed? Suppose, for example, that we were testing for 548975. We would have to proceed as follows:

Since 35 is divisible by 7, we can conclude that 548975 is divisible by 7. This is a recursive method that is fool proof. It is also easy to remember. However, due to the recursion and the fact that most of us probably barely remember multiples of 7 that are 3 or more digits long, for a number with n digits, we will have to perform n – 2 recursions before reaching a 2 digit number that we could recognize as a multiple of 7 or not.

Each recursion involves a multiplication and a subtraction. This results in 2n – 4 operations for the n – 2 recursions.

Attempt at Divisibility by 7

We can see that in most of the tests, we generate 2 terms, the first of which is automatically divisible by the number under consideration. Then the test devolves into testing if the second term is divisible by that number. For the tests for 3 and 9 we saw that one less than a power of 10 is automatically divisible by 3 and 9. For the tests for 4 and 8 we saw that multiples of 100 and 1000 are automatically divisible by 4 and 8 respectively.

Even for the test for 7 we followed the same basic idea indicated by the 21z term. However, while the other tests yielded the answer in a single step, the test for divisibility by 7 involved a recursive approach. Hence, for a 6 digit number, we had to perform 4 recursions to obtain a 2 digit number for which divisibility by 7 can be placed in memory.

The test for 11 was most illuminating. We saw that some powers of 10 (i.e. 10 and 1000) are 1 less than a multiple of 11 while other powers of 10 (i.e. 1 and 100) are 1 more than a multiple of 11. Could we use a similar approach to obtain another test for divisibility by 7?

To attempt that, let us see how we fare when we divide powers of 10 by 7. The table below shows the remainders.

After the 6^th power of 10, that is 1,000,000, the patterns repeats. And this is because

Hence, when we have a number of the form [xyzxyz] it is necessarily divisible by 7. Since 1,000,000 is 1 more than a number that follows this pattern (i.e. 999999) the pattern of remainders will repeat after every 6^th power of 10.

Now we observe that the first three powers of 10 give remainders of 1, 3, and 2 respectively. The next three powers of 10 give negative remainders in the same order.

So if I had the number [uvwxyz], we can rewrite it as

Here the term in red is obviously divisible by 7. Hence, the whole number [uvwxyz] is divisible by 7 if the term in green is divisible by 7. Let’s put this to the test and let’s label the term in braces as D.

Consider a 2 digit number (say 42). Here, u = v = w = x = 0, y = 4 and z = 2.

Since 14 is divisible by 7, 42 is divisible by 7.

Consider 91. Here

Since 28 is divisible by 7, 91 is divisible by 7.

Consider a 3 digit number (say 434). Here, u = v = w = 0, x = 4, y = 3 and z = 4.

Since 21 is divisible by 7, 434 is divisible by 7.

Consider a 4 digit number (say 2072). Here, u = v = 0, w = 2, x = 0, y = 7 and z = 2.

Since 21 is divisible by 7, 2072 is divisible by 7.

Consider a 5 digit number (say 24962). Here, u = 0, v = 2, w = 4, x = 9, y = 6 and z = 2.

Since 28 is divisible by 7, 24962 is divisible by 7.

Consider a 6 digit number (say 548975). Here, u = 5, v = 4, w = 8, x = 9, y = 7 and z = 5.

Since 14 is divisible by 7, 548975 is divisible by 7.

Now I grant that this process is not intuitive. There is no identifiable uniformity that would allow it to function as a test that is easy and quick to use as the other tests. There is, of course, a pattern. In each block of 3 digits, the digits are multiplied by 1, 3, and 2, respectively going from right to left. Alternate blocks are to be added and once all that is done, the difference between the sum of alternate blocks is to be evaluated. The resulting number, which will, in all likelihood, be a 2-digit number, is tested for divisibility by 7 and the result here tells us about the divisibility by 7 of the original number.

If we counted the number of operations involved, we will see that each of the parentheses in the green term above, involves 4 operations. For a number that has 3m digits, we will have m groups with m – 1 operations linking the groups, yielding 5m – 1 operations in all. With the previous test, a number of similar length (i.e. 3m), would require 6m – 4 operations. It is clear, then, that the second test will be quicker than the first. However, the first test is more straightforward than the second.

Stepping Back

However, despite the fact that the second test is not intuitive, such a process reveals the logic behind the other tests. Those tests are quite easy to implement, allowing us to take for granted their efficacy. And if my memory serves me right, none of my teachers ever really explained to me why those tests work, as I have done above. Hence, these algorithms come across to many students as some kind of mathematical wizardry, where reliable results are obtained by some numerical sleight of hand. Actually, I have a nagging feeling that many mathematics teachers themselves have no idea why the tests for 3, 4, 6, 8, and 9 work and most probably have no clue about the test for 11.

If we take a step back to understand what happened when we derived this test for divisibility by 7 we can see that, when we divided powers of 10 by 7, we obtained 6 different remainders. We took powers of 10 because that is precisely what the place value system of writing numbers shows us as we have seen when we expanded [uvwxyz] and other numbers. Concerning the remainders, we can further observe that in fact we obtain all the possible remainders we can get when we divide by 7. This is actually revealed in the recurring pattern ‘142857’ when we divide any integer that is not a multiple of 7 by 7. The increasing powers of 10 only shift this pattern rightward to give us the remainders as follows

Here the numbers with the same color indicate how the pattern is generated, that is, how one remainder morphs into the next as division is continued. Note especially the numbers in parentheses in the last column and compare them with the numbers in the second column of the previous table.

Since the recurring pattern for division by 7 is a string of 6 numbers (142857), we get 6 different remainders, which, in the case for 7, are all the remainders we could obtain. Due to this our test involved 6 different numbers u, v, w, x, y, and z. Since the pattern of remainders 1, 3, 2, -1, -3, -2 conveniently formed 2 blocks of 3 numbers, we were able to reduce the test to a replicating pattern involving the sum of 3 numbers for which the difference is finally taken.

Now, when we obtain tests for divisibility, we need focus only on the prime numbers as seen in the very brief descriptions for divisibility by 6 and 12. Powers of 2 present a unique situation as seen in the tests for 2, 4, and 8. However, if we focus on the prime numbers, the next one we encounter is 13. What would a test for divisibility by 13 look like? It pays to note that the recurring pattern for division by 13 is ‘076923’ and then to generate the following table:

Now it helps to not rush to any conclusions. While division by 7 and 13 produced patterns of length 6, this is not the case for all primes, as we should expect. For example, division by 17, the next prime, produces the full complement of 16 possible remainders. Anyway, getting back to 13, I encourage the reader to use the table above and the ones before it to obtain a test for divisibility by 13. Also do attempt a test similar to the recursive first one. Please do put your findings in the comments. And I will see you next week.
Return of an Old Friend

July 19, 2024

In the last three posts, we have been dealing with some issues related to counting. In Learning to Count, I introduced us to the area of combinatorics and specifically combinatorial arguments. In the second post, Abjuring Double Counting, we looked at the inclusion-exclusion principle that is used to ensure that every element is counted exactly once. In last week’s post, No Place at Home, the third in this series, we looked at the idea of derangements and used the inclusion-exclusion principle and a recursive formula to derive the expression for the number of derangements in a set of n items. To jog your memory, we proved that

Revisiting an Old Friend

In the previous series of posts, we had dealt with the number e. In the second post in that series, Naturally Bounded, we had derived an infinite series expression for e as

In the above equation, we make the following substitution:

This gives us

Please read the above carefully. Raising the equation to the power x we get

Using the same techniques as we used in Naturally Bounded, we will obtain

If we substitute x=-1 in the above we will get

A Surprising Result

Now consider the equation

We can rewrite this as

Taking a limit on both sides as n approaches infinity, we will get

Expanding the summation on the right side we will obtain

But we just showed that the right side is related to e. Specifically, we can write

Or taking reciprocals we get

In other words, as n increases, the ratio of the number of permutations of n items to the number of derangements of the n items approaches e. This is a surprising result as there is no obvious reason for which e should show up in this ratio.

An Approximate Explanation

However, let us consider that we have a set of n items. If we now place items in random positions, the probability that item 1 occupies its own spot will be 1/n. Hence, the probability that it does not occupy its own spot will be 1-1/n. The probability that item 2 occupies its own spot will be 1/(n-1), since one spot has already been occupied by item 1. Hence, the probability that item 2 is not in its own spot is 1-1/(n-1). For large values of n, 1-1/n and 1-1/(n-1) are nearly equal. For example, if n=1,000,000, the corresponding values are 0.999999 and 0.999998999998…, where the difference is approximately 0.000000000001. Since the probability values are so close to each other, we can approximate the probabilities that each of the n items to not occupy its own position to be 1-1/n.

This leads us to the overall probability being

Taking the limit as n approaches infinity we get

So we can see that the reason e appears in our study of derangements is that, for large values of n, the placement of the items in positions not their own are almost independent of each other and almost equal to each other. That we have used the word ‘almost’ twice is the reason why e only shows up in the limiting case and not for any finite values of n. Of course, as we place more items in different positions, the error involved in the approximation will increase. Hence, the above reasoning cannot fully account for the appearance of e in the expressions about derangements.

Hidden in Plain Sight

However, recall that, in the previous post, we had derived

If we divide throughout by n!, we will get

This reduces to

The term in red is the probability that a random permutation of n items is a derangement. The term in green is the probability that a random permutation of n-1 items is a derangement. We can see that as n gets larger, the right side gets closer to zero at the rate of a factorial. Hence, we can conclude that the two probabilities on the left side converge to some specific value.

Further, we had derived the expression for the number of derangements as

which can be rearranged to give

As observed in the previous post, the nesting of the parentheses yield the alternating signs of the terms. We also saw that the nesting of the parentheses was the result of the application of the inclusion-exclusion principle. In other words, the alternating signs are the result precisely of the way the derangements are counted. Hence, even though we might not have expected the appearance of e and might even be quite surprised by it, it is actually something that was in the expression all along.

Deranged No More

I have enjoyed writing these posts on counting, especially the last two on derangements. When I first encountered the idea, it was a novelty. Why would we ever want to place something where it does not belong? Being a person who does like order (though you may not suspect it if you looked at my desk!), the thought of placing something where it did not belong seemed only to be an instance of mathematicians playing around. But, as we saw in the previous post, derangements have a wide range of applications, some not quite intuitive. My interest was further piqued when I first saw how derangements are linked to e. The startling result hammered home the idea that mathematics, though often divided into incommensurate silos while we are taught it in school, is actually a unified body of knowledge.

I hope you have also received some enjoyment and maybe some stirring of interest through these posts. I would really like to hear from you. Please do write a comment and let me know what you liked, what you disliked, and what you found confusing.
No Place at Home

July 12, 2024

Recapitulation on Counting

Russian Matryoshka dolls. (Source: Cambridge Mathematics)

In the previous two posts, I introduced ideas related to counting. In Learning to Count we introduced the idea of combinatorial arguments, which allowed us to gain some insight on the process of selection. In Abjuring Double Counting we saw the recursive process used to ensure that, when we count members of sets that have common elements, we neither omit them nor double count them. I had signed off last time with the promise that we will look at something I called ‘derangements’. Of course, the dictionary states that derangement is “the state of being completely unable to think clearly or behave in a controlled way, especially because of mental illness.” Quite obviously, this is not what I’m referring to! So what am I referring to?

Understanding Derangements

In mathematics, specifically in the area of combinatorics, a derangement is a permutation or arrangement in which no item occupies a position it previously occupied. This, I guess, is still a little vague. So let us consider a couple of examples.

Suppose we have a set of n letters that are in the mailperson’s bag. Obviously, each of these letters has an address on the envelop. However, if the mailperson delivers them such that no letter is delivered to the address on its envelop, then we have a derangement.

Or suppose n patrons of a restaurant check their coats in with the maître d’. On their way out the patrons collect the coats, but none of them receives their own coats. In that case, we have a derangement.

We could visualize a derangement as below

Here, we have five items (A, B, C, D, and E). However, none of the items in green matches the corresponding item in red. Using the ordering ABCDE, the above derangement is BEACD. In what follows, I will use red lettering to indicate an item that is not deranged, that is, an item that is placed where it belongs. And I will use green lettering to indicate an item that is deranged, that is, an item that is not placed where it belongs.

We can begin counting the number of possible derangements with small set sizes. For example, with 2 items, A and B, there are 2 permutations:

As shown above, there is only one derangement.

If we increase the set size to 3 items, A, B, and C, there will be 6 permutations:

As can be seen above, there are only 2 full derangements.

We can increase the set size to 4 items, A, B, C, and D, to get 24 permutations:

As seen above, there are 9 full derangements, indicated in green boxes.

Now, with a set of 5 items, we will have 5! or 120 different permutations. I don’t know about you, but I’m disinclined to even write out all these permutations, let alone scrutinize each to determine which items are deranged, then determine which permutations are indeed full derangements, and then to count the number of full derangements. If you have way too much time on hand for mind numbing tasks, please feel free to step away and undertake this not just for a set of 5 items, but a set of (say) 10 items, for which you can write down and scrutinize all the 3,628,800 permutations!

You see, the factorial function grows at an alarming rate. What might even be manageable for a set of 5 items, with 120 permutations, becomes an onerous task for a set that is just double the size, with millions of permutations. Hence, there must be a better way of determining the number of derangements given the size of the set. And, thankfully, there is!

We will proceed along two routes, both yielding the same result. But because the paths are different, we will discover different insights on the process. The first route we will adopt will use the inclusion-exclusion principle that we saw in the previous post. The second route will use a recursive formula.

Derangements using the Inclusion-Exclusion Principle

Given n items, there are n! different ways of permuting them. Out of these n! ways, there are (n-1)! arrangements in which item 1 is fixed in its own position. These (n-1)! arrangements include arrangements in which some other items may also be in their own positions. Similarly, there are (n-1)! arrangements in which item 2 is fixed in its own position. Again, these (n-1)! arrangements include arrangements in which some other items may also be in their own positions.

Of course, for each of the n items there are (n-1)! arrangements in which that item is fixed in its own position. Since there are n items in all, we are looking at

arrangements in which at least 1 item is fixed in its own position.

So now we have in consideration the initial n! less the arrangements in which 1 item is fixed. Hence under consideration now are

arrangements. It is clear, however, that there has been double counting for arrangements in which any pairs of items are both in their own positions. Hence, per the inclusion-exclusion principle, we now need to add all the arrangements in which pairs of items are fixed in their positions.

Suppose items 1 and 2 are fixed. There are (n-2)! such arrangements and these arrangements include arrangements in which items other than 1 and 2 are also fixed in their positions. There are, of course ⁿC₂ different pairs of items and for each such pair there will be (n-2)! arrangements in which the items in the pair are fixed in their positions. So we are looking at

arrangements in which 2 or more items are fixed in their positions. As mentioned, due to double counting, these need to be added back. After this adjustment we will be considering

arrangements. Of course, now we have overcompensated for the arrangements in which 3 items are fixed in their positions. We will need to subtract these.

Now supposed items 1, 2, and 3 are fixed in their positions. There are (n-3)! such arrangements and these will include arrangements in which items other than 1, 2, and 3 are also fixed in tier positions. There are ⁿC₃ different triplets of items and for each such triplet there will be (n-3)! arrangements in which the items of that triplet are fixed in their positions. So we are looking at

These arrangements will need to be subtracted from the running total that we have, yielding

Continuing in this manner, we can see that the final number of derangements after all the appropriate inclusion-exclusion will be

This can be written in a more compact form as

Using this formula we obtain the table below for values of n up to 10.

Derangements using a Recursive Formula

The second method for counting derangements is by using a recursive formula. We will first derive the recursive formula and then modify to demonstrate that it is equivalent to the one we just derived. Once again, we start with a set of n items. Now consider item 1. In order for us to end with a derangement, this item must be placed in one of positions 2 to n. Hence, there are n-1 ways of placing item 1. Since the number of derangements is a whole number, we can conclude that

where X is another whole number that we have to determine. Till now, we have placed 1 of the n items in a position other than its own.

Now let us focus on the k^th item since it has been dislodged by item 1. We can either place it in position 1 or in one of the remaining n-2 positions. It we place the k^th item in position 1, then we still have to derange the remaining n-2 items, which can be done in D_n-2 ways. If we do not place the k^th item in position 1, then there are still n-1 items, including the k^th item, that have to be deranged, which can be done in D_n-1 ways. Since the k^th item must go either in position 1 or in one of the other n-2 positions, the two cases are mutually exclusive and exhaustive. Hence, we can conclude that

The expression in the square brackets should remind us of the binomial identity

You will recall from two posts back that we had considered the matter of selecting r out of n+1 items and we focused on one item, concluding that the selection either does not contain this item, represented by the first term on the left, or does contain this item, represented by the second term on the left.

While deriving the recursive equation

we used the same idea of focusing on one item and deciding whether it still needs to be deranged, represented by the first term in the square brackets, or has already been deranged, represented by the second term.

This can be expanded and rearranged as follows:

Supposed we designate

Then we can see that the earlier equation can be written as

Using this recursively we get

Substituting in terms of the derangement values we get

This can be rearranged as

We can now substitute values for n to obtain the corresponding expressions for D_n. This will give us

Considering the expression for D₆, we can see that it can be rewritten as

And since 0!=1!=1, we can further write it as

Using sigma notation, this can be written as

This can be generalized to

which is exactly the same expression we obtained earlier using the inclusion-exclusion principle.

Derangements and the Inclusion-Exclusion Principle

The formula for derangements can be further written as

where the nested parentheses show how the inclusion-exclusion principle works, much like the matryoshka dolls in the image at the start of this post. What this formula tells us it that we can calculate the number of derangements by considering the total number of permutations and subtracting the cases where at least 1 item is fixed, subtracting the cases where at least 2 items are fixed, subtracting the cases where at least 3 items are fixed, and so on. Due to the fact that (-1)ⁿ oscillates between 1 and -1, the nested parentheses accomplish the same final result as do the alternating signs in the original formula we obtained, namely

Why Derangements aren’t Deranged

So far the discussion has been reasonably theoretical. And you might wonder if there is any reason for understanding derangements beyond the curiosity of mathematicians. I’ve tried to determine how the idea of derangements came about but have not been able to land on a definitive answer. It is quite likely, as in many cases in mathematics, that some mathematician just toyed with the idea and that the idea later found applications. So what are some of these applications?

In the workplace, if you wish to ensure everyone in a team has the opportunity to be responsible for every process, the idea of derangements is essential. In fact, if the team leader intentionally uses the idea of derangements, he/she could reallocate in a more efficient manner than just reassigning and checking if he/she has obtained a full derangement.

In coding, the idea of derangements is used to check for coding errors. In stochastic processes, derangements are used to determine the entropy of a system. In chemistry, the idea of derangements are used in the study of isomers, the number of which increases rapidly especially for compounds with long carbon chains. In graph theory, the idea of derangements can be used to determine unique walks or paths in a graph. In optimization, derangements are used in situations similar to the traveling salesman problem or the assignment problem.

In other words, while it is likely that the idea of derangements began with the toying of a mathematician, it has a wide range of applications today. But what surprises me in something that I am going to deal with in the next post. It is a result that is shocking to say the least and downright mysterious. Whatever could it be? And with that cliff hanger, I bid you au revoir for a week.
Abjuring Double Counting

July 5, 2024

Recapitulation

In the previous post, we had looked at the basic principles of making combinatorial arguments. We saw that these arguments, while deficient in that they do not yield any numerical values, give us insights about how we can count ways of doing a task. In the conclusion of that post, I introduced the idea of the possibility of double counting and suggested that combinatorial arguments allow us to avoid doing that.

Double counting is a particularly pressing problem, especially in democracies. When we wish to determine the ‘will of the people’, we dare not forget that each citizen falls within a number of demographic categories. For example, I am a Christian male, who is currently self-employed and who has graduate degrees. If I give my opinion on some matter and am counted separately under each of these categories, then my opinion will be given undue weight and would not be fair to others.

The question then arises, “How do we ensure that we count every item without counting any item more than once?” Anyone who is involved in any kind of statistical study needs to pay attention to this so that each data point is given the same weightage. So let me introduce you to the inclusion-exclusion principle.

Basics with Two Categories

We begin with the most basic set-up with all items classified into two categories (say A and B). Since items can have multiple classifications, we can represent this using the Venn diagram below:

The Venn diagram has three regions labelled 1, 2, and 3. Set A includes regions 1 and 2, while set B includes regions 2 and 3. If we just add the number of items in sets A and B, we can see that the central region (2) will be counted twice. In other words, the items that belong in both sets will be double counted. To avoid this, we need to subtract the items that belong in the intersection of the two sets.

This can be written as

The last term on the right side of both equations recognizes that some items have been ‘included’ more than they need to be and explicitly ‘excludes’ them from the count. This adjustment by excluding what was included too many times is what gives the principle its name.

Counting with Three Categories

Of course, we know that demographics includes many more than two categories. So how would the inclusion-exclusion change if we had three categories? The Venn diagram below indicates the situation.

As shown in the diagram, there are now 7 distinct regions. Set A includes regions 1, 2, 4, and 5. Set B includes regions 2, 3, 4, and 6. And set C includes regions 4, 5, 6, and 7. If we add sets A, B, and C, we will count regions 2, 5, and 6 twice and region 4 thrice. We can proceed as before and exclude the intersection regions of two sets. But since region 4 belongs to A∩B, B∩C, and C∩A, we will have excluded region 4 thrice. This means that regions 1, 2, 3, 5, 6, and 7 are now counted once, but region 4 is not counted. Of course, it is a simple matter to add it back to include it. This is represented as

The terms in red we have encountered before in the case where we had only two categories, A and B, and had to compensate for overcounting the regions in the intersection of the two sets. The term in green is a new term, introduced because, by compensating for the overcounting of the regions in the intersections of pairs of sets, we ended up undercounting the intersection of all three sets.

Extension to Four Categories

What would happen if there were more than 3 categories? Unfortunately, a 2 dimensional representation will no longer suffice. So let us label the regions as follows: 1: only A; 2: only B; 3: only C; 4: only D; 5: only A and B; 6: only A and C; 7: only A and D; 8: only B and C; 9: only B and D; 10: only C and D; 11: only A, B, and C; 12: only A, B, and D; 13: only A, C, and D; 14: only B, C, and D; and 15: A, B, C, and D.

Now set A includes regions 1, 5, 6, 7, 11, 12, 13, and 15. Similarly set B includes regions 2, 5, 8, 9, 11, 12, 14, and 15. Set C includes regions 3, 6, 8, 10, 11, 13, 14, and 15, And D includes regions 4, 7, 9, 10, 12, 13, 14, and 15. Now, when we just add A, B, C, and D, the regions 5 to 10 will be counted twice, regions 11, 12, 13, and 14 will be counted thrice, and region 15 four times. The over counting for regions 5 to 10 would need an ‘exclusion’ by subtracting the intersection of pairs of sets. However, this would mean that the regions that involve the intersection of exactly three sets (i.e. 11, 12, 13, and 14) would have been ‘excluded’ three times over, meaning that they would not be ‘included’ anymore. It would also mean that the region 15 has been counted negative 2 times! This can be adjusted by now ‘including’ these four regions. However, this means that region 15 is now ‘included’ 2 times. Hence, we make a final adjustment and subtract the intersection of all four sets. This is represented as

Here the red terms represent the adjustments for regions 5 to 10, the green ones for regions 11 to 14 and the blue on for region 15.

Generalization

If we continue in this manner, adding more categories, we will be able to demonstrate that

This can be written in a more compact form as

I recommend you take a close look at how the last two equations are actually the same as we will need such ‘parsing’ of formulas in future posts!

What we have seen here is that we can recursively ‘exclude’ intersections with greater number of sets and in so doing avoid the twin traps of overcounting and undercounting. In the next post we will look at a way of counting an interesting arrangements of objects known as a ‘derangement’. Hope you stay sane till then!