• A Slice of the Pi

    We are in the middle of a study of π. In the previous post, A Piece of the Pi, we introduced π. We saw how its definition itself, based on measurement, is inadequate for calculating the value of π to a great accuracy. We then saw how mathematicians circumvented this problem by using inscribed and circumscribed regular polygons to obtain lower and upper bounds for the value of π. We saw that that endeavor culminated with the efforts of Christoph Grienberger, who obtained 38 digits in 1630 CE using a polygon with 1040 sides! We realized that this method, while mathematically sound, converges too slowly to be of any long term practical use.

    At the end of the post, I stated that there were two broad approaches to address this issue, one using iterative methods and the other using infinite series. We will deal with the infinite series approach here and keep the iterative method for later.

    Snail’s Pace To Infinity

    Since π is a constant related to the circle, it should come as no surprise that the most common infinite series are derived using the circular functions, a.k.a. the trigonometric functions. Specifically, since 2π is the measure of the angle around a point in radians, the inverse trigonometric functions, which map real numbers to angles are used.

    Now we haven’t yet dealt with deriving infinite series using calculus in this blog. I assure you I will get to that soon. For now, I ask you to take my word with what follows. Using what is known as the Taylor series, we can obtain

    which is known as the Gregory-Leibniz series after James Gregory and Gottfried Leibniz, who independently derived it in 1671 and 1673 respectively.

    Using the Gregory-Leibniz series and knowing that arctan 1 is π÷4 we can substitute x = 1 to obtain

    We can take increasingly more terms of this series to see how it converses. Doing so gives us the following

    We can see that this series also converges quite slowly. Even after n = 10000, we only have π accurate to 2 decimal places! Unfortunately, even though I have a Wolfram Alpha subscription, the computational engine timed out for values greater than 10000. So that’s the maximum I was able to generate. But it should be clear that this series produces values that are no better than the ones generated using the regular polygons.

    Machin-like Formulas

    However, mathematicians are quite creative. Using the formulas

    it is not too difficult to obtain

    as derived by John Machin in 1706 or

    as derived by Leonhard Euler in 1755. Of course, to evaluate the inverse tangent functions, both Machin and Euler would have had to use series similar to the one from the Maclaurin series. Using his equation, Machin obtained 100 digits for π, while Euler used his equation to obtain 20 digits of π in under one hour, a remarkable feat considering that this had to be done by hand!

    Formulas such as the previous two are called Machin-like formulas and were very common in the era before computational devices like calculators and computers were invented. This finally led to the 620 digits of π calculated by Daniel Ferguson in 1946, which remains the record for a calculation of π done without using a computational device. Interestingly, in 1844 Zacharias Dase used a Machin-like formula to calculate π to 200 decimal places. And he did it all in his head! Now that’s not only some commitment, it’s a gargantuan feat of mental gymnastics!

    Convergence of Machin-like Series

    We can see why the Machin-like formulas converge more quickly than the original Gregory-Leibniz series. With x = 1, the powers of x are all identical. This means that the convergence is dependent solely on the slowly increasing denominators of the alternating series.

    However, the Machin-like formulas, split x into smaller parts. And since

    each subsequent term is dependent also on the convergence of a geometric series with common ratio between 0 and 1. Since the geometric series converges reasonably rapidly, all we have to do is determine an appropriate split, such as done by Machin and Euler. Naturally, the smaller the parts, the more rapid will be the convergence. Hence, even though

    the Machin-like formulas use smaller fractions, like 1/5 and 1/239, as used by Machin, or 1/7 and 3/79, as used by Euler, to ensure much more rapid convergence.

    Other Infinities

    In addition to infinite series, which involve the summation of infinite terms, mathematicians have also derived infinite products. In 1655, John Wallis derived the equation involving the infinite product, known as the Wallis product, given by

    which can also be written as

    The derivation of this, however, requires an iterative formula. Hence, we will leave a further study of the Wallis product for the next post.

    However, what we can understand from the study of π compared to what we discovered with e is that mathematicians are skilled at overcoming limitations by looking at things from a different perspective. There is a certain intuition and creativity involved in such innovative ideas and calls into question the unfortunately oft repeated untruth that mathematics is a purely left-brained activity.

  • Definitionally Limited

    (Source: Live Science)

    Almost everyone who has studied beyond Grade 4 has probably heard of the mathematical constant π. Students are told that π represents the ratio of the circumference of a circle to its diameter. That is all well and good. But how do we calculate it? I mean, we could potentially use a piece of string to measure both the circumference and the diameter. Even assuming no error in determining one exact rotation around the circle (for the circumference) and diametrically opposite points (for the diameter), the experiment is limited by the measuring instruments.

    Suppose, however, we have access to measuring instruments (necessarily hypothetical) that can measure to the accuracy of the Planck length (i.e. 1.616255)×10-35 m), we will still only get measurements with about 40 significant figures. Since the accuracy of a mathematical calculation cannot exceed the accuracy of the least accurate number in the calculation, we will get a result that has 40 significant figures for π. Actually, if we look at any measuring instrument, we will see that there are hardly any that give more than 4 significant figures. This means that even obtaining 3.1416 as a 5 digit approximation for π is a pipedream if we take the measurement.

    However, we are told quite early on that π is an irrational number. This means that, when we attempt to write π in a decimal format, the expression continues endlessly without any pattern of repetition. What this means is that the definition of π cannot be used to determine its value! In fact, we should have known this precisely from the definition. If C÷D is irrational, at least one of the numbers, C and D, must be irrational. Otherwise, C÷D will give us a rational number. So we can see that the definition of π that is based on measurement is an impractical way of obtaining the value of an irrational number like π.

    Now, if you refer back to our posts on e, especially the opening post, you will realize that this is exactly the opposite issue we faced there. The definition of e was able to give us results to as great an accuracy as we desire. While we had to do quite a bit of heavy mathematics before we obtained a usable infinite series, once we had obtained the desired infinite series, we could use it to obtain the value of e to any arbitrary level of accuracy.

    Not so with π since its definition in terms of measured lengths renders it limited precisely because no measuring instrument can be indefinitely accurate.

    Why the Symbol

    Before we proceed, however, it is interesting to make a brief detour concerning the history of how the Greek letter π came to be used to denote this ratio. The earliest known use of the π in a similar context was to designate the semiperimeter of the circle. With the Greek letters δ and ρ representing the diameter and radius of the circle, the many mathematicians used the ratios

    to denote what we currently would denote as π and 2π respectively.

    In his 1727 paper An Essay Explaining the Properties of Air, Euler used π to denote the ratio of the circumference to the radius. Just 9 years later, in 1736 in his book Mechanica, Euler used π to denote the ratio of the circumference to the diameter. From then on he consistently used the latter designation and, because he corresponded prodigiously with other mathematicians, his notation eventually was adopted though, even till 1761 there was quite a bit of fluidity.

    Despite the confusing history of the symbol, something utterly lacking in the case of e, π has been the more popular number and more efforts have been made to calculate its digits than the digits of e. Obviously, this means that mathematicians have some other result, necessarily stemming from the definition of π, that allow the generation of different infinite series that can be used to calculate the value of π. The current record stands at 105 trillion digits, which required 75 days of dedicated computer time by a computer company. But if the definition ofπ is limited, how in the world do mathematicians calculate its digits?

    The Polygon Method

    The original impetus came from the study of polygons, specifically regular polygons. In the figure below we see the same circle repeatedly inscribed and circumscribed by regular polygons. On the left we have pentagons. In the middle are hexagons. And on the right are octagons.

    Here it is possible to define the circle as having a radius of 1 unit. Then the distance of the vertices of the inscribed polygon to the center of the circle will also be 1 unit. And the perpendicular distance of the center of the circle from the sides of the circumscribed polygon will also be 1 unit. Using this, the perimeter of both polygons can be determined. Also we can easily see that the circumference of the circle has a value between the perimeters of the two polygons. This allows us to place lower and upper bounds on the value of π. With polygons having more sides our accuracy becomes better and we can get tighter bounds.

    Using this method with 96-sided regular polygons, Archimedes, in the 3rd century BCE, obtained that

    This gives us

    which most students in middle school will realize are accurate only to 3 significant figures! 96 sides and 3 significant figures! I don’t know about you, but I think this is a really poor payoff. I applaud Archimedes for sticking with his 96-sided polygons.

    Not to be outdone by the redoubtable Archimedes, Liu Hui from the Wie Kingdom used a 3,072-sided polygon and determined that the value of π was 3.1416. That’s a factor of 32 more sides yielding just 2 more digits!

    Not having any other methods at the time, around 480 CE and using Liu Hui’s algorithm with a 12,288-sided polygon, Zu Chongzhi showed that π could be approximated by 355÷113, which gives 3.1415929…, accurate to 6 digits. In 499 CE, Aryabhatta used the value 3.1416, though, unfortunately, he does not tell us the method he used or if he was using someone else’s value.

    Mathematicians continues using this polygon method for the next few centuries. In 1424 CE Jamshīd al-Kāshī determined 9 sexagesimal digits of π, the equivalent of about 16 decimal digits, using a polygon with 3×228 sides! Finally, Christoph Grienberger obtained 38 digits in 1630 CE using a polygon with 1040 sides!

    I do not wish to detract from the achievements of any of these great mathematicians. The patience and perseverance they displayed is exemplary. Morover, they achieved what they did in an age before calculators and computers. They only had their hands and minds to work with. Despite their achievements – or rather precisely because of them – it is evident that the polygon method is wonderfully accurate, but also woefully inefficient.

    The Path Forward

    Apart from the slowness with which the polygon method converges to the value of π, it is evident that, when you have a finite number of terms, here represented by the finite number of sides of the polygon, regardless of how large the number of sides may be, the final result will only be achieved as a value between two bounds specified by the inscribed and circumscribed polygons. Due to this, mathematicians have devised what can only be described honestly as ‘workarounds’, methods that yield the value of π but not because of some direct application of the definition.

    These methods fall into two large branches. First, we have the methods that rely on some kind of iterative formula. Second, we have methods that yield an infinite series. We must ask ourselves how an iterative formula or infinite series can be generated for a number that is defined in terms of the ratio of two lengths. In the next two posts we will see just how this is done.

  • The Growing Frustration

    It’s frustrating. Yes, you heard me right. It’s frustrating. You may be wondering why I am saying this and in what context. So here goes. Just the other day, I was looking up some perspectives on a mathematics discussion forum. Quite naturally, there were many mathematics teachers in the forum. And quite naturally there was some discussion about questions asked by students.

    As a mathematics teacher, I find these questions quite illuminating. In fact, this is one of the reasons I love being a teacher. You never know what would spark the interest of some student enough to formulate a question. So even though what I teach might be much the same from year to year, because who I teach changes, it is never boring. Indeed, even when you teach the same group of students, as I did from 2016 to 2018 and then 2020 to 2023, you can see their questions change as they themselves mature and change.

    But it’s time to stop this nostalgic reverie and return to my frustration!

    Case I: Equivalent Fractions

    One question on the forum was asked by the teacher of a student in elementary school. I presume this is between Grade 1 and Grade 5. The student was learning fractions. When they came to the concept of equivalent fractions, the student asked the teacher what was different between the fractions 6/8 and 3/4. The teacher asked the question on the forum, requesting advice on how to respond to the student.

    The poor teacher! The backlash she faced was unconscionable. So many of her fellow mathematics teachers pounced on her to tell her not just that both fractions are the same but that any mathematics teacher worth his/her salt should know this. Some questioned her competence to teach the subject. I commented, taking her side, and posted a link to a previous post in which I demonstrate that equivalent fractions might have the same value but do not contain the same information.

    I faced the same issue in a recent class. A student asked me what the difference between various equivalent fractions was. That this student is currently in Grade 12 and is a reasonably astute student is an indictment against all his mathematics teachers.

    Anyway, the question arose when, in answer to a problem, the student gave me the following numbers:

    The student had calculated accurately and the numerical values of each of these fractions was spot on. However, I told him that, by reducing the fractions to their lowest forms, he had lost something crucial. It was then that he asked me the question about equivalent fractions.

    So I gave him another set of fractions:

    I asked him what links this set of fractions. He struggled, unsuccessfully, for a few minutes. Then I told him that I had generated this set of fractions in a very similar way to the one with which he had generated the first set. I gave him a few more minutes to try his luck, to no avail.

    What was it that gave rise to these two sets of numbers? There is clearly no discernible pattern in either of the sets and they may seem to be just some random fractions generated by an addled mind. However, what if I told you that the first set, before reducing to equivalent fractions, was

    Right away, you will recognize that there is a pattern. The numerator is composed of multiples of 5, while the denominator is composed of factorials, starting with the factorial of 3. If I asked you to give me a formula for generating the terms, you will quickly be able to tell me:

    Similarly, if, for the second set of numbers, I gave you

    you will equally quickly be able to tell me

    You see, while the numbers in the first set and third set have the same corresponding values (and similarly with the second and fourth sets), something is lost by reducing the fractions to their lowest terms. And with this loss, we are rendered incapable of generating further terms in the sequence because the process of reducing the fractions destroys the pattern.

    Case II: Quadratic Expressions

    This shows up in other places too. For example, one of the students I am currently tutoring is studying quadratics at school. Now this student is exceptionally bright. When I was concluding a class, I asked her to think about the difference between

    and

    For those unfamiliar with the above, let me give an example.

    The above is an example of the fact that any quadratic expression of the form

    can be expressed in the form

    But what was the point of the re-expression? Why should we bother to do it? In the next class I asked my student if she had given this some thought. She said she had but that she could not figure out in what way

    is different from

    Since we had just started learning about quadratics, I did not fault her for not being able to answer the question. After all, that’s what learning is. But she told me that she had asked her teacher about the difference. And her teacher had told her that there was no difference because

    I’m glad the class was online. Otherwise, my student would have felt my fury palpably. It is likely that she did sense it even over our Zoom call! The statement of the teacher is absolute rot and betrays a failure to look below the surface. For, while both expressions are indeed equivalent, the expression on the right tells me that this quadratic expression, if plotted as a function of x, has a vertex at the point (1, -13). You may wonder, “What’s the big deal? So what if the expression on the left gives me the coordinates of the vertex? I could do a few steps of algebra and get it.” Absolutely true.

    However, if you were given the functions

    you would not know that they all have the same vertex. However, if they were written as

    the fact that the vertices are at (1, -13) is indisputable. But more than that, the second set of equations also tell us, from the signs on the coefficients 3 and -13, 5 and -13, and -6 and -13, that f(x) = 0 and g(x) = 0 will have real roots, while h(x) = 0 will only have complex roots.

    Case III: Theory of Equations

    The same kind of issue shows up when we attempt to solve equations in general. Suppose, for example, you are asked to solve the equation

    How would you proceed? Of course, you can use some general results from the theory of equations and conclude

    But after this, what? Every student who has fumbled with these results will know that, if you try to solve these equations by some standard methods like elimination or substitution, you will revert to the original equation. So while the above four results give some valuable insights about sums and products of the roots of the equation, they do not bring us any closer to actually solving the equation.

    When it comes to just simple quadratic equations, of course, the above method is powerful. After all, suppose we are given the equation

    A little back of the napkin calculations will tell me that I need to find two numbers α and β such that

    With a little effort, I can determine that the divisors of 3705 are 1, 3, 5, 13, 15, 19, 39, 57, 65, 95, 195, 247, 285, 741, 1235, and 3705. I can place them in pairs as follows: (1, 3705), (3, 1235), (5, 741), (13, 285), (15, 247), (19, 195), (39, 95), and (57, 65), where the numbers in each pair gives the product of 3705. Now since I have -3705 and not 3705, I need two numbers of opposite signs. And since I have -56 and not 56, I need the number with the larger magnitude to be negative. This gives the pair of numbers to be 39 and -95. Now we can proceed to split the middle term and factorize, giving

    This allows us to see that we have the product of two numbers equalling zero, which must mean that either of the two numbers is zero. This allows us to conclude

    While this method is relatively simple and straightforward in the case of quadratics, there is absolutely no way to use it when it comes to any generic polynomial equation of higher degree, like the quartic equation we saw earlier.

    However, the actual way of solving the quadratic, by having a product equal to zero, should allow us to see that, if we had a quartic equation of similar form, we would be able to solve it. Indeed, the earlier quartic equation can be written as

    This will allow us to conclude that

    In other words, though it is almost child’s play for a student in grade 9 or above to show that

    can be expanded to give

    it would take even a skilled mathematician quite a few iterations of trial and error before she was able to solve

    However, most high school students and perhaps even some middle school students would be able to solve

    It is, therefore, quite frustrating when students come to me in Grade 11 and tell me that their teachers either found no difference between the expanded forms of expressions and the factorized forms or that they actually preferred the expanded form. Granted that the parentheses seem intrusive at times, one person may certainly prefer the visual appeal of the expanded form to the factorized form. However, when it is clear that the expanded form is difficult, if not impossible, to solve, this would be like preferring opacity to lucidity.

    Case IV: Permutations and Combinations

    I attribute this unfortunate preference to the noose like grip that textbook publishers and calculator manufacturers have on the global high school examination boards and schools. This fourth case, should highlight this point well. Suppose we were faced with the following problem. In how many ways can a committee of 7 people and another of 8 people be chosen out of 25 people such that 2 members of each committee are to be designated as president and secretary and such that no person belongs to both committees? How would we proceed?

    We could first select the 7 member committee, which can be done in 25C7 ways. Now among these 7 members, we have to choose 1 to be president and 1 to be secretary, which can be done in 7P2 ways. We use a permutation because it matters whether a person is president or if she is secretary. Now we have 18 people remaining. We can choose the second committee of 8 members in 18C8 ways and the president and secretary in 8P2 ways. So in all we have

    Most textbooks will state that the answer is 49473074851200 or 4.947×1013 ways or something of the sort, where each individual term has been calculated and then the product evaluated. But this is an onerous task to do by hand. Hence, expecting the student to get either answer is expecting them to use a calculator. But perhaps the above number is too large for most of us to even process. Let’s deal with some smaller numbers.

    Suppose I have 5 students. On one day, I wish to send a team of 3 of them for a debate. Quite naturally, this can be done in 5C3 = 20 ways. On another day, I need to send one student to the staff room to collect my books and another to the library to collect a journal. This can be done in 5C1 × 4C1 = 5 × 4 = 20 ways. Both tasks can be done in 20 ways. However, the number 20 obscures how it was obtained. If the textbook gives the answer as 20 for the first situation, but a student used the second method and obtained 20, she would think she has understood the concepts since she obtained the ‘correct’ numerical answer. However, her thinking is obviously flawed. However, if the textbook gave the answers as 5C3 and 5C1 × 4C1 respectively, any diligent student would be able to see where she had gone wrong.

    When I have taught large classes of students, I have often given them a random number and asked them to give me a series of operations with only a particular single digit number that would yield the random number. For example, suppose the random number is 23 and the chosen single digit number is 4. Then I could have

    There are infinitely many ways in which I can use just the number 4 and obtain 23. The challenge would be for each student to give a unique set of operations. And when I have done this exercise, never once have we needed to repeat a set of operations.

    What I am trying to say is that expressions such as 5C1 × 4C1 or 5C3 or, as in the original problem,

    may be numerically equal to 20, 20, and 49473074851200 but contain far more information and give far more insight than the explicit numbers. However, if you are a calculator manufacturer, none of these insightful forms will help you sell your products!

    Escaping the Trap

    So you, that is, the calculator manufacturer, and the examination boards and the textbook publishers form a nexus in which these explicit but uninsightful forms are given preference over the forms that can be used to yield mathematical insight and promote mathematical understanding.

    Unfortunately, most teachers go ‘by the book’. And so over time they forget that 3/4 and 6/8 may be numerically equivalent, but have quite different meanings. They forget that 20 is not just a number but could have been obtained in many ways, most of which are incorrect. They tow the line and look at the bottom line, that is, the final answer, concluding, if the answers match, that the student must have correctly executed the right sequence of operations. But as we have seen, this need not be the case.

    It is time that the examination boards took a firm stance and actually favored the development of mathematical understanding of students rather than their ability to use a tool that is obsolete. I mean, where in the world does anyone use a calculator these days? Some old style retail outlets may use four function calculators to add the prices of items on a list and maybe also to calculate any sales tax or GST. However, where does anyone use a calculator for trigonometric or logarithmic or exponential functions except in school where it is forced onto the students? If we want students to be able to use some kind of technology to evaluate complicated functions, then we should teach them a programming language rather than make them use calculators. At least the former has real world relevance for many jobs while the latter has absolutely none!

  • Setting the Stage

    Smoke and Mirrors. Photo by theilr. (Source: Flickr)

    I have long been fascinated by patterns that exist in numerical sequences. Early exploration led me to the discovery of the Fibonacci series and the associated Golden Ratio. I also came across Pascal’s Triangle, which remains a source of intrigue. Hence, when I was introduced to the tests for divisibility, I devoured them with a voraciousness unmatched till then in my life. By some accounts, I stumbled upon the tests for divisibility by 3 and 9 all by myself, without the direct involvement of my teachers.

    And when my teachers ‘failed’ – or so it seemed to my young mind – to give me a test for divisibility by 7, I attempted to derive one such test. Of course, I failed at the task. Try as I might, I could come away with no universal test that would hold true for a number having any number of digits. What was it about 7?

    You see, we have tests for divisibility by 2 (number ends with 0, 2, 4, 6, or 8), 3 (sum of digits is divisible by 3), 4 (last 2 digits are divisible by 4), 5 (number ends in 0 or 5), 6 (number satisfies conditions for 2 and 3), 8 (last 3 digits are divisible by 8), 9 (sum of digits is divisible by 9), 10 (number ends with 0), 11 (difference of the sums of alternate digits is divisible by 11), and 12 (number satisfies conditions for 3 and 4). Why was 7 so stubborn?

    Moreover, why did these tests work out the way they did? Perhaps if we uncovered the reasons for which the tests for 2, 3, 4, 5, 6, 8, etc. are what they are, we can gain some insight into why 7 refuses to submit to our efforts.

    In what follows, I will use the notation [wxyz] to denote a 4 digit number which has w in the thousands position, x in the hundreds position, y in the tens position, and z in the units position. Hence, the value of the number

    Divisibility by 2

    Tessellation pattern. (Source: Mathematical Mysteries)

    Naturally, the first test of divisibility is that by 2. In order to derive this test, we do the following:

    Here it is clear that the sum of the first three terms is divisible by 2. Hence, the number [wxyz] as a whole will be divisible by 2 if and only if z is divisible by 2, which happens when z = 0, 2, 4, 6, or 8, giving the test for divisibility by 2.

    Divisibility by 3

    When we reach divisibility by 3, we can do the following:

    That is

    Here it is clear that the first term is divisible by 3 and 9. Hence, the number [wxyz] as a whole would be divisible by 3 or 9 if the second term, that is the sum of the digits, is divisible by 3 or 9 respectively. This gives rise to the tests for 3 and 9.

    Divisibility by 4

    For the test for divisibility by 4, we manipulate our number as follows:

    Here it is clear that the first term is divisible by 4. Hence the number [wxyz] as a whole will be divisible if the second term, which is the number [yz] formed by the last two digits, is divisible by 4. This yields the test for 4.

    Divisibility by 5

    The test for divisibility by 5 is easy to remember. But it is obtained by doing the following:

    Here it is clear that the first term is divisible by 5. Hence, the number [wxyz] as a whole will be divisible by 5 if z is divisible by 5, which happens when z is either 0 or 5, giving us the test for 5.

    Divisibility by 6 and 12

    Since 6 is the product of 2 and 3, a number that satisfies the condition for 2 and 3 will be divisible by 6. Similarly, since 12 is the product of 3 and 4, a number that satisfies the condition for 3 and 4 will be divisible by 12.

    Depiction of the Mandelbrot Set. (Source: Boston University)

    Divisibility by 8

    The test for divisibility by 8 is an extension of that derived for 4. We proceed as follows:

    Here it is clear that the first term is divisible by 8. Hence, the number [wxyz] as a whole will be divisible by 8 if the second term, which is the number formed by the last three digits, is divisible by 8. Hence, we get the test for 8.

    Divisibility by 10

    The test for divisibility by 10 follows the same logic as that for 5. In order to derive it we do the following:

    Here it is clear that the first term is divisible by 10. Hence, the number [wxyz] as a whole will be divisible by 10 if z is divisible by 10, which happens when z is0, giving us the test for 10.

    Divisibility by 11

    When we come to divisibility by 11, we realize that it is not as straightforward as the ones that came before. Indeed, while the test for divisibility by 11 was taught when I was in school, it has mostly been removed from the mathematics curriculum today. Nevertheless, the test for divisibility by 11 is quite instructive. So let us see what we can learn from it. Given the number [wxyz],, we can do the following:

    Here it is clear that the term in red is divisible by 11. Hence, the number [wxyz] will be divisible by 11 if the term in green, which is the difference or the sum of alternate digits, is divisible by 11, yielding the test for 11.

    Divisibility by 7

    When we come to the number 7, there is indeed a test for divisibility by 7. It goes as follows. Suppose we have the number […xyz]. Then if […xy] – 2z is divisible by 7, the original number is divisible by 7.

    For example, consider the number 658. We can proceed as follows:

    Since 49 is divisible by 7, it follows that 658 is divisible by 7. But how does this work? Consider the number […xyz]. We can proceed as follows:

    Here the term in green is obviously divisible by 7. The term in blue is the one we are asked to test for divisibility by 7. If it is divisible by 7, then the whole number is divisible by 7. But what if we had a much larger number? How would we proceed? Suppose, for example, that we were testing for 548975. We would have to proceed as follows:

    Since 35 is divisible by 7, we can conclude that 548975 is divisible by 7. This is a recursive method that is fool proof. It is also easy to remember. However, due to the recursion and the fact that most of us probably barely remember multiples of 7 that are 3 or more digits long, for a number with n digits, we will have to perform n – 2 recursions before reaching a 2 digit number that we could recognize as a multiple of 7 or not.

    Each recursion involves a multiplication and a subtraction. This results in 2n – 4 operations for the n – 2 recursions.

    Attempt at Divisibility by 7

    We can see that in most of the tests, we generate 2 terms, the first of which is automatically divisible by the number under consideration. Then the test devolves into testing if the second term is divisible by that number. For the tests for 3 and 9 we saw that one less than a power of 10 is automatically divisible by 3 and 9. For the tests for 4 and 8 we saw that multiples of 100 and 1000 are automatically divisible by 4 and 8 respectively.

    Even for the test for 7 we followed the same basic idea indicated by the 21z term. However, while the other tests yielded the answer in a single step, the test for divisibility by 7 involved a recursive approach. Hence, for a 6 digit number, we had to perform 4 recursions to obtain a 2 digit number for which divisibility by 7 can be placed in memory.

    The test for 11 was most illuminating. We saw that some powers of 10 (i.e. 10 and 1000) are 1 less than a multiple of 11 while other powers of 10 (i.e. 1 and 100) are 1 more than a multiple of 11. Could we use a similar approach to obtain another test for divisibility by 7?

    To attempt that, let us see how we fare when we divide powers of 10 by 7. The table below shows the remainders.

    After the 6th power of 10, that is 1,000,000, the patterns repeats. And this is because

    Hence, when we have a number of the form [xyzxyz] it is necessarily divisible by 7. Since 1,000,000 is 1 more than a number that follows this pattern (i.e. 999999) the pattern of remainders will repeat after every 6th power of 10.

    Now we observe that the first three powers of 10 give remainders of 1, 3, and 2 respectively. The next three powers of 10 give negative remainders in the same order.

    So if I had the number [uvwxyz], we can rewrite it as

    Here the term in red is obviously divisible by 7. Hence, the whole number [uvwxyz] is divisible by 7 if the term in green is divisible by 7. Let’s put this to the test and let’s label the term in braces as D.

    Consider a 2 digit number (say 42). Here, u = v = w = x = 0, y = 4 and z = 2.

    Since 14 is divisible by 7, 42 is divisible by 7.

    Consider 91. Here

    Since 28 is divisible by 7, 91 is divisible by 7.

    Consider a 3 digit number (say 434). Here, u = v = w = 0, x = 4, y = 3 and z = 4.

    Since 21 is divisible by 7, 434 is divisible by 7.

    Consider a 4 digit number (say 2072). Here, u = v = 0, w = 2, x = 0, y = 7 and z = 2.

    Since 21 is divisible by 7, 2072 is divisible by 7.

    Consider a 5 digit number (say 24962). Here, u = 0, v = 2, w = 4, x = 9, y = 6 and z = 2.

    Since 28 is divisible by 7, 24962 is divisible by 7.

    Consider a 6 digit number (say 548975). Here, u = 5, v = 4, w = 8, x = 9, y = 7 and z = 5.

    Since 14 is divisible by 7, 548975 is divisible by 7.

    Now I grant that this process is not intuitive. There is no identifiable uniformity that would allow it to function as a test that is easy and quick to use as the other tests. There is, of course, a pattern. In each block of 3 digits, the digits are multiplied by 1, 3, and 2, respectively going from right to left. Alternate blocks are to be added and once all that is done, the difference between the sum of alternate blocks is to be evaluated. The resulting number, which will, in all likelihood, be a 2-digit number, is tested for divisibility by 7 and the result here tells us about the divisibility by 7 of the original number.

    If we counted the number of operations involved, we will see that each of the parentheses in the green term above, involves 4 operations. For a number that has 3m digits, we will have m groups with m – 1 operations linking the groups, yielding 5m – 1 operations in all. With the previous test, a number of similar length (i.e. 3m), would require 6m – 4 operations. It is clear, then, that the second test will be quicker than the first. However, the first test is more straightforward than the second.

    Stepping Back

    However, despite the fact that the second test is not intuitive, such a process reveals the logic behind the other tests. Those tests are quite easy to implement, allowing us to take for granted their efficacy. And if my memory serves me right, none of my teachers ever really explained to me why those tests work, as I have done above. Hence, these algorithms come across to many students as some kind of mathematical wizardry, where reliable results are obtained by some numerical sleight of hand. Actually, I have a nagging feeling that many mathematics teachers themselves have no idea why the tests for 3, 4, 6, 8, and 9 work and most probably have no clue about the test for 11.

    If we take a step back to understand what happened when we derived this test for divisibility by 7 we can see that, when we divided powers of 10 by 7, we obtained 6 different remainders. We took powers of 10 because that is precisely what the place value system of writing numbers shows us as we have seen when we expanded [uvwxyz] and other numbers. Concerning the remainders, we can further observe that in fact we obtain all the possible remainders we can get when we divide by 7. This is actually revealed in the recurring pattern ‘142857’ when we divide any integer that is not a multiple of 7 by 7. The increasing powers of 10 only shift this pattern rightward to give us the remainders as follows

    Here the numbers with the same color indicate how the pattern is generated, that is, how one remainder morphs into the next as division is continued. Note especially the numbers in parentheses in the last column and compare them with the numbers in the second column of the previous table.

    Since the recurring pattern for division by 7 is a string of 6 numbers (142857), we get 6 different remainders, which, in the case for 7, are all the remainders we could obtain. Due to this our test involved 6 different numbers u, v, w, x, y, and z. Since the pattern of remainders 1, 3, 2, -1, -3, -2 conveniently formed 2 blocks of 3 numbers, we were able to reduce the test to a replicating pattern involving the sum of 3 numbers for which the difference is finally taken.

    Now, when we obtain tests for divisibility, we need focus only on the prime numbers as seen in the very brief descriptions for divisibility by 6 and 12. Powers of 2 present a unique situation as seen in the tests for 2, 4, and 8. However, if we focus on the prime numbers, the next one we encounter is 13. What would a test for divisibility by 13 look like? It pays to note that the recurring pattern for division by 13 is ‘076923’ and then to generate the following table:

    Now it helps to not rush to any conclusions. While division by 7 and 13 produced patterns of length 6, this is not the case for all primes, as we should expect. For example, division by 17, the next prime, produces the full complement of 16 possible remainders. Anyway, getting back to 13, I encourage the reader to use the table above and the ones before it to obtain a test for divisibility by 13. Also do attempt a test similar to the recursive first one. Please do put your findings in the comments. And I will see you next week.

  • In the last three posts, we have been dealing with some issues related to counting. In Learning to Count, I introduced us to the area of combinatorics and specifically combinatorial arguments. In the second post, Abjuring Double Counting, we looked at the inclusion-exclusion principle that is used to ensure that every element is counted exactly once. In last week’s post, No Place at Home, the third in this series, we looked at the idea of derangements and used the inclusion-exclusion principle and a recursive formula to derive the expression for the number of derangements in a set of n items. To jog your memory, we proved that

    Revisiting an Old Friend

    In the previous series of posts, we had dealt with the number e. In the second post in that series, Naturally Bounded, we had derived an infinite series expression for e as

    In the above equation, we make the following substitution:

    This gives us

    Please read the above carefully. Raising the equation to the power x we get

    Using the same techniques as we used in Naturally Bounded, we will obtain

    If we substitute x=-1 in the above we will get

    A Surprising Result

    Now consider the equation

    We can rewrite this as

    Taking a limit on both sides as n approaches infinity, we will get

    Expanding the summation on the right side we will obtain

    But we just showed that the right side is related to e. Specifically, we can write

    Or taking reciprocals we get

    In other words, as n increases, the ratio of the number of permutations of n items to the number of derangements of the n items approaches e. This is a surprising result as there is no obvious reason for which e should show up in this ratio.

    An Approximate Explanation

    However, let us consider that we have a set of n items. If we now place items in random positions, the probability that item 1 occupies its own spot will be 1/n. Hence, the probability that it does not occupy its own spot will be 1-1/n. The probability that item 2 occupies its own spot will be 1/(n-1), since one spot has already been occupied by item 1. Hence, the probability that item 2 is not in its own spot is 1-1/(n-1). For large values of n, 1-1/n and 1-1/(n-1) are nearly equal. For example, if n=1,000,000, the corresponding values are 0.999999 and 0.999998999998…, where the difference is approximately 0.000000000001. Since the probability values are so close to each other, we can approximate the probabilities that each of the n items to not occupy its own position to be 1-1/n.

    This leads us to the overall probability being

    Taking the limit as n approaches infinity we get

    So we can see that the reason e appears in our study of derangements is that, for large values of n, the placement of the items in positions not their own are almost independent of each other and almost equal to each other. That we have used the word ‘almost’ twice is the reason why e only shows up in the limiting case and not for any finite values of n. Of course, as we place more items in different positions, the error involved in the approximation will increase. Hence, the above reasoning cannot fully account for the appearance of e in the expressions about derangements.

    Hidden in Plain Sight

    However, recall that, in the previous post, we had derived

    If we divide throughout by n!, we will get

    This reduces to

    The term in red is the probability that a random permutation of n items is a derangement. The term in green is the probability that a random permutation of n-1 items is a derangement. We can see that as n gets larger, the right side gets closer to zero at the rate of a factorial. Hence, we can conclude that the two probabilities on the left side converge to some specific value.

    Further, we had derived the expression for the number of derangements as

    which can be rearranged to give

    As observed in the previous post, the nesting of the parentheses yield the alternating signs of the terms. We also saw that the nesting of the parentheses was the result of the application of the inclusion-exclusion principle. In other words, the alternating signs are the result precisely of the way the derangements are counted. Hence, even though we might not have expected the appearance of e and might even be quite surprised by it, it is actually something that was in the expression all along.

    Deranged No More

    I have enjoyed writing these posts on counting, especially the last two on derangements. When I first encountered the idea, it was a novelty. Why would we ever want to place something where it does not belong? Being a person who does like order (though you may not suspect it if you looked at my desk!), the thought of placing something where it did not belong seemed only to be an instance of mathematicians playing around. But, as we saw in the previous post, derangements have a wide range of applications, some not quite intuitive. My interest was further piqued when I first saw how derangements are linked to e. The startling result hammered home the idea that mathematics, though often divided into incommensurate silos while we are taught it in school, is actually a unified body of knowledge.

    I hope you have also received some enjoyment and maybe some stirring of interest through these posts. I would really like to hear from you. Please do write a comment and let me know what you liked, what you disliked, and what you found confusing.

  • Recapitulation on Counting

    Russian Matryoshka dolls. (Source: Cambridge Mathematics)

    In the previous two posts, I introduced ideas related to counting. In Learning to Count we introduced the idea of combinatorial arguments, which allowed us to gain some insight on the process of selection. In Abjuring Double Counting we saw the recursive process used to ensure that, when we count members of sets that have common elements, we neither omit them nor double count them. I had signed off last time with the promise that we will look at something I called ‘derangements’. Of course, the dictionary states that derangement is “the state of being completely unable to think clearly or behave in a controlled way, especially because of mental illness.” Quite obviously, this is not what I’m referring to! So what am I referring to?

    Understanding Derangements

    In mathematics, specifically in the area of combinatorics, a derangement is a permutation or arrangement in which no item occupies a position it previously occupied. This, I guess, is still a little vague. So let us consider a couple of examples.

    Suppose we have a set of n letters that are in the mailperson’s bag. Obviously, each of these letters has an address on the envelop. However, if the mailperson delivers them such that no letter is delivered to the address on its envelop, then we have a derangement.

    Or suppose n patrons of a restaurant check their coats in with the maître d’. On their way out the patrons collect the coats, but none of them receives their own coats. In that case, we have a derangement.

    We could visualize a derangement as below

    Here, we have five items (A, B, C, D, and E). However, none of the items in green matches the corresponding item in red. Using the ordering ABCDE, the above derangement is BEACD. In what follows, I will use red lettering to indicate an item that is not deranged, that is, an item that is placed where it belongs. And I will use green lettering to indicate an item that is deranged, that is, an item that is not placed where it belongs.

    We can begin counting the number of possible derangements with small set sizes. For example, with 2 items, A and B, there are 2 permutations:

    As shown above, there is only one derangement.

    If we increase the set size to 3 items, A, B, and C, there will be 6 permutations:

    As can be seen above, there are only 2 full derangements.

    We can increase the set size to 4 items, A, B, C, and D, to get 24 permutations:

    As seen above, there are 9 full derangements, indicated in green boxes.

    Now, with a set of 5 items, we will have 5! or 120 different permutations. I don’t know about you, but I’m disinclined to even write out all these permutations, let alone scrutinize each to determine which items are deranged, then determine which permutations are indeed full derangements, and then to count the number of full derangements. If you have way too much time on hand for mind numbing tasks, please feel free to step away and undertake this not just for a set of 5 items, but a set of (say) 10 items, for which you can write down and scrutinize all the 3,628,800 permutations!

    You see, the factorial function grows at an alarming rate. What might even be manageable for a set of 5 items, with 120 permutations, becomes an onerous task for a set that is just double the size, with millions of permutations. Hence, there must be a better way of determining the number of derangements given the size of the set. And, thankfully, there is!

    We will proceed along two routes, both yielding the same result. But because the paths are different, we will discover different insights on the process. The first route we will adopt will use the inclusion-exclusion principle that we saw in the previous post. The second route will use a recursive formula.

    Derangements using the Inclusion-Exclusion Principle

    Given n items, there are n! different ways of permuting them. Out of these n! ways, there are (n-1)! arrangements in which item 1 is fixed in its own position. These (n-1)! arrangements include arrangements in which some other items may also be in their own positions. Similarly, there are (n-1)! arrangements in which item 2 is fixed in its own position. Again, these (n-1)! arrangements include arrangements in which some other items may also be in their own positions.

    Of course, for each of the n items there are (n-1)! arrangements in which that item is fixed in its own position. Since there are n items in all, we are looking at

    arrangements in which at least 1 item is fixed in its own position.

    So now we have in consideration the initial n! less the arrangements in which 1 item is fixed. Hence under consideration now are

    arrangements. It is clear, however, that there has been double counting for arrangements in which any pairs of items are both in their own positions. Hence, per the inclusion-exclusion principle, we now need to add all the arrangements in which pairs of items are fixed in their positions.

    Suppose items 1 and 2 are fixed. There are (n-2)! such arrangements and these arrangements include arrangements in which items other than 1 and 2 are also fixed in their positions. There are, of course nC2 different pairs of items and for each such pair there will be (n-2)! arrangements in which the items in the pair are fixed in their positions. So we are looking at

    arrangements in which 2 or more items are fixed in their positions. As mentioned, due to double counting, these need to be added back. After this adjustment we will be considering

    arrangements. Of course, now we have overcompensated for the arrangements in which 3 items are fixed in their positions. We will need to subtract these.

    Now supposed items 1, 2, and 3 are fixed in their positions. There are (n-3)! such arrangements and these will include arrangements in which items other than 1, 2, and 3 are also fixed in tier positions. There are nC3 different triplets of items and for each such triplet there will be (n-3)! arrangements in which the items of that triplet are fixed in their positions. So we are looking at

    These arrangements will need to be subtracted from the running total that we have, yielding

    Continuing in this manner, we can see that the final number of derangements after all the appropriate inclusion-exclusion will be

    This can be written in a more compact form as

    Using this formula we obtain the table below for values of n up to 10.

    Derangements using a Recursive Formula

    The second method for counting derangements is by using a recursive formula. We will first derive the recursive formula and then modify to demonstrate that it is equivalent to the one we just derived. Once again, we start with a set of n items. Now consider item 1. In order for us to end with a derangement, this item must be placed in one of positions 2 to n. Hence, there are n-1 ways of placing item 1. Since the number of derangements is a whole number, we can conclude that

    where X is another whole number that we have to determine. Till now, we have placed 1 of the n items in a position other than its own.

    Now let us focus on the kth item since it has been dislodged by item 1. We can either place it in position 1 or in one of the remaining n-2 positions. It we place the kth item in position 1, then we still have to derange the remaining n-2 items, which can be done in Dn-2 ways. If we do not place the kth item in position 1, then there are still n-1 items, including the kth item, that have to be deranged, which can be done in Dn-1 ways. Since the kth item must go either in position 1 or in one of the other n-2 positions, the two cases are mutually exclusive and exhaustive. Hence, we can conclude that

    The expression in the square brackets should remind us of the binomial identity

    You will recall from two posts back that we had considered the matter of selecting r out of n+1 items and we focused on one item, concluding that the selection either does not contain this item, represented by the first term on the left, or does contain this item, represented by the second term on the left.

    While deriving the recursive equation

    we used the same idea of focusing on one item and deciding whether it still needs to be deranged, represented by the first term in the square brackets, or has already been deranged, represented by the second term.

    This can be expanded and rearranged as follows:

    Supposed we designate

    Then we can see that the earlier equation can be written as

    Using this recursively we get

    Substituting in terms of the derangement values we get

    This can be rearranged as

    We can now substitute values for n to obtain the corresponding expressions for Dn. This will give us

    Considering the expression for D6, we can see that it can be rewritten as

    And since 0!=1!=1, we can further write it as

    Using sigma notation, this can be written as

    This can be generalized to

    which is exactly the same expression we obtained earlier using the inclusion-exclusion principle.

    Derangements and the Inclusion-Exclusion Principle

    The formula for derangements can be further written as

    where the nested parentheses show how the inclusion-exclusion principle works, much like the matryoshka dolls in the image at the start of this post. What this formula tells us it that we can calculate the number of derangements by considering the total number of permutations and subtracting the cases where at least 1 item is fixed, subtracting the cases where at least 2 items are fixed, subtracting the cases where at least 3 items are fixed, and so on. Due to the fact that (-1)n oscillates between 1 and -1, the nested parentheses accomplish the same final result as do the alternating signs in the original formula we obtained, namely

    Why Derangements aren’t Deranged

    So far the discussion has been reasonably theoretical. And you might wonder if there is any reason for understanding derangements beyond the curiosity of mathematicians. I’ve tried to determine how the idea of derangements came about but have not been able to land on a definitive answer. It is quite likely, as in many cases in mathematics, that some mathematician just toyed with the idea and that the idea later found applications. So what are some of these applications?

    In the workplace, if you wish to ensure everyone in a team has the opportunity to be responsible for every process, the idea of derangements is essential. In fact, if the team leader intentionally uses the idea of derangements, he/she could reallocate in a more efficient manner than just reassigning and checking if he/she has obtained a full derangement.

    In coding, the idea of derangements is used to check for coding errors. In stochastic processes, derangements are used to determine the entropy of a system. In chemistry, the idea of derangements are used in the study of isomers, the number of which increases rapidly especially for compounds with long carbon chains. In graph theory, the idea of derangements can be used to determine unique walks or paths in a graph. In optimization, derangements are used in situations similar to the traveling salesman problem or the assignment problem.

    In other words, while it is likely that the idea of derangements began with the toying of a mathematician, it has a wide range of applications today. But what surprises me in something that I am going to deal with in the next post. It is a result that is shocking to say the least and downright mysterious. Whatever could it be? And with that cliff hanger, I bid you au revoir for a week.

  • Recapitulation

    In the previous post, we had looked at the basic principles of making combinatorial arguments. We saw that these arguments, while deficient in that they do not yield any numerical values, give us insights about how we can count ways of doing a task. In the conclusion of that post, I introduced the idea of the possibility of double counting and suggested that combinatorial arguments allow us to avoid doing that.

    Double counting is a particularly pressing problem, especially in democracies. When we wish to determine the ‘will of the people’, we dare not forget that each citizen falls within a number of demographic categories. For example, I am a Christian male, who is currently self-employed and who has graduate degrees. If I give my opinion on some matter and am counted separately under each of these categories, then my opinion will be given undue weight and would not be fair to others.

    The question then arises, “How do we ensure that we count every item without counting any item more than once?” Anyone who is involved in any kind of statistical study needs to pay attention to this so that each data point is given the same weightage. So let me introduce you to the inclusion-exclusion principle.

    Basics with Two Categories

    We begin with the most basic set-up with all items classified into two categories (say A and B). Since items can have multiple classifications, we can represent this using the Venn diagram below:

    The Venn diagram has three regions labelled 1, 2, and 3. Set A includes regions 1 and 2, while set B includes regions 2 and 3. If we just add the number of items in sets A and B, we can see that the central region (2) will be counted twice. In other words, the items that belong in both sets will be double counted. To avoid this, we need to subtract the items that belong in the intersection of the two sets.

    This can be written as

    The last term on the right side of both equations recognizes that some items have been ‘included’ more than they need to be and explicitly ‘excludes’ them from the count. This adjustment by excluding what was included too many times is what gives the principle its name.

    Counting with Three Categories

    Of course, we know that demographics includes many more than two categories. So how would the inclusion-exclusion change if we had three categories? The Venn diagram below indicates the situation.

    As shown in the diagram, there are now 7 distinct regions. Set A includes regions 1, 2, 4, and 5. Set B includes regions 2, 3, 4, and 6. And set C includes regions 4, 5, 6, and 7. If we add sets A, B, and C, we will count regions 2, 5, and 6 twice and region 4 thrice. We can proceed as before and exclude the intersection regions of two sets. But since region 4 belongs to A∩B, B∩C, and C∩A, we will have excluded region 4 thrice. This means that regions 1, 2, 3, 5, 6, and 7 are now counted once, but region 4 is not counted. Of course, it is a simple matter to add it back to include it. This is represented as

    The terms in red we have encountered before in the case where we had only two categories, A and B, and had to compensate for overcounting the regions in the intersection of the two sets. The term in green is a new term, introduced because, by compensating for the overcounting of the regions in the intersections of pairs of sets, we ended up undercounting the intersection of all three sets.

    Extension to Four Categories

    What would happen if there were more than 3 categories? Unfortunately, a 2 dimensional representation will no longer suffice. So let us label the regions as follows: 1: only A; 2: only B; 3: only C; 4: only D; 5: only A and B; 6: only A and C; 7: only A and D; 8: only B and C; 9: only B and D; 10: only C and D; 11: only A, B, and C; 12: only A, B, and D; 13: only A, C, and D; 14: only B, C, and D; and 15: A, B, C, and D.

    Now set A includes regions 1, 5, 6, 7, 11, 12, 13, and 15. Similarly set B includes regions 2, 5, 8, 9, 11, 12, 14, and 15. Set C includes regions 3, 6, 8, 10, 11, 13, 14, and 15, And D includes regions 4, 7, 9, 10, 12, 13, 14, and 15. Now, when we just add A, B, C, and D, the regions 5 to 10 will be counted twice, regions 11, 12, 13, and 14 will be counted thrice, and region 15 four times. The over counting for regions 5 to 10 would need an ‘exclusion’ by subtracting the intersection of pairs of sets. However, this would mean that the regions that involve the intersection of exactly three sets (i.e. 11, 12, 13, and 14) would have been ‘excluded’ three times over, meaning that they would not be ‘included’ anymore. It would also mean that the region 15 has been counted negative 2 times! This can be adjusted by now ‘including’ these four regions. However, this means that region 15 is now ‘included’ 2 times. Hence, we make a final adjustment and subtract the intersection of all four sets. This is represented as

    Here the red terms represent the adjustments for regions 5 to 10, the green ones for regions 11 to 14 and the blue on for region 15.

    Generalization

    If we continue in this manner, adding more categories, we will be able to demonstrate that

    This can be written in a more compact form as

    I recommend you take a close look at how the last two equations are actually the same as we will need such ‘parsing’ of formulas in future posts!

    What we have seen here is that we can recursively ‘exclude’ intersections with greater number of sets and in so doing avoid the twin traps of overcounting and undercounting. In the next post we will look at a way of counting an interesting arrangements of objects known as a ‘derangement’. Hope you stay sane till then!

  • Background

    I have been teaching high school students for a while now. While I have taught physics and theory of knowledge, my main focus has been mathematics. One of the more fascinating areas of the high school mathematics syllabus is probability and statistics. Now don’t get me wrong. I know that, in an earlier post, I bemoaned the early inclusion of statistics, especially when it is done only to justify the use of a calculator. I have not changed my position on that. I still believe that many of our syllabuses, especially those that come from the Western hemisphere, are artificially bloated with topics that require the use of calculators, a move that reduces the priority of conceptual understanding in favor of brute force methods. 

    Indeed, it is precisely a method used in the area of probability and statistics that demonstrates how there is elegance in one approach to problem solving that is completely absent in another. The method I am referring to is commonly known as Combinatorial Proofs or Combinatorial Arguments.

    The Meaning of ‘Combinatorics’

    Now, many of you may be wondering what in the world ‘combinatorial’ means. Combinatorics is the branch of mathematics that deals with grouping of elements of finite sets of objects. Specifically, combinatorics is concerned with the number of such groups that can be formed rather than the groups per se. So, for example, if we have a set

    Then the subsets with exactly 2 elements are

    While I have considered the elements of the set to be the first three natural numbers, there is no reason we cannot extend this idea to any set with 3 distinct elements. Hence, in general we can say that, given a set containing 3 distinct elements, there are 3 ways to form a subset with exactly 2 elements. Another way of stating this is that the number of ways of selecting 2 out of 3 distinct objects is 3.

    Backtracking a Bit

    We have seen this idea before. In a post on fractions, I introduced the idea that the number of ways of selecting r out of n distinct items is

    We revisited this idea in a post on the number e, in which we obtained the lower and upper bounds for e.

    Using the above formula, we can prove some crucial results. For example, let us prove that

    We can proceed as follows:

    Or let us prove that

    We can proceed as follows

    Here, the proof depends, in large part, on our ability to recognize that the expression in red is common in both terms, allowing us to consider it a factor of the entire expression. However, this is not necessarily obvious. And unless one has developed some skill with manipulating binomial coefficients, it is unlikely one would be able to identify that such an expression actually is common to both terms.

    The two ‘equations’ we have proved are known as ‘’binomial’ identities because they involve the coefficients that arise from the binomial expansion. Some of them can be quite difficult to prove using the brute force method of substituting the formula for nCr and then manipulating the various expressions. However, the method of using combinatorial arguments allows us to make logical arguments to prove the result. This actually gives us the insight about why the identity exists, which the brute force method cannot do – or at least cannot easily do.

    Selection Equals Unselection

    Suppose we have to prove that  

    using combinatorial arguments. 

    We can argue as follows. Suppose we have n items and we have to select r of them for some purpose. This obviously can be done in nCr ways. However, by selecting the r items for the purpose, we automatically create a selection of the remaining n-r items not for that purpose – that is for some other purpose. Selecting n-r items for some other purpose can obviously be done in nCn-r  ways. Since the selection of r items automatically creates a selection of the remaining n-r items, the number of ways must be equal. Hence, 

    What this means is that selecting r out of n things or n-r items out of n things can be done in the same number of ways. Alternatively, we could say that the number of r member committees that can be formed from n candidates is the same as the number of n-r member committees that can be formed from the same n candidates.

    Included or Excluded

    Suppose we have to prove that 

    using combinatorial arguments.

    We can argue as follows. Suppose we have n+1 items and we have to select r of them. This can obviously be done in n+1Cr  ways. Now let us focus on any one of the n+1 items and let us call it item X. Now there are two cases. Either X is in our final selection or X is not. If X is not, then we still have to select r items from the remaining n items, which can obviously be done in nCr  ways. If X is in the final selection, then we have to select r-1 items from the remaining n items, which can obviously be done in nCr-1  ways.

    Since in both cases we end up selecting r items out of n+1 items it follows that 

    In other words, the number of ways of selecting r items out of n+1 items equals the sum of the number of ways of selecting r items out of n items, excluding one particular item, and the number of ways of selecting r-1 items out of n items, including the same item.

    Committees and Sub-committees

    Suppose we have to prove that

    using combinatorial arguments.

    We could argue as follows. Suppose we have n items from which we have to select r items. This can be done in nCr  ways. Now suppose we select k items from the r that we originally selected. This can be done in rCk  ways. The two selections can obviously be done in nCr×rCk  ways.

    However, we can make these selections in a different way. Let us first select the k items from the original n items. This can be done in nCk  ways. Now we have to select r-k items from the remaining n-k items, which can be done in n-kCr-k  ways. The two selections can obviously be done in nCk×n-kCr-k  ways.

    Since in both cases we end up with r-k items for one purpose and k items for a second purpose, the number of ways must be the same. Hence, 

    In other words, the number of ways of forming a committee, with r members, with a sub-committee, with k members, is equal to the number of ways of first forming the sub-committee, of k members, and then selecting the r-k  members of the committee who are not a part of the sub-committee from the remaining unselected n-k members.

    Wrapping Up

    Now it is true that the combinatorial arguments do not give us any idea about how many ways the selection can be done. In that sense, they are deficient. However, the combinatorial arguments give us a clear understanding of how we can go about counting some specified subsets of a given set. The insights gained by the combinatorial arguments allow us to understand how best to enumerate sets so as to avoid double-counting and omission. This is not possible with a formula because much is lost when we express things in a compact formula. The frugality and austerity of mathematics itself causes things to become opaque. By using logical proofs, such as those involved in the combinatorial arguments, we can understand how the results we memorize come about. Unfortunately, because we all too often want a numerical answer to a question, we do not spend much time teaching and learning such elegant forms of argumentation. But by doing so we miss out on yet another factor that gives rise to beauty in mathematics.

  • Recapitulation

    We have reached our final post in this series on the number e. In the first post of the series, we introduced e and looked at the reasons for which it is the base of the natural logarithm and the exponential function. In the second post of the series, we used various techniques to determine that e lies between 2 and 3, a fact that will be important in today’s post. In the previous post, we considered three examples of infinite series for e to show that more advanced mathematics does not guarantee quicker convergence to the value of e. In today’s post, we will demonstrate that e is an irrational number.

    Reductio ad Absurdum using Bounds

    Since e lies between 2 and 3, e cannot be an integer. We will now use reductio ad absurdum, a method we introduced in another post. So we will begin by assuming that e is a rational number. Hence, suppose

    Since e is not an integer, it follows that q cannot be 1. Hence, q>1. Now, we also know that 

    Combining the two we get

    If we multiply this equation by q! we will get

    Distributing the q! into the parentheses on the right side we get

    Now the left side, being the product of natural numbers, is a natural number. On the right side, the terms in green are all integers since the numerator q! is necessarily a multiple of the denominators. This means that, for the equation to hold, the terms on the right must add up to an integer. Let us designate this sum as R. Then

    Now, since q > 1, it follows that

    Taking the reciprocals of each term we get

    This means that

    Hence, we can conclude that

    However, we have seen, when we obtained the lower and upper bounds for e, that the infinite series

    Hence, without the leading 1, the sum must be 1. This allows us to conclude that R < 1. However, since all terms in R involve the products and quotients of positive integers, it must follow that R is positive. However, there is no integer between 0 and 1, which means that R cannot be an integer. This means that the terms in red in the equation

    do not give an integral sum, which is something that was required for the equation to hold. Since we reached this requirement when we assumed that e is rational, we have reached something absurd, namely that this is possible only if we can find an integer between 0 and 1. This concludes the proof by reductio ad absurdum.

    Reductio ad Absurdum using Partial Fractions

    I would like to contribute another proof using reductio ad absurdum that I have not seen anywhere else. This uses the concepts surrounding partial fractions. While partial fractions is normally used in the context of algebra, I’d like to explore its implications in the context of arithmetic.

    Now, it is clear that

    What sets apart the expression in green from the expressions in red is that the one in green only has prime numbers or powers of prime numbers in the denominator. This is not true in the case of the expressions in red, since 6 and 12 do not satisfy the criterion.

    So suppose we restrict the splitting of a given fraction into its arithmetic partial fractions where the denominators consist only of prime numbers or the powers of prime numbers. Suppose then that we have a number N such that

    where

    are distinct prime numbers and

    Now any divisor of N can have a particular prime pk appear from 0 to ak times. For example, the divisors of 12 are 1, 2, 3, 4, 6, and 12. The prime factorization of 12 is 

    Hence, the prime factorization of any divisor of 12 can have 20, as in 1 and 3, or 21, as in 2 and 6, or 22, as in 4 and 12. Or it could have 30 as in 1, 2, and 4, or 31, as in 3, 6, and 12. In other words, any prime pk, which appears as a power ak in the prime factorization of N, can appear as a power from 0 to ak in divisors of N. Hence, there are ak ways in which pk can appear in divisors of N, excluding 1. Hence, the number of divisors of N that are prime numbers or powers of prime numbers is

    This means that, if we are able to write 

    then splitting it into its arithmetic partial fractions would involve at most P terms. However, P is a finite natural number.

    Yet, from the expression

    it is clear that e is expressed as the sum of an infinite series. But in the denominators we have all the natural numbers without exception. And we know that there are infinitely many primes in the set of natural numbers. This means that the infinite series for e includes terms that involve infinitely many prime numbers. For example, the term that contains pk! in the denominator, when split into its arithmetic partial fractions, will contain a finite number of terms that involve all prime numbers and powers of prime numbers that are less than pk.

    However, since there is no limit to pk, there is no way to combine all the terms into a finite series, such as is required by the arithmetic partial fraction expansion of

    Now, we know that 

    The inherent pattern involved in the denominators allows us a concise way of combining the infinite terms to give the resulting sum as 2. In general, if we have an infinite set of fractions where the denominators form a discernible pattern, we might be able to combine the terms to give us a determined sum. If the primes themselves appear after some discernible pattern, then too we would possibly be able to combine the infinite terms to yield a determined sum. Since the primes do not appear in any discernible pattern, such a combination is impossible, even in theory.

    In other words, we have reached a contradiction, where infinitely many fractions with distinct denominators involving infinitely many unpatterned primes and their powers are combined to produce the sum of a finite number of fractions, therefore yielding a rational number. So once again, by reductio ad absurdum the result is proved and we conclude that e is irrational.

    Wrapping Up

    We have devoted four posts to the study of the number e. The impetus for this study came from the opening post of this blog in which I introduced Euler’s Identity, stating that it was an example of beauty in mathematics. During the course of these four posts, we have looked at the definition of e. We also saw how e is related to the ideas of compound interest and, therefore, to the ideas of growth and decay that occur in natural systems. We were able, then, to see why e is the ‘natural’ base for logarithms and exponential functions. 

    Then we obtained a lower and an upper bound for the value of e. Along the way we introduced some key ideas, notable among which were the limiting process, infinite geometric sequences, and the binomial expansion. This allowed us to obtain an infinite series for e

    Following this we explored three infinite series for e to determine the speed at which they converge to the value of e. We saw that more advanced mathematics does not guarantee speedier results. This allowed us to conclude that we need to be wary when claims are made that something is based on more rigorous mathematics. 

    Finally, we proved that e is irrational. We did this in two ways, both involving reductio ad absurdum, a mathematical strategy for proofs that we had introduced earlier. I introduced a proof that I have not seen elsewhere, though this may just be an indication of my ignorance rather than my ingenuity.

    This does not exhaust the study of e. By no means! But this does conclude our exploration at this stage. It has been fruitful and insightful for me. I hope you can say the same.

  • Recapitulation

    I’m back with another post after taking a break for a week. The last week of May was particularly busy as I was teaching a class called Introduction to the New Testament. During the week I also gave a lecture on Causes and Symptoms of Religious Discord, which you can find here if you’re interested. Anyway, back to Acutely Obtuse, we are in the middle of a four part series on the number e. In the last but one post, I started this series. We then saw why e is the base of the natural logarithm and the base of the exponential function. We also saw the relation between e and compound interest. In the previous post, we determined that e lies between 2 and 3. In the next post we will show that e is irrational. Given that e is irrational, it follows that it cannot be expressed as a ratio of two integers. This also means that it cannot be expressed as the sum of a finite number of rational numbers because the sum of rational numbers is necessarily rational.

    Introduction to Infinite Series

    Of course, this leaves open the possibility that e could be represented as the sum of an infinite series, that is a series that does not have a finite number of terms. In the previous post we looked at the limiting process and concluded that

    Using sigma notation, we can write

    This means that e can indeed be expressed as the sum of an infinite series. However, the above series is not the only one that has been shown to converge to the value of e. In this post, I wish to consider the above series and two others that have been derived for determining the value of e. However, while we just about managed to derive the first series without the use of any calculus, most of the series that have been derived require the use of calculus. Since I do not wish to introduce any advanced calculus at this stage in the blog, I will be considering one series that is derived from the first one itself and accepting, but not deriving, a second that has been derived using some advanced calculus.

    But first we must ask ourselves why additional series are needed. If we already have a workable series, why should we bother to obtain more series? Mathematicians, after all, tend to be quite frugal, rarely doing more than is required. So why would they bother with more series representations for e when they already have one that one can use to obtain the value of e to any level of accuracy that might be needed?

    Rationale for Other Infinite Series

    To answer this question, we need to look at the values obtained by using the original series. Here are the first twenty values we obtain from the original series.

    The red digits indicate where the current approximation for e begins to differ from the previous approximation. As can be seen, this series gives on average one additional digit of e for each additional term. Of course, for most purposes 10 or 12 digits of e is more than sufficient. In the table above, we have determined the value of e accurately to 18 digits. Why then would we need to determine more digits?

    Computations of the digits of irrational numbers like π and e are used to measure the computational power of supercomputers. As we include more terms of an infinite series, two things must be done. First, the existing approximation of e must be kept in memory while the latest term is calculated. Second, the existing approximation and the latest term need to be added to obtain the new approximation. However, as we go down the terms in the series, each term becomes decreasingly small, requiring greater storage of memory. Since pushing things to and pulling things from memory are time intensive processes, at least from the perspective of a supercomputer, each new approximation tests the limits of the computational system to a greater extent.

    However, once we have obtained the first (say) 18 digits of e, as we have above, it is pointless to use the same process to obtain the same results. Mathematicians hate repeating things because they know that they aren’t going to get new results. Hence, using the same series to obtain 1 billion digits from a set of computers to see which one is the fastest would be fine once. After that no new information is being generated.

    However, mathematicians think, “If we are going to use infinite series to test the power of computational systems, why not generate series that give more digits per term?” In other words, if we can generate a new series that gives 2 additional digits per term, then with the same amount of time and same number of terms, we can generate 2 billion digits instead of 1 billion. And if someone can produce a series that gives 10 digits per term, then for the same price we can generate 10 billion digits. For people involved in the computational side of things, it does not matter which series is used. Any series can be used to arbitrate between competing computational systems. However, for mathematicians, the goal is to obtain new information. Hence, any series that promises more digits per term is to be valued for the additional information it can provide.

    So mathematicians resort to all sorts of manipulations to produce new series that could provide more digits per term. The two that I have selected do precisely that.

    Combining Terms

    The approach for the second series recognizes that we can pair up terms and not affect the sum. Now, in general, consecutive terms can be written as

    This term simplifies to 

    However, since we have grouped the terms, n cannot now take all the values earlier specified or we will overcount. We can overcome this by replacing n with 2k to get

    Hence, the original sum can be expressed as

    Substituting values for k we get

    We can see that each term is quite simple and that there is an easily recognizable pattern, which makes this easy to program. If we use this infinite series, we will get the following table:

    Of course, as we should have expected, since we combined two terms of the original series to give a single term of the new series, the new series approaches the value of e twice as quickly as the original series. This means that we get, on average, 2 additional digits of e for each term of the new series. While this is laudable, as mentioned earlier, once this new series has been used, it will fail to produce any new information after a single use. This means that mathematicians will look for newer series to put into play. We could, as the approach to the second series suggests, combine more terms. For example, if we combine three consecutive terms we get

    Replacing n+2 with 3k we get

    This series obviously will converge to the value of e three times as fast as the original series. However, we can see that this process might become quite unwieldy. Finding the LCM of multiple terms is not a difficult matter. However, expressing them in a compact form that is conducive to programming is not necessarily a given. Already, while combining only 3 terms, we actually have quite a bit of deft algebraic manipulations to undertake. And remember, once a series is used, it pretty much becomes obsolete! Hence, we must discover new ways with which to express e.

    Using Advanced Calculus

    The third series I wish to consider is derived using the lower and upper incomplete gamma functions and the Taylor series. While the mathematics involved is certainly above the level of the blog at this stage, the end result is

    If we substitute the values of k, we will get

    Once again, each term is quite simple and we can easily recognize the pattern. So this too would be a good series to use for programming. But what kind of approximations of e does this series yield? The table below is instructive.

    What we can see is that the third series converges slightly quicker than the original series but considerably slower than the second series we considered. 

    Rejecting the Snake Oil

    Over the years, mathematicians have derived many infinite series for e using all sorts of techniques. They have also used continued fractions, infinite products, and recursive functions. All these techniques are needed precisely because e is an irrational number, which we will discuss in the next post. However, as we have seen, there is little real world advantage to be gained from deriving more expressions for e. The requirement for new series comes from the realm of computation, where the computing power of a computational engine can be measured by having it perform processor intensive calculations such as are needed by these expressions for e. Mathematicians use this requirement to get the computational engines to churn out more digits of famous irrational numbers like e, π, and φ.

    However, what we have seen in this post is something that we are rarely told about. In fact, I think what we have discovered is intentionally kept from us because it would reveal something we dare not admit. Now, hopefully, the mathematics involved in obtaining the original series, which was discussed in the previous post, was not too onerous. Granted that it was heavier than most other mathematical concepts I have dealt with so far in the blog, I believe it was not too difficult for most readers. The second series we considered was derived by combining the terms of the original series as pairs. Hence, this too did not require much advanced mathematics. However, the third series did require quite a bit of advanced mathematics and I avoided explaining it in this post. Despite this, the resultant series does not converge as quickly as the second series, which was obtained using much simpler mathematics.

    What this tells us is that the level of mathematics involved in deriving an infinite series for e, or for that matter any irrational number, is no indicator of how quickly the series will converge to the desired value. This runs contrary to the intuition many people have about mathematics, namely that we need increasingly more complicated mathematics for more complicated problems. This is not the case and there are many instances when relatively simple mathematics can be put to use to solve extremely complicated problems. In my opinion, this is precisely what undergirds mathematical theorems like Gödel’s Incompleteness Theorems and some of the Millennium Prize Problems such as – P vs NP and the Riemann Hypothesis. It is also what lies behind the deceptively simple, but heretofore unsolved, Collatz Conjecture

    In these days of increased dependence on Machine Learning, it pays to step back and understand that mathematics itself does not give us any guarantees or advice about what methods would work best for a given problem. Until we develop machines that are actually able to replicate human thought, any idea that more advanced computing capabilities would yield lasting solutions is, in my view, a pipedream. Unfortunately, too many of us are mathematically ill equipped to recognize when we are being sold snake oil in mathematical garb!