Opening Disclaimer

In the graphs below, I have used the data from the relevant sources and may or may not have performed simple mathematical operations (addition, subtraction, multiplication, or division) on the data. The purpose of this is to obscure the factor being discussed, before revealing it, while ensuring that the essential shape of the distribution remains the same. In most cases, as expected, the data is not strictly normally distributed. However, for the sake of making the discussion less burdensome, I have assumed a normal distribution and have done a best fit for the data.

Statistics and the Potential Abuse of Mathematics

We have perhaps all heard the saying, “Lies, damned lies, and statistics.” This is presumably from an 1894 paper read by a doctor called M. Price, who argued that there were “the proverbial kinds of falsehoods, ‘lies, damned lies, and statistics.’” According to Book Browse, “This expression is generally used in order to cast doubt on statistics produced by somebody a person does not agree with, or to describe how statistics can be manipulated to support almost any position.”

It is true that statistics can be misused. I remember when I was working at a place that coached students for the IIT-JEE. This institute had classes at about a dozen locations in the city. At each location we started out with about 40 students in each class. This meant that we began with about 500 students. However, toward the end of the 2 year program, most classes were down to between 10 and 30 students. At one location we had 25 students. When the results of the JEE were announced, it turned out that 15 of these 25 students had succeeded. From the remaining locations about another 15 had succeeded. 

What should we have reported? That our success rate was 40 out of 500 students, including those who had dropped out because we had ‘failed’ them, for a success of 8%, which was then more than 3 times the national average? Or that at this particular location our success was 60% – about 25 times the national average? What would have made for better advertising? Obviously the second strategy! And while it told the truth, namely that at that location 60% of the students had succeeded, it did not tell the whole truth, namely that, of the students excluded from the sample, only 15 out of 475 had succeeded? Interestingly, even 15 out of 475, or 3.15%, is slightly higher than the national average. In other words, by any metric, the institute had done better than the national average. However, human greed is such that I do not have to tell you which statistic was finally used in the advertisement campaigns for the next year!

Mathematics itself is unbiased and unmoved by our preferences and prejudices. However, it can be used, misused, and abused to suit all sorts of positions. Unless we have a strong ethical foundation, then, we will abuse the realm of mathematics. And as I have explained in another post, mathematics carries great weight in our societies. Hence, if we are able to support some position using mathematics, even if the mathematics is abused, it will most likely carry a lot of weight and manage to convince many people. The only way to counter this is to delve into the mathematics to give the full picture of what is being discussed, hopefully to highlight the ways in which the mathematics has been abused. 

Recently, I came across an example of just such abuse. I will, however, assume that the people who propagated this abuse actually did not understand the mathematics involved. Otherwise, I would have to question their intentions. I think the lesser accusation is that they have failed to understand how mathematics works rather than that they intentionally have misled people down a path that is, at least from the perspective of mathematics, a blatant lie. But before we get to that, let me set the stage with a few other contexts in which similar data can be used.

Compatibility of Data

Consider, for example, the figure below:

Fig. 1. Variable A on the x-axis for two groups in green and orange.

Fig. 1 shows the distribution of some measure for two groups. One group is in green, while the other is in orange. As declared in the opening disclaimer, I have represented the data as being normally distributed. Apart from this, the population size for both groups is identical. What we can see is that the green group has a lower mean, accounting for its peak being to the left of the orange peak, and a lower standard deviation, accounting for its peak being higher than the orange peak.

Can the data be combined? Of course, from the perspective of mathematics, we just have sets of numbers! So we can combine the two datasets to get:

Fig. 2. Variable A on the x-axis for two groups in green and orange. The combined distribution is in blue.

In Fig. 2, the blue graph indicates what we would get if we combined the two datasets. Since the size of both original groups was the same and because of another factor we will shortly discuss, the blue graph seems to also be normally distributed. It actually isn’t, as we will see as we continue. However, even if we assume that the blue graph is normally distributed, we can see that it has a mean and standard deviation between those of the two original datasets. As mentioned earlier, from just a number crunching perspective there is no problem doing this since we are only dealing with some measure that is valid for every element in both datasets. However, we should ask if this makes sense in the real world since we are using mathematics to represent the real world.

So we have to first ask what was being measured. Suppose I told you that for both groups what was measured was the diameter of the component. What would you conclude? You may conclude that combining the two datasets is perfectly fine since we were measuring the same quantity for both groups.

However, what if I told you that the green group represents the outer diameter of a group of bolts and that the orange group represents the inner diameter of a group of nuts? Right away you would see that combining the two groups is a meaningless activity because bolts are bolts and nuts are nuts! In fact, by combining the two sets we lose the ability to determine what percentage of nuts and bolts in the two groups actually fit each other within some tolerance band. The blue line is as meaningless as any graph drawn by a random monkey with a pen!

What we can conclude from this is that it is crucially important when combining two datasets to know whether or not such a combination actually makes sense. In the context of the nuts and bolts, a bolt with a diameter greater than the mean diameter of the nuts does not function as a nut! It is still a bolt, but may have fewer compatible nuts among the orange group.

Loss of Specificity

Or consider the graphs below:

Fig. 3. Variable B on the x-axis for two groups in green and orange.

Once again, we have the distribution of some measure for two groups – green and orange. Here the size of the green group is larger than the size of the orange group. What we can glean from the graphs is that each distribution has a mode, which happens to be the mean since, as mentioned in the opening disclaimer, I have adjusted the data so that the distributions are normal. We can also see that the mode of the green group is higher than that of the orange group, while the standard deviation of the green group is smaller than that of the orange group. Once again, we can combine the two datasets to get:

Fig. 4. Variable B on the x-axis for two groups in green and orange. The combined distribution is in blue.

Since the green group was larger than the orange group, the resultant blue distribution is visibly no longer normal. It is skewed toward the green graph because of the large size of the green group. But also now the mean value is lower. This is because the means of the original green and orange groups were quite distinct. If we compare Fig. 3 with Fig. 1 we will see that the peak of the orange graph in Fig. 1 is inside the peak of the green graph, whereas in Fig. 3 both peaks are at quite distinct values.

In other words, Fig. 3 says that whatever is being measured has significantly different values for the green group than for the orange group. Hence, the two distributions do not reach their peaks or taper off near each other as they do in Fig. 1.

Due to the facts that the means of the two distributions are markedly different and that the green group is larger than the orange group, the effect is akin to pulling the right tail of the orange graph, resulting in the blue graph, which now has a non-normal distribution. However, while the original graphs had two clear modal values, the blue graph now has an indistinct single modal value that is much closer to the modal value of the green graph than to the modal value of the orange graph.

But are we allowed to combine the two datasets? In this case, what we are looking at are salaries in Austria, with the green graph representing men and the orange graph representing women. Combining the two graphs is certainly permissible since it would tell us the salary distribution without sex being a factor. Such information is certainly meaningful and could, in some contexts, be relevant. 

However, once the data is combined we must observe three things. First, the combined dataset is not bimodal, but has a single mode. This is not necessarily the case as we will shortly see. However, the point made here is that it is fallacious to assume that two sets of data, each with a mode, can be combined in a meaningful way and still remain bimodal. Second, the blue graph does not tell us about the sex wage discrepancy that the green and orange graphs communicate. This is only to be expected. Third, once we combine the two graphs that were based on sex, we have a single dataset that does not have any sex identifiers and hence can no longer be used to make sex based conclusions. Once we combine datasets that were separated on the basis of some factor, that factor can no longer be distinguished from within the combined dataset. 

What this means, in this particular context, is that, if we wish to reduce the sex salary gap, we must not combine the two datasets, but must allow them to stand alongside each other as in Fig. 3.

Loss of Information

Suppose, though we had the following distributions:

Fig. 5. Variable C on the x-axis for two groups in green and orange.

Once again, we have two groups, represented by green and orange and the data in both sets are normally distributed. In actuality the data in these two sets is very close to normal. Hence, I have not had to ‘massage’ the data much to make them normal. Here the size of the green set is slightly larger than that of the orange set. We can see that the orange set has a mean that is lower than that of the green set. It also has a smaller standard deviation than the data in the green set. What would happen if we combined the two sets? We would get the following:

Fig. 6. Variable C on the x-axis for two groups in green and orange. The combined distribution is in blue.

If we pay close attention to the blue graph we will notice the following. First, the data has remained bimodal. As mentioned earlier, this is a possibility but not a guarantee. Second, the modes are not as pronounced as before. This is because the population size has increased, thereby reducing the associated probability for any particular value of the measure. In other words, assuming it is meaningful to combine the two datasets, then if we do, we can no longer refer to the original green and orange lines because now only the blue graph exists since we have ignored whatever it was that separated the green and orange graphs. 

Hence, now, even though the blue graph is bimodal, we have actually lost the ability to determine what factor contributed to the two modes. Hence, by combining the two graphs we have set aside any discussion based on whatever it was that separated the green and orange graphs.

In this case, the measure is the weight of persons in a study of automobile accidents involving pedestrians. Here combining the data would yield the information for humans without any consideration of sex. However, given the anatomical and physiological differences between men and women, combining the data would actually make it less useful. Remember, this data is obtained from a study about automobile accidents involving pedestrians. The blue graph only tells us that there are two modes, but does not tell us what the modes represent since the data concerning sex was ignored. Indeed, between the modes of the blue graph, which constituted the majority of persons being studied, there is no way of knowing where the greater representation is that of women and vice versa. In particular, there is no way of knowing that, for weights lower than indicated by the point of intersection of all three graphs, more than 80% of the accidents involve women. This means that any company that relies on the blue graph is in no position to design an automobile that protects women as well as men, but can only guess about who will be affected.

Loss of Distinctions

Another set of graphs I wish to deal with before proceeding to the reason for which I wrote this post is below:

Fig. 7. Variable D on the x-axis for two groups in green and orange.

Here we have some measure that has a very similar profile for the green and the orange graphs. The modal height is almost similar, leading to the conclusion that the variation of the data, or standard deviations, are almost identical. The only major difference here is the value of the mean, with the green data having a larger mean than the orange data. Also, the size of the green dataset is only slightly larger than that of the orange set. If we combine the two sets we get:

Fig. 7. Variable D on the x-axis for two groups in green and orange. The combined distribution is in blue.

As with the change between Fig. 3 and Fig. 4, we see a stretching of the line, yielding a lower modal height. Since the size of the green and orange sets are roughly equal, the resultant is almost symmetric, like the original two datasets. However, because the modal values are so different, the blue graph is actually not a normal curve. This is different from what we saw between Fig. 1 and Fig. 2, where the proximity of the two modal values and the equal sizes of the two datasets yielded a resultant blue graph that was very close to being normally distributed. 

However, as we can see from Fig. 8, the resultant actually is not normally distributed. Anyone with some familiarity of normal distributions will know that the resultant will not be normally distributed. Note that here we are not adding two normally distributed variables. If that was what we were doing, it would yield a normally distributed variable that was the sum of the two independent variables. Rather, what we are doing here is combining the datasets and then determining what the distribution of the combination will be.

Here the two original datasets represent the heights of people from 20 countries. Once again, the green graph represents men and the orange graph represents women. What does the blue graph represent? Obviously, it represents height distribution without consideration of sex. While that may be worthwhile in some contexts, what this does is get rid of something that is crucial to our understanding of humans, namely that we are sexual beings and, as a sexually reproducing species, there is something like sexual dimorphism that actually does serve to distinguish between the sexes. In other words, while it may be true that a greater percentage of men than women have a height greater than 200 cm, it does not follow, on the basis of height, that a woman who is actually 200 cm tall is more a man than a woman! The original green and orange distributions enable us to recognize this. But the blue distribution does not allow us to say anything. In fact, the Bayesian question, “Given that a particular human has a height of 200 cm, what is the probability that this human is a woman?” cannot be answered by using only the blue graph.

Necessary Biology Excursus

In the preceding section, I have mentioned sexual dimorphism. As sexually reproducing animals, sexual dimorphism is something that is expected for humans. While some measurements, like intelligence, cannot be reliably used to distinguish between women and men, archeologists regularly use the size and shape of skeletal bones to determine if they were studying the remains of a woman or a man. The conditions in which the skeletal remains were less useful were when the population being studied itself was relatively unknown. The one skeletal factor that provides almost certain identification of the sex of the person is the shape of the pelvis. There are, of course, other factors that play a role in sexual dimorphism, including muscle mass, body fat, lean body mass, and fat distribution. Of course, from the perspective of reproduction itself, the gametes produced by women and men are considerably different.

What can be said, then, of these two different kinds of traits, one for which the difference between women and men is negligible or inconsequential, and the other for which the difference is substantial? Let us consider each of these in turn.

Suppose we consider a trait like intelligence, for which there is no significant difference between women and men. We would get the same profile for the whole human population as we did for each sex separately since there is no significant difference. In this case, the sex of the person does not matter since the data leads us to understand that men can be as intelligent as women.

But suppose we consider traits for which there are significant differences. It has been found that, in every factor that contributes to strength of an athlete, like lean body mass, muscle length, and muscle thickness, women are considerably weaker than men. While this difference is likely partly due to the role of testosterone, recent studies indicate that another factor is the sex chromosome that women and men possess. The XX chromosome produces cells that constitute women’s bodies while the XY chromosome produces cells that constitute men’s bodies.

In other words, while there is a distribution among members of the same sex, it would be ludicrous to claim that someone with XY chromosomes in the cells of his body and who happens to be short and less muscular is less of a man or actually a woman!

Spurious Mathematics

Fig. 9. Contrived figure created without data and with spurious ‘variables’ to support claims about ‘gender spectrum’. (Source: Cade Hildreth)

Despite this some people claim that sex exists on a spectrum. One resource, for instance, declares, “A person’s sex can be female, male, or intersex—which can present as an infinite number of biological combinations.” (sic) As an aside, as discussed elsewhere, infinity is not a number. So saying ‘an infinite number’ is misleading at best. This probably indicates that the author of the article has a tenuous grasp of mathematics at best. Anyway, it pays to observe that, while I presented diameter (Fig. 1 & 2), income (Fig. 3 & 4), weight (Fig. 5 & 6), and height (Fig. 7 & 8) on the x-axis, the figure above refers to ‘gender spectrum’, which itself is the issue being discussed. Since there is no quantifiable way of specifying what the variable that determines one’s ‘gender spectrum’ value is, this is nothing but a spurious variable and an example of circular reasoning.

Moreover, even if we assume that we can quantify this ‘gender spectrum’ variable, as we have seen, once we combine datasets, we lose the ability to identify anything on the graph. In fact the process of combining the datasets may not still yield two modal values, as we saw with Fig. 2, 4, and 8. Hence, asserting that there are still two modal values is to assume the result. In fact, to label one peak ‘Women’ and the other ‘Men’ after combining the datasets and getting rid of the differences is disingenuous. Indeed, without any actual data concerning what belongs on the x- and y-axes, the figure is just something that is concocted to give the impression that there is a mathematical basis for the claim being made.

Yet, let us assume that, were we given some data, it still would give us two modal values. This does not mean that, because people find themselves in the region labeled ‘Other Genders Exist’, it actually means that other genders exist anymore than the existence of a short man means he belongs to some other gender. Rather, such a combination could only exist if there is sufficient distinction between the graphs for women and men, as we saw in Fig. 6 but not in Figs. 2, 4, and 8. Such distinct graphs should actually lead to the conclusion that the two sexes are markedly different and that the datasets should not be combined rather than that there is a region in the middle that indicates an infinite variability of sexes or gender. In fact, as we saw with Fig. 2, if we have two incommensurable datasets, in that case nuts and bolts, the existence of a large diameter bolt does not make it less of a bolt and something in between a nut and a bolt! And the fact that there are thousands of nuts and bolts that lie in the intermediate region does not mean that there are infinitely many ‘species’ between nuts and bolts! 

In fact, the argument about infinite sexes and genders is shown to be specious when we consider the example of the nuts and bolts. Unless we are able to demonstrate that two datasets can be legitimately combined, as in the case with Fig. 3 & 4, we do so without any mathematical basis. 

Consider, for example, what would happen if Fig. 3 & 4 did not represent the distribution of salaries but the distribution of the amount of testosterone in an athlete’s blood sample, which works since there are in general more men athletes than women athletes. The blue line in Fig. 4 would then represent absolutely nothing because the original data was obtained using the sex of the person in mind. A woman athlete who had a high testosterone level would not qualify as less than a woman on these grounds. In such a situation, any combination of the datasets would reduce our ability to determine, for example, if a woman athlete had actually doped herself with testosterone. After all, the data line appropriate for women is the orange one. However, once we combine the datasets, we only have the blue graph to refer to. But, as we can see, the loss of the left peak renders even women athletes who have doped themselves impossible to identify since they may still fall to the left of the blue peak. 

Conclusion

This does not mean I believe there are no people who experience discomfort with their bodies. However, we need to be careful what we mean by this. While I may experience some discomfort with my body, I may conclude that I have discomfort with being in a man’s body since this is the only body I have experienced. However, to draw the conclusion that this must mean I am a woman trapped in a man’s body is illogical because, no matter how many resources I read, no matter how many women I speak to, I will never actually know what it means to be a woman, let alone in a woman’s body. 

For instance, I may surround myself with indigenous South Africans day in and day out. But that would not make me truly understand what it was for them to go through apartheid. I may immerse myself in Chinese culture, but it would not enable me to understand the Century of Humiliation. Unless we experience something in our bodies we actually cannot truly appreciate or understand what that experience entails. Anything that is experienced in our bodies, such as our sexuality or gender, requires just such an embodied experience before we can claim it is something we are experiencing.

However, returning to the mathematical side of things, what we can say is that we need some basis outside mathematics that would allow for treating the datasets obtained for women and men as commensurable. Without such a basis external to mathematics, mathematics can be abused, as I have demonstrated. What could such an external basis be? We need to be able to identify some variable that can be measured in all humans without first considering them separately as women and men. Then the single dataset should demonstrate a bimodal behavior. But this only provides mathematical support for a claim. It is certainly necessary. But mathematical support cannot be considered sufficient

Rather, we need to be able to provide a biological explanation for the phenomenon. And if we are claiming that sex or gender is non-binary, there needs to be a biological basis for such an explanation. I doubt we can find such a basis because we exist, as a species, to propagate the species. Reproduction is the key purpose of any species and, for our species, this happens through sexual reproduction. This means that there are distinct gametes that facilitate the reproduction of the species. That some members of the species do not have any gametes or have both kinds of gametes does not mean that there are more than two gametes. 

Indeed, such an argument would be like saying that, because some people are born with no limbs and others with extra limbs, there are infinitely many ways of being limbed and that a person who has no limbs represents another way of being limbed. This is a dangerous line of thought that normalizes what is clearly a physical disability. People with physical disabilities have only recently, and in not too many countries, earned hard won liberties and access to learning and physical spaces. Saying that being born with no limbs is another way of being limbed rather than recognizing that such a person deserves genuine support from society so that they can benefit and contribute as much as anyone else would only betray a lack of compassion on our part.

Hence, I would conclude by claiming that, until our species evolves to require at least a third gamete, the idea that sex and gender are not a binary is wishful thinking at best and unmathematical and unscientific propaganda at worst.

Posted in

Leave a comment