Estimation, Modeling, and Accuracy
I’m currently studying both mathematics and physics in university, and I have to admit that it can be difficult to straddle the line between the two. Both are similar, yet demand different mindsets in terms of how to think about tackling a problem and actually coming up with a solution. In mathematics, not only is the right answer desirable. Every step along the way should be rigourously justified. That’s because the conclusion that one wants to get to rests on the arguments that come beforehand. Without those arguments, you don’t have anything. This is why mathematics classes require students to create proofs that carefully apply definitions. I’m not saying that there isn’t any playfulness involved, but when it comes down to making an argument, the clearer the supporting propositions, the easier it is for others to become convinced of the truth of your claim.
In physics, I’ve found that the situation is quite different. Being mathematically coherent is of course necessary within developing a theory, but the truth is that physicists are much “looser” with their mathematics, for lack of a better word. In physics, it’s often taken for granted that certain complications “are so small that they won’t make a difference”, which allows them to drop the complications. This is something that absolutely would not be allowed when proving statements in mathematics, because any weak argument is the first thing that gets attacked when someone critiques a proof. Many people think that π+e is transcendental, but since we don’t have a proof of this, it’s an unjustified belief.
The difference in physics (and science in general) is the fact that we often know what the answer should be. This makes a huge difference in terms of the way that we work through theory to get to a result. It’s a lot easier to say “these other contributions won’t have a large effect” when we know that continuing in this manner will give the observed result. Of course, it probably is true that certain contributions aren’t as important (and one can show this mathematically), but that extra work is often hand-waved away. Because of this, I’ve observed that we often will simplify matters a lot more than what I would have thought appropriate, because it gives the correct answer.
I’ve had mixed feelings about this, particularly because I’ve been on the other side in my mathematics classes, where it was necessary to go through the steps, even if something seemed obvious or didn’t make a huge difference. I often thought it was annoying (and still do, at times) when the mathematics were “simplified” in the sense that rigour was sacrificed for brevity and the final result. I wished we would rigourously justify each and every step, in order to make things mathematically correct. I also didn’t like the fact that sometimes we would “guess” results, in the sense that the best way to solve an equation was to try a solution and see what came out of it. This all seemed far removed from my studies in mathematics.
Recently though, I’ve not had a change of heart, but rather I’ve understood more of the rationale behind a lot of these decisions. As I’ve wrote about before, science is about making models of the world that both explain and predict the various features we see around us. However, in order to be mathematically tractable, simplifications and approximations are necessary. Furthermore, they aren’t fundamentally a bad thing, as long as one keeps in mind the simplifications throughout. This was the key I was missing. It’s not that we’re deliberately ignoring thorny issues, it’s that we are making a first model, which can always be refined and improved. It’s unrealistic to expect to have hyper-realistic models when first learning a subject, so these toy models with their approximations will have to do. Even if I don’t like the fact that we approximate irregular shapes as spheres, it’s done so that the problem is tractable and it doesn’t change the end result drastically.
My shift in mindset has come after really digging into some of the work of Tadashi Tokieda, who has some interesting resources from an old course available here. He is an applied mathematician who is also a great communicator. If you look at the website I linked to, you will see that he is very good at explaining things, and I particularly like how he characterizes the kind of work an applied mathematician should do. He says that an applied mathematician should be trying to do a back-of-the-envelope every day in order to increase one’s skills. The goal here isn’t to be analytically exact. Instead, it’s about probing the relationships between the items of interest. It’s about using mathematics to get to a result, without being overly worried about the formalism. That can wait for later. This has inspired me to start doing the same.
I’ve begun working on asking myself questions that delve into this sort of thing, where it’s unclear how to exactly begin, but by making approximations, a reasonable estimate can be found. It’s not easy, but it has gotten me to be more open with estimation. As the author of a book I’m reading on the subject writes, “It’s okay to say that 2 3=10.” The point isn’t to be precise. It’s to make a calculation tractable.
The more that I think about it, the more that I realize that we quickly discourage students from doing this at school. We say, “Don’t guess. Find the exact answer.” The truth is that estimation is important, and should be more frequently used. We should be able to take any kind of statement with a number attached and make sense of it. This ability is crippled when everything has to be exact. As such, I think we should be encouraging more estimation and less accuracy in order to get a foothold into a problem. Only then should we move onto refining and making a model more accurate. After all, that’s what we often do in science. We start with something we can handle, and make it more and more sophisticated.
Why Do We Make a Fuss About Definitions?
If you’ve ever taken a mathematics course, chances are that you’ve seen how definitions are one of the common items on the board. Definitions form the heart and soul of mathematics. They allow us to pose problems in very precise ways, yet they are the bane of many students, who get back their assignments and see that points were deducted because things that seemed “minor” and weren’t included were in fact quite important.
There’s a case to be made that the details aren’t always so important, but mathematics is a bit of a special case, because when one wants to refine their arguments, it’s critical to have clear definitions. Without them, there’s a good chance that you can get stuck trying to convince others of the veracity of your claims. Additionally, it turns out that while we have a pretty good intutition about many mathematical properties, capturing these properties can be more difficult than it seems at first.
For example, what would you tell me if I asked you to define a circle? Really, what would you say? Of course, we all know what a circle is, so you might draw one for me. But, I insist on you giving me a definition that I could apply to any shape I drew, without you there to tell me if it was a circle or not.
Perhaps you would say that a circle is the shape with only one side. That seems reasonable, and is certainly true for the circles I know of. So I draw this:
Evidently, this is not what you were talking about.
“No egg shapes!” you say. But then I might draw you something else, such as an ellipse or the shape that a running track makes. What about those shapes? Or what about more complicated shapes that still have one “side”, but aren’t circles?
As you can see, this is quickly becoming a much more difficult problem than what it should be. I mean, we should be able to define a circle without any problem! It’s not exactly the most sophisticated shape in existence. However, the problem is that none of the above proposed definitions solely capture the notion of a circle. They do define sets of objects with a certain property, but the sets aren’t restricted to only circles.
This is an important realization, because it points us to the right definition of a circle. What we want is a definition that includes only circles, and all circles. Both the “only” and the “all” are important. What we are looking for is the defining characteristic of a circle (of course, there could be more than one, and so we would have a list of such characteristics).
After a bit of investigation and drawing shapes that are both circles and shapes that are close to circles, an important property that will be found is that a circle has a radius. This isn’t just a random sort of measurement on a circle. What you might notice is that that the radius is the distance that any point on the circle has from the centre of the circle. If you compare this to any other shape you drew that wasn’t a circle, you will quickly realize that only the circle has this property. As such, this seems like a good candidate for a definition of a circle.
Therefore, we define a circle to be the (x,y) points that are exactly a distance r from a chosen point (the centre of the circle). Go ahead and try out these definitions on the shapes you know of. Is it satisfied only for circles, and does it include all of them? You will find that yes, this definition does indeed work.
What we’ve done here is an important phenomenon in mathematics, which is to abstract or “boil down” some sort of structure to its essence. Frequently, mathematicians begin with some sort of structure that is familiar or intuitive to them, and then they ask, “What is the essence of this object?” From there, mathematicians come up with the appropriate definitions to talk about their chosen object of study.
You might notice that our definition of a circle doesn’t talk about being round. This suggests that being round isn’t necessarily “fundamental” to being a circle. (Though, we know that graphically, their curvature is apparent.) Instead, the fundamental essence of a circle seems to be its centre, and the distance r at which all points are away from the centre. In other words, if I wanted to describe any circle to you, I only have to specify two things: its centre and radius. With those two parameters, you can exactly reconstruct the circle I was imagining, even if you didn’t see what I had beforehand.
Good definitions admit further explorations
Another consequence of a good definition is that it enables further avenues of exploration. To continue with our study of the circle, it turns out that the circle is only a special case of a more general class of objects, called n-spheres. Here, n is the dimension of the object itself (and is greater than or equal to zero), so the circle we are working with is a one-sphere, since the circle is simply a one-dimensional curve. (Remember, a circle is only the boundary of the object. If we were talking about the area inside, it would be called a disk1.) The definition of the n-sphere is a straightforward generalization of our definition for a circle. A n-sphere is the set of points (x1, x2, …, xn+1) which are a distance r from a centre point.
Let’s look at the two other familiar examples. What is the zero-sphere? Well, it’s the set of points (x) which are a distance r from a certain point. But these points are only defined by one coordinate, which means they all lie on the same line! In other words, if we’re given a point (b) on a line, there are only two points that are exactly a distance r from (b). These points are given by (b+r) and (b-r), and can be seen below.
I fully grant you that this isn’t the most interesting object we can think of, but it is consistent with out definition. The other one that we can easily visualize is the two-sphere, which is what you were likely imagining when I said the word sphere in the first place.
Two more note about spheres. You might think that it would have made more sense to call the two-sphere (the regular sphere) a three-sphere, since it’s a three-dimensional object. The problem is that the regular sphere is we only need two free coordinates to describe a sphere (since the radius is constant), which is why we use the description we do.
Additionally, this generalization of the circle means that we can think about spheres in higher dimensions. We can’t visualize them, of course, but we can work with them mathematically. And that’s the important part. We can only do so much mathematics with a definition of a circle that says it’s a “round object”. With our new definition, we can precisely study a sphere in any dimension we want, which is quite useful.
Definitions (along with axioms) work as the building blocks of mathematics. What’s nice about them is that even though we can quickly build up complex machinery and theory that surrounds these definitions, we can always bring ourselves back to the basics if we get stuck. One of my professors captures this perfectly. He often tells us that, if we get stuck on a problem with the new theory we’ve learned, we should always be able to go back to the basics in order to answer a question, or at least find a foothold into the problem.
For myself, this is often how I go about writing proofs. In mathematics, we write proofs to be as minimal as possible, which means that we don’t want to assume anything that we don’t absolutely have to. Therefore, if a certain kind of object is necessary to get a result, there’s a good chance that knowing the definition of that object will help with the proof. That’s why I always try to keep the definitions of the objects I work with handy. You never know if they can be just the thing you need in order to more deeply understand or solve a problem.
You can also either include or exclude the boundary. That’s called a closed or open disk. ↩
Two Types of Mathematics
Broadly speaking, I classify mathematic into two bins. On the one hand, there’s the mathematics that many students in university must learn, such as calculus and linear algebra. These form the bedrock of many jobs in the corporate world, and so students from business, computer science, finance, and other disciplines have good reason to learn these topics. Likewise, there are parts of mathematics that are useful to scientific disciplines like physics, chemistry, biology, and so on. The common thread here is that these mathematical courses are operational.
What do I mean by that? I’m simply trying to underline the fact that most of these classes deal with computation. In other words, you have a problem, and you need to know the technique used to solve it. After that, you practice using the same technique over and over again on related problems until the technique is ingrained in your mind. These classes are operational in the sense that they don’t require you to know why something is working. As long as you can use the tools developed, that’s good enough.
I want to note here that I am absolutely not looking down at those who only come into contact with this type of course in mathematics. Being able to actually solve a problem is a useful skill to have. Applying the theory gained in class is a crucial part of getting on with your work. I think about it like this. In physics, we use mathematics a lot. However, the mindset in these classes is that the mathematics serves the physics. In other words, we aren’t studying mathematics for its own sake, but to use it for some purpose. This isn’t a bad thing, but it is a difference from what I would call “real” mathematics (I’m already cringing here as I write this).
I study both physics and mathematics, so I’m also immersed in the other bin, which is the “knowledge” part of mathematics. Here, we don’t really care about applying the knowledge gained from mathematics in a particular direction. Instead, we want to answer questions such as, “What kind of conclusions can I reach from these few axioms?” The kinds of topics studied in these courses (such as abstract algebra) involve working with a structure and trying to derive properties in order to fully study it. It’s a completely different beast than operational mathematics. It’s necessarily more abstract, which often carries the stereotype of being more difficult.
However, I think it’s important to know the difference between the two. Mathematics as a discipline concerns itself with the latter bin I’ve described for the most part, while the former is more of use towards other fields. Of course, keep in mind that these are all artificial silos, and the real world is a much more messy and interdisciplinary space. It’s just that many students only get to glimpse the operational part of mathematics, while I think it would be worth it for many to study some of the more abstract aspects of mathematics (which is why I think many students should take discrete mathematics as a course).
One-to-One and Onto
When learning about functions, a few properties come up over and over. In particular, we often hear about functions being one-to-one (injective) or onto (surjective). These are important properties of functions that allow one to set up correspondences between sets (bijections), as well as study other features of various functions. I wanted to go through these two properties in a slightly different way than what most sources will do to explain them, so hopefully this will be a good analogy to keep on mind when discussing these two properties.
First, let’s define our function. We will consider a function ƒ that moves objects from a set X to a set Y. We usually call X the domain, and Y the range.
If ƒ is one-to-one, it has the special property that any time you look at an element y in Y that has some element from X sent to it, only that element gets sent to y. If we think of the function as a bunch of arrows linking up elements from X to elements in Y, there will be at most one arrow pointing to any given element in Y.
The way I think about it is this. Imagine the function as a machine with a specific set of instructions, and an input and output area. You stand at the output area, and I stand at the input area. If you know set of instructions that the machine executes, and you receive something at the output of the machine, can you tell me what I put into the machine without seeing it beforehand?
Your initial reaction might be, “Of course, I can tell you what you put in! I’ll just do the instructions in reverse.”
This is fine, but let’s go through an example. Suppose that the machine takes in a number, and then outputs that number modulo two. This means that the machine outputs the remainder of the input number after division by two. In other words, this machine will ouput 1 if the input number was odd, and 0 if the input number was even.
Okay, so you wait at one end of the machine while I insert my number, and the output of the machine is 0. What number did I put in? Well, you know that the number wasn’t odd, because only even numbers get sent to zero. However, apart from that piece of information, you have no idea what number I put in. It could have been any even number.
This is an example of a function that isn’t one-to-one. If we go back to our definition, the reason it isn’t one-to-one is because every single even number “points” to zero. Therefore, you can’t “undo” the machine’s instructions, because they destroyed (for lack of a better word) the nature of the input while executing the intructions.
I like this analogy with the machine, because it helps us see that not all functions “go both ways”. You can’t always get back to the input from the output. In fact, this property only holds when a function is both one-to-one and onto.
Which brings us to the property of being onto. This one’s slightly different. A function is onto if, no matter what element you pick in Y, there is some arrow pointing to it from the set X. More formally, for any y ∈ Y, there exists a x ∈ X such that ƒ(x) = y.
I like to think of this from the perspective of the set X. When I think of a function being onto, I imagine we are shooting arrows from the set X and doing so in a way that we ensure each elements in Y are hit. Put differently, our function ƒ doesn’t “miss” any of the elements in Y.
For example, if we take a very simple function, ƒ(x) = x+1, we can see that this function is onto, because it “hits” any value in our output set (here, we will let x any real number, so our set Y = R). However, a different function such as ƒ(x) = sin x is not onto because the outputs of our function only belong to the interval [-1, 1], which means we miss a lot of values in R.
Sometimes, you will work with a function that is both one-to-one and onto. In that case, the function is called bijective, and it’s a very nice property for a function, because it allows one to identify an inverse transformation. This is something you’ve undoubtedly come across in secondary school. The idea is that if you a function y = ƒ(x), you can “solve” for x in order to get a new function in terms of y.