The Limitations of Models

As many students in the sciences know, the reason we use mathematics to describe our results is because mathematics is the most precise language we possess. It’s not like we have some sort of favouritism towards mathematics that we don’t have to other languages like English or French. Quite frankly, it’s an issue of precision in what one is communicating. It’s the difference between saying I can see a red light and that I can see a light of about 600 nanometres. It’s the difference between basing a prediction on past results and basing on extrapolating from a model.

However, what is often missed in the public is the fact that science is based on mathematical models. And, as any scientist will tell you, a model is only as good as the assumptions it makes. This means the models are inherently different from what we would call “real life”.

Simplicity to complexity

When you first learn physics in secondary school, you typically learn about the big picture concepts, such as Newton’s laws, some optics, and maybe even something about waves. If we focus on only Newton’s famous $\vec{F} = m \vec{a}$, you learn about solving this equation for an easy system. Additionally, one usually starts without any notion of calculus, so the questions revolve around either finding a force or finding the acceleration of the system. Personally, I remember analyzing systems such as a block that is subject to a variety of constant forces. This made the analysis easy, compared to what one does a few years later in more advanced classes.

However, what one must keep in mind is that the systems I was analyzing weren’t realistic. If one stops to think about it, there aren’t many forces that are constant with time (even our favourite gravitational force $m\vec{g}$ isn’t technically a constant). However, we weren’t going to be thrown to the lions in our first physics class, so these simple systems were enough to begin.

Years later, I would refine these models to become gradually more realistic. To give an explicit example, consider the equations for kinematics, which one learns about in secondary school and are given by: What one immediately learns following this is that these are equations that describe the motion of a free-falling object under a constant acceleration. These two emphasized terms are important, because unless you’re trying to describe the motion of projectiles in outer space, these equations don’t actually describe the motion of the systems. There are a few reasons why this is so. First, as alluded to above, these equations are only valid where there is no force acting on the system except for gravity. This is obviously not realistic, since there are other forces that can act on a system when they are launched (such as air friction). Therefore, modeling the situation as if air friction didn’t exist can only give an approximate answer at best. The presence of only gravity as a force is what is meant bu the term free-falling.

Second, the acceleration needs to be constant, and this isn’t true either. If we simply take the example of launching a system into the air, the fact that air friction acts as a force on the system changes the acceleration of the system, thereby nullifying our kinematic equations once again.

Alright, so those are a few reasons why the kinematic equations don’t work, but what does the difference look like in our models? I won’t go through the whole derivation of the kinematic equations when we add in air friction, but here’s a plot that shows the difference between the two for a tennis ball.

As you can see, the difference is very small. Indeed, it took me a bit of time to figure out what kind of object would show some more obvious deviations from the original parabola (which is the red curve). Finally, I found a good example, which is a table tennis ball. The more accurate curve that takes air friction into account (in blue) is quite close to the red curve, so to a first approximation, our original model without air friction is pretty good. Actually, if you take the whole trajectory into account, you can see that the two curves diverge in the latter half of the trajectory.

Table tennis ball trajectory of height as a function of distance

You might be thinking, “Alright great, we have the solution for the trajectory, so now this problem is solved.” But that’s not quite true. If you’ve ever thrown hit a table tennis ball, you know that it doesn’t just fly in the air in one position. It spins, and that rotation changes how the ball moves (as anyone who plays table tennis knows). As such, the moral of the story is that we can always add more elements into the models that make them more accurate. However, that always comes at the cost of simplicity, so your model becomes more difficult to compute as you increase the features it encodes. At some point, you have to choose where you want to fall on the spectrum of simplicity to complexity.

How much stock can we put into models?

So who cares about the trajectory of a ball when we throw it? Chances are, not many. The reason I wanted to show you this one was just to illustrate what we need to take into account when we want to model some sort of phenomena. There are always tradeoffs, and these tradeoffs affect our accuracy.

The problem that we as scientists can fall into is failing to communicate how these models work to the public. It’s nice to give big, qualitative statements about the future, but often we don’t share the limitations of these statements. What I mean by this is simply that our statements in science are often predicated on models. And, as I mentioned in the beginning of this piece, models are only as good as their built-in assumptions. Einstein’s theory of general relativity is a fantastic framework for understanding and predicting many features of spacetime, but if we suddenly see that there isn’t a speed barrier in the universe, then the whole model is useless physically. That’s obviously an extreme example, but the broader point is to keep in mind the limitations of models.


A model is something we use to describe the world. If it’s a very good model, then it may even make predictions about things we haven’t yet discovered. But what you shouldn’t do is keep yourself tied to a specific model. That’s because every model has its own domain of applicability, and trying to apply the model past this domain isn’t a good idea.

We should all keep this in mind when we hear news reports about extraordinary things. First, what kind of model is being used? Is it a model that has proven value, or is it something completely new? Second, what kind of domain of applicability does this model have, and does going past it significantly change the results? As you can see from the example we did above, not including air friction didn’t significantly change the results. However, the amount that is “bad” is very subjective, which means it depends on the application. If we are trying to understand simple models of astrophysical phenomena, we might not be too picky if our results could be up to 20% off (depending of course on the situation). However, if you have a model that predicts certain health issues in patients, misidentifying one in five patients is much too high (for myself).

Therefore, the next time that we hear something extraordinary on the news, think about the model that’s being used. I understand that we can’t possibly research every single ridiculous claim that is made, but a bit more skepticism and curiosity about the details of such claims would not be a bad thing.

Cramer's Rule for Solving Linear Systems

Throughout secondary school, we learn about solving simple systems of equations. We learn that there are several methods (which are more or less the same thing): comparison, substitution, and elimination are the big three. We then go through tons of practice questions which all focus on doing this kind of solving. In particular, we solve systems of two equations, and I don’t recall ever doing more than that. The sense I get from students is that solving these kinds of equations is sometimes confusing. I personally think it’s because there’s a lack of equivalence between the methods, so they all seem like pulling magic tricks out of a hat. That’s a pedagogical/time problem, but I don’t want to focus on that today. Instead, I want to focus on something that makes solving these systems of two equations super quick. It’s called Cramer’s rule, and this method makes it possible to forego using comparison, substitution, or elimination to solve for variables. Instead, you apply this procedure, and you can simply read off the answer.

Going with my new decision to do examples first and theory second, let’s use the following system of equations: Solving this is a bit tedious. Personally, I would eliminate one of the variables, but there’s no “right” way to do it. However, we’re going to be clever, and change how we right these equations. Instead of writing two separate equations, we’re going to write them as one matrix equation, which looks like this: To make sense of how this works, I’ll explain the three differents parts you see in this equation. The first block is called a 2 by 2 matrix, and it encodes the various coefficients of our system. The next part is our vector $\textbf{x}$, given by $\textbf{x} = (x,y)^T$. This is just our variables in the equation. Lastly, we have another vector, which is simply what our two equations are equal to when you write out the equations.

To see how this is equivalent to our above system of equations, we need to multiply the matrix by the vector on the left-hand side. If you’ve taken a linear algebra course, you remember that matrix multiplication is given by multiplying the row of the matrix by the column of the vector. As such, we would get: Now, since this is a vector (in the sense that it has only one column, and a certain number of rows), we just match up the first component of this vector with the first component of the right hand side of our system. Similarly, we do the same for the second component, recovering our two first equations.

Let’s name our matrix $A$, and our vectors $\textbf{x}$ and $\textbf{b}$, with the latter being the vector on the right hand side of our equation. We can then write this as: Our end goal is to solve for $\textbf{x}$. Since the notation above looks a lot like regular algebra, we may be tempted to write the following: This temptation must be resisted! This is because we don’t have a notion of what it means to divide a quantity by a matrix. You could make one up, of course, but in the standard theory of linear algebra, this operation isn’t definied. However, there is a similar notion in linear algebra, and it’s called the inverse of a matrix. The idea is simple. In regular multiplication (as I wrote about here), if we have a number (call it $c$), than its inverse is $c^{-1}$, so that we get $cc^{-1} = 1$. It’s that same idea here with matrix multiplication, except that instead of having the matrix multiplication being equal to one, it will be equal to the equivalent of matrix multiplication, which is called the *identity matrix. It looks like this: Here, $n$ just refers to the dimension you want the matrix to be. Basically, the identity matrix is a matrix with zeros everywhere except for the diagonals, which are ones. The important part is that if you multiply any non-zero matrix or vector by the identity matrix, you get back the same vector/matrix.

With that out of the way, instead of dividing by our matrix $A$ from before to solve for $\textbf{x}$, we will multiply both sides of the equation by the inverse matrix $A^{-1}$. Doing so will give us: And that’s how you solve the equation! Now, the only small detail you might be asking yourself is, “How do I find the inverse of a matrix?”

This is a good question, and in general it requires a long procedure of manipulating the rows of the matrix. However, for the case of a 2 by 2 matrix like we have in our example, there’s a simple formula for the inverse of any 2 by 2 matrix with something called a non-zero determinant. The expression is given by: The denominator $ad-bc$ is the determinant of the matrix A, and we obviously need this to be non-zero so our fraction is defined. Apart from that, this gives us our expression for $A^{-1}$, which we can then throw into our equation above.

Doing exactly this with our original matrix gives: Therefore, solving our equation for our vector $\textbf{x}$ gives: Just like that, we have our answer. This is why it’s nice to use this method for 2 by 2 systems. Calculating the inverse is very straightforward and efficient, so we can get our answer quickly without doing a bunch of substitutions.

You might be wondering if you can do this for larger systems, and the answer is most definitely yes. However, as soon as you go to 3 by 3 systems, it becomes much more time-consuming to calculate the inverse of your matrix, so it might be faster to do substitutions. I just wanted to outline the method here because it’s much faster than the usual comparison, substitution, and elimination methods.

One thing to note though is that this isn’t quite Cramer’s rule. If you look at the “Applications” section of the Wikipedia entry for Cramer’s rule, the procedure is to simply compute the solution using the following: In this formula, you take your regular matrix $A$ as we saw before and take it’s determinant (the denominator of the fraction). Then, the matrix $A_i$ is the same matrix as $A$ except that you replace the $i$-th coloumn with the vector $\textbf{b}$.

As such, if we wanted to calculate the first component of our solution vector (which is $x_1$), our matrix $A_1$ would be: Therefore, the determinant of this matrix is given by $det(A_1) = (10)(-5) - (13)(3) = -89$. And, the determinant of just our regular matrix $A$ as before is $det(A) = -31$. Putting this together gives us the first component of our solution: You can check to see this is indeed the first component of our solution given by equation (11).

As such, this generalizes better to higher dimensions (by doing the swapping of columns of your matrix $A$), but I wanted to show the 2 by 2 case because it is very easy to follow. Now, you should be able to solve these systems much more easily than going through a long substitution.

The Distance Formula (Equation Explainer)

The distance formula is one of the most frequently used relations in physics, allowing us to decompose a variety of vectors into different components. It’s something that every physics student uses, and so it becomes second-nature for most of us. However, I’ve come across the sad fact that many secondary schools don’t seem to teach how the distance formula comes about and its connections with earlier work. As such, the link between algebra and geometry is lost and the distance formula gets lost in calculating differences between $x$ and $y$. Here, the goal is for us to look at the distance formula and see how it relates to other concepts that are of much use.

Algebra

In secondary school, students are presented with a formula that is supposed to give the distance between two points on a graph:

In my mind, this is a bit of an intimidating formula. There are several moving parts (variables), and it looks like you have to keep everything in check in order to get the distance right. As such, I find that almost no student actually remembers this formula without consulting their memory aid. It’s finnicky, and it seems like one wrongly placed number will make everything go haywire1.

However, what is important here is that the formula isn’t very illuminating to the student. I wouldn’t say it’s necessarily because the formula is “ugly” (which is subjective), but that it isn’t explained. To see the formula properly, it’s critical that we don’t only look at the algebra, but that we also draw some sketches.

Graphical

Let’s set up our problem. Given two points $A$ and $B$, we want to find the distance between these points when we draw a straight line between them. We can represent this as such:

In our sketch, we simply have the two points that are placed in a regular Cartesian plane. Now, we know that the equation for the distance between points $A$ and $B$ is what we saw above, but what many students don’t see is that this equation can be seen from the graph. It’s not a magical formula that one gets independent of the graph. The formula is intimately tied to geometry. To make this suggestive, let me draw a few extra lines on our graph:

So how does this help us? Well, one thing you might notice is that if we connect the two points by a straight line, the length of that line is what we want. Furthermore, when we take this line with the other two I have drawn, we get a right triangle.

If there’s one thing I’m hoping that you remember, it’s that we have a very special relation between the three sides of a right triangle: the Pythagorean theorem. This is the quintessential secondary school formula, and I bet you can recite this one from memory. It’s given by $a^2 + b^2 = c^2$, where $a$ and $b$ are the legs of the triangle and $c$ is the hypotenuse.

But what I want you to see is that it’s exactly this relation that gives us our distance formula! If you look again at the lines I drew in our graph, we just need to find the two legs in order to compute the hypotenuse, which is our distance $d$. It might look like we don’t have the information to do this, but if we inspect our points closer, you’ll find that we do.

Start with the green side. For simplicity, let’s suggestively call it $\Delta y$, since I think you would agree that we are looking at the change in the $y$ coordinate. The length for this leg of the triangle is simply how much the $y$ coordinate changes when we go from $A$ to $B$. Note here that when we talk about a length, our measurement is positive. This means that, even if point $B$ is lower than point $A$, we still say that the length is positive2.

We can apply a similar argument for what I’ll call $\Delta x$, and this will give us a picture that looks like this:

Finding the distance between $A$ and $B$ is now a piece of cake. Since we know the two legs, we can apply the Pythagorean theorem to get:

Solving for $d$ gives us:

The final thing to do is find an expression for $\Delta x$ and $\Delta y$. This isn’t as difficult as it may look, because one thing we do know about this problem is where $A$ and $B$ are (which means we know their coordinates). Therefore, if we let $A = (x_1, y_1)$ and $B = (x_2, y_2)$, we know that $\Delta x = x_2 - x_1$ and $\Delta y = y_2 - y_1$. Putting this into our expression for $d$ gives us:

This is exactly the expression you probably have somewhere on your memory aid, but now you know why it’s there. When we compute the distance between two points in the plane, what we are really doing is chopping that distance into a new path, one that gives us the legs of a right triangle. Then, using geometry we’re used to, we work backwards to find the original distance we are looking for. There’s no magic here, just geometry.

Three Dimensions

You might now be wondering what the distance formula would be in three dimensions. Well, I’m not going to go through a whole rigourous proof, but here’s the idea. Imagine we have the following situation where we want to find the distance between the origin $O$ and another point $A$ (I’m using the origin just for a bit of added simplicity).

Hopefully you can see that the distance now occupies three dimensions, which I’ve denoted as $x$, $y$, and $z$. The algorithm is then simple. First, we compute the distance that lies on the $x-y$ plane, as shown below (the solid orange line). Note that I’ve once again used $\Delta x$ and $\Delta y$ to denote the side lengths.

Now that we have the length between along the plane, we can now simply extend this to our third dimension using the Pythagorean theorem.

Putting these steps together, we can see that the distance between a point given by $A = (x_0, y_0,z_0)$ and the origin $O$ is given by:

This is then easily modified for any two points, by simplying replacing every squared term by the difference of the coordinates, such as $(x_1 - x_0)$.

Going Up in Abstraction

Now that we’ve covered our physical dimensions, we might start asking what seem like more difficult questions, such as:

What’s the distance between two points in 4 dimensions? What about $n$ dimensions?

This isn’t a distance you can visualize, of course (unless you’re very good at visualization). We are limited by three dimensions, but the good news is that the mathematics doesn’t care if we work in one, two, three, or $n$ dimensions. It turns out that we can just continue applying the procedure that I outlined above for any number of dimensions you are insterested in. Therefore, we can write the distance between two points given by $A = (x_1, x_2, x_3, … , x_n)$ and $B = (y_1, y_2, y_3, \ldots , y_n)$ is:

In other words, just keep on applying the difference between the two points and squaring the result to get the distance in any dimension.


So why does this work? Or rather, did I just get lucky in the way that I drew my graph that the lines came out to be a right triangle?

The answer is no. I didn’t choose the lines randomly. If you look back at the graph, the lines I drew in are parallel to to the axes. What this does is guarantee two things. First, it ensures that we can easily calculate the distance along either the horizontal ($\Delta x$) line or the vertical ($\Delta y$) line. The second thing is that, by drawing our lines parallel to the $x$ and $y$ axes, the two lines are perpendicular (orthogonal), which means the triangle we get is a right triangle, just what we need to use the Pythagorean theorem.

I hope this makes the distance formula seem a bit more clear. The formula comes directly from the geometry of the situation, which I have demonstrated here. As such, the only thing to remember is that we need to use the Pythagorean theorem to find the distance between two points. After that, everything is a piece of cake.

  1. To assuage this fear a bit, I want to point out a property of a term that looks like $(a-b)^2$. We know that the only difference between $a-b$ and $b-a$ is that one will be negative and one will be positive (we’re assuming they aren’t the same number), but the absolute value is the same. Now, if we square this term, the result is always positive (a property of squaring a number). Therefore, the lesson I want to impart here is that the order of $x_{1}$ and $x_{2}$ don’t matter, since we are squaring the result. 

  2. So why are lengths always positive, even though we know our $\Delta y$ or $\Delta x$ may be negative? I explained to you the “physical” way of thinking about it (I often think in the way of my background in physics), but this is also encoded mathematically in our distance formula. When I presented these two “extra” lengths in our graph, you agreed with me that they were $\Delta x$ and $\Delta y$. However, if you think about it, each of those line segments should be able to be computed using our distance formula, and indeed they can. If we take $\Delta x$ as an example, we know that $y$ does not change as we move along this line, so the distance formula gives $d = \sqrt{(\Delta x)^2} = \Delta x $. Note here that it doesn’t matter if $\Delta x$ is negative, since you have to first square it, and then take the square root. This has the pleasant effect as of taking care of the issue with signs, so you don’t have to worry, and it matches our physical intuition about lengths being positive. 

Examples Before Abstraction

When learning a new topic, there’s always a certain tension between two approaches: going straight to abstraction, or starting off easier with examples. I see this more and more as I learn about more complex and detailed physics and mathematics, and it has always made me wonder which way I should go about trying to learn. Just like anyone else, I want to get to a place where I feel fully comfortable with the concept in abstraction, but I don’t want to subject myself to a painful learning process by hitting myself against the brick wall of abstraction.

For a feeling of what this is like, do the following. Pick a topic that you feel comfortable with, and then look up that topic in some other resource. Chances are (particularly if you’re looking at a mathematics book), the equations and explanations you will see will look completely confusing. This has happened many times to me, and it’s humbling every time I go through this exercise. However, the simple reality is that it’s discouraging, since a topic you thought you knew has a bunch of other facets that you didn’t know about. Of course, this may be an incentive to learn more, but I know that, at least for me, it would make me want to stop.

The other reason I’ve been thinking about this is because I work with a lot of students with classes I’ve already done. When I work with them, I’m often tempted to give them answers that are more abstract than they are used to. In my mind, the idea was to encourage them to see the concepts a bit more abstractly. However, upon reflection for a while now, I’ve been worried that perhaps the abstraction was too much at their stage. It’s not that they couldn’t see the abstraction, but simply that they were already working hard to understand the concepts, so piling on more wasn’t helpful. I see that now in a way that I didn’t quite see before. It has come through my own self-study, where I’ve found it’s not always helpful to give the full abstraction or generalization before concretely looking at examples.

In order to illustrate this, consider the mathematical object called a tensor. The way I like to think of tensors is that they are generalizations of familiar objects like scalars, vectors, and matrices. I personally use tensors in my research within general relativity, but the point I want to make is that one way we can define a tensor is through their transformation properties. In general relativity (and in other applications of differential geometry), we like to work with the tools of calculus. However, since you may have heard that spacetime is curved, it’s not so simple to ask what a “regular” derivative is. In order to make up for this curvature, we have a new notion for derivatives, called the covariant derivative. It’s given by (for a tensor of rank $k + l$):

The point with this long equation isn’t for you to understand it. Heck, it’s still difficult for me to follow. However, what I think we can easily agree on is that this is not the first equation you want to show someone who is learning about tensors for the first time. Sure, it’s a general equation that applies to most situations, but the drawback is that it has a lot going on. This is a repeating pattern that I see a lot in general relativity. The equations are very long, which means they are difficult to analyze and it isn’t always easy to understand what they mean. Therefore, I would argue this is not suited to the beginner. Instead, in this particular case, I would make sure the student understands the two fundamental transformations that play a role here: the upper indices transformation, and the lower indices one. Putting these together gives us the equation above, but it also gives the student a sense of what is going on. Simply giving the student this above equation isn’t helpful on its own.

This is only one personal example, but like I’ve also mentioned, I’ve seen this kind of situation happen with the sutdents I work with. As such, I’m trying to constantly remind myself that the material itself can be challenging to learn, and so I shouldn’t make things too abstract before they are ready. I’m sure there are some who thrive by looking directly at the most abstract and general concepts, but I know that many people aren’t like that. Furthermore, I experience this exact scenario myself during my self-studying, so I shouldn’t try and bring the level of abstraction up too quickly.

After all, the next rung in the ladder of abstraction will always be there for the student to climb.