On Uncertainty in Science

I’ll let you in on a bit of a secret. For most of my life, I hated doing experiments in science.

It didn’t really matter if the experiments were in physics, chemistry, or biology class (though I enjoyed the fact that physics experiments tended not to be as messy). In fact, when I was in secondary school, my grade was asked at the end of the year to vote on what kind of science class they wanted the next year. There were two choices. One was to keep the material more theoretical and from the textbook. The second was to introduce the content in a much more “hands-on” sort of way, which meant more laboratory experiments. If I recall correctly, I was one of the only students who chose the first option.

I didn’t really understand why everyone wanted to do the hands-on program. In my eyes, it just made things seem less exact and more messy. Other students seemed to like the idea that they could do experiments, but it wasn’t my idea of a fun time.

Moving into CÉGEP, I kept this attitude of not enjoying lab experiments. They were annoying to do, and completing the lab reports after were the worst. One had to deal with uncertainties and significant figures and sources of error that made everything seem much more messy than the theoretical predictions that were made using mathematics. I longed for simple relations without error bars.

From reading the above, it may seem like I think science should be all theoretical. Of coure, this is not the case, and I think, if anything, we need to talk more about the uncertainty and messiness in science. If we want to have a society that understands the way we get results in science, we need to communicate this uncertainty more clearly.

Science is not mathematics. Sure, we want to describe the world using mathematics as our language, but we need to keep in mind that nature will not bend to our will. There will always be fluctuations, imprecise measurements, and sheer randomness in some data. We use mathematics to make these uncertainties as small as possible, but we can never fully eliminate them. As such, it’s crucial to realize that a measurement means nothing without its corresponding uncertainty. The reason is simple: we take measurements in order to compare them. If we just dealt with measurements as precise quantities that have no uncertainty, than we would find a lot less agreement with our predictions. This would make it near impossible to do science.

Let’s take a very simple example. Imagine we wanted to measure an object that is said to be 4.500 metres long. To verify this claim, we take a metre stick that has granulations every centimetre and measure the object. Say it comes out to 4.52 metres. Do we say that these two measurments are different?

The answer is, it depends. To find out for sure, we need to know the uncertainties that are associated with each measurement. When the object was measured to be 4.500 metres long originally, what were the uncertainties on that measurement? Was it $\pm \ 1 mm$? These are critical questions to ask when making comparisons.

If we imagine that the metre stick has an uncertainty of $\pm \ 1 cm$ (because this metre stick is only marked off in centimetres), then the two values we are comparing are: The question now becomes: do these two measurements overlap? This is the key question, and in our case, the measurements don’t overlap, since the first measurement could be at most 4.501 m and the second measurement could be at least 4.51 m. Since these two measurements don’t overlap, we consider them to not be in agreement.

As you may notice, this isn’t a trivial matter. It may have seemed like the two measurements did agree at first glance, but without knowing their associated uncertainties, we have no idea. This means that if someone tells you some figure that came from experiment and wasn’t just a theoretical calculation, you need to know their uncertainty if you want to compare the figure to anything else. Without it, the measurement is meaningless.

What I want to stress here is that uncertainty is inherent in science. There’s no getting around this fact, no matter how precise and careful your experiment is. This is why I find it so amusing when people attack scientific results on the basis that they are simply uncertain. Of course they are! This isn’t mathematics, where results have infinite precision. In science, we have this inherent uncertainty, but we use the tools of mathematics to make sure that the uncertainty is as small as possible, and we make our claims using this uncertainty. We make do with what nature presents us.

If there’s one thing I want to ask of you, it is this: make sure you’re aware of the inherent uncertainty in science, so that you aren’t worried when you see scientists saying that the measurements agree with theory, despite the seeming non-equivalence. Chances are, the uncertainties in the measurement is what allows scientists to make this claim. Conversely, look for those who try to exploit this characteristic of science to push information that simply isn’t supported by the scientific method.

Mathematical Sophistication

When I reflect on my education in science (and in physics in particular), the common theme I see is just how the amount of sophistication present in the computations and concepts I learned each year kept increasing. If there was one thing I could count on, it wasn’t learning something “new”. Instead, it was about viewing things I might have once taken for granted as a process that was much more deep than I realized.

For example, take Snell’s law. In secondary school, I learned how this phenomena worked in the sense that I could calculate the effects. I learned that Snell’s law could be written like this: This allows one to calculate the angle of refraction for various simple systems, and this is exactly what I remember doing. Additionally, the “reason” for why this was true seemed to be something about the light “slowing down” in a different medium, but the reasoning wasn’t all that clear. In the end, it was more of a “here’s the law, now calculate it” sort of concept.

At the time, I don’t remember being bothered by this. Now though, it makes me frustrated, since what is the point of learning these ideas if one doesn’t learn why this specific result occurs? It’s something I’ve been thinking about a fair amount lately.

Fast-forward a few years, and now Snell’s law gets derived using Fermat’s principle of least time, which uses the calculus of variations, and gives one a more satisfying explanation concerning what is going on when the light rays “bend”. In this sense, the mathematics produce the result, which is better than being told the result.

Another example is one that I hadn’t thought about much until I came across it. Anyone who has gone through a class in statistics has seen how to fit a curve to a collection of data points. Usually, one is concerned only with fitting a linear curve, but sometimes we also plot quadratic curves as well (with software).

In the case of linear plots, in secondary school, the recipe went like this. First, one should plot the points on a graph. Then, one needs to carefully draw a rectangle around the data points, and then measure the dimensions of this rectangle. From there, the slope can be calculated, and then a representative point was chosen in order to find the initial value of the line. Basically, this was an exercise in graphing and drawing accuracy, not something you’d want from a mathematics class. As such, while the results were qualitatively correct, they coud differ widely from student to student.

Fast-forward a few years later once again, and the story is much different. In my introductory statistics for science class, we were given the equation that would give us the slope of our linear equation, as well as the correct point to use for the initial value. This undoubtedly produced more accurate results, but once again it lacked the motivation behind it (due to a lack of time, in this case). Thankfully, this lack of explanation was addressed in my linear algebra class, where we learned the method of least-squares. Here was finally an explanation as to how these curves were computed. In the statistics class, it was a long and complicated formula that was given. However, in linear algebra, the reasoning behind how to compute such a curve was much simpler and straightforward. In other words, it made sense as a process. Even better, this method generalizes well for other types of curve fitting, not just linear functions. As such, this explanation was much more useful than all of the other ones.

The lesson that I personally get is that, no matter the topic you’re learning, there often is another layer of understanding that can complement it. This means that I shouldn’t stop looking at concepts that I’ve seen many times just because I think they are boring! There are often new perspectives to look at the situations, and they usually come tied with more mathematical sophistication. This is something that I love to see, because it brings new viewpoints to concepts I might have though I had completely figured out. This shows me that I can always learn and understand a concept more thoroughly, and hopefully this can be good inspiration for you to seek out varied explanations of your favourite concepts.

Just because classical mechanics is, well, classical, doesn’t mean you can’t look at it in more sophisticated ways.

The Limitations of Models

As many students in the sciences know, the reason we use mathematics to describe our results is because mathematics is the most precise language we possess. It’s not like we have some sort of favouritism towards mathematics that we don’t have to other languages like English or French. Quite frankly, it’s an issue of precision in what one is communicating. It’s the difference between saying I can see a red light and that I can see a light of about 600 nanometres. It’s the difference between basing a prediction on past results and basing on extrapolating from a model.

However, what is often missed in the public is the fact that science is based on mathematical models. And, as any scientist will tell you, a model is only as good as the assumptions it makes. This means the models are inherently different from what we would call “real life”.

Simplicity to complexity

When you first learn physics in secondary school, you typically learn about the big picture concepts, such as Newton’s laws, some optics, and maybe even something about waves. If we focus on only Newton’s famous $\vec{F} = m \vec{a}$, you learn about solving this equation for an easy system. Additionally, one usually starts without any notion of calculus, so the questions revolve around either finding a force or finding the acceleration of the system. Personally, I remember analyzing systems such as a block that is subject to a variety of constant forces. This made the analysis easy, compared to what one does a few years later in more advanced classes.

However, what one must keep in mind is that the systems I was analyzing weren’t realistic. If one stops to think about it, there aren’t many forces that are constant with time (even our favourite gravitational force $m\vec{g}$ isn’t technically a constant). However, we weren’t going to be thrown to the lions in our first physics class, so these simple systems were enough to begin.

Years later, I would refine these models to become gradually more realistic. To give an explicit example, consider the equations for kinematics, which one learns about in secondary school and are given by: What one immediately learns following this is that these are equations that describe the motion of a free-falling object under a constant acceleration. These two emphasized terms are important, because unless you’re trying to describe the motion of projectiles in outer space, these equations don’t actually describe the motion of the systems. There are a few reasons why this is so. First, as alluded to above, these equations are only valid where there is no force acting on the system except for gravity. This is obviously not realistic, since there are other forces that can act on a system when they are launched (such as air friction). Therefore, modeling the situation as if air friction didn’t exist can only give an approximate answer at best. The presence of only gravity as a force is what is meant bu the term free-falling.

Second, the acceleration needs to be constant, and this isn’t true either. If we simply take the example of launching a system into the air, the fact that air friction acts as a force on the system changes the acceleration of the system, thereby nullifying our kinematic equations once again.

Alright, so those are a few reasons why the kinematic equations don’t work, but what does the difference look like in our models? I won’t go through the whole derivation of the kinematic equations when we add in air friction, but here’s a plot that shows the difference between the two for a tennis ball.

As you can see, the difference is very small. Indeed, it took me a bit of time to figure out what kind of object would show some more obvious deviations from the original parabola (which is the red curve). Finally, I found a good example, which is a table tennis ball. The more accurate curve that takes air friction into account (in blue) is quite close to the red curve, so to a first approximation, our original model without air friction is pretty good. Actually, if you take the whole trajectory into account, you can see that the two curves diverge in the latter half of the trajectory.

Table tennis ball trajectory of height as a function of distance

You might be thinking, “Alright great, we have the solution for the trajectory, so now this problem is solved.” But that’s not quite true. If you’ve ever thrown hit a table tennis ball, you know that it doesn’t just fly in the air in one position. It spins, and that rotation changes how the ball moves (as anyone who plays table tennis knows). As such, the moral of the story is that we can always add more elements into the models that make them more accurate. However, that always comes at the cost of simplicity, so your model becomes more difficult to compute as you increase the features it encodes. At some point, you have to choose where you want to fall on the spectrum of simplicity to complexity.

How much stock can we put into models?

So who cares about the trajectory of a ball when we throw it? Chances are, not many. The reason I wanted to show you this one was just to illustrate what we need to take into account when we want to model some sort of phenomena. There are always tradeoffs, and these tradeoffs affect our accuracy.

The problem that we as scientists can fall into is failing to communicate how these models work to the public. It’s nice to give big, qualitative statements about the future, but often we don’t share the limitations of these statements. What I mean by this is simply that our statements in science are often predicated on models. And, as I mentioned in the beginning of this piece, models are only as good as their built-in assumptions. Einstein’s theory of general relativity is a fantastic framework for understanding and predicting many features of spacetime, but if we suddenly see that there isn’t a speed barrier in the universe, then the whole model is useless physically. That’s obviously an extreme example, but the broader point is to keep in mind the limitations of models.

A model is something we use to describe the world. If it’s a very good model, then it may even make predictions about things we haven’t yet discovered. But what you shouldn’t do is keep yourself tied to a specific model. That’s because every model has its own domain of applicability, and trying to apply the model past this domain isn’t a good idea.

We should all keep this in mind when we hear news reports about extraordinary things. First, what kind of model is being used? Is it a model that has proven value, or is it something completely new? Second, what kind of domain of applicability does this model have, and does going past it significantly change the results? As you can see from the example we did above, not including air friction didn’t significantly change the results. However, the amount that is “bad” is very subjective, which means it depends on the application. If we are trying to understand simple models of astrophysical phenomena, we might not be too picky if our results could be up to 20% off (depending of course on the situation). However, if you have a model that predicts certain health issues in patients, misidentifying one in five patients is much too high (for myself).

Therefore, the next time that we hear something extraordinary on the news, think about the model that’s being used. I understand that we can’t possibly research every single ridiculous claim that is made, but a bit more skepticism and curiosity about the details of such claims would not be a bad thing.

Cramer's Rule for Solving Linear Systems

Throughout secondary school, we learn about solving simple systems of equations. We learn that there are several methods (which are more or less the same thing): comparison, substitution, and elimination are the big three. We then go through tons of practice questions which all focus on doing this kind of solving. In particular, we solve systems of two equations, and I don’t recall ever doing more than that. The sense I get from students is that solving these kinds of equations is sometimes confusing. I personally think it’s because there’s a lack of equivalence between the methods, so they all seem like pulling magic tricks out of a hat. That’s a pedagogical/time problem, but I don’t want to focus on that today. Instead, I want to focus on something that makes solving these systems of two equations super quick. It’s called Cramer’s rule, and this method makes it possible to forego using comparison, substitution, or elimination to solve for variables. Instead, you apply this procedure, and you can simply read off the answer.

Going with my new decision to do examples first and theory second, let’s use the following system of equations: Solving this is a bit tedious. Personally, I would eliminate one of the variables, but there’s no “right” way to do it. However, we’re going to be clever, and change how we right these equations. Instead of writing two separate equations, we’re going to write them as one matrix equation, which looks like this: To make sense of how this works, I’ll explain the three differents parts you see in this equation. The first block is called a 2 by 2 matrix, and it encodes the various coefficients of our system. The next part is our vector $\textbf{x}$, given by $\textbf{x} = (x,y)^T$. This is just our variables in the equation. Lastly, we have another vector, which is simply what our two equations are equal to when you write out the equations.

To see how this is equivalent to our above system of equations, we need to multiply the matrix by the vector on the left-hand side. If you’ve taken a linear algebra course, you remember that matrix multiplication is given by multiplying the row of the matrix by the column of the vector. As such, we would get: Now, since this is a vector (in the sense that it has only one column, and a certain number of rows), we just match up the first component of this vector with the first component of the right hand side of our system. Similarly, we do the same for the second component, recovering our two first equations.

Let’s name our matrix $A$, and our vectors $\textbf{x}$ and $\textbf{b}$, with the latter being the vector on the right hand side of our equation. We can then write this as: Our end goal is to solve for $\textbf{x}$. Since the notation above looks a lot like regular algebra, we may be tempted to write the following: This temptation must be resisted! This is because we don’t have a notion of what it means to divide a quantity by a matrix. You could make one up, of course, but in the standard theory of linear algebra, this operation isn’t definied. However, there is a similar notion in linear algebra, and it’s called the inverse of a matrix. The idea is simple. In regular multiplication (as I wrote about here), if we have a number (call it $c$), than its inverse is $c^{-1}$, so that we get $cc^{-1} = 1$. It’s that same idea here with matrix multiplication, except that instead of having the matrix multiplication being equal to one, it will be equal to the equivalent of matrix multiplication, which is called the *identity matrix. It looks like this: Here, $n$ just refers to the dimension you want the matrix to be. Basically, the identity matrix is a matrix with zeros everywhere except for the diagonals, which are ones. The important part is that if you multiply any non-zero matrix or vector by the identity matrix, you get back the same vector/matrix.

With that out of the way, instead of dividing by our matrix $A$ from before to solve for $\textbf{x}$, we will multiply both sides of the equation by the inverse matrix $A^{-1}$. Doing so will give us: And that’s how you solve the equation! Now, the only small detail you might be asking yourself is, “How do I find the inverse of a matrix?”

This is a good question, and in general it requires a long procedure of manipulating the rows of the matrix. However, for the case of a 2 by 2 matrix like we have in our example, there’s a simple formula for the inverse of any 2 by 2 matrix with something called a non-zero determinant. The expression is given by: The denominator $ad-bc$ is the determinant of the matrix A, and we obviously need this to be non-zero so our fraction is defined. Apart from that, this gives us our expression for $A^{-1}$, which we can then throw into our equation above.

Doing exactly this with our original matrix gives: Therefore, solving our equation for our vector $\textbf{x}$ gives: Just like that, we have our answer. This is why it’s nice to use this method for 2 by 2 systems. Calculating the inverse is very straightforward and efficient, so we can get our answer quickly without doing a bunch of substitutions.

You might be wondering if you can do this for larger systems, and the answer is most definitely yes. However, as soon as you go to 3 by 3 systems, it becomes much more time-consuming to calculate the inverse of your matrix, so it might be faster to do substitutions. I just wanted to outline the method here because it’s much faster than the usual comparison, substitution, and elimination methods.

One thing to note though is that this isn’t quite Cramer’s rule. If you look at the “Applications” section of the Wikipedia entry for Cramer’s rule, the procedure is to simply compute the solution using the following: In this formula, you take your regular matrix $A$ as we saw before and take it’s determinant (the denominator of the fraction). Then, the matrix $A_i$ is the same matrix as $A$ except that you replace the $i$-th coloumn with the vector $\textbf{b}$.

As such, if we wanted to calculate the first component of our solution vector (which is $x_1$), our matrix $A_1$ would be: Therefore, the determinant of this matrix is given by $det(A_1) = (10)(-5) - (13)(3) = -89$. And, the determinant of just our regular matrix $A$ as before is $det(A) = -31$. Putting this together gives us the first component of our solution: You can check to see this is indeed the first component of our solution given by equation (11).

As such, this generalizes better to higher dimensions (by doing the swapping of columns of your matrix $A$), but I wanted to show the 2 by 2 case because it is very easy to follow. Now, you should be able to solve these systems much more easily than going through a long substitution.