One-to-One and Onto

When learning about functions, a few properties come up over and over. In particular, we often hear about functions being one-to-one (injective) or onto (surjective). These are important properties of functions that allow one to set up correspondences between sets (bijections), as well as study other features of various functions. I wanted to go through these two properties in a slightly different way than what most sources will do to explain them, so hopefully this will be a good analogy to keep on mind when discussing these two properties.

First, let’s define our function. We will consider a function ƒ that moves objects from a set X to a set Y. We usually call X the domain, and Y the range.

If ƒ is one-to-one, it has the special property that any time you look at an element y in Y that has some element from X sent to it, only that element gets sent to y. If we think of the function as a bunch of arrows linking up elements from X to elements in Y, there will be at most one arrow pointing to any given element in Y.

The way I think about it is this. Imagine the function as a machine with a specific set of instructions, and an input and output area. You stand at the output area, and I stand at the input area. If you know set of instructions that the machine executes, and you receive something at the output of the machine, can you tell me what I put into the machine without seeing it beforehand?

Your initial reaction might be, “Of course, I can tell you what you put in! I’ll just do the instructions in reverse.”

This is fine, but let’s go through an example. Suppose that the machine takes in a number, and then outputs that number modulo two. This means that the machine outputs the remainder of the input number after division by two. In other words, this machine will ouput 1 if the input number was odd, and 0 if the input number was even.

Okay, so you wait at one end of the machine while I insert my number, and the output of the machine is 0. What number did I put in? Well, you know that the number wasn’t odd, because only even numbers get sent to zero. However, apart from that piece of information, you have no idea what number I put in. It could have been any even number.

This is an example of a function that isn’t one-to-one. If we go back to our definition, the reason it isn’t one-to-one is because every single even number “points” to zero. Therefore, you can’t “undo” the machine’s instructions, because they destroyed (for lack of a better word) the nature of the input while executing the intructions.

I like this analogy with the machine, because it helps us see that not all functions “go both ways”. You can’t always get back to the input from the output. In fact, this property only holds when a function is both one-to-one and onto.

Which brings us to the property of being onto. This one’s slightly different. A function is onto if, no matter what element you pick in Y, there is some arrow pointing to it from the set X. More formally, for any yY, there exists a xX such that ƒ(x) = y.

I like to think of this from the perspective of the set X. When I think of a function being onto, I imagine we are shooting arrows from the set X and doing so in a way that we ensure each elements in Y are hit. Put differently, our function ƒ doesn’t “miss” any of the elements in Y.

For example, if we take a very simple function, ƒ(x) = x+1, we can see that this function is onto, because it “hits” any value in our output set (here, we will let x any real number, so our set Y = R). However, a different function such as ƒ(x) = sin x is not onto because the outputs of our function only belong to the interval [-1, 1], which means we miss a lot of values in R.

Sometimes, you will work with a function that is both one-to-one and onto. In that case, the function is called bijective, and it’s a very nice property for a function, because it allows one to identify an inverse transformation. This is something you’ve undoubtedly come across in secondary school. The idea is that if you a function y = ƒ(x), you can “solve” for x in order to get a new function in terms of y.


When you learn a new concept, chances are that there’s some sort of procedure to follow in order to come up with an answer to a problem. This helps students when they are first learning, because it lets them follow clearly laid out steps that will culminate in the correct answer. For example, if we were trying to add two fractions together, we know that a common denominator is needed. As a result, students might be told that they should multiply each fraction by the other’s denominator, which will guarantee that the denominators are the same. It might even be written in a nice three-step method like this:

  • Identify your fractions as a/b and c/d.
  • Multiply a by d and c by b.
  • Write the denominator as bd. Your new fraction should be of the form (ad+bc)/bd.

This is an easy-to-follow recipe that will produce the correct answer all of the time. It’s made so that you can go through the steps one line at a time and arrive at the answer you want. Students might even be encouraged to memorize this procedure.

I want to argue that this is a fundamentally flawed idea of how we should approach teaching students to solve problems.

First, let’s take a look at the example above. Imagine we wanted to add 1/2 and 3/4. The algorithm tells us that we should transform the two fractions in order to get 4/8 and 6/8, so that in total we get 10/8. This is the correct answer, but anyone who has taught this concepts knows that this isn’t the most efficient way to go about it. Instead, we notice that 2 is a multiple of 4, so we only have to change the first fraction in order to create common denominators. Doing so gives us 2/4+3/4=5/4, which is the same answer as above, but simplified. In fact, when doing these problems, students are usually required to simplify anyway, so the student would have to do more work after following the recipe.

This is a very simple example, but it illustrates an important point. Recipes blind students to shortcuts or other ways to solve a problem. Once a student is given a recipe, why should they look for a quicker way to do the problem? They know that the way they were given will work, so it usually isn’t worth the extra effort to look for a shortcut or another method. This is true even if the recipe takes longer to do! In this sense, I think we do students a great disservice if we only emphasize recipes.

It’s not that a recipe is bad. In fact, there are many wonderful recipes that solve many problems (more commonly known as algorithms in computer science and mathematics). However, the idea for those recipes is to feed them into a computer so that the computer can work on the result. They aren’t necessarily for students themselves to do. Plus, we don’t care if a computer takes a little longer to solve a problem (for the most part), because it can still do it fairly fast.

On the other hand, we want students to be able to solve all sorts of problems, and to use the techniques they’ve learned to tackle these problems. But recipes make students hone in on one way to solve a problem, without thinking about anything else. It offloads the thinking of a problem and reduces it to following predetermined steps. This could lead a student to completely miss the point of a problem, or to see an interesting connection if they weren’t strictly following an algorithm.

Some may scoff and say that seeing a better path to a problem rarely actually happens, but that’s because they aren’t trying hard enough. There are all sorts of tricks and techniques that one can use to solve problems without needing to go through recipes.

Another ripe example is the factoring of expressions, something that is the bane of many students. In secondary school, students focus on factoring quadratic polynomials, and there are a several cases that one has to consider in order to get the factorization just right. Since memory aids are allowed for students, these procedures are typically written down and some students may even analyze each expression to see which case it falls into.

This is a horrible way to go about learning factorization. It reduces the process to a classification problem, where students simply match their expression with the corresponding case, and follow the steps. This means that students will often miss shortcuts and other techniques that could be just as helpful to them as following the procedure!

The other problem with recipes is that they substitute knowledge for aptitude at following a procedure. Instead of knowing why the recipe works, the student becomes only responsible to use the recipe correctly. This encourages students to not even think about what they are doing, since they know the output is what they want. As a result, students aren’t thinking about the problem as much as going through the motions. This can lead teachers to thinking that a student knows why a concept works, when really the student can only tell you how it works.

My vivid example comes from learning how to complete the square in secondary school. The idea is that you want to factor a quadratic expression into something of the form a(x-b)2+c. When I first learned about it, the recipe seemed like magic (and not in a good way!). I had little idea about what was happening, but I did know that if followed the steps on my memory aid, I could solve the problem.

Did I look deeper in order to understand what was actually going on? Of course not. It was only years later that I looked at the technique again and saw that it did make sense and I should have understood it more when I was first introduced to it. However, since the recipe was there, I offloaded the responsibility of knowing in order to simply being able to use it.

I want to finish with the acknowledgment that recipes can be useful. They can speed problems up, particularly the one’s that are repeated over and over again (like finding the roots of a quadratic function). However, you should always ask yourself, “Could I work from first principles to get back to this point?” If the answer is “yes”, then you have understood the concept. If it’s “no”, then you should investigate that uncertainty! It’s an opportunity to learn. This is why I always try to ask the students I work with conceptual questions, because if the point of education is just to follow recipes, A.I. will replace us much sooner than we might want.

Intuition about Ideals

When studying rings in abstract algebra, one also learns about subrings. They are pretty much exactly what you would expect: subsets of a ring with the same operations defined on this subset. However, a more interesting type of ring is an ideal.

Definition: An ideal $I$ of a ring $R$ is a subring with the special property that, for any element $a \in I$ and any element $r \in R$, $ar \in I$ and $ra \in I$.

Also, note that if we have a commutative ring, then you only need to check one of the cases above. So that’s the formal definition. It’s a little clunky, but there’s a nice intuition behind it.

An ideal “absorbs” the elements that it comes into contact with. In other words, any time an element in your ring $R$ comes into contact with the an element of the ideal, it becomes part of the ideal. (I kind of want to give a zombie analogy, but I’ll let you fill in the details for now!)

On its own, this isn’t particularly interesting. So what if ideals absorb their elements?

The real magic requires a bit more theory. First, we can have a particularly kind of ideal, called a principal ideal, which is an ideal of the form $\left< a \right> ={ ar: r \in R, a \in I }$. This simply means that the element $a$ “generates” (or is a factor of) every single element in the ideal. For a quick example, if we consider the ring of integers (which is just the integers with addition and multiplication defined on them), $\left< 2 \right>$ is a principal ideal of the integers, which consists of all the even integers. No matter what integer you multiply $2$ by that isn’t in $\left< 2 \right>$, the result will be in the ideal.

We can now consider the factor ring $R/ \left< a \right> = { x + \left< a \right>:x \in R }$. I’m going to avoid talking about classes and equivalence relations here, so instead, I’ll describe the idea behind these factor rings. Essentially, the factor ring above is the set of all elements in $R$ such that any “factors” of $\left< a \right>$ are taken out. In other words, we are taking the elements $x \in R$ modulo the elements that are of the form $ay, y \in R$.

If we go back to our example with the ring of integers and $\left< 2 \right>$, we can consider the factor ring $\mathbb{Z} / \left< 2 \right>$. What are the elements which are part of this set? Well, the elements of $\left< 2 \right>$ are all of the even integers. Therefore, the elements which are left in our factor ring can’t have any factors of two in them. Furthermore, they can’t have factors of two in them as a result of division with remainder. To see this explicitly, consider $65$. This number is odd, so it isn’t divisible by two. However, $65 = 32 \cdot 2 + 1$. As such, the $32 \cdot 2 \in \left< 2 \right>$, so the ideal “absorbs” it. Operationally, that means it disappears, so we are simply left with $1$.

If you continue with this factor ring, you will see that the only numbers we can be left over with is $-1, 0$, and $1$. We can then remove $-1$ by simply adding $2$ (which is equivalent to a zero in our factor ring). What we end up with is precisely the integers modulo two.

As you can see, we these special ideals act as absorbers that take in certain elements and remove them from the ring. Why do we want to remove elements? Well, a particular type of ring that is useful to work in is a field, where all the nonzero elements are units and there are no zero-divisors. Creating a factor ring by “dividing” out certain elements generated by an ideal can give us a field. In fact, there’s a theorem called the first isomorphism theorem of rings which accomplishes precisely this function, but that’s for another time.

Why Can't We Reach the Speed of Light By Boosting?

If you have ever come across someone talking about special relativity, there’s a good chance you will be able to tell me one of the two fundamental axioms in the subject: the fact that the speed of light is constant in all frames of reference.

After thinking about this for a few moments, one might come up with a thought experiment that looks like a counterexample. Imagine (for the sake of the experiment) that there’s a train moving along at a constant speed (we’ll call it v) along the ground with respect to you, the unmoving person. Then, imagine that there’s another train on top of the first train, and its moving at a speed v, but with respect to the first train. As such, you would undoubtedly agree that the second train seems to be moving faster than the first train, since it has the speed of the first train and its own speed.

Following this argument, it seems reasonably straightforward that if you keep on stacking trains on top of each other such that they all are moving relative to the last one with speed v, at some point, no matter how slow the speed v is, one of the trains should exceed the speed of light. Bingo, we’ve done it!

Unfortunately, this is not so. But what is the actual problem here? Why can’t we get past the speed of light?

To answer this, I’ll have to introduce a few things. But first, some notes. One, we are trying to see why the speed of light acts as a “cosmic speed limit”, so we aren’t going to simply say, “Let the speed of the first train be faster than the speed of light.” Instead, we want to see why we can’t build past it. Second, I want to note that this isn’t just a strange situation that has no applications. If you don’t like the scenario with the trains, imagine continually boosting oneself to a faster and faster speed. Of course, this has to still be done with the right Lorentz transformation, but you will hit a barrier.

With that out of the way, let’s dig into the problem. We could try and do successive Lorentz transformations in order to get the relative velocity between the observer on the ground and the nth train. However, the fact is that this isn’t the “natural” or easy way to do the problem. In special relativity, velocities don’t add like we are used to. Instead, there’s a fancy factor γ and several other differences when transforming velocities. However, there is a quantity that does simply add when two velocities are composed. It is called the rapidity, and is denoted with the letter φ. It is defined as:

Here, β is just the ratio of the speed (that any train is moving at with respect to the one underneath it) and the speed of light. For our problem, this value is constant. We want to know how fast the nth train is moving, so we just have to keep on adding the rapidity factors. But they’re all the same! This means we get the rather simple result that the rapidity for the nth cart is n times the first train’s rapidity factor. We then use the definition for φ from above in order to get:

We could stop here since everything in the above expression is a constant that we would input, but we don’t want to know what happens for the nth cart. We want to keep boosting forever, until we get past the speed of light! This means we need to take a limit as n tends to infinity. Therefore, it would be nice to have our above expression in a form that we can take a limit much more cleanly. I’ll spare you the gory details, but basically we can use the identities for hyperbolic functions that allow us to go from them to exponential and logarithmic functions. After doing this, you should be able to get something like the following:

This is actually a nice expression, because its limit is very easy to evaluate. Before this though, let’s look at plot of this function as a function of n. In other words, we want to see the behaviour of this function as n gets larger.

Plot of the function defined above in terms of n.

As you can see, the function gets closer and closer to one as n increases. Why isn’t it the speed of light c? Remember what β is. It’s the speed of the train divided by the speed of light, so having the plot asymptotically go towards one is what we should expect. Also, if you’re curious, the value for β1 is 0.5, which is quite a large value. This is why the function asymptotes to one very fast.