I have a basically limitless pile of papers to read, a large fraction of which (I’ve not counted) contain a heavy math element. To take one example basically at random: Stein’s paper on “The Inadmissibility Of The Usual Estimator For The Mean Of A Multivariate Normal Distribution”. I don’t really get that paper. In particular, I don’t have the intuition about the topic that I do about, say, Unix or even economics: I can’t come to the topic and consume the whole thing in big chunks, which I can understand as a unit; instead I’m groveling through it line by line, and maybe understanding one tree at a time while the forest remains unassaulted. My reading of mathematics is a lot like my reading of French, while my reading of other topics is fluid like my reading of English.

Backing up from academic papers: it’s been a long-term goal of mine to learn measure theory, which underlies probability theory. I have ever so many books that cover aspects of measure theory: this one, and also this one, not to mention this one, and of course we’d be remiss if we didn’t mention this one. They’re all good, but I find them all hard. To pick another random mathematics book off the shelf: Körner’s book on Fourier analysis is fantastic at the high level at which I’ve been able to understand it, but digging into the details has always felt to me like an impossible slog.

Is the answer no more complicated than

1. go to a library with just those books (no phone, no laptop), a pad of paper, and a pen
2. bang your head against the problem sets for hours and hours
3. GOTO 1

? It’s always felt to me like, no matter how much I bang my head against them, I’m going to be unable to prove theorems. I wonder if that’s true, or if that’s just the wrong side of my brain talking. I wonder if it’s the side of my brain that Ira Glass mentioned when we saw him in Boston a while back; a canonical form of the quote seems to be the one here:

All of us who do creative work, we get into it because we have good taste. But there is this gap. For the first couple years you make stuff, it’s just not that good. It’s trying to be good, it has potential, but it’s not. But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you. A lot of people never get past this phase, they quit. Most people I know who do interesting, creative work went through years of this. We know our work doesn’t have this special thing that we want it to have. We all go through this. And if you are just starting out or you are still in this phase, you gotta know its normal and the most important thing you can do is do a lot of work. Put yourself on a deadline so that every week you will finish one story. It is only by going through a volume of work that you will close that gap, and your work will be as good as your ambitions. And I took longer to figure out how to do this than anyone I’ve ever met. It’s gonna take awhile. It’s normal to take awhile. You’ve just gotta fight your way through.

An unspoken part of this is that not everyone is able to do creative work at a high level. (And mathematics is certainly included within “creative work”). Something similar would have to be said about, say, basketball: I watched a lot of the Bulls during the Michael Jordan era, and I knew from quite early on that my playing basketball at that level was just not an option. Or maybe it was — maybe I just needed to put in the hours, hire the coaches, etc. But it would always be harder for me to reach that level than it would be for Jordan. If I put in 9 billion hours of work, maybe I could get where he got with the mythical 10,000 hours. Probably not, though. There I stood on this side of the chasm, and there he stood on the other, and nothing I did was going to get me to his side.

Obviously I’m closer to being a professional mathematician than I am to being a professional basketball player. I feel like I’ve got more inherent writing talent than I do inherent mathematical talent. This isn’t to say that I’m John McPhee, but writing comes to me more fluidly than mathematics does.

Anyway, so yeah: do I just go to the library with some books and let the rest happen naturally? An obvious alternative is to pay someone to sit by my side while I read books and do problem sets; here in Boston, anyway, we call this “college”. So do I go to this mythical “college” to learn some more math? Or do I adopt some good self-study methods and do it on my own? Or is there some better way to learn math on one’s own?

I feel like I should learn some subject that’s entirely new to me. I basically did that with behavioral economics a few years back, and I found that fun. Now I’d like to find some subject that’s really interesting, totally new to me, and hopefully somewhat off the beaten path. Behavioral economics, for instance, is rather overplayed in the public discourse. Sociology and anthropology, on the other hand, are probably underplayed: among the social sciences, they get the least treatment in mass media, and the least respect among the sort of technical folks who would laud economics. So sociology and anthropology would be good candidates. Maybe I should start with Weber (apart from The Protestant Ethic, which I read and which is a Bad book — a book that is Bad), since he seems to be a father of sociology. And maybe Durkheim?

What about other fields? Anyone have any suggestions on books I really ought to be reading, in subjects about which I know little?

As always, I’d also like to learn more math. I’m much less good at it than I’d like to be. I think writing code may be the right way to learn it, so I think I’ll try to take that avenue in. It’d be interesting to actually code up some crypto and primality-testing algorithms, for instance. I tried to code up the AKS algorithm, but I didn’t really understand why I was coding what I was coding. So a textbook of number theory or crypto, taught via programming examples, would be useful. Likewise for linear algebra and complex analysis. Maybe Coding The Matrix would be the way to go.

Open thread. Let me know what I should stuff into my brain this year!

1. It’s a long-term goal of mine to understand the proof of the Prime Number Theorem. The PNT, for those who are unfamiliar, is that the number of primes less than or equal to N is approximately N/ln(N), with the approximation improving as N grows to infinity; the percent error of the approximation approaches zero as N grows. I found a really interesting paper that walks through a proof. (The paper won an award for math exposition.) I’ve only done a quick first read-through, but it seems interesting. On that first pass, I think I now get why we even bother with the von Mangoldt symbol. Baby steps. If anyone would like to read along with me, y’all are welcome to.

One question I have is whether it’s mathematically impossible to find a function that exactly equals the number of primes less than or equal to a number. The N/ln(N) approximation is good, but it’s just an approximation with a known rate of error — and while the percent error drops to zero, the absolute error grows without bound. Is there a proof that a function which counts the exact number of primes less than N is impossible?

(I could be more precise about this. Obviously the function which sums the number 1 over all primes p less than or equal to x is an exact function. But it’s obviously not what I’m looking for. There’s probably a formal way to describe exactly what I’m looking for it. It’s probably either “a function containing only a certain set of symbols” or “a function of at most O(log n)” or something.)

2. If someone gives you the equation “ax = b”, you can solve that really easily with elementary algebra; the solution is x = b/a. Likewise if someone gives you an equation involving x-squared; the solution is the quadratic formula. The solutions get more and more complicated for x-cubed and the fourth power of x. There’s a theorem (Abel-Ruffini) which says that no solution using only addition, subtraction, multiplication, division, and root extraction (square roots, cube roots, etc.) is possible for the fifth power of x and higher in the general case. When I limit it to “the general case”, I mean that certain specific equations at higher powers may have solutions: x100-1=0 has the solution x=1. But the general equation ax5+bx4+cx3+dx2+ex+f=0 has no solution, nor do equations with higher powers.

The question I’ve had for a while (apart from knowing enough Galois theory to know how to prove Abel-Ruffini) is what happens if you allow mathematical operations other than addition, subtraction, and the rest. What if you allow sines and cosines? Fourier series and their inverses? Bessel functions? Elliptic functions? The Wiki says that a certain kind of transformation turns the general quintic into a simpler quintic polynomial, which can then be solved using a Bring radical. Is the Bring radical the simplest function which can solve the general quintic?

So then my first question is: what’s the simplest set of functions you can add to addition, subtraction, multiplication, and division, such that the general quintic has a solution in this augmented set?

I’ve been updating the code underlying the Supreme Court death calculator, with an emphasis on greater mathematical precision and increased code elegance. Meanwhile, Justice Scalia has died.

For the record, I would not have predicted this. If you look at the CDC life tables, and scroll to page 35 (“Table 14. Life table for non-Hispanic white males: United States, 2010”), then scroll a little further down, you’ll see that, of 100,000 non-Hispanic white males born in the United States, 53,857 will survive to their 79th birthday. Justice Scalia would have been 84.9 years old on Inauguration Day, 2021. Of those 53,857 who make it to their 79th birthday, 38,053 are still alive on their 84th birthdays; 34,567 are alive on their 85th. So if you count 84.9 as “84 years old”, he had a 29% chance of dying before the 2021 inauguration. If you count it as 85, the number rises to 35%. If you interpolate between those two probabilities, you get (duh) a number between 29% and 35%. So I would not have bet money on his death.

The probability of one or more Justices’ dying, independently, between now and Inauguration Day 2021 was in the 85+% range. The expected number of Justices who’d die was around 1.7. I’ll have to re-run the numbers now with only 8 Justices.

Akamai has an Employee Stock Purchase Plan, which I’ve tried very hard not to think of as magical free money. But I think it basically is. It works like this: you set aside some fraction of your after-tax paycheck, and every six months the company uses that money to buy the company’s own stock for you. There are some limits on how the ESPP can be structured: the company can give you the stock at a discount, but the discount can’t be any more than 15% off the fair market value (FMV); you can’t get more than \$25,000 in stock (at FMV) per year; and Akamai (in keeping, apparently, with general practice) imposes a further limit, such that you can contribute at most 15% of your eligible compensation.

To see how great the return on this is, consider first a simplified form of an ESPP. You put some money in, then wait six months; the company buys the stock, and you sell it immediately. They gave it to you at a 15% discount, i.e., 85 cents on the dollar. So basically you take your 85 cents, turn around, and sell it for a dollar. That’s a 17.6% return (1/.85 ~ 1.176) in six months. To turn that into an annual rate, square it. That makes it a 38% annual return.

Introducing some more realism into the computation makes it even better, because your money isn’t actually locked up for six months. In practice, you’re putting away a little of your money with every paycheck. So the money that you put in at the start of the ESPP period is locked up for six months, but the money you put in just before the end of the period is locked up for no time at all. The average dollar is locked up for three months. So in exchange for three months when you can’t touch the average dollar, that dollar turns into \$1.176. Annualized, that’s a 91.5% return.

Doing this in full rigor means accurately counting how long each dollar is locked up. A dollar deposited at the start is locked up for 6 months; a dollar deposited two weeks later is locked up for six months minus two weeks; and so forth. It looks like this:

End of pay period 0: You have \$0 in the bank.

End of pay period 1: You deposit \$1. Now you have \$1.

End of pay period 2: You deposit \$1, and you earn a rate r on the money that’s already in the bank. So now you have 1 + (1 + r) dollars in the bank.

End of pay period 3: You deposit \$1. The 1 + (1 + r) dollars already in there earn rate r, meaning that they grow to (1 + (1 + r))(1 + r) = (1 + r) + (1 + r)2. In total you have 1 + (1 + r) + (1 + r)2.

In general, at the end of period n, you have 1 + (1 + r)2 + (1 + r)3 + … + (1 + r)n-1 in the bank. That simplifies nicely: at the end of period n, you have (1 – (1 + r)n)/(1 – (1 + r)), or (1/r) (-1 + (1 + r)n) dollars in the bank.

At the end of the n-th period, you get back (1/.85)n dollars for the n dollars that you put in. So what does r have to be so that you end up with n/.85 dollars when period n is over? You need to solve (1/r) (-1 + (1 + r)n) – n/.85 = 0 for r. Use your favorite root-finding method. I get r=0.02662976. That’s the per-period interest rate. (It’s also known as the Internal Rate of Return (IRR).) In our case it’s a 6-month ESPP period, with money contributed every two weeks, so there are about n=13 periods. So the return on your money is ~1.026613 in six months, or 1.026626 in a year. That comes out to about a 98% return. Which is, to my mind, insane.

The full story would be both somewhat better and somewhat worse than that. Somewhat better, in that the terms of our ESPP are even more generous: when it comes time to buy the stock, Akamai buys it for you at the six-months-ago price, or the today price, whichever is lower. So imagine you have \$12,500 in the ESPP account, that the stock is worth \$60 today, and that it was worth \$40 six months ago. You get shares valued at \$40 apiece, minus the 15% discount. So the company buys shares at .85*\$40=\$34. It can buy at most \$12,500 in shares (at FMV), so it can buy floor(12500/40)=312 shares. Cool. Now you have 312 shares, which you can turn around and sell for \$60, for a total of \$18,720. That is, you put in \$12,500, and you got out \$18,720. Magic \$6,220 profit.

The “somewhat worse” part is that you pay taxes on two pieces of that. First, you pay taxes on the discount that they gave you (since it’s basically like salary). Second, if you hold the stock for any period of time and pick up a capital gain, you pay tax on that; if you held the stock for less than a year, that’s short-term capital gains (taxed at your regular marginal rate), whereas if you hold for a year or more you pay long-term cap gains (15%, I believe).

I’ve not refined my return calculation to incorporate the tax piece, but I doubt it changes the story substantially. First, it’s hard for me to imagine that the taxes lower the rate of return from 98% down to, say, 15%. Second, any other investment (a house, stocks, bonds, a savings account) would also require you to consider taxes. And since the question isn’t “Is an ESPP good?” but rather “Is an ESPP better than the alternatives?”, I suspect that taxes would affect all alternatives equally. It strikes me that ESPP must win in a rout here — which would explain why the amount you can put in the ESPP is strictly limited; otherwise it really would be an infinite magical money-pumping machine.

(…as we were), does anyone have an intuition — or can anyone *point me to* an intuition — for why Fourier series would be so much more powerful than power series? Intuitively, I would think that very-high-order polynomials would buy you the power to represent very spiky functions, functions with discontinuities at a point (e.g., f(x) = -1 for x less than 0, f(x) = 1 for x >= 0), etc. Yet the functions that can be represented by power series are very smooth (“analytic“), whereas the functions representable by Fourier series can be very spiky indeed.

This could lead me down a generalization path, namely: develop a hierarchy of series representations, with representations higher on the hierarchy being those that can represent all the functions that those lower on the hierarchy can represent, plus others. In this way you’d get a total ordering of the set of series representations. I don’t know if this is even possible; maybe there are some series representations that intersect with, but are not sub- or supersets of, other series representations. I don’t think I’ve ever read a book that treated series representations generally; it’s always been either Fourier or power series, but rarely both, and never any others. Surely these books exist; I just don’t know them.

And now, back to reading Hawkins.

Check out Terry Tao’s measure-theory book, starting with ‘let us try to formalise some of the intuition for measure discussed earlier’ on page 18, through to ‘it turns out that the Jordan concept of measurability is not quite adequate, and must be extended to the more general notion of Lebesgue measurability, with the corresponding notion of Lebesgue measure that extends Jordan measure’ on p. 18.

I’ve understood for some time that there’s a notion of “non-measurable set”, and that you want your definition of ‘measure’ to preserve certain intuitive ideas — e.g., that taking an object and moving it a few feet doesn’t change its measure. I didn’t understand that there was any connection between non-measurability and the axiom of choice. Tao’s words here are some of the first that have properly oriented me toward the problem that we’re trying to solve, and the origins of that problem to begin with.

My partner is taking a biostatistics course, which is reminding me of how much I loved this stuff at CMU. I’m inclined to find a course in measure theory around here. We have a university or two.

Someone is wrong on the Internet. In particular, today my friend Paul sent me a link to this guy, who credulously buys someone’s argument that 1 + 2 + 3 + 4 + 5 + 6 + … equals a small negative number. This is completely false, but it’s false for reasons that trip up a lot of people, so I think it’s worth spending some time on.

This is the same genre of argument by which you can “prove” that 1 = 2. So here’s the first step in arguing against it: think to yourself, “If I find this nonsensical, then it’s probably nonsense.” That’s really an okay way to feel. But people are scared of math, so they often think, “Well, mathematics says a lot of crazy things, so what do I know?” They’re likely to blame mathematicians for being unrealistic and for endorsing absurd conclusions just because their axioms made them say so.

The next step is to ask why mathematicians *don’t* just follow their axioms off a cliff. 1 is not equal to 2, and mathematicians know it. But who knows, maybe some abstruse chain of reasoning would lead a mathematician somewhere absurd. The reason that doesn’t happen is that *mathematics eventually has to collide with the real world*. Eventually physicists are going to use mathematics. Eventually engineers are going to build buildings; if they prove that a steel beam can handle 2 tons of weight, it damn well better not actually be 1 ton of weight. Mathematics is used in all sorts of real contexts. Logic cannot be used to lead us to unreasonable conclusions.

Now, mathematics is nice, because it consists of axioms and logic. You start with some axioms, and you follow some logic, and you get a conclusion. If the conclusion is absurd, then it must be because either the axioms were wrong or the logic was wrong. So you only have a small number of places to check for mistakes. (As opposed to your gut, which is less subject to verification.)

But infinity is weird, right? Surely infinities can do weird things. That’s absolutely true, which is why a couple hundred years of mathematicians and philosophers, starting with Isaac Newton and Bishop Berkeley, worked very hard to create a set of tools that allow us to talk about infinity in a sensible way that makes it hard for us to trip ourselves up. This is what calculus is, and why calculus is one of the monuments of Western civilization. It’s not just a very useful collection of tools used in everything from humdrum contexts like building buildings to literally heavenly pursuits like astronomy, though it is that. It’s also a philosophical marvel that makes the infinite comprehensible to mere finite humans. It is a way of keeping our language precise and avoid getting in hopeless muddles, even when we’re talking about incomprehensible vastness.

The basic trick that the essayist and the video creator are (mis)using, and the trick that lands them in such a muddle, is the following. We start with this:

x = 1 – 1 + 1 – 1 + …

and we add another copy like so:

2x = (1 – 1 + 1 – 1 + …) + (1 – 1 + 1 – 1 + …)

Then we write them on separate lines and shift things, like so:

```2x = (1 - 1 + 1 - 1 + …)
+ (1 - 1 + 1 - 1 + …)
= (1 - 1 + 1 - 1 + …)
+ (1 - 1 + 1 - 1 + …)
```

Nothing too complicated, right? We just shifted everything down a line and over by a couple of spaces. Great. Now, goes the argument, we see that every +1 on one line is paired with a -1 on the next line, or vice versa. From this they conclude that

```2x = 1 + (-1 + 1) + (-1 + 1) + …
= 1 + 0 + 0 + …
```

And that equals 1. So then 2x = 1, which means x = 1/2.

Your intuition should tell you that this is absurd. The sum up to the first term is 1. The sum up to the second term is 0. The sum up to the third term is 1. And on we go, back and forth, forever. The sum never settles down at a single value. Your intuition should tell you this, and your intuition is correct.

Another way to respond to this essayist’s nonsense is to use his argument against him. Take the same chain of reasoning as before: we put the definitions of x and 2x on separate lines, except this time we shift everything ahead *two* positions rather than just one. Like so:

```2x = (1 - 1 + 1 - 1 + …)
+ (1 - 1 + 1 - 1 + …)
= (1 - 1 + 1 - 1 + …)
+ (1 - 1 + 1 - 1 + …)
```

Again, nothing suspicious about this, right? Only this time, the same chain of reasoning — that we pair the row above with the row below — leads us to conclude that

```2x = 1 - 1 + (1 + 1) + (-1 + -1) + (1 + 1) + (-1 + -1) + …
= 0 + 2 + -2 + 2 + -2 + …
```

which lands us back where we started. If just shifting things around by an arbitrary amount leads to wildly varying results, then your intuition should tell you that something is probably wrong with the “shifting” method.

Basically everything in that essay and that video reduces to this “shifting” trick. By repeated application of the method they end up concluding that 1+2+3+4+5+… equals a negative number. It doesn’t, which is obvious. Your intuition doesn’t fail you here.

The actual answer is that talking about the sum of this series makes no sense, because it has no sum. If a sum is going to eventually settle down to something nice and finite, the terms have to get smaller. Here the terms aren’t getting smaller; they’re just oscillating. Likewise, the terms in 1+2+3+4+5+… aren’t getting smaller; they’re increasing. So that sum doesn’t converge either, and for a different reason: it’s blowing up, and will grow without bound.

The mathematical answer is that if a sum “diverges” like this one does, then you can’t arbitrarily rearrange terms in it and expect the sum to keep working out. Your intuition should tell you that the problem with 1+2+3+4+5+… isn’t the sort of problem that can be solved by just shifting things around; the problem with that sum is that *you’re adding things that keep getting larger*. No amount of shifting things is going to make that sum up to something nice.

Indeed, the 1-1+1-1+… example is one that they give you in calculus textbooks to show you that we can’t treat infinite sums the way we treat finite ones. The example shows that you need to be much more careful with infinities. It shows you that the logic and axioms you thought were sensible for finite quantities don’t quite work out for infinite ones.

Your intuition does, then, need help sometimes. In particular, it regularly fails when it’s faced with infinities. But there are times when your intuition leads you the right way, and mathematics can help you confirm it.

There are other examples that are facially similar but differ in crucial ways from this 1-1+1-1+… nonsense. There’s a mathematical proof, for instance, that .99999…=1. That happens to be true. The basic intuition there is that if I can bring two numbers as close together as I want, then those two numbers are indeed equal. If I am standing a foot away from you, and tell you that I’m going to halve the distance between us, then halve it again, then continue halving it forever, then — assuming we both live forever — I will eventually be standing 0.00000… inches away from you.

This can be proven rigorously. It’s important to note, though, that it can be proved entirely with finite numbers. I never need to use an “actual infinity” to prove to you that this works. All I need to say is that, essentially, I have a recipe for coming close to you. The recipe is “at every step, close half the distance between me and you.” Then you challenge me: “I bet you can’t get within 1/4 of a foot of me.” I reply, “My recipe will get me there in two steps: after one step I’m 6 inches away, and after two steps I’m 3 inches away.” So you say, “Fine, but I bet you can’t within an inch of me,” to which I reply, “My recipe will get me there in four steps: after 1 step I’m 6 inches away, after 2 steps I’m 3 inches away, after 3 steps I’m 1.5 inches away, and after 4 steps I’m 3/4 of an inch away. At that point I’m within an inch of you.”

You see what’s happening. I never actually say anything about how “after an infinite number of steps, I’m 0.000… inches away from you.” Instead I just show that I have a recipe that will get me as close as you could wish, in a finite number of steps. That is what we call a “limit” in calculus. The labor that went into making that word intellectually coherent is one of our species’s greatest accomplishments.

So please: use your intuition here. And if you question whether your intuition is the proper guide, learn a little bit of math. The mathematics of infinities is both spectacularly beautiful and really fun. Maybe in subsequent posts I’ll give some examples of how fun it is.

__P.S.__ (same day): This is an excellent response to the #slatepitch quackery, also via my friend Paul.

(Proofs of any of the individual steps are available upon request, should you find yourself thinking that I’m pulling a 1=0 trick.) So then

This converges very slowly, though, because for every two steps forward you take
a step back. (More precisely: for every 1 step forward, you take steps back.) You can make it converge faster by combining the forward step and the smaller backward step into a single, smaller, forward step:

whence