More specifically, they tend to fully take advantage of the fact that modern computers are really fast without considering a concept that I've come to call the Law of Exponential Degradation. This means that a developer can make one bad decision or two bad decisions throughout their work on a particular project and not experience much of an impact. However, as they continue to make poor code choices, the impact to the project grows exponentially.
Let's look at something as simple as computing a polynomial. Suppose we have a polynomial of degree \(3\) such as \(2x^3 - 6x^2 + 2x - 1\) and we want to evaluate it at \(x = 3\).
If we're only going to compute this once, it probably doesn't necessarily make sense to write a function for it, so suppose we decide to compute it inline and we leverage the typical operations like multiplication and using powers.
# using multiplications and powers -- inline/fastmath @btime 2x^3 - 6x^2 + 2x - 1 > 100.768 ns (0 allocations: 0 bytes) # using multiplications -- inline @btime 2x*x*x - 6x*x + 2*x - 1 > 97.310 ns (0 allocations: 0 bytes)
We write it once with multiplications and powers and get a runtime of around one-hundred nanoseconds. Since powers can usually be a bit more resource intensive than pure multiplications, we also write it with only multiplication operations, yet don't see much of an improvement.
There's a macro in Julia
@fastmath which involves some optimizations where we trade a little bit of accuracy for speed. However, for such a simple computation, we won't expect much of any impact, so we try using the same code but with the macro.
# using multiplications and powers -- inline/fastmath @btime @fastmath 2x^3 - 6x^2 + 2x - 1 > 56.752 ns (0 allocations: 0 bytes) # using multiplications -- inline @btime @fastmath 2x*x*x - 6x*x + 2*x - 1 > 58.184 ns (0 allocations: 0 bytes)
This really small change has yielded around a 45% improvement in the speed of the code, and that's pretty huge. But, we can probably do a bit better.
A common tool of computer scientists to speed up polynomial computations is something called Horner's method. The basic idea is that instead of computing the expanded polynomial, we iteratively compute it with pure multiplications by exploiting the axiom of multiplicative distributivity which is a requirement of the ring of integers.\[ ((2x - 6)x + 2)x - 1 \,.\]
We'll also leverage Julia's code optimizer by moving our previously inlined code to functions.
x = 3 function f_1(poly, x) result = poly for k in 1:length(poly)-1 result = result*x + poly[k+1] end return result end poly = [ 2; -6; 2; -1] @btime f_1(poly, x) > 15.372 ns (0 allocations: 0 bytes) f_2(x) = 2*x*x*x - 6*x*x + 2*x - 1 @btime f_2(x) > 13.754 ns (0 allocations: 0 bytes) f_3(x) = 2*x^3 - 6*x^2 + 2*x - 1 @btime f_3(x) > 13.739 ns (0 allocations: 0 bytes)
Whoa! We ended up getting roughly the same improvement across all three functions, but in each case we're talking about a nearly 90% improvement from the original code! That's really huge! We've shown that inline version of the computation took around one-hundred nanoseconds, but by simply moving it to a function the same code is running in under 15 nanoseconds. Why is this? It's because when we compile the code, the function version ends up essentially becoming a constant so it will execute very fast.