I’m trying to write fast long division using techniques like register lowering and rewriting the entire operation using cheaper instructions like shifts.
Why?

FOR SCIENCE!
Benchmark code here: https://gist.github.com/badamczewski/4361974487c102bf7c02680257c7e49f
Other Methods:
(posting images crashes my browser so I’m going post links to Twitter pictures):
https://twitter.com/badamczewski01/status/1344371530323603459/photo/1
https://twitter.com/badamczewski01/status/1344974438245216256/photo/1

You should try a benchmark where are seemingly randomly higher or lower than 2^32, in your benchmark your inputs are always lower than 2^32 so it makes branch prediction easy. With branch misprediction, I don’t think it would do as well (and I would be curious to know how it does exactly)

Cool! Coincidentally, I just saw this video on youtube about Quake’s 1/sqrt(x) function (https://youtu.be/p8u_k2LIZyo). Would you, @levelUp_01, be interested in benchmarking this as well. I could really use this for normalizing arrays.

source