Least common multiple of natural numbers up to a limit, say, 10'000'000

Asked 15/7 at 23:55 Answered 16/7 at 2:52

Solved python algorithm math optimization

I'm working on a small Python program for myself and I need an algorithm for fast multiplication of a huge array with prime powers (over 660 000 numbers, each is 7 digits). The result number is over 4 millions digits. Currently I'm using math.prod, which calculates it in ~10 minutes. But that's too slow, especially if I want to increase amount of numbers.

I checked some algorithms for faster multiplications, for example the Schönhage–Strassen algorithm and Toom–Cook multiplication, but I didn't understand how they work or how to implement them. I tried some versions that I've found on the internet, but they're not working too well and are even slower. I wonder if someone knows how to multiply these amounts of numbers faster, or could explain how to use some math to do this?

Constringe answered 15/7 at 23:55 Comment(7)

Python has those fancy algorithms built-in anyway, but they only really kick in when multiplying two numbers that are both big, not when one is big and the other is not. BTW what sort of thing are really computing here, perhaps there is some shortcut or alternative? – Mammalian 16/7 at 0:0

You could spend days writing something passable, or just use one of the many efficient mathematical and scientific libraries out there that have had a team of engineers, scientists, and programmers work on them for years. – Yseult 16/7 at 0:4

I don't think there is, all of the numbers in array are powers of primes. This program calculates LCM for range from 1 to N, the numbers above are for N = 10 000 000. So, this 4 million digit number is the least number that is dividable by any number in range from 1 to 10 millions. – Constringe 16/7 at 0:5

@Constringe what about the context in which this number is used, can you avoid computing it, perhaps by working with it implicitly? (you know its factors, that's a good start to working with it implicitly) – Mammalian 16/7 at 0:8

In other words, what are you going to do with that number? Surely you're not going to print and read it. And maybe your actual goal can be achieved without computing that number. – Geotropism 16/7 at 1:22

Also, the LCM of 1 to 10 000 000 being the product of the highest prime powers in that range for each prime in that range – did you mean 664579 7-digit numbers? – Analogous 16/7 at 3:52

@Analogous yea, my bad, numbers are 7-digits here. – Constringe 16/7 at 8:55

There are two keys to making this fast. First, using the fastest mult implementation you can get. For "sufficiently large" multiplicands, Python's Karatsuba mult is O(n^1.585). The decimal module's much fancier NTT mult is more like O(n log n). But fastest of all is to install the gmpy2 extension package, which wraps GNU's GMP library, whose chief goal is peak speed. That has essentially the same asymptotics as decimal mult, but with a smaller constant factor.

Second, the advanced mult algorithms work best when multiplying two large ints of about the same size (number of bits). You can leave that to luck, or, as below, you can force it by using a priority queue and, at each step, multiplying the "two smallest" partial products remaining.

from gmpy2 import mpz
from heapq import heapreplace, heappop, heapify

# Assuming your input ints are in `xs`.
mpzs = list(map(mpz, xs))
heapify(mpzs)
for _ in range(len(mpzs) - 1):
    heapreplace(mpzs, heappop(mpzs) * mpzs[0])
assert len(mpzs) == 1
# the result is mpzs[0]

That's the code I'd use. Note that the cost of recursion (which this doesn't use) is trivial compared to the cost of huge-int arithmetic. Heap operations are more expensive than recursion, but still relatively cheap, and can waaaaay more than repay their cost if the input is in an order such that the "by luck" methods aren't lucky enough.

Colunga answered 16/7 at 2:52 Comment(17)

Can you give an idea of how fast this is, particularly compared with the current accepted answer? – Unshapen 16/7 at 3:2

Using their way of constructing test data (which is fine - saying so just to make it clear), this code runs in less than a third of the time, and regardless of whether the time to convert from int to mpz is included (that's cheap). And, BTW, they compute the same result :-) – Colunga 16/7 at 3:18

Does that mean it's slower than my decimal one? Since that took a fifth of the time. – Geotropism 16/7 at 3:20

On my box, the times in seconds are 10.1 (original), 3.3 (decimal), and 2.1 (mine). If 18 digits are used instead of 9, 32.0 (original), 7.1 (decimal), and 3.4 (mine). For small inputs, mine is enormously slower (heap operatioss dominate then). Not on a quiet machine. – Colunga 16/7 at 3:35

Thanks. Though did you get different times earlier? I wouldn't call 2.1 "less than a third" of 10.1. – Geotropism 16/7 at 3:43

Won't let me edit the last comment: the time for mine should say 2.9 instead of 2.1 for 9 digits. – Colunga 16/7 at 3:44

Put them into the answer, then you can edit at will :-) (And they'd better be there anyway.) – Geotropism 16/7 at 3:45

Nope - I'm done. Speed of all methods here will vary by platform. Recording just one isn't that interesting to me. Irrelevant to the high-level points anyway, the most important of which is: if you care a lot about giant int speed, use gmpy2. – Colunga 16/7 at 3:50

I'd call the other one more important. Using gmpy2 won't help much with unbalanced operands (as in the question / math.prod). – Geotropism 16/7 at 3:53

If you care about worst case, sure. I do. But I expect most inputs, most of the time, won't be in a pathological order. For the specific test data here, using a heap is mostly needless overhead. And no, I'm not running more tests: I'm done :-) Not at all disputing that "one at a time left to right" is horrid. Not talking about that at all. Talking about the 3 non-insane methods on the table. – Colunga 16/7 at 3:56

Your heap might be more overhead than necessary, but they need to do something to balance the operands. I mean, if they use gmpy2 with math.prod, it'll still be slow, whereas the recursion or my way or your heap with Python's ints is fast. So I consider using gmpy2 the (far) less important point, and using a better high-level algorithm the more important one. – Geotropism 16/7 at 4:14

"Talking about the 3 non-insane methods on the table" - Argh, that edit appeared only after I posted my above comment. – Geotropism 16/7 at 4:15

That's OK: my "if you care a lot about giant int speed, use gmpy2" didn't really refer to this post anyway: regardless of what you're doing with giant ints, "use gmpy2" is going to be part of the best answer you're ever likely to find. – Colunga 16/7 at 4:23

This works even better than what I've accepted at first. @nocomment about same speed as your decimal way for me. – Constringe 16/7 at 8:56

A few test later, decimal calculated result for 5.7 mil 8-digits in ~63 seconds, gmpy2 in ~59 seconds. They're probably about the same speed. – Constringe 16/7 at 9:11

The larger the products, the better gmpy2 will do. Under the covers, extremely complex things are going on. gmpy2 actually implements many different mult algorithms, and changes the ones it uses depending on the exact bit sizes, which can even vary depending on which hardware it's running on, decimal only implements 3 mult algorithms, and picks cutoff sizes independent of hardware.. Don't expect to reproduce the same differences on different platforms either. – Colunga 16/7 at 9:43

@Constringe Still: What are you going to do with that huge number, now even over 40 million digits? – Geotropism 16/7 at 10:32

math.prod will accumulate a product one number at a time. You can do better by recursively dividing the list, taking the product of each half, for example, which reduces the size of the intermediate products.

This runs in a few seconds for me:

import math


def recursive_prod(ns, r):
    if len(r) <= 10:  # arbitrary small base case
        return math.prod(ns[i] for i in r)

    split_at = len(r) // 2
    return recursive_prod(ns, r[:split_at]) * recursive_prod(ns, r[split_at:])


import random
ns = [random.randrange(1_000_000_000) for _ in range(660_000)]
p = recursive_prod(ns, range(len(ns)))

Analogous answered 16/7 at 0:3 Comment(10)

I would avoid recursion if speed is a priority – Lump 16/7 at 0:10

Thanks, multiplies everything in 15 seconds for me! – Constringe 16/7 at 0:27

It would be better to explain *why" that is faster. – Geotropism 16/7 at 1:0

@Lump How much faster do you think a non-recursive solution would be? – Geotropism 16/7 at 1:2

I'd say "reduces the size of the intermediate products" is kinda rather the opposite of the reason it's faster. You want large operands so that the better multiplication algorithm gets used. And the string concatenation seems completely misleading, as concatenation doesn't use different algorithms and is linear. If you think of the master theorem, your recursive multiplication falls under case 3 but your recursive concatenation falls under case 1. Being case 1 is the reason it's faster, not because a different algorithm gets used. That's a different reason. – Geotropism 16/7 at 2:2

@nocomment: It’s kind of hard to disentangle them from each other since it doesn’t make sense to use the faster multiplication algorithm on small inputs, but I would say that the smaller size of the intermediates is key if anything is. And yes, the string concatenation is meant to be a lower bound on the improvement, since multiplication is worse than linear. – Analogous 16/7 at 2:6

@nocomment: On second thought, it’s really just a different way of saying the same thing, emphasizing maximum vs. minimum size at each level in a problem where a smaller maximum implies a larger minimum and vice versa. – Analogous 16/7 at 2:16

Don't you mean an upper bound on the improvement? Because it really benefits from the divide-and-conquer itself, improving from O(n^2) to O(n log n). Whereas the product only improves from O(n^2) to O(n^1.58), which is a much smaller improvement, and due not really to the divide-and-conquer but due to Karatsuba kicking in. – Geotropism 16/7 at 2:18

@nocomment: I mean a lower bound if you treat the product as having the same asymptotic complexity of O(n^1.58) in both cases. The cost of a single product doesn’t improve from O(n^2) to O(n^1.58) when switching to the recursive approach – it worsens from O(n) (because one operand has a constant upper bound)… which maybe does mean my analogy is wrong (but for the opposite reason?). Will revisit when I get the chance. – Analogous 16/7 at 2:27

The cost of a single product does improve to O(n^1.58) when switching to the recursive approach. Because the recursive approach makes the operands balanced and that triggers Karatsuba when they become large enough. Although I wasn't talking about a single product but the total one. Let me put it differently: If Python didn't switch to Karatsuba but kept using the naive algorithm, then your recursive solution would have the same complexity as math.prod and would probably be similarly slow. – Geotropism 16/7 at 2:48

A tricky one I've been using, always multiplying the oldest two not-yet-multiplied numbers until only one is left:

from operator import mul

def prod(ns):
    ns = list(ns)
    it = iter(ns)
    ns += map(mul, it, it)
    return ns[-1]

About as fast as @Ry-'s. The speed comes from Karatsuba kicking in when multiplying two large numbers instead of one large and one tiny.

And using decimal seems about five times faster due to even faster multiplication algorithms (and has the advantage of being much faster to print in decimal, if you want that):

from operator import mul
from decimal import *

setcontext(Context(prec=MAX_PREC, Emax=MAX_EMAX))

def prod(ns):
    ns = list(map(Decimal, ns))
    it = iter(ns)
    ns += map(mul, it, it)
    return ns[-1]

Geotropism answered 16/7 at 1:16 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags