-1

I want to create a list of numbers that contains three consecutive 1s in its binary representation. To do this, I make set of number up to 2**n, and subtract it with a set of numbers that contains NO three consecutive 1s to get the result. No need for iterator or anything, just a set of number so I can do set subtraction (basically I have another set of numbers, and I want to subtract the set of all numbers from 0 to 2**n-1 with it).

Right now I'm using {i for i in range(2**n)}, which is pretty slow when n is about ~30. I tried using np.arange, but it's even slower. What would be the fastest way of doing this?

16
  • 2
    How long do you expect it to take to generate a set of about a billion integers? Commented Jul 7 at 12:40
  • just create a range-like class, and implement the set-methods you will use into it: you really don't need all thenumbers to exist "physically" in memory, since its trivial to know which are the set members. Not saying any operations you want to perform on this many numbers will be fast.
    – jsbueno
    Commented Jul 7 at 12:48
  • 1
    I'm not convinced you should build that set and do the subtraction. Better ask about what you really want.
    – no comment
    Commented Jul 7 at 12:54
  • 2
    The set you want to produce by set subtraction is almost certainly not a good tool for whatever job you're hoping to use it for. You should look for a fundamentally different approach to solving your underlying problem. Commented Jul 7 at 12:54
  • 1
    Actually, I'll later use it as a list and iterate over them. Basically I have a task where if I detect a 1 in the string I do a certain task, and if I detect a 0 I do a different task. And I check this over all possible combinations that contain three consecutive 1s. Perhaps I should ask another question, since the bottleneck probably won't come from generating the numbers anyway.
    – Kim Dong
    Commented Jul 7 at 18:25

2 Answers 2

2

I tested your solution

{i for i in range(2**n)}

against just doing set(range(2**n)) for n equal 10 using timeit, results at my machine are as follows

python -m timeit "{i for i in range(2**10)}"
10000 loops, best of 5: 35.6 usec per loop
python -m timeit "set(range(2**10))"
10000 loops, best of 5: 21 usec per loop

so latter is faster, but order of magnitude is same.

What would be the fastest way of doing this?

If you are able to provide enough disk space, you might try create your giant set once and pickle it to file, then load them when you need it. You should then test if it is faster on your machine or not.

4
  • How about n=20?
    – no comment
    Commented Jul 7 at 12:52
  • 4
    "create your giant set once and pickle it to file" — that's unlikely to be faster. Commented Jul 7 at 12:53
  • @nocomment for 2**20 I got 5 loops, best of 5: 77.3 msec per loop and 5 loops, best of 5: 63.1 msec per loop respectively so again latter is faster and order of magnitude is same
    – Daweo
    Commented Jul 7 at 13:09
  • Yes, and they're closer together and it's more like their 30. I don't see why you used the very small n=10.
    – no comment
    Commented Jul 7 at 13:12
0

I think the faster solution is to generate them. The idea is to start with the simplest case:

only 3 consecutive 1

Given N=5 there are only 3 solutions:

11100
01110
00111

This can be implemented using a code like this:

from collections import deque
N=5
S = deque('111'+'0'*(N-3))
for _ in range(N-2):
    print(''.join(S))
    S.rotate()

This shifting technique, allows us to see the problem from another angle, explore all valid '3 consecutive 1' (3c1) having the left and right remaining part all equal to 0 '0'*(N-3)

[left]<3c1>[right]

Now the problem is generating all the remaining numbers for left and right. We need an algorithm able to Generate them for all lengths from 1 to N-3

It is not clear to me if 4 consecutive 1 are valid, or multiple 3c1 are valid as well, in any way this can be implemented in this piece of code.

Supposing all other cases are allowed, we can generate them using itertools:

itertools.product('01',repeat=n)

Putting all together:

shift the three ones from left to right at each step we generate all other remaining part both for left and right

Example with N=5:

Step 1:

left: lengt=0
right: lenght=2

['111'+''.join(r) for r in itertools.product('01',repeat=2)]
Out[3]: ['11100', '11101', '11110', '11111']

Step 2:

left: lenght=1
right: lenght=1

[''.join(l)+'111'+''.join(r) for r in itertools.product('01',repeat=1) for l in itertools.product('01',repeat=1)]
Out[5]: ['01110', '11110', '01111', '11111']

Step 2:

useless to explain.It should be already clear

This code explains a possible algorithm to generate the numbers but has no optimization at all. I think some profiling should be used to address further opt. maybe numpy, or create the left/right set and load it can give good speed-up.

10
  • How would you adapt this for the range(2**30) ?
    – SIGHUP
    Commented Jul 9 at 12:30
  • In the example N represent the number of bits used, so for 2^30 N should be 30. This is oneliner solution: set([''.join(l)+'111'+''.join(r) for offset in range(N-3) for r in itertools.product('01',repeat=N-3-offset) for l in itertools.product('01',repeat=offset)]) it takes less 1sec for N=20, 1min for N=25
    – Glauco
    Commented Jul 9 at 13:28
  • Then you don't produce all possibilities and for N=20 you generate 2,228,224 strings, way more than even 2^20.
    – no comment
    Commented Jul 9 at 13:40
  • Also, using a list comprehension seems just silly, why not use a set comprehension?
    – no comment
    Commented Jul 9 at 13:42
  • This was just an example, of the algorithm, int(x,2) should transform the string representation into the final number. Right for set comprehension.
    – Glauco
    Commented Jul 9 at 13:44

Not the answer you're looking for? Browse other questions tagged or ask your own question.