Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault If binary_search_num_conflicts Is Used #4006

Open
jawbroken opened this issue Nov 25, 2023 · 4 comments
Open

Segmentation Fault If binary_search_num_conflicts Is Used #4006

jawbroken opened this issue Nov 25, 2023 · 4 comments
Assignees
Labels
Bug Solver: CP-SAT Solver Relates to the CP-SAT solver
Milestone

Comments

@jawbroken
Copy link

What version of OR-Tools and what language are you using?
Version: v9.8.3296
Language: Python

Which solver are you using (e.g. CP-SAT, Routing Solver, GLOP, BOP, Gurobi)

CP-SAT

What operating system (Linux, Windows, ...) and version?

macOS Sonoma 14.1.1

What did you do?

Setting the binary_search_num_conflicts to a non-negative value (e.g. 1) causes random segmentation faults in CP-SAT for me. I can reproduce it using the Weighted Latency Problem example code, but you need to set the problem size large enough (e.g. num_nodes = 100, grid_size = 100), remove max_time_in_seconds: 5 from the solver parameters, and add binary_search_num_conflicts: 1. It might take some time to happen (e.g. in my case it took about 110 seconds), and the stack trace is always as below.

What did you expect to see

No segmentation fault.

What did you see instead?

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               Python [32966]
Path:                  /opt/homebrew/*/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python
Identifier:            org.python.python
Version:               3.11.6 (3.11.6)
Code Type:             ARM-64 (Native)
Parent Process:        zsh [31090]
Responsible:           Electron [30943]
User ID:               501

Date/Time:             2023-11-25 22:18:42.9063 +1100
OS Version:            macOS 14.1.1 (23B81)
Report Version:        12
Anonymous UUID:        4B83107E-5C12-FA78-4D6B-7147F0F91C46


Time Awake Since Boot: 160000 seconds

System Integrity Protection: enabled

Crashed Thread:        14

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000000000000b
Exception Codes:       0x0000000000000001, 0x000000000000000b

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   Python [32966]

VM Region Info: 0xb is not in any region.  Bytes before following region: 68719476725
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      commpage (reserved)        1000000000-7000000000   [384.0G] ---/--- SM=NUL  ...(unallocated)

Thread 14 Crashed:
0   libsystem_kernel.dylib        	       0x1886f911c __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x188730cc0 pthread_kill + 288
2   libsystem_c.dylib             	       0x188609540 raise + 32
3   Python                        	       0x10121da00 faulthandler_fatal_error + 448
4   libsystem_platform.dylib      	       0x18875fa24 _sigtramp + 56
5   libortools.9.dylib            	       0x13669c3d4 operations_research::sat::IntegerEncoder::AddImplications(absl::lts_20230802::btree_map<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_>, operations_research::sat::Literal, std::__1::less<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_>>, std::__1::allocator<std::__1::pair<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_> const, operations_research::sat::Literal>>> const&, absl::lts_20230802::container_internal::btree_iterator<absl::lts_20230802::container_internal::btree_node<absl::lts_20230802::container_internal::map_params<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_>, operations_research::sat::Literal, std::__1::less<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_>>, std::__1::allocator<std::__1::pair<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_> const, operations_research::sat::Literal>>, 256, false>> const, std::__1::pair<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_> const, operations_research::sat::Literal> const&, std::__1::pair<operations_research::StrongInt64<operations_research::sat::IntegerValue_integer_tag_> const, operations_research::sat::Literal> const*>, operations_research::sat::Literal) + 224
6   libortools.9.dylib            	       0x13669d4e8 operations_research::sat::IntegerEncoder::AssociateToIntegerLiteral(operations_research::sat::Literal, operations_research::sat::IntegerLiteral) + 688
7   libortools.9.dylib            	       0x13669cfec operations_research::sat::IntegerEncoder::GetOrCreateAssociatedLiteral(operations_research::sat::IntegerLiteral) + 644
8   libortools.9.dylib            	       0x136726568 operations_research::sat::RestrictObjectiveDomainWithBinarySearch(operations_research::StrongIndex<operations_research::sat::IntegerVariable_index_tag_>, std::__1::function<void ()> const&, operations_research::sat::Model*) + 448
9   libortools.9.dylib            	       0x1365f2540 operations_research::sat::(anonymous namespace)::SolveLoadedCpModel(operations_research::sat::CpModelProto const&, operations_research::sat::Model*) + 1392
10  libortools.9.dylib            	       0x1366027fc operations_research::sat::(anonymous namespace)::LnsSolver::GenerateTask(long long)::'lambda'()::operator()() const + 2728
11  libortools.9.dylib            	       0x1367b8578 std::__1::__function::__func<operations_research::sat::NonDeterministicLoop(std::__1::vector<std::__1::unique_ptr<operations_research::sat::SubSolver, std::__1::default_delete<operations_research::sat::SubSolver>>, std::__1::allocator<std::__1::unique_ptr<operations_research::sat::SubSolver, std::__1::default_delete<operations_research::sat::SubSolver>>>>&, int)::$_2, std::__1::allocator<operations_research::sat::NonDeterministicLoop(std::__1::vector<std::__1::unique_ptr<operations_research::sat::SubSolver, std::__1::default_delete<operations_research::sat::SubSolver>>, std::__1::allocator<std::__1::unique_ptr<operations_research::sat::SubSolver, std::__1::default_delete<operations_research::sat::SubSolver>>>>&, int)::$_2>, void ()>::operator()() + 52
12  libortools.9.dylib            	       0x1360b39e0 operations_research::RunWorker(void*) + 112
13  libortools.9.dylib            	       0x1360b42a4 void* std::__1::__thread_proxy[abi:v15006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(void*), operations_research::ThreadPool*>>(void*) + 52
14  libsystem_pthread.dylib       	       0x188731034 _pthread_start + 136
15  libsystem_pthread.dylib       	       0x18872be3c thread_start + 8

Anything else we should know about your project / environment

I don't think I can reproduce it with OR-Tools v9.7.2996 (though it's hard to tell because the crash isn't deterministic).

@lperron
Copy link
Collaborator

lperron commented Nov 27, 2023

Thanks. We will investigate. I can reproduce the crash.

Note that you should just not use this flag. There are better techniques for the same effect.

@jawbroken
Copy link
Author

Thanks for looking at it. I mainly reported it because it seemed from the stack trace like it might be a deeper issue that was just more likely to trigger with this flag.

I was just playing around with the parameter because it seemed to help improve lower bounds faster on my problem, but I haven't done any rigorous testing. What are the better techniques?

@lperron
Copy link
Collaborator

lperron commented Nov 27, 2023 via email

@jawbroken
Copy link
Author

Great, thanks. I only have 12 cores and didn't appreciate that increasing the number of workers (or just customising the subsolvers) might be a big improvement despite the increased contention. No wonder binary_search_num_conflicts seemed helpful.

For reference, if anyone else stumbles across this, I get lb_tree_search at 13 workers, probing at 14, objective_lb_search at 15, objective_shaving_search_no_lp at 17, objective_shaving_search_max_lp at 19, probing_max_lp at 21, objective_lb_search_no_lp at 23 and objective_lb_search_max_lp at 25, in OR-Tools v9.8.3296. I'll look into perhaps customising the list of subsolvers to the ones that seem most useful for my problem.

@Mizux Mizux added this to the v10.0 milestone Feb 12, 2024
@Mizux Mizux added Bug Solver: CP-SAT Solver Relates to the CP-SAT solver labels Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Solver: CP-SAT Solver Relates to the CP-SAT solver
3 participants