0

Context: My company is using Solr 7.0 in cloud mode, kept in sync by a Zookeeper ensemble; 3 Solr nodes, 3 Zookeeper nodes; 1 collection, 1 shard, 3 replicas (each replica on one of the Solr instances): collection1_shard1_replica_n1 (on Solr1); collection1_shard1_replica_n2 (on Solr2) and collection1_shard1_replica_n3 (on Solr3); all kept in AWS. This Solr setup was created many years ago.

Due to circumstances, I must change the OS of the Solr instances, that means I need to create new instances with a new OS.

I created new Solr instances with a new OS, created backups for the original Solr instances, powered down the original Solr instances, moved the original Solr data volume to the new Solr instances and started Solr with a command from /etc/rc.local su solr -c '/my/solr/path/solr/bin/solr start -cloud -p 8983 -z zk1-ip,zk2-ip,zk3-ip -m 8g' but there is a problem (see below).

In the past, on the old Solr instances web interface, in the "Cloud" menu option, I can see correctly all 3 IPs of the Solr nodes, one of which was the master/primary. Whenever we needed to reboot master/primary Solr1 node, another node would take leadership and we cycled through all 3 of them until we reached back to Solr1 as primary/master node

Problem: Not knowing much about how Solr works, I thought this was going to be easy ... boy was I wrong. Once I started the new Solr1, I noticed that in the "Cloud" menu of the web interface it was still showing the old Solr1 IP, even after recovering and taking active leadership (at this point, all old Solr instances were powered down, and only new Solr1 was powered up). After some time I decided to power on new Solr2 instance and see what happens. Obviously, the new Solr2 instance came online but in the "Cloud" menu it was showing the old Solr2 IP and never went past the recovery step. Upon checking the new Solr logs, I discovered that Solr was actively trying to use the old IPs (even if the new instance has different IP) so I figured they must be configured somewhere in the Solr collection, Solr configs or in the Zookeeper configs. So, new Solr2 was still trying to reach old Solr1 master/primary node IP, spamming the logs that it couldn't reach it (because it was powered down). New Solr1 also stated in the logs that it was going to be the leader, but with the IP of the old Solr leader. This was taken from the logs of new Solr1

(OverseerElectionContext I am going to be the leader old-solr-master-IP:8983_solr)

However, after spending some time checking both new Solr and Zookeeper configs, I could not find a place to change this. After some more hours, I had to power down the new Solrs and power up the old/original ones, because this was holding back production.

Question: How can I migrate/move the Solr collection from the 3 original Solr instances to the 3 new Solr instances without loosing data and without re-indexing? I'm not sure we can re-index :(

Thanks

1
  • 1
    I don't remember exactly when it was introduced, but the Collections backup API is usually the preferred way to move content to a new cluster: solr.apache.org/guide/solr/latest/deployment-guide/… - if you've copied the old collection cluster state (stored inside zk) over to the new cluster, the nodes won't know where the other nodes are actually located, since the state file says they're somewhere else from where they currently are.
    – MatsLindh
    Commented Jun 21 at 10:24

0

Browse other questions tagged or ask your own question.