Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqs_events: SQS messages processed before reactor engine is ready #49404

Open
cyrilgdn opened this issue Aug 29, 2018 · 11 comments
Open

sqs_events: SQS messages processed before reactor engine is ready #49404

cyrilgdn opened this issue Aug 29, 2018 · 11 comments
Labels
Feature new functionality including changes to functionality and code refactors, etc.
Milestone

Comments

@cyrilgdn
Copy link

cyrilgdn commented Aug 29, 2018

Description of Issue/Question

See also: https://groups.google.com/forum/#!topic/salt-users/tAm-c-Cs5ss

We are using the sqs_events engine but we are losing messages.
When Salt is starting, sqs_events engine is ready and starts to send events before the reactor engines is properly started.

I also test to sleep a bit at the beginning of the start function of the sqs_events engine.
With this, messages are not lost at startup anymore but we are still losing messages when restarting so it seems that when Salt is stopping, events may be sent when reactors are already stopped.

Test setup

For testing purpose, I've just created a reactor which creates empty files with filenames retrieved for the SQS queue and I'm sending messages in the queue manually.

Master configuration (relevant part):

log_level: debug
engines:
- sqs_events:
    queue: salt-queue
sqs:
  message_format: json
sqs.region: eu-central-1
reactor:
- salt/engine/sqs:
  - salt://test_sqs_events.sls

states/test_sqs_events.sls

#!py
# vim:ft=python

import json
import os

def run():
    if not os.path.isdir("/tmp/test_salt"):
        os.mkdir("/tmp/test_salt")

    message = json.loads(data['message'])
    open("/tmp/test_salt/{}".format(message["filename"]), 'a').close()

    return {}

Steps to Reproduce Issue

The first test is to stop salt-master, then to run

for i in $(seq 10); do aws sqs send-message --queue-url https://eu-central-1.queue.amazonaws.com/ACCOUNT_ID/salt-queue --message-body "{\"filename\": $i}"; done

Then to start salt-master

No messages will be processed by reactors:

$ ls /tmp/test_salt
ls: cannot access '/tmp/test_salt': No such file or directory

Logs (relevant part)

[...]
2018-08-29 15:55:03,584 [salt.utils.lazy  :100 ][DEBUG   ][2694] LazyLoaded sqs_events.start
2018-08-29 15:55:03,584 [salt.engines     :53  ][INFO    ][2694] Starting Engine salt.engines.Engine(salt.loaded.int.engines.sqs_events)
2018-08-29 15:55:03,586 [salt.utils.process:418 ][DEBUG   ][2694] Started 'salt.engines.Engine(salt.loaded.int.engines.sqs_events)' with pid 2709
2018-08-29 15:55:03,589 [salt.utils.lazy  :100 ][DEBUG   ][2694] LazyLoaded reactor.start
2018-08-29 15:55:03,590 [salt.engines     :53  ][INFO    ][2694] Starting Engine salt.engines.Engine(salt.loaded.int.engines.reactor)
2018-08-29 15:55:03,592 [salt.utils.process:418 ][DEBUG   ][2694] Started 'salt.engines.Engine(salt.loaded.int.engines.reactor)' with pid 2710
2018-08-29 15:55:03,598 [salt.utils.lazy  :100 ][DEBUG   ][2709] LazyLoaded sqs_events.start
2018-08-29 15:55:03,600 [salt.utils.event :321 ][DEBUG   ][2709] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
2018-08-29 15:55:03,600 [salt.utils.event :324 ][DEBUG   ][2709] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
2018-08-29 15:55:03,631 [boto             :383 ][DEBUG   ][2709] Retrieving credentials from metadata server.
2018-08-29 15:55:03,621 [salt.utils.lazy  :100 ][DEBUG   ][2710] LazyLoaded reactor.start
2018-08-29 15:55:03,638 [salt.config      :2117][DEBUG   ][2710] Reading configuration from /etc/salt/master
2018-08-29 15:55:03,646 [boto             :404 ][DEBUG   ][2709] Retrieved credentials will expire in 5:06:27.353955 at: 2018-08-29T21:01:31Z
2018-08-29 15:55:03,724 [boto             :897 ][DEBUG   ][2709] Method: GET
2018-08-29 15:55:03,724 [boto             :898 ][DEBUG   ][2709] Path: /ACCOUNT_ID/salt-queue
2018-08-29 15:55:03,725 [boto             :899 ][DEBUG   ][2709] Data:
2018-08-29 15:55:03,726 [boto             :900 ][DEBUG   ][2709] Headers: {}
2018-08-29 15:55:03,726 [boto             :901 ][DEBUG   ][2709] Host: eu-central-1.queue.amazonaws.com
2018-08-29 15:55:03,727 [boto             :902 ][DEBUG   ][2709] Port: 443
2018-08-29 15:55:03,750 [boto             :1172][DEBUG   ][2709] <?xml version="1.0"?><ReceiveMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ReceiveMessageResult><Message><MessageId>9b3789ec-cba0-4e5e-a3e3-6ca0d4ceda13</MessageId><ReceiptHandle>...</ReceiptHandle><MD5OfBody>aea163445e8c67e6d22ded8aa8dab526</MD5OfBody><Body>{&quot;filename&quot;: 6}</Body></Message></ReceiveMessageResult><ResponseMetadata><RequestId>7b63582f-dbc4-5af7-be4d-0c2f1fd39f06</RequestId></ResponseMetadata></ReceiveMessageResponse>
2018-08-29 15:55:03,751 [salt.transport.ipc:254 ][DEBUG   ][2709] Initializing new IPCClient for path: /var/run/salt/master/master_event_pull.ipc
2018-08-29 15:55:03,763 [salt.utils.event :739 ][DEBUG   ][2709] Sending event: tag = salt/engine/sqs; data = {u'message': u'{"filename": 6}', u'_stamp': '2018-08-29T15:55:03.763372'}
[Idem with all messages]
2018-08-29 15:55:03,961 [salt.config      :2117][DEBUG   ][2710] Reading configuration from /etc/salt/master
[Reading all configuration files]
2018-08-29 15:55:06,083 [salt.loaded.int.grains.core:598 ][DEBUG   ][2710] Please install 'virt-what' to improve results of the 'virtual' grain.
2018-08-29 15:55:06,281 [salt.loaded.int.grains.extra:70  ][DEBUG   ][2710] Loading static grains from /etc/salt/grains
2018-08-29 15:55:06,497 [salt.utils.lazy  :100 ][DEBUG   ][2710] LazyLoaded jinja.render
2018-08-29 15:55:06,502 [salt.utils.lazy  :100 ][DEBUG   ][2710] LazyLoaded yaml.render
2018-08-29 15:55:06,504 [salt.utils.event :321 ][DEBUG   ][2710] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
2018-08-29 15:55:06,506 [salt.utils.event :324 ][DEBUG   ][2710] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
2018-08-29 15:55:06,506 [salt.transport.ipc:254 ][DEBUG   ][2710] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
2018-08-29 15:55:12,693 [salt.utils.reactor:109 ][DEBUG   ][2710] Gathering reactors for tag salt/auth
2018-08-29 15:55:22,707 [salt.utils.reactor:109 ][DEBUG   ][2710] Gathering reactors for tag salt/auth
[...]

Restart

If we test to restart salt-master during message sending:

for i in $(seq 20); do aws sqs send-message --queue-url https://eu-central-1.queue.amazonaws.com/ACCOUNT_ID/salt-queue --message-body "{\"filename\": $i}"; done

Then just after messages started to be sent:

systemctl restart salt-master

We can see that some messages have been lost:

$ ls -1 /tmp/test_salt/ | sort -n
1
2
3
15
16
17
18
19
20

Versions Report

(also tested on Salt 2016.11.X)

Salt Version:
           Salt: 2018.3.2
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.5.3
      docker-py: Not Installed
          gitdb: 2.0.0
      gitpython: 2.1.1
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: 1.3.7
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.13 (default, Nov 24 2017, 17:33:09)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: 2.0.1
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.2.1
 
System Versions:
           dist: debian 9.5 
         locale: UTF-8
        machine: x86_64
        release: 4.9.0-8-amd64
         system: Linux
        version: debian 9.5 
@Ch3LL
Copy link
Contributor

Ch3LL commented Aug 31, 2018

ping @saltstack/team-core any ideas on how to solve the issue where hte engine and events start sending before the reactor is up

@Ch3LL Ch3LL added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Aug 31, 2018
@Ch3LL Ch3LL added this to the Blocked milestone Aug 31, 2018
@gtmanfred
Copy link
Contributor

We would need to add the ability to order the starting of engines. To make sure the reactor engine is up before the sqs engine.

@srinivassundar
Copy link

hi team, any update on this issue ? i am facing the same problem.
root@salt:~# salt --versions-report
Salt Version:
Salt: 2018.3.3

Dependency Versions:
cffi: 1.11.5
cherrypy: unknown
dateutil: 2.7.5
docker-py: Not Installed
gitdb: 2.0.3
gitpython: 2.1.8
ioflo: Not Installed
Jinja2: 2.10
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 1.0.7
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: 2.19
pycrypto: 2.6.1
pycryptodome: 3.7.2
pygit2: Not Installed
Python: 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
python-gnupg: 0.4.1
PyYAML: 3.12
PyZMQ: 16.0.2
RAET: Not Installed
smmap: 2.0.3
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.2.5

System Versions:
dist: Ubuntu 18.04 bionic
locale: UTF-8
machine: x86_64
release: 4.15.0-1021-aws
system: Linux
version: Ubuntu 18.04 bionic

@gtmanfred
Copy link
Contributor

afaik there is no work currently being done allowing ordering of engine starts.

But you could try directly specifying the reactor in the list of engines.

log_level: debug
engines:
  - reactor: {}
  - sqs_events:
    queue: salt-queue
sqs:
  message_format: json
sqs.region: eu-central-1
reactor:
- salt/engine/sqs:
  - salt://test_sqs_events.sls

And see if explicitly putting the reactor engine before the sqs events engine causes the reactor to be started first.

The way that the reactor is started now is if you have a reactor key in the master config, it adds the reactor engine to the list of engines, and starts them in order.

@srinivassundar
Copy link

@gtmanfred
ordering the engine starts doesn't seem to help.
is there any other solution?

thanks
Srinivas

@gtmanfred
Copy link
Contributor

nope, it would require a change to allow for ordering of the engines to start, and something to signify that an engine has finished starting before the next one is started.

and that would be a feature request if @saltstack/team-triage will tag it.

thanks

@srinivassundar
Copy link

srinivassundar commented Jan 3, 2019 via email

@garethgreenaway garethgreenaway added Feature new functionality including changes to functionality and code refactors, etc. and removed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Jan 3, 2019
@garethgreenaway
Copy link
Contributor

Since this is asking for new functionality, the ability to specify some dependencies in what order engines load, which doesn't currently exist I'll go ahead and label it as a feature request and approve it for a future feature addition.

@garethgreenaway garethgreenaway modified the milestones: Blocked, Approved Jan 3, 2019
@stale
Copy link

stale bot commented Jan 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 9, 2020
@mchugh19
Copy link
Contributor

mchugh19 commented Jan 9, 2020

Still valid

@stale
Copy link

stale bot commented Jan 9, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature new functionality including changes to functionality and code refactors, etc.
6 participants