Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waitForReport YAML instruction is racy #19861

Open
mrjerryjohns opened this issue Jun 22, 2022 · 4 comments · May be fixed by #28257
Open

waitForReport YAML instruction is racy #19861

mrjerryjohns opened this issue Jun 22, 2022 · 4 comments · May be fixed by #28257
Assignees
Labels
p1 priority 1 work testing V1.0 yaml Missing features or bugs in the YAML test harness

Comments

@mrjerryjohns
Copy link
Contributor

mrjerryjohns commented Jun 22, 2022

Problem

CI Tests have been failing a lot more on Darwin recently. One example of that is here.

In that test, there is CPU starvation/context switching happening that's exposing some racy test logic.

That test in particular does a few things:

  1. Sends UpOrOpen command to get the window blind 'moving'. That should change the OperationalStatus value immediately to 0x21.
  2. Waits for 2s to get some initial movement.
  3. Subscribes to the OperationalStatus attribute with a min-interval of 4s, max of 5s, which returns back the value of 0x21. However, at this point, the test app side has a big, unexplained jump in timestamp for some reason:
2022-06-21T19:30:30.6266150Z 2022-06-21 19:30:30.626 ERROR   19:29:08.528 - TEST OUT  : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] 				DataVersion = 0x7da58729,[0m
2022-06-21T19:30:30.6267080Z 2022-06-21 19:30:30.626 ERROR   19:29:08.528 - TEST OUT  : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] 				AttributePathIB =[0m
2022-06-21T19:30:30.6267950Z 2022-06-21 19:30:30.626 ERROR   19:29:19.707 - TEST OUT  : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] 				{[0m
2022-06-21T19:30:30.6269390Z 2022-06-21 19:30:30.626 ERROR   19:29:19.707 - TEST OUT  : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] 					Endpoint = 0x1,[0m
2022-06-21T19:30:30.6270010Z 2022-06-21 19:30:30.626 ERROR   19:29:19.707 - TEST OUT  : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] 					Cluster = 0x102,[0m

This time jump of 10s happens right while it's parsing the data.

  1. Then, we call StopMotion on the target. That results in a report being generated immediately containing a value of 0 for the status back to the client, because on the server, the minimum interval requirement of 4s had already been satisfied.
  2. We wait for 3s for some reason on the test in Step 2b, and then check if we received the report. The report was already received earlier however, so we end up waiting and subsequently, timing out waiting for the change.

This test only passes when there is no CPU starvation happening, because it relies on the min interval of 4s holding off the report generation till we've gotten to the right test stage. That is really racy.

Proposal

waitForReport YAML command should really be checking for the report to have been received right from the previous call to subscribeAttribute, and not when the that test step is executed.

@mrjerryjohns
Copy link
Contributor Author

@jmeg-sfy jmeg-sfy self-assigned this Aug 23, 2022
@bzbarsky-apple bzbarsky-apple added the yaml Missing features or bugs in the YAML test harness label Oct 6, 2022
@franck-apple franck-apple added the p1 priority 1 work label Oct 24, 2022
@stale
Copy link

stale bot commented Apr 25, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@ericzijian1994
Copy link
Contributor

Think the problem is still happening...

@bzbarsky-apple
Copy link
Contributor

Yes, #28257 is not merged yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 priority 1 work testing V1.0 yaml Missing features or bugs in the YAML test harness
5 participants