You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CI Tests have been failing a lot more on Darwin recently. One example of that is here.
In that test, there is CPU starvation/context switching happening that's exposing some racy test logic.
That test in particular does a few things:
Sends UpOrOpen command to get the window blind 'moving'. That should change the OperationalStatus value immediately to 0x21.
Waits for 2s to get some initial movement.
Subscribes to the OperationalStatus attribute with a min-interval of 4s, max of 5s, which returns back the value of 0x21. However, at this point, the test app side has a big, unexplained jump in timestamp for some reason:
2022-06-21T19:30:30.6266150Z 2022-06-21 19:30:30.626 ERROR 19:29:08.528 - TEST OUT : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] DataVersion = 0x7da58729,[0m
2022-06-21T19:30:30.6267080Z 2022-06-21 19:30:30.626 ERROR 19:29:08.528 - TEST OUT : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] AttributePathIB =[0m
2022-06-21T19:30:30.6267950Z 2022-06-21 19:30:30.626 ERROR 19:29:19.707 - TEST OUT : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] {[0m
2022-06-21T19:30:30.6269390Z 2022-06-21 19:30:30.626 ERROR 19:29:19.707 - TEST OUT : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] Endpoint = 0x1,[0m
2022-06-21T19:30:30.6270010Z 2022-06-21 19:30:30.626 ERROR 19:29:19.707 - TEST OUT : [0;34m[1655839748528] [28482:124272] CHIP: [DMG] Cluster = 0x102,[0m
This time jump of 10s happens right while it's parsing the data.
Then, we call StopMotion on the target. That results in a report being generated immediately containing a value of 0 for the status back to the client, because on the server, the minimum interval requirement of 4s had already been satisfied.
We wait for 3s for some reason on the test in Step 2b, and then check if we received the report. The report was already received earlier however, so we end up waiting and subsequently, timing out waiting for the change.
This test only passes when there is no CPU starvation happening, because it relies on the min interval of 4s holding off the report generation till we've gotten to the right test stage. That is really racy.
Proposal
waitForReport YAML command should really be checking for the report to have been received right from the previous call to subscribeAttribute, and not when the that test step is executed.
The text was updated successfully, but these errors were encountered:
Problem
CI Tests have been failing a lot more on Darwin recently. One example of that is here.
In that test, there is CPU starvation/context switching happening that's exposing some racy test logic.
That test in particular does a few things:
OperationalStatus
value immediately to 0x21.This time jump of 10s happens right while it's parsing the data.
StopMotion
on the target. That results in a report being generated immediately containing a value of 0 for the status back to the client, because on the server, the minimum interval requirement of 4s had already been satisfied.This test only passes when there is no CPU starvation happening, because it relies on the min interval of 4s holding off the report generation till we've gotten to the right test stage. That is really racy.
Proposal
waitForReport
YAML command should really be checking for the report to have been received right from the previous call to subscribeAttribute, and not when the that test step is executed.The text was updated successfully, but these errors were encountered: