Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

RegistryManager GetModulesOnDeviceAsync Returning Removed Modules Twins #570

Open
jason-e-gross opened this issue Apr 3, 2018 · 16 comments

Comments

@jason-e-gross
Copy link

jason-e-gross commented Apr 3, 2018

GetModulesOnDeviceAsync is returning modules that are no longer part of the device.

  • Created an IOT Edge device.
  • Added a module. Played around for a few days (was called "barkModule").
  • Removed the "barkModule" module, and later that day added another, "BarkModule" (uppercase B).

Now, for the last weeek - any time I call GetModulesOnDeviceAsync (tested with Microsoft.Azure.Devices v1.16.0-preview-001 and 003), to get a list of module twins -- It brings back four ($edgeAgent, $edgeHub, barkModule and BarkModule).

            foreach (var _device in _deviceTwins) {
                var moduleList = await registryManager.GetModulesOnDeviceAsync(_device.DeviceId);

The Azure website for the IoT Hub only shows three, as expected. I looked at all the module twins for $edgeAgent, $edgeHub, and the non-existent "barkModule", and can't find any indicator that says it is deleted/removed.

Expected:
To not see removed modules in a GetModulesOnDeviceAsync call to get module twins for a device.

Attached are two images - one showing the IoT hub listing only three.
2
Another image showing the GetModulesOnDeviceAsync call bringing back a result of four.
3

@dsajanice
Copy link
Member

Thanks for reporting this issue. We are taking a look at it and will get back with more details.

@jason-e-gross
Copy link
Author

To clarify -- I'm seeing it with any removed module.

Not just ones named the same but with different upper/lowercase in name (ie., barkModule vs BarkModule). I had a device today that had ModuleA and ModuleB - removed ModuleA - and this method is still bringing returning it.

@dsajanice
Copy link
Member

How did you create and delete the barkModule?

@jason-e-gross
Copy link
Author

jason-e-gross commented Apr 5, 2018

Through the IoT Hub interface on Azure portal. In all instances, so far - it's doing this. I'll click Edge Devices (Preview), then select the device. Then target the module and click "Delete". It removes it from the portal, but the GetModulesOnDeviceAsync method still shows it as part of the device.

Same with creating it. Done through the portal, configuring the device, adding and removing modules.

@dsajanice
Copy link
Member

Hi Jason, thanks for the info. This is a bug in the portal and the button name, 'Delete Module', is certainly confusing. What the delete button is actually doing is removing the module from the Edge device's deployment (as reflected in the $edgeagent module's twin json). Contrary to what the name suggests, the delete button is not deleting the module from the device. This is why you still see the module after clicking delete. Did you start the Edge runtime on a device? If you do, the Edge runtime should detect that the "deleted" module does not belong to its deployment and it should delete it from IoTHub. Can you please try that and let us know if that doesn't work. Thanks!

@jason-e-gross
Copy link
Author

jason-e-gross commented Apr 5, 2018

I don't think this is accurate -- because on the device, I've done these things:

  • Stopped the agent.
  • Started the agent.
  • Confirmed it's not showing up in the $edgeAgent's module twin.
  • Stopped, removed containers, removed images of $edgeAgent, $edgeHub and any of the device modules containers/images - so iotedgectl is forced to "start over again" and pull everything down (essentailly an empty docker).

Yet it still shows up in the GetModulesOnDeviceAsync method when I call it.

@dsajanice
Copy link
Member

There is definitely work to be done in the UI based on what I posted above. I am investigating why the edge runtime did not delete the module.

@jason-e-gross
Copy link
Author

jason-e-gross commented Apr 6, 2018

I think we're getting our wires crossed. Adding a module to a device is done through the IoT Hub interface -- do we not also remove them from there as well? How does that play well with IoT Edge Deployment? That's sort of off-topic, because:

If I remove a module, or add a module using the IoT Hub portal -- the edgeAgent at the device does do the work in modifying its module twin. If it's a removed module, the edgeAgent's module twin reflects that (no more entries for it). If it's an added module - again, it reflects that (adds an entry). The device also adds module images and fires up the containers, or stops removed module containers. That all works.

The IoT Hub portal and the actual device/modules are behaving as they should.

BUT -- that command -- the reason I filed this ticket - is it still reports those modules as if they existed. Even though :

  • it's not in the $edgeAgent module twin properties,
  • not listed on the IoT Hub portal
  • and even if I go to the extra steps of stopping the device, deleting all images and containers in the Docker entirely, and restarting the the device and forcing it to bring back down all new fresh agent/hub/modules).

The device is working as expected. The portal is acting as expected. The $edgeAgent module looks like I would expect (no mention of the removed devices).

What's not working as expected is the method call. It still shows any removed modules as if they still existed.

@dsajanice
Copy link
Member

Module identities in the cloud are different from module containers on an Edge device. Deleting a module container does not delete its identity. The edge agent is responsible for deleting module identities in the cloud. What we are investigating is why the edge agent did not delete the module identity for barkModule.

@jason-e-gross
Copy link
Author

OK, I'll let you at it. I've doubly confirmed that:

  1. The deleted module is not on the portal's version of the module twin.
  2. The deleted module is not on the $edgeAgent's module twin at the device.
  3. The deleted module is not a container in the device's docker.
  4. the deleted module is not as an image in the device's docker.

I don't have any flippin' clue where these ghost modules are hiding. I have 6-7 devices now that are acting like this.

@dsajanice
Copy link
Member

Please call RemoveModuleAsync if you want to delete the modules.

@jason-e-gross
Copy link
Author

I can do that for my OCD sake, but in trying to build a monitoring tool that uses that method to get a list of modules and if they're alive or not - is going to be barking alerts that there are dead modules, not knowing they're ghosts.

Thanks again for your help, appreciate you diving into this.

@dsajanice
Copy link
Member

While not desirable, it is not against the architectural principles of Edge to have a module identity in the cloud that Edge Agent is unaware of. For example one could use the AddModuleAsync call to create a module identity in the cloud and Edge Agent would have no idea this identity existed. I do not recommend calling GetModulesOnDeviceAsync to determine whether containers are alive. As I mentioned earlier, module identities are independent of module containers. The best way to monitor containers is through the Docker API.

I was able to reproduce your issue. The way one can get in to the state you are in is by clicking on the 'Delete Module' button in the UI when the Edge Agent is not running on the device. Can you please confirm that was the case in your setup.

@jason-e-gross
Copy link
Author

Yep, that does do the trick. I can reproduce it by doing as you said - modifying it in the Hub Portal while the device is off/agent not running.

What would be the purpose of creating a module in the module identities, if it doesn't end up and reflect (twin) what is what I want at the device -- or visa-versa? I guess, what's the point of having this method, and the properties showing the connection state of the modules at the device -- if it is a "best guess" decoupled from the device?

It seems ungainly that if I want to get an overview of the devices and their module states, that I would have to walk through thousands of devices and dozens of modules each - interrogating all those module twins - just to see if they're up and running (or what version of an image they're using in case of an update to a module happens [which IoT Edge Deployments is not suitable]).

"show me all the devices that are using tempsensor:v31 module" -- would be ugly.

Am I using this method/MS.Azure.Devices incorrectly? Was the GetModulesOnDeviceAsync more geared towards the IoT Edge Deployment implementation? Is the best-practice to interrogate the module twins across all the devices for a healthy state / overview?

@dsajanice
Copy link
Member

The Edge Agent twin's reported properties section has details about the state of all the modules. You can use the service client to read those reported properties. The GetModulesOnDeviceAsync is decoupled from the state of a container and so is not the correct API to use for your scenario. Take a look at the agent's reported properties and let us know if that suffices.

@dsajanice
Copy link
Member

@jason-e-gross there are some UI improvements that we are tracking internally. Besides that, we have not identified any edge specific product changes. I plan to close this bug in a couple days. Please let me know if there are any other open or blocking issues related to this bug.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
2 participants