Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

[V2] edgeHub just stopped working? #558

Open
Hammatt opened this issue Mar 27, 2018 · 28 comments
Open

[V2] edgeHub just stopped working? #558

Hammatt opened this issue Mar 27, 2018 · 28 comments

Comments

@Hammatt
Copy link

Hammatt commented Mar 27, 2018

We've been developing an application built on IoT Edge, I just restarted the runtime and the edgeHub module will no longer start up.

docker logs edgeHub
docker :
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) --->
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied
   at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle
password, X509KeyStorageFlags keyStorageFlags)
   at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password,
X509KeyStorageFlags keyStorageFlags)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.<MainAsync>d__1.MoveNext() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27

Edit:
I've tried restarting the device. it's running windows 10 iot core if that's of any relevance. I've re-ran the setup command a few times to see if that changed anything and it hasn't.

@darobs
Copy link
Contributor

darobs commented Mar 27, 2018

Hello @Hammatt

Thanks for the information.

Do you know which version of the edgeHub you're running? It will be at the top of the starting banner, Like this:

2018-03-27 19:52:01.404 +00:00 [INF] - Starting Edge Hub
2018-03-27 19:52:01.406 +00:00 [INF] - Version - 1.0.0-preview022.11567621 (12a8e1bb63e619b17ca685efd470ad3f412034f4)
2018-03-27 19:52:01.407 +00:00 [INF] -
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

I'm also curious about the iotedgectl setup command line (minus the connection string), if you're willing to share.

@Hammatt
Copy link
Author

Hammatt commented Mar 27, 2018

Hi @darobs ,

I'm not able to see any logs in the edge hub at all, what I posted in the original post is the entire output of docker logs edgeHub

This is what we're using for the setup command:
iotedgectl setup --connection-string "{connection-string}" --auto-cert-gen-force-no-passwords

Edit: possibly relevant to mention, all other modules start up (but time out because they can't connect to the hub)

@Hammatt
Copy link
Author

Hammatt commented Mar 27, 2018

Just some more information, I've been trying to verify ways to reproduce.

So far: This doesn't seem to happen on windows 10 Pro, the module starts up and everything works fine.

But on every Windows 10 IoT Core based device that I've tested this on, the issue occurs.

Hope this is of use.

@michael-chi
Copy link

michael-chi commented Mar 28, 2018

not sure if this is relevant, but I am also seeing edgeHub keep on start/stop.
I am running on Raspberry PI 3

Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l GNU/Linux

docker logs -f edgeHub results:

Edge Hub Server Certificate File: /mnt/edgehub/edge-hub-server.cert.pfx
Edge Hub CA Server Certificate File: /mnt/edgehub/edge-chain-ca.cert.pem
SSL_CERTIFICATE_PATH=/mnt/edgehub
SSL_CERTIFICATE_NAME=edge-hub-server.cert.pfx
Executing: cp /mnt/edgehub/edge-chain-ca.cert.pem /usr/local/share/ca-certificates/edge-chain-ca.crt
Executing: update-ca-certificates
Updating certificates in /etc/ssl/certs...
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Certificates installed successfully!
runuser: user  does not exist

I couldn't see version information thru docker logs -f edgeHub, below is result of docker logs -f edgeAgent

pi@raspberrypi:~ $ sudo docker logs -f edgeAgent
2018-03-28 04:03:40.721 +00:00 [INF] - Starting module management agent.
2018-03-28 04:03:47.799 +00:00 [INF] - Version - 1.0.0-preview022.11567621 (12a8e1bb63e619b17ca685efd470ad3f412034f4)
2018-03-28 04:03:47.801 +00:00 [INF] -

result of docker images

pi@raspberrypi:~ $ sudo docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
kalschi/rpi-camera-module      0.0.1-arm32v7       9f48fee2c123        About an hour ago   176MB
microsoft/azureiotedge-hub     1.0-preview         d40af83309cd        22 hours ago        234MB
microsoft/azureiotedge-agent   1.0-preview         b7d29616e809        22 hours ago        218MB
kalschi/rpi-camera-module      <none>              53858f83eaaa        2 days ago          230MB
microsoft/azureiotedge-hub     <none>              a679d016e9d2        3 weeks ago         235MB
microsoft/azureiotedge-agent   <none>              8c623975bae5        3 weeks ago         218MB
@darobs
Copy link
Contributor

darobs commented Mar 28, 2018

This seems to be ARM-specific, we will be investigating.

@Hammatt
Copy link
Author

Hammatt commented Mar 28, 2018

Hey @darobs, I can confirm that this is not limited only to ARM as the Windows 10 IoT Core Devices that we have are all x64 architecture. Specifically we've been able to reproduce on a Minnowboard turbot dual Ethernet Quad-Core (Intel Atom E3845) model, and a number of other x64 based devices. I don't have access to any ARM devices to test this on.

@darobs
Copy link
Contributor

darobs commented Mar 28, 2018

Got my problems of the day mixed up... The problem @michael-chi is seeing on the Raspberry Pi has been fixed and should be pushed out to Docker.

@Hammatt - we're still looking at this problem.

@Hammatt
Copy link
Author

Hammatt commented Mar 28, 2018

Thanks @darobs , is there a way that I could roll back to a working version here? It's blocking me pretty hard at work.

@darobs
Copy link
Contributor

darobs commented Mar 28, 2018

Here's what I would try in the following order:

  1. docker rm $(docker ps -aq) and restart
  • There are issues with Docker/Windows restarting and creating a new network with the same name.
  1. Roll back to preview21
  • docker rm $(docker ps -aq) to clean out preview22 images.
  • add --agent microsoft/azureiotedge-agent:1.0.0-preview021 to the iotedgectl setup command, and set the edge hub image in the deployment to microsoft/azureiotedge-hub:1.0.0-preview021

If this is the Windows networking issue we're seeing, the first should fix the problem.

@Hammatt
Copy link
Author

Hammatt commented Mar 28, 2018

That first command isn't working:

PS C:\Data\Users\Administrator\Documents> docker rm $(docker ps -aq)
docker : "docker rm" requires at least 1 argument.
At line:1 char:1
+ docker rm $(docker ps -aq)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: ("docker rm" req...ast 1 argument.:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

See 'docker rm --help'.
Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...] [flags]
Remove one or more containers

Edit: Neither is the 2nd:

iotedgectl setup --connection-string "{connection-string}" --auto-cert-gen-force-no-passwords --agent microsoft/azureiotedge-agent:1.0.0-preview
021
iotedgectl : usage: iotedgectl setup [-h] [--config-file] [--connection-string]
    + CategoryInfo          : NotSpecified: (usage: iotedgec...nection-string]:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

                        [--edge-config-dir] [--edge-home-dir]
                        [--edge-hostname] [--runtime-log-level] [--image]
                        [--docker-registries  [...]] [--docker-uri]
                        [--upstream-protocol]
                        [--auto-cert-gen-force-no-passwords]
                        [--owner-ca-cert-file] [--device-ca-cert-file]
                        [--device-ca-chain-cert-file]
                        [--device-ca-private-key-file]
                        [--device-ca-passphrase-file] [--device-ca-passphrase]
                        [--agent-ca-passphrase-file] [--agent-ca-passphrase]
                        [-C] [-ST] [-L] [-OR] [-OU] [-CN]
iotedgectl setup: error: ambiguous option: --agent could match --agent-ca-passphrase-file, --agent-ca-passphrase
@darobs
Copy link
Contributor

darobs commented Mar 28, 2018

My bad. That command works as is in Linux, and I thought the same form worked in Powershell.

Essentially, you want to run docker rm -f on all existing containers.

...and the other mistake is it's not --agent but --image

iotedgectl setup --connection-string "{connection-string}" --auto-cert-gen-force-no-passwords --image microsoft/azureiotedge-agent:1.0.0-preview
021
@Hammatt
Copy link
Author

Hammatt commented Mar 28, 2018

Alright, so what I've done is stop all the containers, run docker system prune -a and then double checked that the containers are gone with docker ps -a. Then I've ran the setup command without the image argument and the result was the same:

docker logs edgeHub
docker :
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) --->
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied
   at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle
password, X509KeyStorageFlags keyStorageFlags)
   at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password,
X509KeyStorageFlags keyStorageFlags)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.<MainAsync>d__1.MoveNext() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27

I then pruned again and tried with the --image flag this time. I wasn't quite sure what to put after the image flag as you said a couple of different things but i went with microsoft/azureiotedge-agent:1.0.0-preview021, and not the one for the hub because it seemed to change the Edge Agent Image field that the setup command displayed.

I then check and it has the same output again:

docker logs edgeHub
docker :
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) --->
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied
   at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle
password, X509KeyStorageFlags keyStorageFlags)
   at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password,
X509KeyStorageFlags keyStorageFlags)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.<MainAsync>d__1.MoveNext() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27

Not really sure what's going on now, if it happens in this version too. Have you been able to reproduce the issue at all?

@aribeironovaes
Copy link
Contributor

aribeironovaes commented Mar 29, 2018 via email

@Hammatt
Copy link
Author

Hammatt commented Mar 29, 2018

Sorry I may have not been clear, I ran iotedgectl setup --connection-string "{connection string}" --auto-cert-gen-force-no-passwords --image microsoft/azureiotedge-agent:1.0.0-preview021 and it still failed wit hthe same error.

edit: oh, i think i see what you mean now

@Hammatt
Copy link
Author

Hammatt commented Mar 29, 2018

Sorry for the confusion earlier, I'm up and running now on edgeHub Version - 1.0.0-preview021.10543704

@yphuangms
Copy link

yphuangms commented Mar 29, 2018

Hi, I was curious, is there anyone that can successfully run preview22 version of edgeAgent , edgeHub using windows container?

I have run into the same issue as @Hammatt, and I've tried both Windows 10 destop, Windows IoT Core, but all failed with the same exception.

Can we say latest edgeHub windows container (privew22) has blocking issue, and the only way to start IoT Edge on windows platform is to rollback to preview21?

And what's the steps to rollback to preview21? By running iotedgectl setup to change edgeAgent image version doesn't make change to edgeHub version, it still use the latest edgeHub image (preview22)... Any help would be very appreciated, thanks!

@Orlando1991
Copy link

@yphuangms , you have to change it in the Azure portal. Go to where you would set your modules. Click on Configure advanced Edge runtime settings. And change image to:
microsoft/azureiotedge-hub:1.0.0-preview021

@darobs
Copy link
Contributor

darobs commented Mar 29, 2018

@yphuangms

I was able to run preview22 error free with Windows containers on my Windows 10 PC, but it was a completely new deployment.

@yphuangms
Copy link

yphuangms commented Mar 30, 2018

@Orlando1991, Thanks! It helps a lot! And from trial and error, I found that only edgeHub requires rollback.

@darobs , Do you know if there will be a quick release to recover edgeHub preview22? Or just leave it as it is, those who encounter the same issue have to resolve on their own?

@michael-chi
Copy link

I can confirm that new version of edge runtime works fine with RPi now.

@Hammatt
Copy link
Author

Hammatt commented Apr 2, 2018

Hey @darobs, I can run preview22 with windows containers on windows 10 pro, the problem occurs for me on Windows 10 IoT Core only. When you say that you were able to run it, was that windows 10 pro or windows 10 iot core?

@darobs
Copy link
Contributor

darobs commented Apr 2, 2018

Windows 10 pro, not IoT Core. I've reached out to our Windows experts for more help.

@Hammatt
Copy link
Author

Hammatt commented Apr 2, 2018

I take what I said back, on windows 10 pro I'm getting endless timeouts:
CONNECT failed: RefusedNotAuthorized, caused by: Microsoft.Azure.Devices.Client

After running docker system prune -a preview022 worked for me on windows 10 pro.

@v-tbert
Copy link

v-tbert commented Apr 2, 2018

@Hammatt do you by chance use SetMethodHandlerAsync and/or SetMethodDefaultHandlerAsync?

@Hammatt
Copy link
Author

Hammatt commented Apr 2, 2018

@darobs
Copy link
Contributor

darobs commented Apr 3, 2018

Hello @Hammatt

According to our Windows experts, this looks like a permissions issue on the certificate file. They suggested possibly a missing read or read/execute permission. Would you please check this?

@Hammatt
Copy link
Author

Hammatt commented Apr 3, 2018

Hey @darobs , I should have permission as I'm running the commands from an admin account.

([Security.Principal.WindowsPrincipal] `
 [Security.Principal.WindowsIdentity]::GetCurrent()
).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)

returns true.

If you're able to tell me the location on disk where the certificate could be, I could double check.

@Hammatt
Copy link
Author

Hammatt commented Apr 20, 2018

Just updating, still seeing this issue here. Edge Hub Preview022 won't start up on any of our iot core devices, but downgrading to Preview 021 without changing anything else does start up.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
7 participants