Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing of profile in Spark options instead of a profile-file (#102) #103

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jacob-heldenbrand-cl
Copy link

@jacob-heldenbrand-cl jacob-heldenbrand-cl commented Jan 6, 2022

This PR provides the ability to pass in the secrets stored in the profile-file directly into Spark.

While I've added an integration test, I was unable to run it directly since I do not have access to the test environment. However, using a debugger, I was able to confirm that the parameters from the DeltaSharingDataSource's createRelation method contain the options, and that the method constructs the RemoteDeltaLog object correctly.

Closes #102

Signed-off-by: Jacob Heldenbrand jacob.heldenbrand@closedloop.ai

…ile-file (delta-io#102)

This PR provides the ability to pass in the secrets stored in the profile-file directly into Spark.

Closes delta-io#102

Signed-off-by: Jacob Heldenbrand <jacob.heldenbrand@closedloop.ai>
@zsxwing
Copy link
Member

zsxwing commented Jan 7, 2022

Thanks for the contribution. Actually, we discussed how to pass the secrets in the past and decided to use the profile file approach. Setting the secrets in the code directly is not encouraged so we don't want to support this.

@jacob-heldenbrand-cl
Copy link
Author

jacob-heldenbrand-cl commented Jan 7, 2022

The reason we want to pass credentials in as read options is not so we can hardcode secrets in the code, but so we can inject them into Spark dynamically. For our use case, we are not allowed to save secrets to an arbitrary S3 file, but instead must store them in an audited secret management system (in our case HashiCorp's Vault). We follow this pattern with other technologies as well, such as JDBC.

Is there an alternative approach we could use/implement to inject these secrets into Spark dynamically?

@zsxwing
Copy link
Member

zsxwing commented Jan 12, 2022

This is a fair point. Let me think about this and also how to support SQL.

Is there an alternative approach we could use/implement to inject these secrets into Spark dynamically?

As a workaround, you can manually read them from your audited secret management system, and store as a temp file, and use the temp file path to access.

@ssimeonov
Copy link

I'll second the request for dynamic configuration.

The original design decision to use files seems to have been made on a very faulty assumption. Putting secrets in source code has been discouraged since first days of source code. It's not the responsibility of this project to save developers from themselves, especially not at the cost of increasing configuration/deployment complexity.

@stevenayers-bge
Copy link

@linzhou-db @zsxwing could we get this merged in, please? I'm happy to help resolve the conflicts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants