Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(initContainer): Tune start-up process #10454

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

kiblik
Copy link
Contributor

@kiblik kiblik commented Jun 25, 2024

Previous implementation used initialDelaySeconds set to 120s which might be fine for standard (default resource allocation), first-time starting (with empty database) instances. Mentioned fact is problematic for:

  • instances that are "under-resourced" and start-ups might be longer and 120s might not be enough
  • instances that are "over-resourced" and start-ups might be much shorter and 120s might be wasting of time (to wait until the instance is marked as Ready)
  • instances that are already deployed and just need upgrade (with no or only just a couple of migrations). In this case, it is wasting of time again.
  • instances with multiple replicas when each replica needs to wait 2 minutes without the reason. Waisting of time again.

This solution does not solve #10197 but at least do not silently allow full rollout until the database is fully migrated.

This solution might be solved as well #10141

@github-actions github-actions bot added the helm label Jun 25, 2024
Copy link

dryrunsecurity bot commented Jun 25, 2024

Hi there 👋, @DryRunSecurity here, below is a summary of our analysis and findings.

DryRun Security Status Findings
Configured Codepaths Analyzer 0 findings
IDOR Analyzer 0 findings
Sensitive Files Analyzer 0 findings
Server-Side Request Forgery Analyzer 0 findings
SQL Injection Analyzer 0 findings
Authn/Authz Analyzer 0 findings
Secrets Analyzer 0 findings

Note

🟢 Risk threshold not exceeded.

Change Summary (click to expand)

The following is a summary of changes in this pull request made by me, your security buddy 🤖. Note that this summary is auto-generated and not meant to be a definitive list of security issues but rather a helpful summary from a security perspective.

Summary:

The code changes in this pull request are focused on improving the security and reliability of the DefectDojo application's deployment using Helm. The key changes include the addition of a database migration checker, the use of Kubernetes Secrets to store sensitive information, and the implementation of security-related configurations such as session and CSRF cookie security, security contexts, and resource limits.

The database migration checker is a particularly noteworthy addition, as it helps ensure that the application's database is in a valid state before the main containers start. This can prevent issues related to database schema changes or migrations, which is an important security consideration.

Additionally, the use of Kubernetes Secrets to store sensitive information, such as database passwords and the application's secret key, follows security best practices and helps prevent the accidental exposure of sensitive data. The code also sets appropriate security-related configurations, such as session and CSRF cookie security, which can help mitigate common web application vulnerabilities.

Overall, the changes in this pull request appear to be focused on improving the security and reliability of the DefectDojo application's deployment, which is a positive step from an application security perspective.

Files Changed:

  1. helm/defectdojo/templates/_helpers.tpl:

    • Added a new template called "dbMigrationChecker" to define a container that periodically checks if the database has been migrated to the latest state.
    • The container retrieves the database password from a secret and sets the DD_DEBUG environment variable based on the django.uwsgi.enable_debug configuration.
    • The container can be configured with a security context and resource limits.
  2. helm/defectdojo/templates/celery-beat-deployment.yaml:

    • Included an optional dbMigrationChecker initContainer to check the database migration status before the main Celery Beat container starts.
    • Securely retrieved sensitive information, such as the Celery broker password, database password, and secret key, from Kubernetes secrets.
    • Specified a security context for the Celery Beat container, including running the container as a non-root user.
  3. helm/defectdojo/templates/celery-worker-deployment.yaml:

    • Introduced a database migration checker initContainer to ensure the database is in a valid state before the main containers start.
    • Included a cloudsql-proxy container to securely connect to a Cloud SQL database.
    • Updated the environment variables to use Kubernetes secrets for sensitive values, such as the Celery broker and database passwords.
    • Added annotations to the pod template to trigger pod restarts when the application's configuration changes.
  4. helm/defectdojo/templates/django-deployment.yaml:

    • Added an initContainer to perform database migration checks before the main containers start.
    • Used Kubernetes Secrets to store sensitive environment variables, such as the database password, Celery broker password, and the application's secret key.
    • Set the DD_SESSION_COOKIE_SECURE and DD_CSRF_COOKIE_SECURE environment variables to "True" if TLS is enabled.
    • Included liveness and readiness probes for the Django and Nginx containers.
    • Set security context configurations for the Django and Nginx containers, including running the containers as a non-root user.
  5. helm/defectdojo/values.yaml:

    • Reduced the initial delay for the liveness probe from 120 seconds to 3 seconds.
    • Enabled the database migration checker feature.
    • Allowed for the mounting of additional volumes, such as configuration files or certificates.
    • Provided sections for adding extra environment variables and secrets.

Powered by DryRun Security

@kiblik kiblik changed the title feat(startupProbe): Tune start-up process Jun 26, 2024
@kiblik kiblik force-pushed the startupProbe branch 3 times, most recently from f55d424 to 925dc4f Compare June 26, 2024 12:57
@kiblik kiblik marked this pull request as ready for review June 26, 2024 13:26
@fcecagno
Copy link
Contributor

I believe you forgot to add the startupProbe to the deployment. I was expecting to see the startupProbe just after https://github.com/DefectDojo/django-DefectDojo/blob/dev/helm/defectdojo/templates/django-deployment.yaml#L210-L223, along with default values for it.
As far as I understand, this PR tries to solve two different issues that are unrelated: probes and migration. I'd suggest to split it into two different PRs, this one looks good for migration, I'd just revert https://github.com/DefectDojo/django-DefectDojo/pull/10454/files#diff-62fee928ffe25d7f0d884b137db6f0ad7ffd1fb15d4fb217b922e3ef201f2e37L223-R223 and leave the whole probes modification to a new PR.

Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@kiblik
Copy link
Contributor Author

kiblik commented Jul 2, 2024

I believe you forgot to add the startupProbe to the deployment. I was expecting to see the startupProbe just after https://github.com/DefectDojo/django-DefectDojo/blob/dev/helm/defectdojo/templates/django-deployment.yaml#L210-L223, along with default values for it. As far as I understand, this PR tries to solve two different issues that are unrelated: probes and migration. I'd suggest to split it into two different PRs, this one looks good for migration, I'd just revert https://github.com/DefectDojo/django-DefectDojo/pull/10454/files#diff-62fee928ffe25d7f0d884b137db6f0ad7ffd1fb15d4fb217b922e3ef201f2e37L223-R223 and leave the whole probes modification to a new PR.

@fcecagno, thank you for your feedback.
TBH, originally, I planned to use startupProbe. However, during testing, I found out it is not the best approach in my opinion.
Based on my experience, until now, probes have been failing because the application has not been ready (mandatory migrations have not been finished).
The previous implementation was trying to "solve" it with 2 minutes long initial period. However, this is not the best (check the reasoning in the description of this PR)
Using startupProbe would be usable however it would be necessary to add it to both containers. I wanted to avoid starting redundant probes and using initContainer has the same outcome = liveness and readiness probes are not started until initContainers are not Completed.
But please tell me if you see it otherwise. Or if you have a different experience.

@fcecagno
Copy link
Contributor

fcecagno commented Jul 2, 2024

Hi @kiblik , thanks for your reply. IMO this PR improves the current state.

@mtesauro
Copy link
Contributor

@kiblik Got a new/different opinion of this one? Wanted to double check since we're getting close to 4 approvals.

@kiblik
Copy link
Contributor Author

kiblik commented Jul 12, 2024

@kiblik Got a new/different opinion of this one? Wanted to double check since we're getting close to 4 approvals.

I still see it as useful. I saw also @fcecagno's proposal #10506 and I believe that both of them are beneficial, they can coexist and help to improve smoother deployments.

Comment on lines +352 to +354
dbMigrationChecker:
enabled: true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this either at the top of values.yaml (somewhere near podLabels) or at the bottom (somewhere near extraConfigs). It's likely to be missed where it is right now sandwiched between the blocks configuring individual services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6 participants