Skip to content

Latest commit

 

History

History
895 lines (752 loc) · 44 KB

pluggable-image-vulnerability-scanning_proposal.md

File metadata and controls

895 lines (752 loc) · 44 KB

Proposal: Pluggable Image Vulnerability Scanning

Authors: Daniel Pacak @danielpacak, Zach Hill @zhill

Discussions:

  1. The first issue opened by @lizrice
  2. Initial working draft of pluggable scanning proposal and working group by @zhill
  3. Scanner Adapters architecture proposal by @danielpacak
  4. Pluggable Scanners PRD by Alex Xu

TOC

Abstract

Add support to Harbor for using other image scanners than just Clair by replacing the current Clair-specific scanning job implementation with an adapter layer implemented as an HTTP API between Harbor and the scanners' native interfaces. This will provide runtime configurable scanner invocation to provide vulnerability scanning initially with the option for other types of scanning in the future.

Introduce:

  1. Scanner Adapter HTTP API (defined and maintained by Harbor)
    • Core operations:
      • Execute a scan (non-blocking)
      • Retrieve a scan report (polling by Harbor)
      • Describe the scanner’s capabilities, i.e. supported artifacts and reports
  2. Scanner Adapter HTTP client in Harbor
  3. Scanner Adapter configuration management and persistence in the Harbor DB

The adapter interface is a well-defined REST API specified and maintained by Harbor. Harbor will have a client for the API, and manage configuration of the client. The configuration is primarily an endpoint registration, and multiple configurations will be supported concurrently in the system. This will allow user-selectable and configurable scanning of images at runtime with no restarts for Harbor required. Scanner adapters must implement the specified API, but the deployment and configuration of the adapter services themselves is out-of-scope for this proposal and Harbor itself is not responsible for management or deployment of the adapter services.

Background

Current Understanding of Long-Term Harbor Interrogation-Service Objectives

Based on discussions with the Harbor users and maintainers, there is a long-term desire to introduce flexible and configurable artifact scanning integration capabilities into Harbor with the aim of allowing Harbor admins and users to configure, at runtime, and on a per-project basis:

  1. The set of scans that should be executed on artifacts in the project
    • Examples: vulnerability scan, software license scan, malware scan
    • Artifacts to be scanned examples: images, helm charts, CPAN bundles
  2. The way that the results of the scans should be interpreted and combined to produce a final binary acceptance result for the artifact that can subsequently be used to optionally control user access to the artifact
  3. Persistence of the scan results in support of audit history
  4. Visualizations of some or all of the scan results for users directly in Harbor

Alignment and Scope of this Proposal to Long-Term Objectives

Given that broader long-term objective, this proposal addresses the initial steps required to provide pluggable image scanning for vulnerabilities, while providing the framework to build upon towards the longer-term artifact scanning objective. Thus, this proposal can be considered phase 1 of a multi-phase work to achieve the above objectives.

Scope of this proposal:

  1. Introduce a stable adapter API that can be extended for scanning multiple types of artifacts and returning multiple report types.
  2. Introduce minimally impactful framework for configuring and executing scans against any scanner that implements the adapter API
  3. Scan docker containers for vulnerabilities
  4. Support configuration of multiple scanners with a single scanner at a time in use across all projects in Harbor (the "default" scanner)
  5. Avoid design and API decisions that would require breaking changes to add capabilities defined in the long-term objectives above

Image Scanning with Clair

CLICK TO SHOW

The components responsible for submitting scan requests and fetching scan results from Clair are part of the Harbor’s deployment.

harbor-clair-deployment

The diagram below shows the current workflow for scanning images with the detailed explanation beneath.

harbor-clair-sequence

  1. A User requests a scan of selected image by clicking the Scan button.
  2. The system schedules a ClairJob for execution.
  3. The ClairJob pulls the manifest of the image from Registry.
  4. The ClairJob parses the manifest. For each image layer it creates an instance of ClairLayer structure, which is an internal representation of an image layer in Clair. Each ClairLayer has Name, Path, and Authorization header properties that allow Clair to pull the image thru Docker Registry v2 API exposed by Harbor.
  5. The ClairJob pushes a slice of ClairLayer items to Clair API for the actual scanning. Note that this is a blocking operation.
  6. The ClairJob pulls the scan result from Clair API.
  7. The ClairJob transforms scan result to the components overview model represented by the ComponentsOverview structure. The ComponentsOverview model is understandable by the Harbor web console and simple policy checker.
  8. The ClairJob saves the components overview to the Harbor DB. (It’s used later on to enforce simple policy rules, e.g. preventing users from pulling an image which contains severe vulnerabilities.) The name of the parent ClairLayers, aka DetailsKey, is stored in Harbor DB and used later on to fetch scan results.
  9. The User clicks the Refresh button or the UI timer triggers the scan results refresh.
  10. The Repository API Handler calls the VulnerabilitiesDetails method of Repository API.
  11. The Repository API downloads Clair scan result for the corresponding DetailsKey.
  12. The ClairScanResult is transformed to the Harbor’s model, effectively a slice of VulnerabilityItems so it can be rendered in the UI as a grid of security vulnerabilities.

Proposal

To achieve the goal of runtime configurable and pluggable container image vulnerability scanning, the current Clair-specific logic will be abstracted out and replaced with a generic scanner configuration, selection, and invocation framework based on a simple HTTP API that Harbor will call on adapters that wrap the scanner implementations.

The proposed work focuses on building the adapter framework and leaves scan result interpretation, combination, and selection issues to future work by leveraging the existing lifecycle and invocation triggers for the scans themselves and only replacing the mechanisms for executing scans and persisting results. Current scan invocation triggers will remain:

  1. Users can set a scan schedule on some set of images in a project
  2. Users can explicitly request a scan

New Components:

  1. Scanner Adapter API - HTTP API defining the interface between Harbor and artifact scanners.
    1. Defined and maintained by Harbor
    2. Versioned
    3. REST-based
    4. Specified by OpenAPI/Swagger spec
    5. Authentication specifics are out-of-scope, but should be supported using the HTTP Authorization header
  2. Scanner Adapter - HTTP Service that implements the Scanner Adapter API and manages translation of the Scanner Adapter API to/from native APIs/CLIs that scanners implement
    1. Deployed outside the system boundary of Harbor, not considered an internal component
    2. Implementations are out-of-tree of Harbor
    3. Has independent state management, configuration, and deployment lifecycle from Harbor
  3. Scanner Registry - An internal component logically responsible for managing the configurations for invoking a scanner adapter
    1. Backed by persistence in the DB
    2. Basic CRUD semantics via new Harbor APIs
  4. Scanner Registration - A named configuration for invoking a scanner via its adapter
    1. Name - The name of the entry, must be globally unique in Harbor
    2. Description - A description of the scanner for human consumption
    3. Endpoint - The hostname and port to invoke the Adapter API calls against
    4. Authorization Configuration - The optional value to set in the HTTP Authorization header
    5. NOTE: no plan for preventing duplicate entries based on endpoint etc as such constraints can be delegated to the system administrator’s discretion based on deployment specific requirements.
  5. Scanner Adapter Client - An HTTP client that takes a registration as configuration input and invokes Scanner Adapter API calls to initiate scans, retrieve results, query health, and refresh metadata from Scanner Adapter(s).
  6. Scan Job - The unit-of-work for executing and retrieving results for a scan of an artifact in the Harbor registry.
    1. This may be extended to include an aggregation meta-job that defines and executes multiple Scan Jobs themselves and combines their results in a specific way
    2. Identified by a unique identifier, each job is unique
  7. Scan Report - the output of a Scan Job, the type and content of which depend on the adapter that was called by the Scan Job (e.g. image vulnerability scan vs license compliance scan will have different results with different schemas and MIME types)
  8. Scan Store - An internal component responsible for persisting Scan Reports.
    1. Only the latest Scan Report will be persisted in Harbor DB.

Updated Components:

  1. Scan Controller - The main coordinator of the whole scanning procedure.

The main focus of this proposal and the subsequent sections is the Scanner Adapter API design as it must be stable for Scanner Adapter implementers. Harbor-internal design is secondary as this can evolve independently of the Scanner Adapter API as long as the contract between the two is set.

Image Scanning with Scanner Adapter

The indirection introduced by the Scanner Adapter framework allows implementation of a generic Scan Job which in turn uses a generic Scanner Adapter Client. Both are agnostic to the underlying scanner internals or upgrades.

Such architecture is also extensible to triggering multiple Scanner Adapters for a single scan request. This would require aggregating Scan Reports coming from each Scanner Adapter but is certainly doable without changing the contract between Harbor service and Scanner Adapters. For example, there might be a ScanReportAggregate struct which holds a collection of ScanReports. Such an aggregate can be visualised in Harbor Web console as a tabbed pane, with each tab corresponding to a scanner involved.

harbor-scanner-adapter-deployment

harbor-scanner-adapter-sequence

  1. A User requests a scan of selected Artifact by clicking the Scan button.
  2. The Harbor API forwards the request to the Scan Controller.
  3. The Scan Controller retrieves the default Scanner Registration from the Scanner Registry.
  4. The Scan Controller enqueues a Scan Job with params.
  5. The Scan Job instantiates the generic Scanner Adapter Client passing Registry Settings as constructor args.
  6. The Scan Job creates a ScanRequest.
  7. The Scan Job sends the ScanRequest to a Scanner Adapter, which in turn, forwards the request to the underlying Scanner.
  8. At this stage the Scanner pulls the Artifact to perform the actual scan for vulnerabilities and generate the report.
  9. The Scan Job blocks and periodically pulls the ScanReport from the Scanner Adapter until the report is available. Note the identifier of the ScanRequest generated by the Scanner Adapter and the MIME type passed to retrieve a ScanReport.
  10. The Scan Job emits a ScanCompleted event.
  11. The ScanCompleted event is dispatched to the Event Handler.
  12. The Event Handler saves the ScanReport along with its MIME type and the Artifact's digest to the Harbor DB.
  13. The User requests refresh of the vulnerabilities report for an Artifact.
  14. The Harbor API retrieves the ScanReport for the given Artifact from the Harbor DB thru Scan Store.

Artifact Data Access

Scanners need access to the artifact data, images in this case. Scanners will retrieve data from Harbor for analysis using existing Harbor APIs and optionally credentials. The specific APIs used will vary by the artifact being analyzed, but for the initial image analysis, the image data is retrieved by the scanner using the Docker Registry v2 API exposed by Harbor. Necessary credentials are presented to the scanner by the scanner adapter as provided in the API call by Harbor during initiation of a scan.

Policy Check Interceptor

Harbor can block image distribution based on severity of vulnerabilities found during scan. Since Clair is shipped with Harbor they're both deployed in the same network and communicate thru a private IP. Docker clients are supposed to access the registry thru external IP configured by ingress or load balancer.

Harbor Clair Networking

$ kubectl describe ingress harbor-harbor-ingress
Name:             harbor-harbor-ingress
Namespace:        harbor
Address:          10.0.2.15
TLS:
  harbor-harbor-ingress terminates core.harbor.domain
  harbor-harbor-ingress terminates notary.harbor.domain
Rules:
  Host                  Path  Backends
  ----                  ----  --------
  core.harbor.domain
                        /             harbor-harbor-portal:80 (172.17.0.8:80)
                        /api/         harbor-harbor-core:80 (172.17.0.18:8080)
                        /service/     harbor-harbor-core:80 (172.17.0.18:8080)
                        /v2/          harbor-harbor-core:80 (172.17.0.18:8080)
                        /chartrepo/   harbor-harbor-core:80 (172.17.0.18:8080)
                        /c/           harbor-harbor-core:80 (172.17.0.18:8080)
  notary.harbor.domain
                        /   harbor-harbor-notary-server:4443 (172.17.0.12:4443)

In the example, http://harbor-harbor-registry:5000 is the internal registry service endpoint accessible to Clair but not accessible to docker clients. The https://core.domain.harbor refers to the external IP exposed to docker clients.

In the proposal we stated that Scanner Adapters should be deployed outside the system boundary of Harbor. This will introduce a problem of accessing registry via external endpoint which might block pulling.

OAuth 2 Bearer Tokens

Harbor provides a JWT Bearer token to Clair on scan request. The token is generated in OAuth Client Credentials (with client_id and client_secret) flow and then passed directly to Clair in a HTTP POST request to scan a Clair Layer.

Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IkJWM0Q6MkFWWjpVQjVaOktJQVA6SU5QTDo1RU42Ok40SjQ6Nk1XTzpEUktFOkJWUUs6M0ZKTDpQT1RMIn0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJCQ0NZOk9VNlo6UUVKNTpXTjJDOjJBVkM6WTdZRDpBM0xZOjQ1VVc6NE9HRDpLQUxMOkNOSjU6NUlVTCIsImF1ZCI6InJlZ2lzdHJ5LmRvY2tlci5jb20iLCJleHAiOjE0MTUzODczMTUsIm5iZiI6MTQxNTM4NzAxNSwiaWF0IjoxNDE1Mzg3MDE1LCJqdGkiOiJ0WUpDTzFjNmNueXk3a0FuMGM3cktQZ2JWMUgxYkZ3cyIsInNjb3BlIjoiamxoYXduOnJlcG9zaXRvcnk6c2FtYWxiYS9teS1hcHA6cHVzaCxwdWxsIGpsaGF3bjpuYW1lc3BhY2U6c2FtYWxiYTpwdWxsIn0.Y3zZSwaZPqy4y9oRBVRImZyv3m_S9XDHF1tWwN7mL52C_IiA73SJkWVNsvNqpJIn5h7A2F8biv_S2ppQ1lgkbw

Clair, on the other hand, is using the token to pull image layers from Harbor registry. It works because Clair is using a standard http library and sets a Authorization header programmatically.

In order to enable Scanner Adapters to bypass Policy Check Interceptor, Harbor's authentication service will generate a dedicated JWT access token and hand it over to the underlying Scanner thru Scanner Adapter in a ScanRequest.

Robot Accounts

The existing OCI/Docker tooling does not support passing OAuth Bearer tokens directly in. They rely on username/passwords to initiate an OAuth password-grant flow to get a bearer token or an existing refresh token to get a new access token. This means that it will require work in the adapters (and likely in scanners themselves depending on their implementation and libs) to be able to consume a raw token as Harbor currently issues Clair. Clair uses a very raw HTTP client rather than a docker/image aware client so it can handle the tokens already.

One option discussed that made sense to us was to use the robot account mechanism to generate credentials that work with these common OCI/Docker tooling libraries to provide credentialed access to the image data. The lifecycle of the robot account credentials can be bound to the Scan Job so that a set of credentials is deleted when the Scan Job is ended (success or failure) to ensure the credentials are not long lived. Additionally, a modification is needed to ensure that the generated credentials have access to bypass the configured policy checks on the image that normal users are subject to if those checks are configured.

Scanner Adapter API

Scanner vendors are supposed to implement Scanner Adapters exposing the Scanner Adapter API as specified in Scanner Adapter v1.0 - OpenAPI Specification.

Note: OpenAPI spec yaml file can be opened in the online Swagger Editor.

  • The deployment method is up to the vendor as long as the mounted API endpoint URL is accessible to Harbor services.
  • For each ScanRequest a Scanner Adapter generates a unique identifier which is used to poll for the corresponding ScanReport.
  • The lifetime of ScanRequest identifier returned by a Scanner Adapter is defined by the adapter. The TTL of the ScanReport and its identifier is long enough to support polling with reasonable timeouts.
  • Scanner Adapters are not expected to persist scan reports forever, Harbor is supposed to cache at least the latest Scan Report.
  • Scanner Adapters are not expected to make responses for the given artifact immutable, i.e. Scan Reports might change over time when new vulnerabilities are discovered.
  • A Scan Job may get a 404 response status for a ScanRequest identifier and should treat it as failed and return a failure in the job. Harbor is expected to send a new Scan Request in that case.
  • Scanner Adapter API leverages content negotiation by using MIME types in the Accept header to define schema of a result returned by GET /scan/{scan_request_id}/report requests.

Sample Interaction between Harbor and Scanner Adapter

  1. Make sure that the Scanner Adapter has expected capabilities:
    curl -H 'Accept: application/vnd.scanner.adapter.metadata+json; version=1.0" \
      http://scanner-adapter:8080/api/v1/metadata
    
    Content-Type: application/vnd.scanner.adapter.scanner.metadata+json; version=1.0
    Status: 200 OK
    
    {
      "scanner": {
        "name": "Microscanner",
        "vendor": "Aqua Security",
        "version": "3.0.5",
      },
      "capabilities": [
        {
          "consumes_mime_types": [
            "application/vnd.oci.image.manifest.v1+json",
            "application/vnd.docker.distribution.manifest.v2+json"
          ],
          "produces_mime_types": [
            "application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0",
            "application/vnd.scanner.adapter.vuln.report.raw"
          ]
        }
      ],
      "properties": {
        "harbor.scanner-adapter/scanner-type": "os-package-vulnerability",
        "harbor.scanner-adapter/vulnerability-database-updated-at": "2019-08-13T08:16:33.345Z"
      }
    }
    
  2. Submit the scan request
    1. Submit an invalid scan request:
      curl http://scanner-adapter:8080/api/v1/scan \
      -H 'Content-Type: application/vnd.scanner.adapter.scan.request+json; version=1.0' \
      -d @- << EOF
      {
        "registry": {
          "url": "INVALID_REGISTRY_URL",
          "authorization": "Bearer JWTTOKENGOESHERE"
        },
        "artifact": {
          "repository": "library/mongo",
          "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e"
        }
      }
      EOF
      
      Status: 422 Unprocessable Entity
      Content-Type: application/vnd.scanner.adapter.error+json; version=1.0'
      
      {
        "error": {
          "message": "invalid registry_url"
        }
      }
      
    2. Submit a valid scan request:
      curl http://scanner-adapter:8080/api/v1/scan \
      -H 'Content-Type: application/vnd.scanner.adapter.scan.request+json; version=1.0' \
      -d @- << EOF
      {
        "registry": {
          "url": "harbor-harbor-registry:5000",
          "authorization": "Bearer: JWTTOKENGOESHERE"
        },
        "artifact": {
          "repository": "library/mongo",
          "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e"
        }
      }
      EOF
      
      Status: 202 Accepted
      Content-Type: application/vnd.scanner.adapter.scan.response+json; version=1.0'
      
      {
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
      }
      
  3. Try getting scan report (in unified Harbor format understandable by Harbor Web console):
    curl -H 'Accept: application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0' \
      http://scanner-adapter:8080/api/v1/scan/3fa85f64-5717-4562-b3fc-2c963f66afa6/report
    
    Retry-After: 15
    Status: 302 Found
    
  4. Wait 15 seconds or use your own retry interval ...
  5. ... and try getting scan report again:
    curl -H 'Accept: application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0' \
      http://scanner-adapter:8080/api/v1/scan/3fa85f64-5717-4562-b3fc-2c963f66afa6/report
    
    Content-Type: application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0
    Status: 200 OK
    
    {
      "generated_at": "2019-08-07T12:17:21.854Z",
      "artifact": {
        "repository": "library/mongo",
        "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e"
      },
      "scanner": {
        "name": "Microscanner",
        "vendor": "Aqua Security",
        "version": "3.0.5",
      },
      "severity": "High",
      "vulnerabilities": [
        {
          "id": "CVE-2017-8283",
          "package": "dpkg",
          "version": "1.17.27",
          "fix_version": "1.18.0",
          "severity": "High",
          "description": "...",
          "links": [
            "https://security-tracker.debian.org/tracker/CVE-2017-8283"
          ]
        },
        ...
      ]
    }
    
  6. Alternatively we could request a proprietary vulnerability report (with an example report generated by MicroScanner in JSON format):
    curl -H 'Accept: application/vnd.scanner.adapter.vuln.report.raw' \
       http://scanner-adapter:8080/api/v1/scan/3fa85f64-5717-4562-b3fc-2c963f66afa6/report
    
    Content-Type: application/vnd.scanner.adapter.vuln.report.raw
    Status: 200 OK
    
    {
      "scan_started": {
        "seconds": 1561386673,
        "nanos": 390482870
      },
      "scan_duration": 2,
      "digest": "b3c8bc6c39af8e8f18f5caf53eec3c6c4af60a1332d1736a0cd03e710388e9c8",
      "os": "debian",
      "version": "8",
      "resources": [
        {
          "resource": {
            "format": "deb",
            "name": "apt",
            "version": "1.0.9.8.5",
            "arch": "amd64",
            "cpe": "pkg:/debian:8:apt:1.0.9.8.5",
            "name_hash": "583f72a833c7dfd63c03edba3776247a"
          },
          "scanned": true,
          "vulnerabilities": [
            {
              "name": "CVE-2011-3374",
              "vendor_score_version": "Aqua",
              "vendor_severity": "negligible",
              "vendor_statement": "Not exploitable in Debian, since no keyring URI is defined",
              "vendor_url": "https://security-tracker.debian.org/tracker/CVE-2011-3374",
              "classification": "..."
            }
          ]
        }
      ]
    }
    
  • The Accept request header is required to indicate to Scanner Adapter an intended scan report format
  • If the client does not specify the Accept header it's assumed to be Harbor vulnerability report with the MIME type application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0.
  • In phase 1 each Scanner Adapter should support at least the following artifact MIME types:
    • application/vnd.oci.image.manifest.v1+json
    • application/vnd.docker.distribution.manifest.v2+json
  • In phase 1 each Scanner Adapter should support at least the following scan report MIME types:
    • application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0 - corresponds to HarborVulnerabilityReport
      • fixed schema described in Scanner Adapter API spec
      • can be parsed in type-safe manner and displayed in Harbor Web console.
    • application/vnd.scanner.adapter.vuln.report.raw
      • corresponds to a raw scan report
      • no fixed schema, documented by a scanner's vendor
  • New scan report MIME types might be introduced without breaking the backward compatibility of the API and introducing new URL paths to the Scanner Adapter API spec.
  • For example, there can be a vendor specific policy report returned by Anchore with the corresponding MIME type application/vnd.anchore.policy.report+json; version=0.3:
    [
      {
        "sha256:57334c50959f26ce1ee025d08f136c2292c128f84e7b229d1b0da5dac89e9866": {
          "docker.io/alpine:latest": [
            {
              "detail": {},
              "last_evaluation": "2019-08-07T06:33:48Z",
              "policyId": "2c53a13c-1765-11e8-82ef-23527761d060",
              "status": "pass"
            }
          ]
        }
      }
    ]

Scanner Registry

Provides a management interface to Harbor API and Scan Controller to register Scanner Adapters and retrieve the configuration of the default Scanner Adapter to perform the actual scans.

package scanner

// RegistrationSettings consists of EndpointURL and optional Authorization properties to test connection
// with the corresponding Scanner Adapter.
type RegistrationSettings struct {
    // A base URL of the Scanner Adapter.
    EndpointURL    string
    // An optional value of the HTTP Authorization header sent with each request to the Scanner Adapter API.
    Authorization  string
    // SkipCertVerify a flag indicating whether the client should verify the Scanner Adapter's certificate.
    SkipCertVerify bool
}

// Registration represents a named configuration for invoking a scanner via its adapter.
type Registration  struct {
    // The unique identifier of this registration.
    ID             int64
    // The name of this registration.
    Name           string
    // An optional description of this registration.
    Description    string
    // A flag indicating whether this registration is the default one.
    IsDefault      bool
    // A flag indicating whether this registration is enabled or disabled.
    IsEnabled      bool

    RegistrationSettings
}

// Registry defines methods for managing the configurations for invoking a Scanner Adapter.
type Registry interface {
    // List returns a list of currently configured scanner registrations.
    List() ([]*Registration, error)
    // Create creates a new scanner registration with the given data.
    // Returns the scanner registration identifier.
    Create(registration *Registration) (int64, error)
    // Get returns the details of the specified scanner registration.
    Get(registrationID int64) (*Registration, error)
    // Update updates the specified scanner registration.
    Update(registration *Registration) error
    // Delete deletes the specified scanner registration.
    Delete(registrationID int64) (*Registration, error)
    // SetAsDefault marks the specified scanner registration as default.
    // The implementation is supposed to unset any registration previously set as default.
    SetAsDefault(registrationID int64) error
    // GetDefault returns the default scanner registration or `nil` if there are no registrations configured.
    GetDefault() (*Registration, error)
    // Ping pings Scanner Adapter to test EndpointURL and Authorization settings.
    // The implementation is supposed to call the GetMetadata method on scanner.Client.
    // Returns `nil` if connection succeeded, a non `nil` error otherwise.
    Ping(settings *RegistrationSettings) error
}

The implementation of the scanner.Registry interface will persist Registration entities in the Harbor DB. There will be a new scanner_registration table and the corresponding scanner_registration_id_seq sequence.

create table scanner_registration
(
  id serial        not null constraint scanner_registration_pkey primary key,
  name             varchar(255) not null constraint scanner_registration_name_key unique,
  description      varchar(255),
  endpoint_url     varchar(255) not null,
  "authorization"  varchar(255),
  skip_cert_verify boolean default false not null,
  enabled_flag     boolean default true not null,
  default_flag     boolean default false not null
);

create sequence scanner_registration_id_seq;

The Harbor API will be extended with paths pertinent to Scanner Registry management as specified in Harbor API for Scanner Registry management (DELTA) v1.10.

Note: Swagger 2.0 spec yaml file can be opened in the online Swagger Editor.

Configuration

scanner-registration-list

A Harbor admin can configure Scanner Registrations in Harbor web console. There is a dedicated tab named Scanners in the Configuration tabbed pane. By default the tab displays a list of currently configured Scanner Registrations.

  1. Only one Scanner Registration can be marked as default at the time.
  2. The admin can change the default Scanner Registration and the change will take effect only for subsequent scan requests.
  3. The default Scanner Registration is inherited by each Harbor project. This could be extended to configure multiple scanners per project but is not in scope of phase 1.
  4. The Scanners tab supports CRUD operations for Scanner Registrations.

scanner-registration-form

When creating or updating a Scanner Registration a Harbor admin has to specify the following properties: Name, (Optional) Description, Endpoint URL, and (Optional) Authorization header. The form allows the admin to test connection params by making an HTTP GET request to the /api/v1/metadata endpoint path of the Scanner Adapter being configured.

Scanner Adapter Client

Given a Scanner Registration settings, it provides functions for accessing the Scanner Adapter API to the Scan Job. Transparently polls the API for getting the results back until the results are ready or an error occurs.

package scanner

// Metadata represents scanner metadata and capabilities.
type Metadata struct {
    Scanner         Scanner
    Capabilities    []Capability
    // There might be some predefined properties that Harbor use, e.g. harbor.scanner-adapter/scanner-type
    // harbor.scanner-adapter/vulnerability-database-updated-at.
    Properties      map[string]string
}

type Scanner struct {
    Name    string
    Vendor  string
    Version string
}

// Capability consists of the set of recognized artifact MIME types and the set of scanner report MIME types.
// For example, a scanner capable of analyzing Docker images and producing a vulnerabilities report recognizable
// by Harbor web console might be represented with the following capability:
// - consumed MIME types:
//   - `application/vnd.oci.image.manifest.v1+json`
//   - `application/vnd.docker.distribution.manifest.v2+json`
// - produced MIME types:
//   - `application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0`
type Capability struct {
    consumesMIMETypes  []string
    producesMIMETypes  []string
}

// Registry represents Registry connection settings.
type Registry struct {
    // A base URL of the Docker Registry v2 API exposed by Harbor.
    URL           string
    // An optional value of the HTTP Authorization header sent with each request to the Docker Registry v2 API.
    // For example, `Bearer: JWTTOKENGOESHERE`.
    Authorization string
}

// Artifact represents an artifact stored in Registry.
type Artifact struct {
    // The name of a Harbor repository containing the artifact.
    // For example, `library/oracle/nosql`.
    Repository    string
    // The artifact's digest, consisting of an algorithm and hex portion.
    // For example, `sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b`,
    // represents sha256 based digest.
    Digest        string
    // The MIME type of this artifact to distinguish Docker images from Helm 3 Charts or CNABs.
    MimeType      string
}

// ScanRequest represents a structure that is sent to a Scanner Adapter to initiate artifact scanning.
// Conducts all the details required to pull the artifact from a Harbor registry.
type ScanRequest struct {
    // Connection settings for the Docker Registry v2 API exposed by Harbor.
    Registry Registry
    // Artifact to be scanned.
    Artifact Artifact
}

type ScanResponse struct {
    // The unique identifier generated by Scanner Adapter.
    // Used to poll and fetch the corresponding Scan Reports.
    ID  string
}

// Represents a unified image vulnerabilities report that can be rendered in the Harbor web console.
// Corresponds to the `application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0` report MIME type.
type HarborVulnerabilityReport struct {
    GeneratedAt     time.Time
    Artifact        Artifact
    Scanner         Scanner
    Severity        Severity
    Vulnerabilities []*VulnerabilityItem
}

// Client provides functions for accessing Scanner Adapter API.
type Client struct{}

// NewClient constructs a client with the given Scanner Registration settings,
// i.e. a combination of Scanner Adapter's endpoint URL and optional Authorization header.
func NewClient(settings *RegistrationSettings) (*Client, error) {
    return nil, errors.New("not implemented")
}

// GetMetadata gets the scanner's metadata.
func (c *Client) GetMetadata() (*Metadata, error) {
    return nil, errors.New("not implemented")
}

// SubmitScan initiates a scanning of the given artifact.
// Returns `nil` if the request was accepted, a non `nil` error otherwise.
func (c *Client) SubmitScan(req ScanRequest) (*ScanResponse, error) {
    return nil, errors.New("not implemented")
}

// GetScanReport gets the scan result for the corresponding ScanRequest identifier.
// Note that this is a blocking method which either returns a non `nil` scan report or error.
// A caller is supposed to cast the returned interface{} to a structure that corresponds
// to the specified MIME type.
func (c *Client) GetScanReport(scanRequestID, reportMIMEType string) (interface{}, error) {
    return nil, errors.New("not implemented")
}

Scan Controller

Scan Controller is the top coordinator for the whole scanning procedure. It exposes several standard interface methods for the upper Harbor API layer to call. It's running in the core service of Harbor.

package scanner

type Options struct {
    RegistryAuthorization string
}

// A scan reports aggregate. Might contain additional properties to track the status of the underlying Scan Job(s).
type ScanReportAggregate struct {
    Reports []HarborVulnearbilityReport
}

// Controller for the scanning procedure
type Controller interface {
    // Scan scans the given artifact.
    //
    //  Arguments:
    //    artifact *Artifact : object includes the kind and metadata of the scanning artifact.
    //    options  Options   : options for scanning, such as access token of Harbor etc.
    //
    //  Returns:
    //    error  : non-nil error if any errors occurred
    Scan(artifact *Artifact, options Options) error

    // Get the scan report for the given artifact.
    //
    //  Arguments:
    //    artifact *Artifact : the artifact to get the report for
    //
    //  Returns:
    //    *ScanReportAggregate : the aggregated scanning result data
    //    error                : non-nil error if any error occurred
    GetScanReport(artifact *Artifact) (*ScanReportAggregate, error)
}

When scanning requests are distributed from the upper API layer, the Scan Controller will get the default Scanner Registration from the Scanner Registry and launch a Scan Job to handle the scanning workload.

There will be a unique scan identifier generated to track the status of the launched Scan Job. The scan identifier will be used to get the results eventually returned by the Scanner Adapter. The result will have a unified format suitable for rendering in Harbor web console, i.e. the client will explicitly specify the MIME type of requested scan report in the Accept header to application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0.

The periodical trigger way can also be supported via the job service web hook. (A special job is submitted as a periodic job. When the tick is coming, there will be a hook event generated and published to the Hook Listener. It will raise a request to the Scan Controller to launch the scan).

Scan Store

For result storage, Harbor should store the ScanJob UUID, artifact Identifier (repo + digest of image or object), MIME type, timestamp, and text/json blob of the report returned (including its MIME type for marshalling). Any additional processing of the result can be done from that record (e.g. indexing with structured rows) and is considered future work. For this proposal, only the most recent result should be maintained, but future work can extend it to provide a scan history/audit as needed and independently of the scanner adapter implementations. This keeps the data model simple and supports future work for multiple report types that all can be persisted in the same database.

If considering the performance, there will probably be a Scan Store to persist the related scanning results in the database with a predefined model (full, partial or reference?; only latest copy or all of the passed copies?) for easily querying. Under this case, the Scan Controller can get the results from the Scan Store. The Scan Job needs to publish the results and related status changes to the Scan Store for updates and persistence.

Non-Goals

  1. Initial multi-scanner support - Scanning each image with multiple scanners in a single scanning job (as perceived by the end-user)
    1. The proposal allows for a natural extension of the concept to support multi-scanner support but does not propose to deliver it initially. Such support should be a new proposal based upon this work
  2. Supporting UI visualization of results beyond vulnerabilities

Rationale

  1. Externalized Scanner Adapters - code out-of-tree and deployment out-of-band of Harbor deployment
    1. Advantages
      1. Harbor services protected from crashes/failures of adapters
      2. Independent release cycles
      3. No source license implications (or even requirement for open-source adapters)
      4. Development independent of Harbor processes, no burden on core maintainers
    2. Disadvantages
      1. Operational overhead for users to run more services as part of Harbor deployment
      2. Restricted API has limited data context from Harbor
      3. API management (versioning etc) required with no ability to update adapters to meet new API changes
      4. Requires more security considerations because it presents another external attack surface
  2. Harbor defined data model for scan results
    1. Advantages:
      1. Common schema needed to support UI visualization
      2. Can be augmented with scanner-specific data later that could be presented raw to users in UI to avoid per-scanner UI work
      3. Simpler for non-admins to understand results without detailed knowledge of scanners
    2. Disadvantages
      1. Lowest common denominator for scanners
      2. Harbor limited to consuming only things it understands, so must keep up with scanners to add new capabilities

Compatibility

  • <TODO: Preserving previous scan results?>
  • <TODO: Existing Clair implementation vs require new Adapter for Clair? Upgrade implications>

Components Overview

The components overview in Harbor Web console shows the summary of os packages and their respective vulnerability severities. In particular it shows the number of os packages with no vulnerabilities.

Components Overview

The components overview will no longer display the number of non-vulnerable os packages as some scanners might not provide such information in a scan report.

Vulnerability Database Update Timestamp

Currently Harbor Web console shows the update timestamp of vulnerability database used by Clair.

Vulnerability Database Update Timestamp

The update timestamp will be displayed along with Scanner Registration properties and metadata.

Versioning Scanner Adapter API

  • The API follows semantic versioning.
  • API URLs are versioned by major number changes such as v1, v2, and v3. This number scheme signifies breaking changes to the API.
  • API requests and responses use custom content MIME types that include the version, e.g. application/vnd.scanner.adapter.scan.request+json; version=1.0
  • A property added to the ScanRequest v1 with MIME type application/vnd.scanner.adapter.scan.request+json; version=1.0 increments a point version to application/vnd.scanner.adapter.scan.request+json; version=1.1.
  • Functionality added to the v1 API increments a point version, e.g. feature additions can increment API version to v1.1.

Implementation

[A description of the steps in the implementation, who will do them, and when.]

Work Items

  1. Adapter selection and configuration framework
  2. Adapter client implementation
  3. Anchore Engine Adapter
  4. Aqua MicroScanner Adapter
  5. Clair Adapter (?)

Open issues (if applicable)