Authors: Daniel Pacak @danielpacak, Zach Hill @zhill
Discussions:
- The first issue opened by @lizrice
- Initial working draft of pluggable scanning proposal and working group by @zhill
- Scanner Adapters architecture proposal by @danielpacak
- Pluggable Scanners PRD by Alex Xu
Add support to Harbor for using other image scanners than just Clair by replacing the current Clair-specific scanning job implementation with an adapter layer implemented as an HTTP API between Harbor and the scanners' native interfaces. This will provide runtime configurable scanner invocation to provide vulnerability scanning initially with the option for other types of scanning in the future.
Introduce:
- Scanner Adapter HTTP API (defined and maintained by Harbor)
- Core operations:
- Execute a scan (non-blocking)
- Retrieve a scan report (polling by Harbor)
- Describe the scanner’s capabilities, i.e. supported artifacts and reports
- Core operations:
- Scanner Adapter HTTP client in Harbor
- Scanner Adapter configuration management and persistence in the Harbor DB
The adapter interface is a well-defined REST API specified and maintained by Harbor. Harbor will have a client for the API, and manage configuration of the client. The configuration is primarily an endpoint registration, and multiple configurations will be supported concurrently in the system. This will allow user-selectable and configurable scanning of images at runtime with no restarts for Harbor required. Scanner adapters must implement the specified API, but the deployment and configuration of the adapter services themselves is out-of-scope for this proposal and Harbor itself is not responsible for management or deployment of the adapter services.
Based on discussions with the Harbor users and maintainers, there is a long-term desire to introduce flexible and configurable artifact scanning integration capabilities into Harbor with the aim of allowing Harbor admins and users to configure, at runtime, and on a per-project basis:
- The set of scans that should be executed on artifacts in the project
- Examples: vulnerability scan, software license scan, malware scan
- Artifacts to be scanned examples: images, helm charts, CPAN bundles
- The way that the results of the scans should be interpreted and combined to produce a final binary acceptance result for the artifact that can subsequently be used to optionally control user access to the artifact
- Persistence of the scan results in support of audit history
- Visualizations of some or all of the scan results for users directly in Harbor
Given that broader long-term objective, this proposal addresses the initial steps required to provide pluggable image scanning for vulnerabilities, while providing the framework to build upon towards the longer-term artifact scanning objective. Thus, this proposal can be considered phase 1 of a multi-phase work to achieve the above objectives.
Scope of this proposal:
- Introduce a stable adapter API that can be extended for scanning multiple types of artifacts and returning multiple report types.
- Introduce minimally impactful framework for configuring and executing scans against any scanner that implements the adapter API
- Scan docker containers for vulnerabilities
- Support configuration of multiple scanners with a single scanner at a time in use across all projects in Harbor (the "default" scanner)
- Avoid design and API decisions that would require breaking changes to add capabilities defined in the long-term objectives above
CLICK TO SHOW
The components responsible for submitting scan requests and fetching scan results from Clair are part of the Harbor’s deployment.
The diagram below shows the current workflow for scanning images with the detailed explanation beneath.
- A User requests a scan of selected image by clicking the Scan button.
- The system schedules a ClairJob for execution.
- The ClairJob pulls the manifest of the image from Registry.
- The ClairJob parses the manifest. For each image layer it creates an instance of
ClairLayer
structure, which is an internal representation of an image layer in Clair. EachClairLayer
has Name, Path, and Authorization header properties that allow Clair to pull the image thru Docker Registry v2 API exposed by Harbor. - The ClairJob pushes a slice of
ClairLayer
items to Clair API for the actual scanning. Note that this is a blocking operation. - The ClairJob pulls the scan result from Clair API.
- The ClairJob transforms scan result to the components overview model represented by the
ComponentsOverview
structure. TheComponentsOverview
model is understandable by the Harbor web console and simple policy checker. - The ClairJob saves the components overview to the Harbor DB. (It’s used later on to enforce simple policy rules,
e.g. preventing users from pulling an image which contains severe vulnerabilities.) The name of the parent
ClairLayers
, akaDetailsKey
, is stored in Harbor DB and used later on to fetch scan results. - The User clicks the Refresh button or the UI timer triggers the scan results refresh.
- The Repository API Handler calls the
VulnerabilitiesDetails
method of Repository API. - The Repository API downloads Clair scan result for the corresponding
DetailsKey
. - The
ClairScanResult
is transformed to the Harbor’s model, effectively a slice ofVulnerabilityItem
s so it can be rendered in the UI as a grid of security vulnerabilities.
To achieve the goal of runtime configurable and pluggable container image vulnerability scanning, the current Clair-specific logic will be abstracted out and replaced with a generic scanner configuration, selection, and invocation framework based on a simple HTTP API that Harbor will call on adapters that wrap the scanner implementations.
The proposed work focuses on building the adapter framework and leaves scan result interpretation, combination, and selection issues to future work by leveraging the existing lifecycle and invocation triggers for the scans themselves and only replacing the mechanisms for executing scans and persisting results. Current scan invocation triggers will remain:
- Users can set a scan schedule on some set of images in a project
- Users can explicitly request a scan
New Components:
- Scanner Adapter API - HTTP API defining the interface between Harbor and artifact scanners.
- Defined and maintained by Harbor
- Versioned
- REST-based
- Specified by OpenAPI/Swagger spec
- Authentication specifics are out-of-scope, but should be supported using the HTTP
Authorization
header
- Scanner Adapter - HTTP Service that implements the Scanner Adapter API and manages translation of the Scanner
Adapter API to/from native APIs/CLIs that scanners implement
- Deployed outside the system boundary of Harbor, not considered an internal component
- Implementations are out-of-tree of Harbor
- Has independent state management, configuration, and deployment lifecycle from Harbor
- Scanner Registry - An internal component logically responsible for managing the configurations for invoking a
scanner adapter
- Backed by persistence in the DB
- Basic CRUD semantics via new Harbor APIs
- Scanner Registration - A named configuration for invoking a scanner via its adapter
- Name - The name of the entry, must be globally unique in Harbor
- Description - A description of the scanner for human consumption
- Endpoint - The hostname and port to invoke the Adapter API calls against
- Authorization Configuration - The optional value to set in the HTTP
Authorization
header - NOTE: no plan for preventing duplicate entries based on endpoint etc as such constraints can be delegated to the system administrator’s discretion based on deployment specific requirements.
- Scanner Adapter Client - An HTTP client that takes a registration as configuration input and invokes Scanner Adapter API calls to initiate scans, retrieve results, query health, and refresh metadata from Scanner Adapter(s).
- Scan Job - The unit-of-work for executing and retrieving results for a scan of an artifact in the Harbor
registry.
- This may be extended to include an aggregation meta-job that defines and executes multiple Scan Jobs themselves and combines their results in a specific way
- Identified by a unique identifier, each job is unique
- Scan Report - the output of a Scan Job, the type and content of which depend on the adapter that was called by the Scan Job (e.g. image vulnerability scan vs license compliance scan will have different results with different schemas and MIME types)
- Scan Store - An internal component responsible for persisting Scan Reports.
- Only the latest Scan Report will be persisted in Harbor DB.
Updated Components:
- Scan Controller - The main coordinator of the whole scanning procedure.
The main focus of this proposal and the subsequent sections is the Scanner Adapter API design as it must be stable for Scanner Adapter implementers. Harbor-internal design is secondary as this can evolve independently of the Scanner Adapter API as long as the contract between the two is set.
The indirection introduced by the Scanner Adapter framework allows implementation of a generic Scan Job which in turn uses a generic Scanner Adapter Client. Both are agnostic to the underlying scanner internals or upgrades.
Such architecture is also extensible to triggering multiple Scanner Adapters for a single scan request. This would require
aggregating Scan Reports coming from each Scanner Adapter but is certainly doable without changing the contract
between Harbor service and Scanner Adapters. For example, there might be a ScanReportAggregate
struct which holds
a collection of ScanReport
s. Such an aggregate can be visualised in Harbor Web console as a tabbed pane, with each tab
corresponding to a scanner involved.
- A User requests a scan of selected Artifact by clicking the Scan button.
- The Harbor API forwards the request to the Scan Controller.
- The Scan Controller retrieves the default Scanner Registration from the Scanner Registry.
- The Scan Controller enqueues a Scan Job with params.
- The Scan Job instantiates the generic Scanner Adapter Client passing Registry Settings as constructor args.
- The Scan Job creates a ScanRequest.
- The Scan Job sends the ScanRequest to a Scanner Adapter, which in turn, forwards the request to the underlying Scanner.
- At this stage the Scanner pulls the Artifact to perform the actual scan for vulnerabilities and generate the report.
- The Scan Job blocks and periodically pulls the ScanReport from the Scanner Adapter until the report is available. Note the identifier of the ScanRequest generated by the Scanner Adapter and the MIME type passed to retrieve a ScanReport.
- The Scan Job emits a
ScanCompleted
event. - The
ScanCompleted
event is dispatched to the Event Handler. - The Event Handler saves the ScanReport along with its MIME type and the Artifact's digest to the Harbor DB.
- The User requests refresh of the vulnerabilities report for an Artifact.
- The Harbor API retrieves the ScanReport for the given Artifact from the Harbor DB thru Scan Store.
Scanners need access to the artifact data, images in this case. Scanners will retrieve data from Harbor for analysis using existing Harbor APIs and optionally credentials. The specific APIs used will vary by the artifact being analyzed, but for the initial image analysis, the image data is retrieved by the scanner using the Docker Registry v2 API exposed by Harbor. Necessary credentials are presented to the scanner by the scanner adapter as provided in the API call by Harbor during initiation of a scan.
Harbor can block image distribution based on severity of vulnerabilities found during scan. Since Clair is shipped with Harbor they're both deployed in the same network and communicate thru a private IP. Docker clients are supposed to access the registry thru external IP configured by ingress or load balancer.
$ kubectl describe ingress harbor-harbor-ingress
Name: harbor-harbor-ingress
Namespace: harbor
Address: 10.0.2.15
TLS:
harbor-harbor-ingress terminates core.harbor.domain
harbor-harbor-ingress terminates notary.harbor.domain
Rules:
Host Path Backends
---- ---- --------
core.harbor.domain
/ harbor-harbor-portal:80 (172.17.0.8:80)
/api/ harbor-harbor-core:80 (172.17.0.18:8080)
/service/ harbor-harbor-core:80 (172.17.0.18:8080)
/v2/ harbor-harbor-core:80 (172.17.0.18:8080)
/chartrepo/ harbor-harbor-core:80 (172.17.0.18:8080)
/c/ harbor-harbor-core:80 (172.17.0.18:8080)
notary.harbor.domain
/ harbor-harbor-notary-server:4443 (172.17.0.12:4443)
In the example, http://harbor-harbor-registry:5000 is the internal registry service endpoint accessible to Clair but not accessible to docker clients. The https://core.domain.harbor refers to the external IP exposed to docker clients.
In the proposal we stated that Scanner Adapters should be deployed outside the system boundary of Harbor. This will introduce a problem of accessing registry via external endpoint which might block pulling.
Harbor provides a JWT Bearer token to Clair on scan request. The token is generated in OAuth Client Credentials (with client_id and client_secret) flow and then passed directly to Clair in a HTTP POST request to scan a Clair Layer.
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiIsImtpZCI6IkJWM0Q6MkFWWjpVQjVaOktJQVA6SU5QTDo1RU42Ok40SjQ6Nk1XTzpEUktFOkJWUUs6M0ZKTDpQT1RMIn0.eyJpc3MiOiJhdXRoLmRvY2tlci5jb20iLCJzdWIiOiJCQ0NZOk9VNlo6UUVKNTpXTjJDOjJBVkM6WTdZRDpBM0xZOjQ1VVc6NE9HRDpLQUxMOkNOSjU6NUlVTCIsImF1ZCI6InJlZ2lzdHJ5LmRvY2tlci5jb20iLCJleHAiOjE0MTUzODczMTUsIm5iZiI6MTQxNTM4NzAxNSwiaWF0IjoxNDE1Mzg3MDE1LCJqdGkiOiJ0WUpDTzFjNmNueXk3a0FuMGM3cktQZ2JWMUgxYkZ3cyIsInNjb3BlIjoiamxoYXduOnJlcG9zaXRvcnk6c2FtYWxiYS9teS1hcHA6cHVzaCxwdWxsIGpsaGF3bjpuYW1lc3BhY2U6c2FtYWxiYTpwdWxsIn0.Y3zZSwaZPqy4y9oRBVRImZyv3m_S9XDHF1tWwN7mL52C_IiA73SJkWVNsvNqpJIn5h7A2F8biv_S2ppQ1lgkbw
Clair, on the other hand, is using the token to pull image layers from Harbor registry. It works because Clair
is using a standard http
library and sets a Authorization
header programmatically.
In order to enable Scanner Adapters to bypass Policy Check Interceptor, Harbor's authentication service will generate a dedicated JWT access token and hand it over to the underlying Scanner thru Scanner Adapter in a ScanRequest.
The existing OCI/Docker tooling does not support passing OAuth Bearer tokens directly in. They rely on username/passwords to initiate an OAuth password-grant flow to get a bearer token or an existing refresh token to get a new access token. This means that it will require work in the adapters (and likely in scanners themselves depending on their implementation and libs) to be able to consume a raw token as Harbor currently issues Clair. Clair uses a very raw HTTP client rather than a docker/image aware client so it can handle the tokens already.
One option discussed that made sense to us was to use the robot account mechanism to generate credentials that work with these common OCI/Docker tooling libraries to provide credentialed access to the image data. The lifecycle of the robot account credentials can be bound to the Scan Job so that a set of credentials is deleted when the Scan Job is ended (success or failure) to ensure the credentials are not long lived. Additionally, a modification is needed to ensure that the generated credentials have access to bypass the configured policy checks on the image that normal users are subject to if those checks are configured.
Scanner vendors are supposed to implement Scanner Adapters exposing the Scanner Adapter API as specified in Scanner Adapter v1.0 - OpenAPI Specification.
Note: OpenAPI spec yaml file can be opened in the online Swagger Editor.
- The deployment method is up to the vendor as long as the mounted API endpoint URL is accessible to Harbor services.
- For each ScanRequest a Scanner Adapter generates a unique identifier which is used to poll for the corresponding ScanReport.
- The lifetime of ScanRequest identifier returned by a Scanner Adapter is defined by the adapter. The TTL of the ScanReport and its identifier is long enough to support polling with reasonable timeouts.
- Scanner Adapters are not expected to persist scan reports forever, Harbor is supposed to cache at least the latest Scan Report.
- Scanner Adapters are not expected to make responses for the given artifact immutable, i.e. Scan Reports might change over time when new vulnerabilities are discovered.
- A Scan Job may get a 404 response status for a ScanRequest identifier and should treat it as failed and return a failure in the job. Harbor is expected to send a new Scan Request in that case.
- Scanner Adapter API leverages content negotiation by using MIME types in the
Accept
header to define schema of a result returned by GET/scan/{scan_request_id}/report
requests.
- Make sure that the Scanner Adapter has expected capabilities:
curl -H 'Accept: application/vnd.scanner.adapter.metadata+json; version=1.0" \ http://scanner-adapter:8080/api/v1/metadata Content-Type: application/vnd.scanner.adapter.scanner.metadata+json; version=1.0 Status: 200 OK { "scanner": { "name": "Microscanner", "vendor": "Aqua Security", "version": "3.0.5", }, "capabilities": [ { "consumes_mime_types": [ "application/vnd.oci.image.manifest.v1+json", "application/vnd.docker.distribution.manifest.v2+json" ], "produces_mime_types": [ "application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0", "application/vnd.scanner.adapter.vuln.report.raw" ] } ], "properties": { "harbor.scanner-adapter/scanner-type": "os-package-vulnerability", "harbor.scanner-adapter/vulnerability-database-updated-at": "2019-08-13T08:16:33.345Z" } }
- Submit the scan request
- Submit an invalid scan request:
curl http://scanner-adapter:8080/api/v1/scan \ -H 'Content-Type: application/vnd.scanner.adapter.scan.request+json; version=1.0' \ -d @- << EOF { "registry": { "url": "INVALID_REGISTRY_URL", "authorization": "Bearer JWTTOKENGOESHERE" }, "artifact": { "repository": "library/mongo", "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e" } } EOF Status: 422 Unprocessable Entity Content-Type: application/vnd.scanner.adapter.error+json; version=1.0' { "error": { "message": "invalid registry_url" } }
- Submit a valid scan request:
curl http://scanner-adapter:8080/api/v1/scan \ -H 'Content-Type: application/vnd.scanner.adapter.scan.request+json; version=1.0' \ -d @- << EOF { "registry": { "url": "harbor-harbor-registry:5000", "authorization": "Bearer: JWTTOKENGOESHERE" }, "artifact": { "repository": "library/mongo", "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e" } } EOF Status: 202 Accepted Content-Type: application/vnd.scanner.adapter.scan.response+json; version=1.0' { "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6" }
- Submit an invalid scan request:
- Try getting scan report (in unified Harbor format understandable by Harbor Web console):
curl -H 'Accept: application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0' \ http://scanner-adapter:8080/api/v1/scan/3fa85f64-5717-4562-b3fc-2c963f66afa6/report Retry-After: 15 Status: 302 Found
- Wait 15 seconds or use your own retry interval ...
- ... and try getting scan report again:
curl -H 'Accept: application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0' \ http://scanner-adapter:8080/api/v1/scan/3fa85f64-5717-4562-b3fc-2c963f66afa6/report Content-Type: application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0 Status: 200 OK { "generated_at": "2019-08-07T12:17:21.854Z", "artifact": { "repository": "library/mongo", "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e" }, "scanner": { "name": "Microscanner", "vendor": "Aqua Security", "version": "3.0.5", }, "severity": "High", "vulnerabilities": [ { "id": "CVE-2017-8283", "package": "dpkg", "version": "1.17.27", "fix_version": "1.18.0", "severity": "High", "description": "...", "links": [ "https://security-tracker.debian.org/tracker/CVE-2017-8283" ] }, ... ] }
- Alternatively we could request a proprietary vulnerability report (with an example report generated
by MicroScanner in JSON format):
curl -H 'Accept: application/vnd.scanner.adapter.vuln.report.raw' \ http://scanner-adapter:8080/api/v1/scan/3fa85f64-5717-4562-b3fc-2c963f66afa6/report Content-Type: application/vnd.scanner.adapter.vuln.report.raw Status: 200 OK { "scan_started": { "seconds": 1561386673, "nanos": 390482870 }, "scan_duration": 2, "digest": "b3c8bc6c39af8e8f18f5caf53eec3c6c4af60a1332d1736a0cd03e710388e9c8", "os": "debian", "version": "8", "resources": [ { "resource": { "format": "deb", "name": "apt", "version": "1.0.9.8.5", "arch": "amd64", "cpe": "pkg:/debian:8:apt:1.0.9.8.5", "name_hash": "583f72a833c7dfd63c03edba3776247a" }, "scanned": true, "vulnerabilities": [ { "name": "CVE-2011-3374", "vendor_score_version": "Aqua", "vendor_severity": "negligible", "vendor_statement": "Not exploitable in Debian, since no keyring URI is defined", "vendor_url": "https://security-tracker.debian.org/tracker/CVE-2011-3374", "classification": "..." } ] } ] }
- The
Accept
request header is required to indicate to Scanner Adapter an intended scan report format - If the client does not specify the
Accept
header it's assumed to be Harbor vulnerability report with the MIME typeapplication/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0
. - In phase 1 each Scanner Adapter should support at least the following artifact MIME types:
application/vnd.oci.image.manifest.v1+json
application/vnd.docker.distribution.manifest.v2+json
- In phase 1 each Scanner Adapter should support at least the following scan report MIME types:
application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0
- corresponds toHarborVulnerabilityReport
- fixed schema described in Scanner Adapter API spec
- can be parsed in type-safe manner and displayed in Harbor Web console.
application/vnd.scanner.adapter.vuln.report.raw
- corresponds to a raw scan report
- no fixed schema, documented by a scanner's vendor
- New scan report MIME types might be introduced without breaking the backward compatibility of the API and introducing new URL paths to the Scanner Adapter API spec.
- For example, there can be a vendor specific policy report returned by Anchore with the corresponding MIME type
application/vnd.anchore.policy.report+json; version=0.3
:[ { "sha256:57334c50959f26ce1ee025d08f136c2292c128f84e7b229d1b0da5dac89e9866": { "docker.io/alpine:latest": [ { "detail": {}, "last_evaluation": "2019-08-07T06:33:48Z", "policyId": "2c53a13c-1765-11e8-82ef-23527761d060", "status": "pass" } ] } } ]
Provides a management interface to Harbor API and Scan Controller to register Scanner Adapters and retrieve the configuration of the default Scanner Adapter to perform the actual scans.
package scanner
// RegistrationSettings consists of EndpointURL and optional Authorization properties to test connection
// with the corresponding Scanner Adapter.
type RegistrationSettings struct {
// A base URL of the Scanner Adapter.
EndpointURL string
// An optional value of the HTTP Authorization header sent with each request to the Scanner Adapter API.
Authorization string
// SkipCertVerify a flag indicating whether the client should verify the Scanner Adapter's certificate.
SkipCertVerify bool
}
// Registration represents a named configuration for invoking a scanner via its adapter.
type Registration struct {
// The unique identifier of this registration.
ID int64
// The name of this registration.
Name string
// An optional description of this registration.
Description string
// A flag indicating whether this registration is the default one.
IsDefault bool
// A flag indicating whether this registration is enabled or disabled.
IsEnabled bool
RegistrationSettings
}
// Registry defines methods for managing the configurations for invoking a Scanner Adapter.
type Registry interface {
// List returns a list of currently configured scanner registrations.
List() ([]*Registration, error)
// Create creates a new scanner registration with the given data.
// Returns the scanner registration identifier.
Create(registration *Registration) (int64, error)
// Get returns the details of the specified scanner registration.
Get(registrationID int64) (*Registration, error)
// Update updates the specified scanner registration.
Update(registration *Registration) error
// Delete deletes the specified scanner registration.
Delete(registrationID int64) (*Registration, error)
// SetAsDefault marks the specified scanner registration as default.
// The implementation is supposed to unset any registration previously set as default.
SetAsDefault(registrationID int64) error
// GetDefault returns the default scanner registration or `nil` if there are no registrations configured.
GetDefault() (*Registration, error)
// Ping pings Scanner Adapter to test EndpointURL and Authorization settings.
// The implementation is supposed to call the GetMetadata method on scanner.Client.
// Returns `nil` if connection succeeded, a non `nil` error otherwise.
Ping(settings *RegistrationSettings) error
}
The implementation of the scanner.Registry
interface will persist Registration
entities in the Harbor DB.
There will be a new scanner_registration
table and the corresponding
scanner_registration_id_seq
sequence.
create table scanner_registration
(
id serial not null constraint scanner_registration_pkey primary key,
name varchar(255) not null constraint scanner_registration_name_key unique,
description varchar(255),
endpoint_url varchar(255) not null,
"authorization" varchar(255),
skip_cert_verify boolean default false not null,
enabled_flag boolean default true not null,
default_flag boolean default false not null
);
create sequence scanner_registration_id_seq;
The Harbor API will be extended with paths pertinent to Scanner Registry management as specified in Harbor API for Scanner Registry management (DELTA) v1.10.
Note: Swagger 2.0 spec yaml file can be opened in the online Swagger Editor.
A Harbor admin can configure Scanner Registrations in Harbor web console. There is a dedicated tab named Scanners in the Configuration tabbed pane. By default the tab displays a list of currently configured Scanner Registrations.
- Only one Scanner Registration can be marked as default at the time.
- The admin can change the default Scanner Registration and the change will take effect only for subsequent scan requests.
- The default Scanner Registration is inherited by each Harbor project. This could be extended to configure multiple scanners per project but is not in scope of phase 1.
- The Scanners tab supports CRUD operations for Scanner Registrations.
When creating or updating a Scanner Registration a Harbor admin has to specify the following properties: Name,
(Optional) Description, Endpoint URL, and (Optional) Authorization header. The form allows the admin to test connection
params by making an HTTP GET request to the /api/v1/metadata
endpoint path of the Scanner Adapter being configured.
Given a Scanner Registration settings, it provides functions for accessing the Scanner Adapter API to the Scan Job. Transparently polls the API for getting the results back until the results are ready or an error occurs.
package scanner
// Metadata represents scanner metadata and capabilities.
type Metadata struct {
Scanner Scanner
Capabilities []Capability
// There might be some predefined properties that Harbor use, e.g. harbor.scanner-adapter/scanner-type
// harbor.scanner-adapter/vulnerability-database-updated-at.
Properties map[string]string
}
type Scanner struct {
Name string
Vendor string
Version string
}
// Capability consists of the set of recognized artifact MIME types and the set of scanner report MIME types.
// For example, a scanner capable of analyzing Docker images and producing a vulnerabilities report recognizable
// by Harbor web console might be represented with the following capability:
// - consumed MIME types:
// - `application/vnd.oci.image.manifest.v1+json`
// - `application/vnd.docker.distribution.manifest.v2+json`
// - produced MIME types:
// - `application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0`
type Capability struct {
consumesMIMETypes []string
producesMIMETypes []string
}
// Registry represents Registry connection settings.
type Registry struct {
// A base URL of the Docker Registry v2 API exposed by Harbor.
URL string
// An optional value of the HTTP Authorization header sent with each request to the Docker Registry v2 API.
// For example, `Bearer: JWTTOKENGOESHERE`.
Authorization string
}
// Artifact represents an artifact stored in Registry.
type Artifact struct {
// The name of a Harbor repository containing the artifact.
// For example, `library/oracle/nosql`.
Repository string
// The artifact's digest, consisting of an algorithm and hex portion.
// For example, `sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b`,
// represents sha256 based digest.
Digest string
// The MIME type of this artifact to distinguish Docker images from Helm 3 Charts or CNABs.
MimeType string
}
// ScanRequest represents a structure that is sent to a Scanner Adapter to initiate artifact scanning.
// Conducts all the details required to pull the artifact from a Harbor registry.
type ScanRequest struct {
// Connection settings for the Docker Registry v2 API exposed by Harbor.
Registry Registry
// Artifact to be scanned.
Artifact Artifact
}
type ScanResponse struct {
// The unique identifier generated by Scanner Adapter.
// Used to poll and fetch the corresponding Scan Reports.
ID string
}
// Represents a unified image vulnerabilities report that can be rendered in the Harbor web console.
// Corresponds to the `application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0` report MIME type.
type HarborVulnerabilityReport struct {
GeneratedAt time.Time
Artifact Artifact
Scanner Scanner
Severity Severity
Vulnerabilities []*VulnerabilityItem
}
// Client provides functions for accessing Scanner Adapter API.
type Client struct{}
// NewClient constructs a client with the given Scanner Registration settings,
// i.e. a combination of Scanner Adapter's endpoint URL and optional Authorization header.
func NewClient(settings *RegistrationSettings) (*Client, error) {
return nil, errors.New("not implemented")
}
// GetMetadata gets the scanner's metadata.
func (c *Client) GetMetadata() (*Metadata, error) {
return nil, errors.New("not implemented")
}
// SubmitScan initiates a scanning of the given artifact.
// Returns `nil` if the request was accepted, a non `nil` error otherwise.
func (c *Client) SubmitScan(req ScanRequest) (*ScanResponse, error) {
return nil, errors.New("not implemented")
}
// GetScanReport gets the scan result for the corresponding ScanRequest identifier.
// Note that this is a blocking method which either returns a non `nil` scan report or error.
// A caller is supposed to cast the returned interface{} to a structure that corresponds
// to the specified MIME type.
func (c *Client) GetScanReport(scanRequestID, reportMIMEType string) (interface{}, error) {
return nil, errors.New("not implemented")
}
Scan Controller is the top coordinator for the whole scanning procedure. It exposes several standard interface methods for the upper Harbor API layer to call. It's running in the core service of Harbor.
package scanner
type Options struct {
RegistryAuthorization string
}
// A scan reports aggregate. Might contain additional properties to track the status of the underlying Scan Job(s).
type ScanReportAggregate struct {
Reports []HarborVulnearbilityReport
}
// Controller for the scanning procedure
type Controller interface {
// Scan scans the given artifact.
//
// Arguments:
// artifact *Artifact : object includes the kind and metadata of the scanning artifact.
// options Options : options for scanning, such as access token of Harbor etc.
//
// Returns:
// error : non-nil error if any errors occurred
Scan(artifact *Artifact, options Options) error
// Get the scan report for the given artifact.
//
// Arguments:
// artifact *Artifact : the artifact to get the report for
//
// Returns:
// *ScanReportAggregate : the aggregated scanning result data
// error : non-nil error if any error occurred
GetScanReport(artifact *Artifact) (*ScanReportAggregate, error)
}
When scanning requests are distributed from the upper API layer, the Scan Controller will get the default Scanner Registration from the Scanner Registry and launch a Scan Job to handle the scanning workload.
There will be a unique scan identifier generated to track the status of the launched Scan Job. The scan identifier
will be used to get the results eventually returned by the Scanner Adapter. The result will have a unified format
suitable for rendering in Harbor web console, i.e. the client will explicitly specify the MIME type of requested
scan report in the Accept
header to application/vnd.scanner.adapter.vuln.report.harbor+json; version=1.0
.
The periodical trigger way can also be supported via the job service web hook. (A special job is submitted as a periodic job. When the tick is coming, there will be a hook event generated and published to the Hook Listener. It will raise a request to the Scan Controller to launch the scan).
For result storage, Harbor should store the ScanJob UUID, artifact Identifier (repo + digest of image or object), MIME type, timestamp, and text/json blob of the report returned (including its MIME type for marshalling). Any additional processing of the result can be done from that record (e.g. indexing with structured rows) and is considered future work. For this proposal, only the most recent result should be maintained, but future work can extend it to provide a scan history/audit as needed and independently of the scanner adapter implementations. This keeps the data model simple and supports future work for multiple report types that all can be persisted in the same database.
If considering the performance, there will probably be a Scan Store to persist the related scanning results in the database with a predefined model (full, partial or reference?; only latest copy or all of the passed copies?) for easily querying. Under this case, the Scan Controller can get the results from the Scan Store. The Scan Job needs to publish the results and related status changes to the Scan Store for updates and persistence.
- Initial multi-scanner support - Scanning each image with multiple scanners in a single scanning job (as perceived by
the end-user)
- The proposal allows for a natural extension of the concept to support multi-scanner support but does not propose to deliver it initially. Such support should be a new proposal based upon this work
- Supporting UI visualization of results beyond vulnerabilities
- Externalized Scanner Adapters - code out-of-tree and deployment out-of-band of Harbor deployment
- Advantages
- Harbor services protected from crashes/failures of adapters
- Independent release cycles
- No source license implications (or even requirement for open-source adapters)
- Development independent of Harbor processes, no burden on core maintainers
- Disadvantages
- Operational overhead for users to run more services as part of Harbor deployment
- Restricted API has limited data context from Harbor
- API management (versioning etc) required with no ability to update adapters to meet new API changes
- Requires more security considerations because it presents another external attack surface
- Advantages
- Harbor defined data model for scan results
- Advantages:
- Common schema needed to support UI visualization
- Can be augmented with scanner-specific data later that could be presented raw to users in UI to avoid per-scanner UI work
- Simpler for non-admins to understand results without detailed knowledge of scanners
- Disadvantages
- Lowest common denominator for scanners
- Harbor limited to consuming only things it understands, so must keep up with scanners to add new capabilities
- Advantages:
- <TODO: Preserving previous scan results?>
- <TODO: Existing Clair implementation vs require new Adapter for Clair? Upgrade implications>
The components overview in Harbor Web console shows the summary of os packages and their respective vulnerability severities. In particular it shows the number of os packages with no vulnerabilities.
The components overview will no longer display the number of non-vulnerable os packages as some scanners might not provide such information in a scan report.
Currently Harbor Web console shows the update timestamp of vulnerability database used by Clair.
The update timestamp will be displayed along with Scanner Registration properties and metadata.
- The API follows semantic versioning.
- API URLs are versioned by major number changes such as v1, v2, and v3. This number scheme signifies breaking changes to the API.
- API requests and responses use custom content MIME types that include the version, e.g.
application/vnd.scanner.adapter.scan.request+json; version=1.0
- A property added to the ScanRequest v1 with MIME type
application/vnd.scanner.adapter.scan.request+json; version=1.0
increments a point version toapplication/vnd.scanner.adapter.scan.request+json; version=1.1
. - Functionality added to the v1 API increments a point version, e.g. feature additions can increment API version to v1.1.
[A description of the steps in the implementation, who will do them, and when.]
- Adapter selection and configuration framework
- Adapter client implementation
- Anchore Engine Adapter
- Aqua MicroScanner Adapter
- Clair Adapter (?)