Apigee Hybrid consists of control plane and runtime plane. The control plane is hosted on GCP and is accessible securely over the internet. The runtime plane can be hosted on any supported Kubernetes platform on GCP, AWS, Azure or on customers on-premises ( using Anthos, RKE, Openshift). The runtime plane needs consistent connectivity to the control plane. The objective of this blog is to provide information about impact encountered by Apigee hybrid runtime components when connectivity to the Apigee hybrid control plane is lost.
The runtime plane communicates with the control plane over internet. The below table describes the GCP URLs used for communications by the hybrid runtime plane:
Apigee Hybrid Component |
GCP URL Accessed |
Ingress |
NA |
Message Processor |
NA |
Synchronizer |
|
UDCA (Analytics) |
|
Apigee Connect |
|
Prometheus (metrics) |
|
fluentd (logging) |
|
MART |
|
Message Processor (Optional) |
|
Watcher |
The following image shows the ports used for external communications with the hybrid runtime plane:
The connection to the control plane may be lost for various reasons, including:
Synchronizer retrieves data from the control plane and stores it in Cassandra, which is a shared backend used by all synchronizers. After data replication in Cassandra, a zip file is created locally for use by message processors.
In a multi-region setup, Cassandra is shared by all synchronizers. To prevent redundant data downloads, only the first synchronizer to poll and discover that the data is unavailable locally will retrieve it from the control plane. Subsequent synchronizers will then download the data from Cassandra.
The configuration data downloaded by the Synchronizer includes:
{"level":"SEVERE","thread":"NIOThread@1","mdc":{},"className":"com.apigee.probe.ProbeAPI","method":"getResponse","severity":"SEVERE","message":"probe failed with details ProbeStatusResponse{isProbeSuccessful=false, failureMessages=[Probe ControlPlaneErrorMonitor failed due to Error in connecting to control plane more than 5 times consecutively.]}","formattedDate":"2024-05-15T05:54:17.724Z","logger":"ProbeAPI"} |
The apigee-runtime is responsible for processing incoming API requests, executing policies, and forwarding them to the appropriate target services. To carry out these tasks, the runtime interacts with the synchronizer and cassandra.
The apigee-runtime continuously polls the synchronizer to get the latest configuration containing proxies, resources, target servers, and related entities, such as trace data and encryption keys. The runtime data is stored in the Cassandra database.
The apigee-runtime configuration is set at the environment level, and each environment has one or more apigee-runtime pods depending on the number of replicas.
{"level":"SEVERE","thread":"Apigee-Timer-1","mdc":{"action":"RUNTIME-SYNC","env":"test1","org":"apigee-hybrid-378710"},"className":"com.apigee.hybrid.runtime.contract.load.sync.context.HttpContractDownloader","method":"lambda$download$0","severity":"SEVERE","message":"Failed to get version. Cause: Not Found [CONTEXT ratelimit_period=\"10 MINUTES [skipped: 13]\" ]","formattedDate":"2024-05-15T06:02:36.977Z","logger":"API-SECURITY-CONTRACT-REPLICATION"} {"level":"SEVERE","thread":"Apigee-Timer-1","mdc":{},"className":"com.apigee.threadpool.PollTask","method":"runTask","severity":"SEVERE","message":"Error during refresh [CONTEXT ratelimit_period=\"10 MINUTES [skipped: 13]\" ]","formattedDate":"2024-05-15T06:02:36.977Z","logger":"API-SECURITY-CONTRACT-REPLICATION","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = sync.replicators.DownloadError, message = Error downloading Version zip file : cause Not Found, associated contexts = []}\n"} |
The Universal Data Collection Agent (UDCA) is a service running in the runtime plane that extracts analytics, debug, and deployment status data and sends it to the UAP (Unified Analytics Platform) on GCP.
{"level":"error","ts":1715752630.1903248,"caller":"log/logger.go:85","msg":"Encountered http error while uploading file \"api.xxxx-xxxx.test1.MP-UDCA-CHANNEL_0\". Details: http error received with code = xxx for service = \"DATALOCATION\" with message = \"unable to generate signed url. details: {\\n \\\"error\\\": {\\n \\\"code\\\": xxx,\\n \\\"message\\\": \\\"xxx \\\\\\\"organizations/xxxx/environments/test1/datalocation\\\\\\\" (or it may not exist)\\\",\\n \\\"status\\\": \\\"xxx\\\"\\n }\\n}\\n\"","stacktrace":"edge-internal.git.corp.google.com/uap/aau/log.Errorf\n\t/go/src/edge-internal/uap/aau/log/logger.go:85\nedge-internal.git.corp.google.com/uap/aau/handler.(*handler).HandleError\n\t/go/src/edge-internal/uap/aau/handler/handler.go:132\nedge-internal.git.corp.google.com/uap/aau/handler.(*handler).Run\n\t/go/src/edge-internal/uap/aau/handler/handler.go:72"} {"level":"info","ts":1715752630.1903994,"caller":"log/logger.go:65","msg":"Reverting file \"/opt/apigee/data/api/staging/1715471290207.api.xxxxx.test1.65fef1f5-baa7-47f7-966e-f4cb5b0c86f1_0.gz\" to original name \"/opt/apigee/data/api/api.xxxxx.test1.MP-UDCA-CHANNEL_0.gz\""} {"level":"info","ts":1715752630.190494,"caller":"log/logger.go:65","msg":"Updating retry count for file \"api.apigee-xx-xx.test1.MP-UDCA-CHANNEL_0\" to 1"} |
All Apigee hybrid components deployed on K8s clusters expose an HTTP/HTTPS Prometheus endpoint that the Apigee metrics pods can scrape. Each application publishes their metrics in OpenCensus format and should support either one-way TLS or mTLS.
Apigee uses OpenTelemetry collector to scrape metrics from the Kubernetes pods over HTTP(s) and sends customer-facing metrics to the customer project.
OpenTelemetry collector sends following customer-facing metrics:
Monitored Resources |
Metric Name |
apigee.googleapis.com/Proxy |
apigee.googleapis.com/proxy/request_count |
apigee.googleapis.com/Proxy |
apigee.googleapis.com/proxy/response_count |
apigee.googleapis.com/Proxy |
apigee.googleapis.com/proxy/latencies |
apigee.googleapis.com/Target |
apigee.googleapis.com/target/request_count |
apigee.googleapis.com/Target |
apigee.googleapis.com/target/response_count |
apigee.googleapis.com/Target |
apigee.googleapis.com/target/latencies |
Apigee logging pods are deployed as a daemonset across the kubernetes clusters . The daemonset contains one fluentd container augmented with plugins for SD storage, prometheus and tailing files . The container is responsible for tailing log files from the container logs directory for all containers with apigee-* prefix.
The logs are then pushed to GCP Logging
Data that belongs to your Apigee organization and is accessed during runtime API calls are stored by Cassandra in the runtime plane.
This data includes:
To access and update that data for example, to add a new KVM or to remove an environment you can use the Apigee hybrid UI or the Apigee APIs.
The MART server (Management API for Runtime data) processes the API calls against the runtime datastore.
With Apigee Connect, the Apigee hybrid management plane can securely connect to the MART service in the runtime plane, eliminating the need to expose the MART endpoint on the internet.
Watcher is responsible for periodically executing tasks in the Apigee runtime k8s cluster. Tasks currently performed by watcher:
Apigee Hybrid is built for resilience, not isolation. While it can tolerate temporary disruptions, extended offline periods may impair its core functionality, especially the ability to deploy new or modified proxies. This highlights the need for robust network infrastructure and a focus on maintaining consistent connectivity. By understanding these details, organizations can better plan for and mitigate the risks associated with network disruptions.