# Development guide for Remote Development

The `remote_development` module in the GitLab Agent manages the lifecycle of workspace resources in
Kubernetes clusters. Understanding how the module's components work together helps you develop and
troubleshoot workspace functionality.

The module consists of high-level architectural components that control the reconciliation process
between the GitLab Agent and the Rails service.

## High-level components

### Reconciler

The reconciler contains the core business logic for the Remote Development module. It's defined in
`reconciler.go` and implements the primary `Run` function that encompasses all tasks in one
reconciliation cycle between the GitLab Agent and the Rails service.

When you invoke `Run`, it's a blocking call that completes when the reconciliation cycle finishes.
For subsequent reconciliation cycles, you must invoke the same reconciler, or a new reconciler, again.

### Worker

The worker manages when different types of reconciliation cycles execute.
The reconciler handles the core logic, while the worker determines the timing.

Two types of reconciliation cycles exist:

- Partial reconciliation: Reuses application state in a pre-existing reconciler instance for the next reconciliation cycle.
  Partial reconciliation occurs frequently to keep data synchronized between the GitLab Agent and Rails service.

- Full reconciliation: Creates a new reconciler instance for the next reconciliation cycle.
  Full reconciliation completely rebuilds reconciler state and freshly synchronizes all workspace metadata for the associated cluster.
  Full reconciliation is considered expensive and occurs periodically but less frequently than partial reconciliation.

The worker coordinates execution of both reconciliation types based on their timing settings.
The main entrypoint is the `Run(context.Context)` method. When invoked, `Run` blocks while scheduling
and executing reconciliation cycles in the same goroutine.
The invocation returns when the worker stops, which happens when you cancel the context passed to `Run`.

### Module entrypoint

The module entrypoint in `module.go` maintains consistency with other GitLab Agent modules.
It monitors changes to the `remote_development` module in the agent configuration and starts the
worker to begin reconciliation when enabled. When the module is disabled, it stops the running worker.

The module could assimilate the worker's orchestration responsibilities, but these were kept distinct
for simplicity and clarity.

## Reconciler internals

The reconciler contains the core business logic of the reconciliation process.
A reconciliation cycle begins when `reconciler.Run` is invoked and follows this sequence:

1. [Collect metadata to exchange with Rails service](#collect-metadata-to-exchange-with-rails-service)
1. [Send collected metadata to Rails service and receive updates](#send-collected-metadata-to-rails-service-and-receive-updates)
1. [Apply changes to the cluster based on received metadata](#apply-changes-to-the-cluster-based-on-received-metadata)

### Collect metadata to exchange with Rails service

During this step, the following information is collected for exchange with Rails:

#### Workspace updates

This information comes from two components: an **informer** and a **persistedStateTracker**.

- Informer: Internally subscribes to Kubernetes events of type `Deployments` labeled with the agent ID.
  This allows the informer to serve as the single source of truth for the history of events relevant
  to workspaces hosted on the cluster that were created by a specific agent.
  However, this data grows over time, and communication between the GitLab Agent and Rails must be
  optimized to prevent resending information already persisted in the Rails service.

- `persistedStateTracker`: An internal store that tracks information already persisted in Rails using
  the response received from Rails service. By comparing the version of the Kubernetes resource available
  in the informer and persistedStateTracker, you can determine whether the information must be sent to Rails.

To collect information not yet persisted, all existing workspace data in the informer is compared
with workspace data in the persistedStateTracker. If the version in the persistedStateTracker doesn't
match the version returned by the informer, you can safely assume it corresponds to an update not yet
shared with the Rails instance.

#### Termination progress for relevant workspaces

When a workspace terminates successfully, it's removed from the informer, so the informer serves as
the source of truth for workspaces no longer present in the cluster.
However, the informer alone isn't sufficient because:

- The informer doesn't track workspaces in the middle of termination.
- The informer doesn't track workspaces for which termination was requested.

For these reasons, the reconciler uses another component called `terminationTracker`. When the agent
receives a request to terminate a workspace, it saves the request in `terminationTracker` along with
termination progress (initially `Terminating`). The `terminationTracker` is essentially a key-value
store using workspace identification details as keys and progress information as values.

During each subsequent reconciliation, workspaces tracked by `terminationTracker` are compared with
workspaces returned by the informer:

- If a workspace exists in both the informer and `terminationTracker`, the workspace is in the middle
  of termination.
- If a workspace exists only in the `terminationTracker`, the workspace terminated successfully and
  no longer exists in the cluster.

The `terminationTracker` also tracks whether the termination progress of a workspace was persisted in Rails.

Using the `terminationTracker` and informer, you can collect termination progress information for
workspaces not yet persisted in Rails.

{{< alert type="note" >}}

Even if a workspace exceeds its lifetime or time-to-stop, the current Rails implementation updates the
desired state to Terminated/Stopped only when it receives information from the agent for the corresponding workspace.
If there's no change in the workspace's current status in Kubernetes, no information is sent to Rails.
However, because the full reconciliation interval is 1 hour and the tracker is reconstructed,
sending information about all workspaces to Rails, this approach works effectively.

{{< /alert >}}

#### Errors encountered when managing workspaces

Managing workspaces in a Kubernetes cluster involves operations during reconciliation that might fail.
In some cases, errors are returned asynchronously, such as when applying Kubernetes manifests to a cluster.
To capture and return errors consistently, the reconciler uses the `errorDetailsTracker`.

The `errorDetailsTracker` is a key-value store that uses workspace identifiers as keys and error details as values.
Error details might be supplied asynchronously through channels.

During the first phase of reconciliation, a snapshot of error details captured so far prepares a
list of error details to send to the Rails service.

### Send collected metadata to Rails service and receive updates

After preparing the Rails request using information from the previous step, the reconciler calls
the `reconcile` API in the Rails service.

If the API call succeeds, all exchanged information is persisted in Rails. The API response might
contain data for additional actions carried out in the Rails service that need reflection in the
cluster, such as creating, stopping, or terminating a workspace.

The API call also returns settings that affect the reconciliation frequency of both full and
partial reconciliation. These settings are returned at the end of reconciliation so the worker can
adjust reconciliation frequency if required.

If the API call fails for any reason, the reconciliation process stops immediately and returns an
error to the worker.

### Apply changes to the cluster based on received metadata

Based on information received from the Rails service, the reconciler performs these operations:

#### Update the various trackers

- `persistedStateTracker`: Updated with the latest version persisted in Rails.
  This ensures the same information isn't repeatedly published to Rails after persistence.
- `errorDetailsTracker`: If error details were published to Rails successfully, they're evicted from the tracker.
- `terminationTracker`:
  - If the actual state of a workspace persisted in Rails is `Terminated`, the entry is evicted from
    both the `terminationTracker` and `persistedStateTracker`, as no further tracking is needed.
  - If the actual state isn't `Terminated` but the intent is to terminate the workspace, an entry is
    created in the `terminationTracker` with progress set to `Terminating` (if the workspace exists in the cluster)
    or `Terminated` (if the workspace no longer exists).
  - If the workspace exists in the cluster and the `Terminating` status was already persisted in Rails,
    the tracker is updated to prevent sending this information again in future partial reconciliations.

#### Update workspace resources in the cluster

If the desired state of the workspace is `Terminated`, the agent checks and deletes workspace resources
if required, along with other necessary changes to the various trackers.

For any other desired state, the agent attempts to apply the Kubernetes manifests received in the request
to the cluster. Then, the errorDetailsTracker is invoked to watch for asynchronous errors that might
result from these operations.

When tracking errors, you must provide a version to the tracker so it only tracks errors for the latest
operation performed on a workspace. This is useful when consecutive partial reconciliation cycles
perform different operations on a workspace, and it makes sense to return errors only for the latest operation.
The version must be monotonically increasing (the current implementation uses an atomic counter) to
ensure that errors for only the latest operation are tracked and eventually sent to the Rails instance.
