# Graph API design

## Abstract

Kubernetes objects can reference each other.
References can be one to one or one to many (e.g. via label selectors).
These relationships can be represented in a graph.
We need an API that allows to load and query the graph.
Our first use case is a [dashboard](https://gitlab.com/groups/gitlab-org/-/epics/13963).

## Requirements

Functional:

- Load the graph of objects.
- Watch the graph for changes in real time. We want a dashboard that is updated as the changes happen in the cluster.
- Support both namespace-scoped and cluster-scoped objects.
- Support known (e.g. built-in and Flux) and unknown objects (e.g. defined via CRDs or future built-in).
- Support various ways for objects to refer to each other:
  - Field with referred object name. E.g. a `Deployment` referring to a `ConfigMap`.
  - Label selector. In practice label selectors are often used together with controller
    [owner references](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/)
    to resolve situations where multiple objects use overlapping selectors.
  - Owner references. Apart from the above, owner references can be used for
    [deletion ordering](https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion).
    They have extra flags: `controller` and `blockOwnerDeletion`
  - List references. Some objects (e.g. Flux `Kustomization` or `HelmRelease`'s `Secret`) store the list of
    managed objects in their `data` or `status` fields.

Non-functional:

- Watch implementation should be in agentk.
  We'll use Kubernetes watch API to watch objects in the cluster to deliver real time updates to the graph.
  A robust way to use the watch API is via informers.
  Informers can consume a lot of RAM since they load all objects in a namespace or even globally (can filter with label selectors).
  Because of that, we cannot implement this in kas as the API will need to handle an unknown and potentially large
  number of objects.
  An extra benefit of implementing the functionality in agentk is that it can filter objects out relatively cheaply,
  without sending anything to kas.
- Minimize RAM usage. For the same reasons as above, the implementation should minimize RAM usage.
  We don't know how much RAM agentk has available, and we want to minimize the chance of running out of RAM.
- Flexible API.
  It's impractical to make significant changes to an implementation in agentk since users don't update very often.
  Hence, since a considerable chunk of functionality will be in agentk,
  it should be made more flexible from the get-go so that it's somewhat future-proof.
- Kubernetes is an eventually consistent system. It's possible for an object to refer to a
  non-existent object. It may have been deleted or never existed in the first place.
  This graph API will expose the information as-is. API clients need to handle this gracefully, including situations
  where the referred object didn't exist but was created later (or the opposite - existed and then was deleted).
- Due to security consideration, contents of the `Secret` objects is never returned.
- The solution should scale to 5,000 resources in a tree. Customers shared concerns that the tree might fail
  to render for their clusters given the size of it.

## Background information

### Groups, versions, resources

From [Kubernetes API terminology](https://kubernetes.io/docs/reference/using-api/api-concepts/#standard-api-terminology)
note `resource` and `kind`.
From [Resource URIs](https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-uris)
note `group` and `version`.
We don't care about other things in this document.

In Kubernetes all resources are part of the following hierarchy.
Root of the hierarchy is an array of groups.
Each group has one or more versions.
This document refers to a particular version of a group as "group+version".
Each group+version contains an array of resources.
Note that in this structure it is possible that a resource may exist in group+version `v1` but be absent
in group+version `v2` (or vice versa).

```yaml
- mygroup1
  - v1
    - foos
    - abcs
  - v2beta1
    - bars
    - abcs
- mygroup2.mydomain.com
  - v2
    - foos
    - abcs
```

The `foos` resource exists in `mygroup1/v1` but does not in `mygroup2.mydomain.com/v2`.
`bars` is the other way around.

The `foos` resources from `mygroup2.mydomain.com/v2` is not the same thing as the `foos` from `mygroup1/v1`.
They are completely separate resources.
Hence, the full "identifier" of a resource is group+version+resource (otherwise there is ambiguity).

There is a concept of co-habitation that is apparently not documented anywhere.
It's a way to move a resource to another group without breaking existing consumers.
Here is the
[list of co-habitating resources](https://github.com/kubernetes/kubernetes/blob/v1.30.3/pkg/kubeapiserver/default_storage_factory_builder.go#L129-L136)
in Kubernetes v1.30.3, for example.
This information is not exposed via discovery, so we need to be careful not to expose the same thing multiple times
to avoid any user confusion.
In practice this mechanism was used to get rid of the `extensions` and `events` groups.
We can avoid any problems if we always exclude these two groups.

### Selectors

[Label selectors](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors).

[Field selectors](https://kubernetes.io/docs/concepts/overview/working-with-objects/field-selectors/).

[JSONPath Support](https://kubernetes.io/docs/reference/kubectl/jsonpath/).

### CEL

[Introduction](https://github.com/google/cel-spec/blob/master/doc/intro.md).

[Language Definition](https://github.com/google/cel-spec/blob/master/doc/langdef.md).

## Proposal

Expose a new WebSocket API that allows to configure which objects (group+version+resource) to watch, how to filter them and what information to return.
Information about resources (graph vertices) and references among them (graph arcs aka directed edges) is returned as a stream of
WebSocket messages.

Note that the resources graph is a directed graph.
Vertices are connected by arcs.
A graph may have cycles and loops (an arc that connects a vertex to itself).
Two vertices may be connected by more than one arc (directly or indirectly), but in that case arcs will be of different types.
If a resource has multiple references of the same type to another resource, they are considered a single arc.

Types of arcs:

- Owner reference (`t=or`). E.g. `Pod` has an owner reference to a `ReplicaSet`.
- Reference (`t=r`). E.g. a `Deployment` may mount a `ConfigMap`.
- Transitive reference (`t=t`). E.g. a `Deployment` can refer to a `Secret` but it doesn't directly use it.
  Hence, a transitive arc type is used.

A vertex may have multiple arcs of the same type, but they will be connected to distinct vertices.

Having loops is a pathological case but Kubernetes API allows for that.
For example, an owner reference to itself.

### Client request

API path is `/graph` on the Kubernetes API proxy endpoint.
It accepts a WebSocket upgrade request on this path.
WebSocket subprotocol `gitlab-agent-graph-api` is accepted here.

First WebSocket message the client sends configures what information is requested.
No more messages are expected.
Server closes the connection if another message is received.
The message is a text message with JSON payload:

```json
{
  "queries": [
    {
      "exclude": {
        "resource_selector_expression": "group == 'apps' && resource == 'pods'"
      }
    },
    {
      "include": {
        "resource_selector_expression": "group == 'apps' && version == 'v1' && resource == 'deployments'",
        "object": {
          "label_selector": "app=my-app",
          "field_selector": "",
          "object_selector_expression": "obj.status.ready == 'true'",
          "json_path": ""
        }
      }
    }
  ],
  "namespaces": {
    "names": ["my-ns"],
    "label_selector": "app=my-app",
    "field_selector": "",
    "object_selector_expression": ""
  },
  "roots": {
    "object_selector_expressions": [
      "group == 'apps' && resource == 'deployments' && namespace == 'myns' && name == 'my-deployment' && 'app' in labels"
    ],
    "ignore_arc_direction": ["or", "r", "t"]
  }
}
```

#### High-level algorithm

1. Discovery information is fetched.
   All group+version+resource triples are filtered through `queries` to select which group+version+resource to watch.

1. An array of namespaces is either provided or all namespaces are listed and watched.
   The list is optionally filtered down using a label selector and a CEL selector.

1. For each of the selected namespaces, a watch is established for all the selected group+version+resource triples.
   Watches filter objects using the corresponding `label_selector`, `field_selector`, and `object_selector_expression`.

1. Objects, that pass the filtering, are added to the graph.
   Object is only made visible to the client if it has a direct or transitive inbound arc from a root.

`roots` allows to dynamically select root objects out of the objects that passed informer-level filtering.

`json_path` allows to select parts of an object to be returned to the client.

#### `queries`

`queries` is a set of queries to select which groups+versions+resources to watch.
Required field, must have at least one element in the array.
Agentk gets the list of all groups and resources Kubernetes supports via the discovery API.
It then evaluates each one using the queries.
Queries are evaluated in the provided order, from the first to last one in the array.
First one that matches is selected for a particular group+version+resource.
Subresources (e.g. `deployments/scale`) are never evaluated.

`exclude` queries exclude the matched group+version+resource from further consideration.

`include` queries add the matched group+version+resource to the list of group+version+resource to watch.
Subject to version selection to pick a single version for each group+resource if multiple matched.

`include.resource_selector_expression` and `exclude.resource_selector_expression` are CEL expressions.
Query is a match if the expression evaluates to boolean `true`.
The expression must return a boolean. The following variables are available:

- `group` group of group+version+resource.
- `version` version of group+version+resource.
- `resource` resource of group+version+resource.
- `namespaced` scope of group+version+resource. Can be `bool` `true` or `false`.

`include.object.*` allows to filter objects from the matched group+version+resource using label selectors,
field selectors and a CEL expression in `object_selector_expression`.
The expression must return a boolean.
The following variables are available in `object_selector_expression`:

- `obj` is the Kubernetes object being evaluated.
- `group` group of the object.
- `version` version of the object.
- `resource` resource name of the object. E.g. `pods` for the `Pod` kind.
- `namespace` namespace of the object.
- `name` name of the object.
- `labels` labels of the object.
- `annotations` annotations of the object.

`include.object.json_path` allows to select parts of an object to be returned to the client.

#### `queries` - version selection

Version is called "stable" if it matches the `^v\d+$` regex e.g. `v1`. Otherwise, it's called "unstable".

Each group+version+resource is matched against the queries.
At most one version of a group+resource is ever selected to be watched.

The newest stable version is selected.
If there are no stable versions, the newest unstable version is selected.
Versions are sorted using natural order e.g. `v2` comes before `v10`.

#### `namespaces`

Allows to select namespaces to watch.
There are two modes of operation:

##### All namespaces

- `namespaces.label_selector` sets the label selector to filter namespaces.
  Optional field. No filtering is done if the field is an empty string or not specified.
- `namespaces.field_selector` sets the field selector to filter namespaces.
  Optional field. No filtering is done if the field is an empty string or not specified.
- `namespaces.object_selector_expression` allows to provide a CEL expression to further filter namespaces to watch.
  Optional field.

In this mode if no filtering is used, agent will establish cluster-wide watch for the selected group+version+resources.

##### List of namespaces

- `namespaces.names` is the list of namespaces to watch.
  Should have at least one namespace name specified.
- `namespaces.object_selector_expression` allows to provide a CEL expression to further filter namespaces to watch.
  Optional field.

#### `roots`

Allows to select a subgraph of resources that are connected to the provided roots.
Optional field.
Root selection is additive.
If no roots are specified, no root-based filtering is performed.
Object is only made visible to the client if it has a direct or transitive inbound arc from a root.

`roots.object_selector_expressions` is an array of CEL expressions that allow to select objects to be used as roots.
If an expression returns `true`, an object is selected as a root.
Each expression must evaluate to a boolean.
Empty or not specified means no root-based filtering is required.

It's an array to avoid huge expressions.
Expressions are evaluated in the provided order.
Hence, expressions, that are more likely to evaluate to `true`, should be listed earlier.

The following variables are available in `roots.object_selector_expressions`:

- `obj` is the Kubernetes object being evaluated.
- `group` group of the object.
- `version` version of the object.
- `resource` resource name of the object. E.g. `pods` for the `Pod` kind.
- `namespace` namespace of the object.
- `name` name of the object.
- `labels` labels of the object.
- `annotations` annotations of the object.

To ignore the direction of arcs (i.e. to treat them as undirected) when performing roots filtering,
provide `roots.ignore_arc_direction` with a list of arc types.

### Response to client

Server will send WebSocket text messages with JSON of the following format:

```json
{
  "actions": [],
  "warnings": [],
  "error": {}
}
```

`actions` may contain zero or more actions to alter the graph.
Actions must be carried out in the order they appear in the array.
There may be lots and lots of actions, so short field names are used to save bandwidth and reduce loading times.
Possible actions are listed below.

`warnings` may contain zero or more warning objects.
This field contains non-terminal errors i.e. warnings.

`error` is set when an error occurs and the error reason string is too long to fully include in the WebSocket close
frame. The error is still sent as part of the close frame, but truncated to the maximum allowed length.
Hence, when this field is set, use the error message from it, not from the close frame.
Message structure is as follows:

- `error.code` is the WebSocket status code as per [RFC 6455](https://datatracker.ietf.org/doc/html/rfc6455#section-7.4.1).
- `error.code_string` is the textual representation of the code.
- `error.reason` is the human-readable error reason.

Example:

```json
{
  "error": {
    "code": 1007,
    "code_string": "StatusInvalidFramePayloadData",
    "reason": "error description"
  }
}
```

#### Warnings

A warning object has these fields:

- `t`. One of the predefined types of warnings. This can be used to programmatically understand what happened.
- `m`. A free-form message to show to the human user.
- `a`. A type-specific set of attributes to programmatically understand what happened.

Known warning type (`t`) constants:

- `INFORMER_SYNC_FAILED`. Happens when an informer fails to sync within a timeout.
- `DISCOVERY_FAILED`. Happens when discovery API failed to return a full response.
- `NAMESPACE_LIST_FAILED`. Happens when namespace retrieval or processing failed.
- `OBJECT_PROCESSING_FAILED`. Happens when object retrieval or processing failed.
- `INTERNAL_ERROR`. Shouldn't really happen.

Attributes for `INFORMER_SYNC_FAILED`:

- `g` is the group.
- `v` is the version.
- `r` is the resource.
- `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.

Example:

```json
{
  "t": "INFORMER_SYNC_FAILED",
  "m": "Failed to sync informer for apps/v1/deployments in myns in 30 seconds. Check agent's log, agent's permissions",
  "a": {
    "g": "apps",
    "v": "v1",
    "r": "deployments",
    "ns": "myns"
  }
}
```

Attributes for `DISCOVERY_FAILED`: N/A

Attributes for `NAMESPACE_LIST_FAILED`:

- `ns` is the namespace.

Attributes for `OBJECT_PROCESSING_FAILED`:

- `g` is the group.
- `v` is the version.
- `r` is the resource.
- `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
- `n` is the name.

Attributes for `INTERNAL_ERROR`: N/A

#### Set a vertex in the graph

This action is used to add or update a vertex in the graph.

- `svx` stands for "set vertex".
- `vx` stands for "vertex".
  - `g` is the group.
  - `v` is the version.
  - `r` is the resource.
  - `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
  - `n` is the name.
- `o` is the contents of the resource.
  Only set if `j` is not set.
- `j` is the contents of the resource filtered by JSON path.
  `j` is an array because JSON path may select multiple nodes from the resource's JSON representation.
  It may be an empty array if JSON path didn't select anything.
  Only set if `json_path` was provided in the matched rule that included the resource.

At most one of `o` or `j` is set, never both.
None is set if both are empty (empty object/array).
`Secret` objects always have `o`/`j` set to an empty object/array.

```json
{
  "svx": {
    "vx": {
      "g": "apps",
      "v": "v1",
      "r": "deployments",
      "ns": "myns",
      "n": "my_deployment"
    },
    "o": {},
    "j": []
  }
}
```

#### Delete a vertex from the graph

- `dvx` stands for "delete vertex".
- `vx` stands for "vertex".
  - `g` is the group.
  - `v` is the version.
  - `r` is the resource.
  - `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
  - `n` is the name.

```json
{
  "dvx": {
    "vx": {
      "g": "apps",
      "v": "v1",
      "r": "deployments",
      "ns": "myns",
      "n": "my_deployment"
    }
  }
}
```

#### Set an arc in the graph

This action is used to add or update an arc in the graph.

- `sarc` stands for "set arc".
- `s` is object defining the "source" vertex.
  - `g` is the group.
  - `v` is the version.
  - `r` is the resource.
  - `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
  - `n` is the name.
- `d` is the object defining the "destination" vertex.
  - `g` is the group.
  - `v` is the version.
  - `r` is the resource.
  - `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
  - `n` is the name.
- `t` is the type of the arc. See the section "types of arcs" above.
- `a` is the attributes object. Keys are attribute names and values are attribute values. Omitted when empty.

Known attributes:

- For owner reference arcs (i.e. `t=or`):
  - `c` marks an owner reference as a controller reference. It's only set when it's `true`.
  - `b` marks an owner reference as blocking the deletion. It's only set when it's `true`.
- When an arc's destination vertex does not exist, or we don't know if it exists or not (e.g. informer is not up to date),
  `e` is set to `true`. Arcs are not emitted for GVRs that are not being watched.

Owner reference arc example:

```json
{
  "sarc": {
    "s": {
      "g": "",
      "v": "v1",
      "r": "pods",
      "ns": "myns",
      "n": "pod1"
    },
    "d": {
      "g": "apps",
      "v": "v1",
      "r": "deployments",
      "ns": "myns",
      "n": "my_deployment"
    },
    "t": "or",
    "a": {
      "c": true
    }
  }
}
```

#### Delete an arc from the graph

- `darc` stands for "delete arc".
- `s` is the object defining the "source" vertex.
  - `g` is the group.
  - `v` is the version.
  - `r` is the resource.
  - `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
  - `n` is the name.
- `d` is the object defining the "destination" vertex.
  - `g` is the group.
  - `v` is the version.
  - `r` is the resource.
  - `ns` is the namespace. Omitted when empty i.e. for cluster-scoped resources.
  - `n` is the name.
- `t` is the type of the arc. See the section "types of arcs" above.

```json
{
  "darc": {
    "s": {
      "g": "",
      "v": "v1",
      "r": "pods",
      "ns": "myns",
      "n": "pod1"
    },
    "d": {
      "g": "apps",
      "v": "v1",
      "r": "deployments",
      "ns": "myns",
      "n": "my_deployment"
    },
    "t": "or"
  }
}
```

### Error responses to client

- kas returns an error if it cannot find an agent supporting the new functionality.
  UI can then show a messages, asking to update the agent.

### kas to agentk API

The new functionality will need a new RPC exposed by agentk.

```protobuf
syntax = "proto3";

package gitlab.agent.kubernetes_api.rpc;

message Action {
  // Graph actions.
}

message Warning {
  // Details.
}

message WatchGraphRequest {
  // The parsed request that comes from the client.
}

message WatchGraphResponse {
  repeated Action actions = 1;
  repeated Warning warnings = 2;
}

service KubernetesApi {
  rpc WatchGraph(WatchGraphRequest) returns (stream WatchGraphResponse) {
  }
}
```
