# Feature Flags

Historically, the GitLab Agent for Kubernetes and specifically in KAS
we used the boring approach of having environment variable based feature flags.
This has various limitations, like:

- requires a KAS configuration change to change a feature flag.
- requires a KAS restart for a changed feature flag to take affect.
- only allows KAS-wide feature flags.
- only allows global feature flags, no actor support
- only allows fully rolled out feature flags, no percentage-based rollouts.

This document as it is right now serves as a proposal for how to implement
a first iteration of improved feature flagging that addresses most of the
aforementioned limitations.
Once an implementation for this is in place, this document must be rephrased
to match that implementation.

## Goals

- feature flags in KAS.
- feature flags can be changed with the same feature flagging infrastructure
  as Rails uses (see [development guide](https://docs.gitlab.com/development/feature_flags/)).
- feature flags changes do not require a KAS restart.
- feature flags consider the actor (project or group).
- feature flags support percentage-based rollouts.

## Non-Goals

- feature flags evaluated in agents (e.g. `agentk`).
- feature flags with an actor scope. Agents don't have identities yet.
  Also see the [future iterations section](#future-iterations).

## Defining feature flags

- feature flags must be defined in Rails with the `gitlab_kas_` prefix.
- feature flags must be defined in KAS.

### Define feature flags in KAS

The feature flags must be defined in separate files in the `featureflag` package.

Follow this pattern for consistency:

```go
package featureflag

var (
    // ExampleFeature defines the feature flag for ...
    // Rollout issue: https://gitlab.com/gitlab-org/...
    ExampleFeature = NewFeatureFlag(
      // the name of the feature flag. Use all lower-case letters
      // and underscores to separate words.
      // The name here must match the definition in Rails without
      // the `kas_` prefix.
      "example_feature",
      // if the feature flag is enabled by default or not.
      false,
    )
)
```

## Retrieving feature flags

- feature flags should no be cached in KAS to properly support percentage-based rollouts.
  However, sometimes we still have to cache - in these cases, make sure the cache
  is relatively short (under 5 minutes) and is legitimate (e.g. for performance reasons).
- feature flags can be returned as part of KAS-specific REST API response headers.
  This applies to all REST API endpoints where it makes sense, no need to return them
  prematurely for all of them.
- feature flags can be sent as part of `rails -> kas` gRPC requests in their metadata section.

Depending on the context of where the feature flag should be retrieved,
a different method has to be used. Currently, there are two main methods to retrieve
feature flags in KAS.

### Retrieve from REST API response

In KAS you can retrieve a feature flag from REST API responses made from the
`gitlab` package. The endpoint functions in this package must be extended to
support feature flags when it becomes necessary. You can use the following pattern:

- define a wrapper `struct` around the API response object (usually defined in protobuf).
- by convention, the wrapper `struct` should contain a `Response` field to the actual
  protobuf response message and a `FeatureFlags` field
  mapping to `featureflag.ParsedFeatureFlagSet.`
- the `gitlab.WithResponseHeaderHandler` and `gitlab.WithFeatureFlags` option functions
  can be used when making the request.

An example is in the
[`AuthorizeProxyUser`](/internal/gitlab/api/authorize_proxy_user.go) endpoint:

```go
type AuthorizeProxyUserResponse struct {
	Response     *AuthorizeProxyUserAPIResponse
	FeatureFlags featureflag.Set
}

func AuthorizeProxyUser(ctx context.Context, client gitlab.ClientInterface, agentKey api.AgentKey, accessType AccessType, accessKey, csrfToken string, opts ...gitlab.DoOption) (*AuthorizeProxyUserResponse, error) {
	auth := &AuthorizeProxyUserAPIResponse{}
	s := featureflag.NewSet()
	err := client.Do(ctx,
		joinOpts(opts,
			gitlab.WithMethod(http.MethodPost),
			gitlab.WithPath(AuthorizeProxyUserAPIPath),
			gitlab.WithJWT(true),
			gitlab.WithProtoJSONRequestBody(&AuthorizeProxyUserAPIRequest{
				AgentId:    agentKey.ID,
				AccessType: string(accessType),
				AccessKey:  accessKey,
				CsrfToken:  csrfToken,
			}),
			gitlab.WithResponseHandler(gitlab.ProtoJSONResponseHandlerWithStructuredErrReason(auth)),
			gitlab.WithResponseHeaderHandler(gitlab.WithFeatureFlags(&s)),
		)...,
	)
	if err != nil {
		return nil, err
	}
	return &AuthorizeProxyUserResponse{
		Response:     auth,
		FeatureFlags: s,
	}, nil
}
```

The caller of this endpoint function can then easily check if
a specific feature flag is enabled:

```go
resp, _ := gapi.AuthorizeProxyUser(ctx, ...)
enabled := resp.FeatureFlags.IsEnabled(featureflag.ExampleFeature)
// ...
```

### Retrieving from incoming context in gRPC handlers

When Rails performs a gRPC request to KAS it includes KAS-related
feature flags in the metadata. This data can be retrieved from the context, like this:

```go
func (s *server) ListEnvironmentTemplates(ctx context.Context, r *rpc.ListEnvironmentTemplatesRequest) (*rpc.ListEnvironmentTemplatesResponse, error) {
  rpcAPI := modshared.RPCAPIFromContext[modserver.RPCAPI](ctx)
  enabled := rpcAPI.IsEnabled(featureflag.ExampleFeature)
  // ...
}
```

## Testing feature flags

- feature flags must be tested. That is, code that relies on a feature flag should
  enable both cases, when the feature flag is enabled and disabled.
- by default the test suite is ran with all feature flags enabled.

## Process

- each feature flag must have an associated rollout issue.
- the feature flag rollout should happen according to the rollout issue
  and communicated with the team in `#f_agent_for_kubernetes`.

## Future iterations

- consider adding an explicit endpoint to actively retrieve feature flags.
  This will be useful for feature flags that need to be known at KAS startup
  or feature flags for background processes or where KAS doesn't communicate
  with Rails at all.
- consider agent-based rollouts.
