Download Latest Version kubectl-ray_v1.4.2_darwin_amd64.tar.gz (34.2 MB)
Email in envelope

Get an email when there's a new version of KubeRay

Home / v1.4.0
Name Modified Size InfoDownloads / Week
Parent folder
kuberay_1.4.0_checksums.txt 2025-06-21 418 Bytes
kubectl-ray_v1.4.0_darwin_amd64.tar.gz 2025-06-21 34.2 MB
kubectl-ray_v1.4.0_darwin_arm64.tar.gz 2025-06-21 32.8 MB
kubectl-ray_v1.4.0_linux_amd64.tar.gz 2025-06-21 34.0 MB
kubectl-ray_v1.4.0_linux_arm64.tar.gz 2025-06-21 31.9 MB
README.md 2025-06-20 59.7 kB
v1.4.0 source code.tar.gz 2025-06-20 7.9 MB
v1.4.0 source code.zip 2025-06-20 8.3 MB
Totals: 8 Items   149.1 MB 0

Highlights

Enhanced Kubectl Plugin

KubeRay v1.4.0 introduces major improvements to the Kubectl Plugin:

  • Added a new scale command to scale worker groups in a RayCluster.
  • Extended the get command to support listing Ray nodes and worker groups.
  • Improved the create command:
  • Allows overriding default values in config files.
  • Supports additional fields such as Kubernetes labels and annotations, node selectors, ephemeral storage, ray start parameters, TPUs, autoscaler version, and more.

See Using the Kubectl Plugin (beta) and ray-project/ray#53886 (link will be updated to the docs site after merging) for more details.

KubeRay Dashboard (alpha)

Starting from v1.4.0, you can use the open source dashboard UI for KubeRay. This component is still experimental and not considered ready for production, but feedback is welcome.

KubeRay dashboard is a web-based UI that allows you to view and manage KubeRay resources running on your Kubernetes cluster. It's different from the Ray dashboard, which is a part of the Ray cluster itself. The KubeRay dashboard provides a centralized view of all KubeRay resources.

See ray-project/ray#53830 for more information. (The link will be replaced to doc website after the PR being merged)

Integration with kubernetes-sigs/scheduler-plugins

Starting with v1.4.0, KubeRay integrates one more scheduler kubernetes-sigs/scheduler-plugins to support gang scheduling for RayCluster resources. Currently, only single scheduler mode is supported.

See KubeRay integration with scheduler plugins for details.

KubeRay APIServer V2 (alpha)

The new APIServer v2 provides an HTTP proxy interface compatible with the Kubernetes API. It enables users to manage Ray resources using standard Kubernetes clients.

Key features:

  • Full compatibility with Kubernetes OpenAPI Spec and CRDs.
  • Available as a Go library for building custom proxies with pluggable HTTP middleware.

APIServer v1 is now in maintenance mode and will no longer receive new features. v2 is still in alpha. Contributions and feedback are encouraged.

Service Level Indicator (SLI) Metrics

KubeRay now includes SLI metrics to help monitor the state and performance of KubeRay resources.

See KubeRay Metrics Reference for details.

Breaking Changes

Default to Non-Login Bash Shell

Prior to v1.4.0, KubeRay ran most commands using a login shell. Starting from v1.4.0, the default shell is a non-login Bash shell. You can temporarily revert to login shell behavior using the ENABLE_LOGIN_SHELL environment variable, but using login shell is not recommended and this environment variable will be removed in the future release. (#3679)

If you encounter any issues with the new default behavior, please report in [#3822] and don't open new issues.

Resource Name Changes and Length Validation

Before v1.4.0, KubeRay silently truncated resource names if they are too long to fit the 63-character limitation for Kubernetes. Starting from v1.4.0, we don't implicitly truncate resource names anymore. Instead, we emit an invalid spec event if the names are too long. (#3083)

We also shortened some of the resource names to loosen the length limitation. The following changes are made:

  • The suffix of headless service for RayCluster changes from headless-worker-svc to headless. (#3101)
  • The suffix of RayCluster name changes from -raycluster-xxxxx to -xxxxx (#3102)
  • The suffix of the head pod for RayCluster changes from -head-xxxxx to -head (#3028)

Updated Autoscaler v2 configuration

Starting from v1.4.0, autoscaler v2 is now configured using:

:::yaml
spec:
  autoscalerOptions:
    version: v2

You should not use the old RAY_enable_autoscaler_v2 environment variable.

See Autoscaler v2 Configuration for guidance.

Changelog

  • [Release] Update KubeRay version references for 1.4.0 (#3816, @MortalHappiness)
  • [kubeclt-plugin] fix get cluster all namespace (#3809, @fscnick)
  • [Docs] Add kubectl plugin create cluster sample yaml config files (#3804, @MortalHappiness)
  • [Helm Chart] Set honorLabel of serviceMonitor to true (#3805, @owenowenisme)
  • [Metrics] Remove serviceMonitor.yaml (#3795, @owenowenisme)
  • [Chore][Sample-yaml] Upgrade pytorch-lightning to 1.8.5 for ray-job.pytorch-distributed-training.yaml (#3796, @MortalHappiness)
  • Use ImplementationSpecific in ray-cluster.separate-ingress.yaml (#3781, @troychiu)
  • Remove vLLM examples in favor of Ray Serve LLM (#3786, @kevin85421)
  • Update update-ray-job.kueue-toy-sample.yaml (#3782, @troychiu)
  • [Feat] Add e2e test for applying ray-job.interactive-mode.yaml (#3779, @CheyuWu)
  • [Release] Update KubeRay version references for 1.4.0-rc.2 (#3784, @MortalHappiness)
  • [Doc][Fix] correct the indention of storageClass in ray-cluster.persistent-redis.yaml (#3780, @rueian)
  • [doc] Improve APIServer v2 doc (#3773, @kevin85421)
  • [Release] Reset ray-operator version in root go.mod to v0.0.0 (#3774, @MortalHappiness)
  • Revert "Fix issue where unescaped semicolons caused task execution failures. (#3691)" (#3771, @MortalHappiness)
  • support scheduler plugins (#3612, @KunWuLuan)
  • Added Ray-Serve Config For LLMs (#3517, @Blaze-DSP)
  • [Release] Fix helm chart tag missing "v" prefix and release rc1 (#3757, @MortalHappiness)
  • [Release] Update KubeRay version references for 1.4.0-rc.0 (#3698, @MortalHappiness)
  • Improve Grafana Dashboard (#3734, @troychiu)
  • [Fix][CI] Fix ray operator image build error by setting up docker buildx (#3750, @MortalHappiness)
  • [Test][Autoscaler] deflaky unexpected dead actors in tests by setting max_restarts=-1 (#3700, @rueian)
  • add go.mod for operator (#3735, @troychiu)
  • [fix][operator] RayJob.Status.RayJobStatusInfo.EndTime nil deref error (#3742, @davidxia)
  • [operator] fix TPU multi-host RayJob and RayCluster samples (#3733, @davidxia)
  • [chore] upgrade Ray to 2.46.0 in remaining places (#3724, @davidxia)
  • chore: run yamlft pre-commit hook (#3729, @davidxia)
  • [Grafana] Update Grafana dashboard (#3726, @win5923)
  • [Test][Autoscaler] deflaky autoscaler idle timeout e2e tests by a longer timeout (#3727, @rueian)
  • [Chore] Upgrade Ray to 2.46.0 follow-up (#3722, @MortalHappiness)
  • [doc] Update API server v1 doc (#3723, @kevin85421)
  • feat: upgrade to Ray 2.46.0 (#3547, @davidxia)
  • [Test][Autoscaler] deflaky unexpected dead actors in tests by higher resource requests (#3707, @rueian)
  • [Doc] add ray cluster uv sample yaml (#3720, @fscnick)
  • [apiserver] Use ClusterIP instead of NodePort for KubeRay API server service (#3708, @machichima)
  • Bump next from 15.2.3 to 15.2.4 in /dashboard (#3709, @dependabot[bot])
  • [Feat][apiserver] Support CORS config (#3711, @MortalHappiness)
  • Add kuberay operator servicemonitor (#3717, @troychiu)
  • [CI] Split Autoscaler e2e tests into 2 buildkite runners (#3715, @kevin85421)
  • Add Grafana Dashboard for KubeRay Operator (#3676, @win5923)
  • [Fix][Release] Fix KubeRay dahsboard image build pipeline (#3702, @MortalHappiness)
  • Fix issue where unescaped semicolons caused task execution failures. (#3691, @xianlubird)
  • [refactor] Refactor enable login shell (#3704, @kevin85421)
  • [chore] Update user to kuberay instead of a contributor's name (#3706, @kevin85421)
  • [DOCS] Apiserver improve docs readability (#3564, @machichima)
  • [Ray-operator] Feature flag login bash (#3679, @fscnick)
  • [Grafana] Add flag for enabling auto load dashboards (#3689, @owenowenisme)
  • [Doc] Fix broken link in documentation (#3697, @nadongjun)
  • [kubectl-plugin] Generate submission_id in job_submit.go (#3693, @LeoLiao123)
  • [Doc] Update README (#3695, @kevin85421)
  • [ray-operator][Bug] Rayjob is Failed or Succeed, but Raycluster status(jobDeploymentStatus) is still Running(#3553) (#3642, @dushulin)
  • [API Server] consolidate e2e test (#3674, @troychiu)
  • [CI] Fix autoscaler e2e test flakiness caused by timeout (#3668, @nadongjun)
  • [Test][Autoscaler] Add an E2E test for placement groups (#3687, @rueian)
  • [Feature] [kubectl-plugin] Expose setting shutdownAfterJobFinishes and ttlSecondsAfterFinished in ray job submit (#3627, @CheyuWu)
  • [API Server] Add v2 related helm (#3677, @troychiu)
  • [apiserver] Make local-e2e-test hermetic (#3513, @troychiu)
  • [docs] Remove unused docs (#3684, @kevin85421)
  • [docs] Remove unused docs (#3683, @kevin85421)
  • [Autoscaler] Improve TestRayClusterAutoscalerAddNewWorkerGroup (#3682, @kevin85421)
  • [Chore] Add kubectl plugin and dashboard to components in issue template (#3678, @MortalHappiness)
  • [Prometheus] Add serviceMonitor for KubeRay Operator (#3530, @win5923)
  • [Test][Autoscaler] Add an E2E test for adding a new worker group (#3680, @kenmcheng)
  • [RayService] don't update serveConfigV2 in current ray cluster if ray… (#3559, @fscnick)
  • [Metric] kuberay_job_deployment_status (#3656, @troychiu)
  • [Fix][kubectl-plugin] Remove controller-runtime logger warning in kubectl ray job submit (#3669, @EagleLo)
  • [RayJob] Add RayJobInfo to RayJob CRD status (#3673, @kevin85421)
  • [SLI-Metric] kuberay_service_condition_upgrade_in_progress (#3663, @owenowenisme)
  • [Test][Autoscaler] Add E2E test for ray.autoscaler.sdk.request_resources (#3649, @nadongjun)
  • [Hotfix] Extend Autoscaler e2e tests timeout (#3665, @kevin85421)
  • [SLI-Metrics] kuberay_service_ready (#3577, @owenowenisme)
  • test: reduce requests in sample ray service yaml config (#3636, @pawelpaszki)
  • [Feature][Ray-operator] Improve RayJob validation for shutdownAfterJobFinishes and ttlSecondsAfterFinished (#3653, @CheyuWu)
  • [Grafana] Allow auto-load dashboard jsons (#3643, @owenowenisme)
  • [Test][Autoscaler] Add an E2E test for not removing idle nodes required by an upcoming placement group (#3647, @rueian)
  • [Test][Autoscaler] Add an E2E test for updating maxReplicas on a worker group (#3623, @machichima)
  • [apiserver] Start apiserver v2 in apiserver/cmd/main.go (#3603, @troychiu)
  • doc: mention kubectl plugin in README (#3652, @davidxia)
  • [SLI Metrics] Add metric kuberay_cluster_condition_provisioned (#3635, @win5923)
  • Single go.mod file (#3640, @troychiu)
  • [kubectl-plugin] Support node selectors for kubectl ray job submit (#3562, @CheyuWu)
  • [CI] fix missing Go module release step (#3644, @davidxia)
  • refactor tests: use testify instead of Fatal everywhere (#3600, @davidxia)
  • [Test][Autoscaler] Add an E2E test for CPU tasks on GPU nodes. (#3629, @LeoLiao123)
  • [Doc][CI] Align K8s version in Doc and CI with minimal required version (#3628, @kenchung285)
  • [SLI-Metrics] Ray service info (#3604, @owenowenisme)
  • chore CI: use Go 1.24 everywhere (#3584, @davidxia)
  • [SLI Metrics] Add metric kuberay_job_info (#3621, @troychiu)
  • [docs] Fix typos (#3609, @omahs)
  • [Refactor] Remove duplicate definition of get_ray_cluster_status (#3608, @LeoLiao123)
  • [Prometheus] Add kuberay_cluster_info metric (#3535, @win5923)
  • feat plugin: support enabling autoscaler both v1 and v2 (#3459, @davidxia)
  • test: remove duplicate delete worker group test (#3605, @emmanuel-ferdman)
  • [chore] Remove misleading log (#3601, @kevin85421)
  • [refactor] Use mutate funcs to clearly show per-test field changes (#3587, @LeoLiao123)
  • [kubectl-plugin] Use dashboard API instead of the stdout of the ray job submit CLI to get Job submission ID (#3569, @LeoLiao123)
  • [refactor] Combine TestCalculateMinReplicas and TestCalculateMaxReplicas into a single test (#3579, @tinaxfwu)
  • feat: add Version to AutoscalerOptions (#3578, @davidxia)
  • Bump nanoid from 3.3.7 to 3.3.11 in /dashboard (#3589, @dependabot[bot])
  • Bump braces from 3.0.2 to 3.0.3 in /dashboard (#3590, @dependabot[bot])
  • Bump @babel/runtime from 7.24.1 to 7.27.1 in /dashboard (#3591, @dependabot[bot])
  • [apiserversdk] make withFieldSelector private and consistent 'KubeRay' (#3595, @rueian)
  • Remove unused icon from dashboard (#3599, @han-steve)
  • chore operator: improve TestIsAutoscalingEnabled test (#3583, @davidxia)
  • refactor tests: use testify instead of Fatal (#3593, @davidxia)
  • Add dashboard component to master (#3566, @han-steve)
  • [apiserversdk] check service belongs to kuberay (#3563, @troychiu)
  • refactor: remove unnecessary type args when type can be inferred (#3585, @davidxia)
  • [Feature] Add unit test for update service request validation (#3546, @LeoLiao123)
  • [refactor][operator]: make RayStartParams optional (#3202, @davidxia)
  • [Feature] Auto detect MIG GPUs and pass them into Ray’s logical resources (#3567, @siyuanfoundation)
  • Add more grouping to dependabot.yml to resolve inconsistencies when bumping versions (#3554, @kenmcheng)
  • [apiserver] ListAllServices with pagination (#3490, @tinaxfwu)
  • [RayCluster][Expectation] Add a test to ensure expectations work well during scaling down (#3543, @kenchung285)
  • [CI] Fix MultiArch image push (#3575, @kevin85421)
  • Revert "[Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)" (#3573, @kevin85421)
  • [Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570, @400Ping)
  • [Bug][kubectl-plugin] Wrong behavior for InteractiveMode RayJob with BackoffLimit set (#3555, @CheyuWu)
  • [CI] Fix: /etc/docker/daemon.json: No such file or directory (#3565, @win5923)
  • [Feature] Fix dependency upgrade for gomock (#3558, @400Ping)
  • [apiserversdk] Add query filter to proxy (#3534, @owenowenisme)
  • [Feature] Add timeout for apiserver grpc server (#3427, @machichima)
  • [Fix] RayCluster fails to transit Status.State to Ready when numOfHosts > 1 (#3353, @CheyuWu)
  • [Prometheus] Refactor kuberay_cluster_provisioned_duration_seconds (#3497, @win5923)
  • [TEST] Improve unit test coverage for apiserver pkg/model (#3495, @JiangJiaWei1103)
  • Bump github.com/jarcoal/httpmock from 1.2.0 to 1.4.0 in /ray-operator (#3536, @dependabot[bot])
  • [TEST] Unit tests for ray_job_submission_service_server.go (#3532, @machichima)
  • [apiserver] Support setting headServiceAnnotations (#3523, @troychiu)
  • [Fix][Operator] Explictly wait for pod not found for satisfying the delete scale exectation (#3520, @MortalHappiness)
  • [Apiserver] Determine the minimum resource requirements for KubeRay API server e2e tests (#3526, @kenchung285)
  • [Refactor] Improve developer experience of API server e2e-test (#3466, @JiangJiaWei1103)
  • [Feat][kubectl-plugin] Support -v flag for kubectl ray job submit (#3524, @MortalHappiness)
  • [Fix]remove broken link in doc (#3519, @simo-hsieh)
  • [Fix][kubectl-plugin] Remove filepath.Clean for ray job submit workingDir (#3518, @MortalHappiness)
  • [apiserversdk] use config.Middleware at most once (#3522, @rueian)
  • [apiserversdk] implement the apiserversdk proxy (#3494, @rueian)
  • docs: update dev docs to use Golang 1.24 (#3515, @davidxia)
  • [SLI Metrics] kuberay_job_execution_duration_seconds (#3488, @troychiu)
  • feat: use specified --ray-version in --image (#3514, @davidxia)
  • [Feature] Manually upgrade k8s package group (#3486, @LeoLiao123)
  • Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 in /kubectl-plugin (#3499, @dependabot[bot])
  • [Feature] Upgrade net package (#3485, @400Ping)
  • Bump the google-golang group across 5 directories with 3 updates (#3493, @dependabot[bot])
  • [CI] Upload logs as artifacts to BuildKite (#3405, @win5923)
  • [Feature] Upgrade ginkgo (#3503, @LeoLiao123)
  • Fix: Helm lint and test CI failed (#3505, @ChenYi015)
  • Bump github.com/Masterminds/semver/v3 from 3.2.1 to 3.3.1 in /ray-operator (#3500, @dependabot[bot])
  • [CI][HELM] Use chart-testing to install Helm charts (#3412, @ChenYi015)
  • Fix upgrade gomega (#3483, @owenowenisme)
  • [Apiserver] Set the right amount of resource in e2e test (#3465, @owenowenisme)
  • [Prometheus] Add kuberay_cluster_provisioned_duration_seconds metric (#3212, @win5923)
  • Only try once in HTTP health check commands (#3469, @epall)
  • [TEST] e2e test for Cluster in resource_manager (#3432, @machichima)
  • [Feature] Upgrade grpc gateway version manually (#3491, @JiangJiaWei1103)
  • [docs] Correct typos in CONTRIBUTING.md and api-server README.md (#3492, @LeoLiao123)
  • Bump github.com/onsi/gomega from 1.36.2 to 1.37.0 in /apiserver (#3475, @dependabot[bot])
  • chore: move go mod download before copy source in Dockerfile (#3460, @fscnick)
  • [Refactor] Improve API server developer experience (#3458, @JiangJiaWei1103)
  • Add a grouping for 'google.golang.org/*' to avoid inconsistency between sub-projects (#3470, @kenmcheng)
  • [Refactor] Encapsulate RayJob metrics in a custom Prometheus collector (#3444, @troychiu)
  • Bump golang.org/x/net from 0.33.0 to 0.38.0 in /experimental (#3407, @dependabot[bot])
  • [TEST] Improve unit test coverage for apiserver pkg/model (#3419, @JiangJiaWei1103)
  • [N/N] [Lint] Group imports by sections (#3454, @troychiu)
  • [Feature] Upgrade golang version (#3461, @CheyuWu)
  • [Apiserver] Use Eventually from Gomega instead of wait from apimachinery (#3433, @owenowenisme)
  • [HELM] Fix serviceAccount name inconsistency in templates (#3451, @archsyscall)
  • [3/N] [Lint] Group imports by sections (#3430, @troychiu)
  • [TEST] Improve test coverage for apiserver pkg/manager (#3386, @machichima)
  • [2/N] [Lint] Group imports by sections (#3429, @troychiu)
  • [Refactor] Format API server Makefile for consistency (#3435, @JiangJiaWei1103)
  • [HELM] Typo correction (operatorComand -> operatorCommand) (#3450, @archsyscall)
  • [apiserver] Update entry point readme page (#3437, @dentiny)
  • bump protobuf version (#3443, @troychiu)
  • [Feature] Add e2e test for UpdateRayService function (#3446, @400Ping)
  • [Feature] Manually fix controller runtime package upgrade [#3397] (#3448, @machichima)
  • [Feature] Manually fix net package upgrade (#3447, @owenowenisme)
  • [Feature] Fix auto upgrade prometheus (#3449, @400Ping)
  • [1/N] [Lint] Group imports by sections (#3428, @troychiu)
  • [Refactor][Helm] Adjust the indention of ray-cluster template files (#3410, @ChenYi015)
  • [Feature] [API Server] Support activeDeadlineSeconds in API Server RayJob resource (#3335, @machichima)
  • [Docs] Correct command to load KubeRay operator image (#3387, @JiangJiaWei1103)
  • [apiserver] List services with pagination (#3309, @dentiny)
  • Use helm-docs to generate README for chart kuberay-operator automatically (#3331, @ChenYi015)
  • [api-server] change cluster expected status to speed up e2e test (#3415, @troychiu)
  • [HELM] Add Helm unit tests for chart kuberay-apiserver (#3361, @ChenYi015)
  • [Refactor][Helm] Define name templates for kuberay-operator resources (#3381, @ChenYi015)
  • fix precommit fail in master (#3416, @troychiu)
  • Api server refactor/allow multiple job statuses in servicee2e (#3375, @owenowenisme)
  • [doc] [apiserver] Remove wording on pagination and sort (#3411, @dentiny)
  • [kubectl-plugin] TestKubectlRayCommand is flaky (#3371, @kenchung285)
  • Bump google.golang.org/protobuf from 1.34.2 to 1.36.6 in /experimental (#3395, @dependabot[bot])
  • Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 in /apiserver (#3391, @dependabot[bot])
  • [CI] Bump Go version to 1.23 to support E2E Operator Version Upgrade tests in Buildkite (#3406, @win5923)
  • [Apiserver][Refactor] Use polling in autoscaler e2e test (#3402, @owenowenisme)
  • [Refactor][Helm] Use with to simplify ray-cluster templates (#3376, @ChenYi015)
  • Bump the kubernetes group across 3 directories with 9 updates (#3390, @dependabot[bot])
  • Bump github.com/rs/zerolog from 1.33.0 to 1.34.0 in /apiserver (#3393, @dependabot[bot])
  • Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.20.4 in /apiserver (#3392, @dependabot[bot])
  • Bump github.com/prometheus/client_golang from 1.20.5 to 1.22.0 in /apiserver (#3394, @dependabot[bot])
  • [apiserver] Implement pagination to list all jobs (#3359, @dentiny)
  • [CI] Add apiserver e2e test to buildkite (#3351, @win5923)
  • [apiserver] Add make command to start apiserver (#3362, @dentiny)
  • [Docs][ray-operator] Add types of tests and debug tips to development doc (#3401, @MortalHappiness)
  • Add dependabot.yml for enabling "Dependabot version updates" (#3357, @kenmcheng)
  • [apiserver] Fix typos in apiserver proto (#3389, @zaporter-work)
  • [DOCS] document step to do before running e2e test (#3385, @machichima)
  • [test][apiserver] Add unittests for pkg/interceptor (#3346, @chuang0221)
  • [apiserver] Enable golang linter for apiserver (#3367, @dentiny)
  • Add Helm chart unit tests to ray-cluster (#3374, @ChenYi015)
  • [apiserver] Delete resource manager interface (#3370, @dentiny)
  • [apiserver] Remove unused code in apiserver error utils (#3352, @dentiny)
  • Api server refactor/allow multiple job statuses in jobe2e (#3363, @owenowenisme)
  • [apiserver] Fix error propagation and add unit test (#3334, @dentiny)
  • [Feature] Add apiserver unit test(pkg/util/cluster.go) (#3348, @kenchung285)
  • [2/N] [apiserver] Fix second-half apiserver lint (#3338, @dentiny)
  • [apiserver] [easy] Add http error status test (#3347, @dentiny)
  • [apiserver] [high-priority] Fix API server merge conflict (#3343, @dentiny)
  • [apiserver] Move http request spec out of entry page (#3337, @dentiny)
  • [Refactor] Encapsulate RayCluster metrics in a custom Prometheus collector (#3310, @win5923)
  • [apiserver][feat] add pagination to ListRayJobs (#3285, @owenowenisme)
  • [apiserver] Remove unnecessary test image (#3340, @dentiny)
  • [Refactor] use constant variable for port name and value (#3341, @machichima)
  • [kubectl-plugin] TestSwitchesIncompatibleWithConfigFilePresent is flaky (#3333, @kenchung285)
  • [1/N] [apiserver] Fix half of linter issues for apiserver (#3328, @dentiny)
  • [apiserver] Enable test coverage for e2e test (#3332, @dentiny)
  • [Feature][RayService] Set default ports (#3262, @machichima)
  • [refactor] move isOpenShift to options (#3304, @troychiu)
  • [lint] Apply golangci-lint autofix on apiserver directory (#3323, @dentiny)
  • [release] Remove images directory because all documentation has been moved to the Ray website a long time ago (#3322, @kevin85421)
  • [release] Remove unused ray-cluster.pod-security.yaml (#3320, @kevin85421)
  • Upgrade golang linter for precommit hook (#3319, @dentiny)
  • Add Helm chart unittests to CI (#3280, @ChenYi015)
  • [feat][plugin] support creating RayCluster with config file (#3225, @davidxia)
  • Add basic Helm chart unittests for kuberay-operator (#3253, @ChenYi015)
  • [refactor] Make options as a direct field in reconciler (#3299, @troychiu)
  • Fix apiserver linter (#3296, @dentiny)
  • chore: improve kubectl ray session error message (#3300, @davidxia)
  • [easy] Remove cmd folder from test command (#3297, @dentiny)
  • [Autoscaler][Sample] Add comment for AUTOSCALER_UPDATE_INTERVAL_S (#3294, @nadongjun)
  • [apiserver][feat] support pagination for ListAllClusters (#3291, @troychiu)
  • [apiserver] No lint at test (#3292, @dentiny)
  • [readme] Make collapsible section for apiserver readme (#3290, @dentiny)
  • [refactor][plugin] update generation_test.go (#3276, @davidxia)
  • [fix][plugin] typos in some tests (#3263, @davidxia)
  • [Feat][kubectl-plugin] Create cluster with TPUs (--worker-tpu, --num-of-hosts) and TPUs' validation (#3258, @CheyuWu)
  • refactor metrics (#3236, @troychiu)
  • [docs][operator] fix incorrect example output (#3268, @davidxia)
  • [apiserver][feat] add pagination to ListClustersRequest (#3240, @rueian)
  • [feat] enforce DNS1035 validations on RayCluster, RayService, and RayJob names (#3239, @rueian)
  • docs: minor formatting changes to ray-operator dev (#3267, @davidxia)
  • Update KubeRay release documentation (#3226, @andrewsykim)
  • [operator] add +optional to CRD fields that are optional (#3231, @davidxia)
  • [Refactor] move rayStartParams to options.complete (#3260, @CheyuWu)
  • [CI] Remove create tag step from release (#3249, @MortalHappiness)
  • [Docs] Align development guide with Makefile docker-build logic (#3248, @JiangJiaWei1103)
  • [feat] move rbac_test.py to scripts (#3256, @robert-cronin)
  • [fix][plugin] add missing CLI flags for create cluster cmd (#3237, @davidxia)
  • [RayJob][Test] refactor TestValidateRayJobSpec with table test (#3223, @fscnick)
  • [kubectl-plugin] fix rayStartParams error when creating workgroup (#3251, @CheyuWu)
  • [refactor][plugin] RayClusterSpecObject (#3238, @davidxia)
  • [kubectl-plugin] remove CPU limits by default (#3243, @andrewsykim)
  • [Chore][CI] Limit the release-image-build github workflow to only take tag as input (#3117, @tinaxfwu)
  • add node selector option for kubectl plugin create worker group (#3235, @troychiu)
  • [Docs] update development md (#3230, @machichima)
  • kubectl-plugin: set global flags at the root cmd (#3203, @davidxia)
  • [kubectl-plugin] Add head/worker node selector option (#3228, @troychiu)
  • Integrate with rayci (#3215, @dayshah)
  • [RayJob][Fix] Use --no-wait for job submission to avoid carrying the error return code to the log tailing (#3216, @rueian)
  • [operator] add // +optional to CRD fields with omitempty (#3220, @davidxia)
  • [fix][proto] install missing unzip in Dockerfile (#3221, @davidxia)
  • [operator] remove incorrect comment (#3218, @davidxia)
  • [RayService][Test] make sure annotation populated to RayCluster (#3210, @fscnick)
  • Allow app.kubernetes.io/component to be overriden in kuberay-operator helm chart (#3198, @acrewdson)
  • [RayJob][Test] Make sure annotation populated to RayCluster (#3199, @fscnick)
  • kubectl plugin: version shows commit for dev (#3188, @spencer-p)
  • plugin job submit consistently uses field manager (#3187, @spencer-p)
  • RayService event can't set redis password in both GCSFaultTolerance and rayStartParam (#3153, @fscnick)
  • [RayCluster]Upgrade volcano to 1.11.0 (#3159, @owenowenisme)
  • [doc] update file name for container images page (#3189, @chewong)
  • [feat][kubectl-plugin] add scale command (#2926, @davidxia)
  • kubectl ray job submit: provide entrypoint (#3186, @spencer-p)
  • [RayService][Test] util for creating empty RayClusterSpec in test (#3182, @fscnick)
  • [Doc] Add a YAML to explain why some worker pod are not ready in RayService (#3139, @kenchung285)
  • [RayService] make RayClusterSpec required (#3169, @fscnick)
  • [feat] validate the name length of RayCluster, RayService, and RayJob (#3083, @rueian)
  • [chore][docs] enable Markdown lint rule MD013 (#3167, @davidxia)
  • Revert PR [#3127] (#3165, @MortalHappiness)
  • [chore][docs] enable Markdown lint rule MD024 (#3104, @davidxia)
  • [feat][kubectl-plugin] support setting K8s ephemeral storage resource (#3150, @davidxia)
  • [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray get node & workergroup (#3154, @win5923)
  • [chore][docs] enable Markdown lint rules MD050 MD052 (#3152, @davidxia)
  • [CI][RayService] deflaky the TestAutoscalingRayService (#3119, @rueian)
  • [chore][docs] enable Markdown lint rule MD036 (#3137, @davidxia)
  • [Fix][kubectl-plugin] Fix the flaky test "should reconnect after pod connection is lost" (#3147, @MortalHappiness)
  • Changes required make a build after update of component-base (#3004, @mszadkow)
  • Rayjob event can't set redis password in both GCSFaultTolerance and rayStartParam (#3093, @fscnick)
  • [ci] remove the test for modin because its certificate is broken (#3135, @rueian)
  • [feat][RayCluster] shorten HeadlessServiceSuffix to have more space for CR names (#3101, @rueian)
  • [Bug] Re-enable flaky kubectl plugin e2e test in kubectl_ray_job_submit_test.go (#3124, @owenowenisme)
  • [chore][docs] enable Markdown lint rule MD041 (#3125, @davidxia)
  • [bug] use passed context instead of TODO context (#3129, @troychiu)
  • kubectl ray job submit: provide empty entrypoint (#3127, @spencer-p)
  • [Fix] Adjust crd path to verify changed files (#3103, @mszadkow)
  • [feat][kubectl-plugin] delete accepts multiple resources (#3097, @davidxia)
  • [Refactor][kubectl-plugin] Move fakeClient in version_test.go to client/fake (#3113, @owenowenisme)
  • [CI] Remove test_security.py and all python test dependencies in CI (#3123, @MortalHappiness)
  • [Chore] Set default go version in pre commit (#3121, @troychiu)
  • [Bug] Re-enable flaky kubectl plugin e2e test "should reconnect after pod connection is lost" (#3116, @owenowenisme)
  • [RayCluster] IsAutoscalingEnabled takes RayClusterSpec (#3111, @fscnick)
  • [chore][docs] enable Markdown lint rules MD033 MD034 (#3105, @davidxia)
  • [feat][kubectl-plugin] support K8s labels and annotations (#3095, @davidxia)
  • [feat][kubectl-plugin] support setting ray start params (#3046, @davidxia)
  • [feat] shorten RayClusterSuffix to have more space for RayService and RayJob names (#3102, @rueian)
  • [fix][kubectl-plugin] error when creating cluster that already exists (#3099, @davidxia)
  • Add e2e test make sure resource quota error is surfaced (#3087, @han-steve)
  • [CI] apply resource logger to ray service test (#3081, @simo-hsieh)
  • [CI] fix locust versions (#3100, @rueian)
  • [chore][docs] enable Markdown lint rule MD040 (#3096, @davidxia)
  • [Fix][kubectl-plugin] Release bot opens PRs to Krew repo with unexpected whitespace changes (#3090, @MortalHappiness)
  • [chore][docs] enable more Markdown lint rules (#3088, @davidxia)
  • [docs] remove unnecessary kubectl-plugin/README.md (#3089, @davidxia)
  • [chore][docs] enable Markdown lint rules MD012 and MD022 (#3080, @davidxia)
  • [Fix][CI] Redirect stderr to stdout in Test Autoscaler E2E (nightly operator) (#3074, @tinaxfwu)
  • chore: update version string in examples and docs (#3086, @davidxia)
  • Add e2e KubeRay operator upgrade test (#3060, @ryanaoleary)
  • [feat][kubectl-plugin] show last Condition and State (#2919, @davidxia)
  • [CI] apply resource logger to ray cluster test (#3075, @simo-hsieh)
  • [refactor][kubectl-plugin] simplify cobra Validate() tests (#3040, @davidxia)
  • [chore][docs] enable Markdown lint rules MD005 and MD007 (#3073, @davidxia)
  • [CI] Composable kube resource logger when test failed (#3070, @simo-hsieh)
  • [feat][kubectl-plugin] support waiting until Ray cluster is provisioned (#3067, @davidxia)
  • [CI] dump failed test k8s resources (#3025, @simo-hsieh)
  • [kubectl-plugin] Validate resource values for negative and non-numeric inputs (#3045, @win5923)
  • [Chore] Add RayJob InteractiveMode sample yaml (#3062, @MortalHappiness)
  • [refactor][kubectl-plugin] make --node-type an enum flag (#3018, @davidxia)
  • [refactor][kubectl-plugin] require one posarg in session (#3017, @davidxia)
  • [CI] enable CI checks for all release branches (#3066, @andrewsykim)
  • [RayCluster] Make headpod name deterministic (#3028, @owenowenisme)
  • [Fix][kubectl-plugin] ray job submit runtime-env-json null error (#3063, @MortalHappiness)
  • [release] Use ray-ml:2.41.0.deprecated instead of ray-ml:2.41.0.deprecated-gpu (#3058, @kevin85421)
  • [Fix][RayCluster] fix missing pod name in CreatedWorkerPod and Failed… (#3057, @rueian)
  • [feat][kubectl-plugin] list Ray nodes (#3007, @davidxia)
  • [feat][kubectl-plugin] add get workergroup cmd (#2996, @davidxia)
  • [refactor][kubectl-plugin] use cobra's posargs length validator (#2985, @davidxia)
  • [Feature] Add timestamps for logs in e2e tests (#3006, @kenchung285)
  • [Fix][kubectl-plugin] Don't print wrapped error for job submit startup (#3027, @MortalHappiness)
  • [fix][kubectl-plugin] return error when getting a non-existent cluster (#2990, @davidxia)
  • Add vLLM TPU example RayService manifest (#3000, @ryanaoleary)
  • [Fix][CI] E2E tests do not reflect error (#3021, @MortalHappiness)
  • [kubectl-plugin] Support specifying number of head GPUs and worker GPUs for Rayjob (#2989, @win5923)
  • [refactor][kubectl-plugin] require cluster in create workergroup (#3010, @davidxia)
  • [Fix][CI] kubectl plugin krew index CI error (#3015, @MortalHappiness)
  • Fix incorrect comment in raycluster_controller.go (#3003, @owenowenisme)
  • [Autoscaler] Print the value of WorkerGroupSpec.Replicas (#3005, @fscnick)
  • [kubectl-plugin] Fix panic when working_dir is not set (#2988, @win5923)
  • [fix][kubectl-plugin] fix some test errors (#2997, @davidxia)
  • [fix][kubectl-plugin] use explicit context for session cmd (#2994, @davidxia)
  • [Refactor] Merge raycluster_gcs_ft_test.go and raycluster_gcsft_test.go (#3008, @LeoLiao123)
  • [Fix] Standardize Buildkite Display Format Across All Tests (#2992, @tinaxfwu)
  • [refactor][kubectl-plugin] remove unnecessary lines (#2991, @davidxia)
  • [CI][#2905] Improvement: enable testifylint compares rule (#2977, @cchung100m)
  • [Feature] [Fix] Ensure Correct Logs Display for Go Test Logs in Buildkite Runner (#2837, @tinaxfwu)
  • [Refactor] Use constants for image tag, image repo, and versions in golang to avoid hard-coded strings (#2978, @400Ping)
  • Update TPU Ray CR manifests to use Ray 2.41.0 (#2965, @ryanaoleary)
  • Update samples to use Ray 2.41.0 images (#2964, @andrewsykim)
  • [Test] Use GcsFaultToleranceOptions in test and backward compatibility (#2972, @fscnick)
  • [chore][docs] enable Markdownlint rule MD004 (#2973, @davidxia)
  • [release] Update Volcano YAML files to Ray 2.41 (#2976, @win5923)
  • [release] Update Yunikorn YAML file to Ray 2.41 (#2969, @kenchung285)
  • [CI] Change Pre-commit-shellcheck-to-shellcheck-py (#2974, @owenowenisme)
  • [chore][docs] enable Markdownlint rule MD010 (#2975, @davidxia)
  • [Release] Upgrade ray-job.batch-inference.yaml image to 2.41 (#2971, @MortalHappiness)
  • [RayService] adapter vllm 0.6.1.post2 (#2823, @pxp531)
  • [release][9/N] Update text summarizer RayService to Ray 2.41 (#2961, @kevin85421)
  • [RayService] Deflaky RayService envtest (#2962, @kevin85421)
  • [RayJob] Deflaky RayJob e2e tests (#2963, @kevin85421)
  • [fix][kubectl-plugin] set worker group CPU limit (#2958, @davidxia)
  • [docs][kubectl-plugin] fix incorrect example commands (#2951, @davidxia)
  • [release][8/N] Upgrade Stable Diffusion RayService to Ray 2.41 (#2960, @kevin85421)
  • [kubectl-plugin] Fix panic when GPU resource is not set (#2954, @win5923)
  • [docs][kubectl-plugin] improve help messages (#2952, @davidxia)
  • [CI] Enable testifylint len rule (#2945, @LeoLiao123)
  • [release][7/N] Update RayService YAMLs (#2956, @kevin85421)
  • [Fix][RayJob] Invalid quote for RayJob submitter (#2949, @MortalHappiness)
  • [chore][kubectl-plugin] use consistent capitalization (#2950, @davidxia)
  • [chore] add Markdown linting pre-commit hook (#2953, @davidxia)
  • [chore][kubectl-plugin] use better test assertions (#2955, @davidxia)
  • [CI] Add shellcheck and fix error of it (#2933, @owenowenisme)
  • [docs][kubectl-plugin] add dev docs (#2912, @davidxia)
  • [release][6/N] Remove unnecessary YAMLs (#2946, @kevin85421)
  • [release][5/N] Update some RayJob YAMLs from Ray 2.9 to Ray 2.41 (#2941, @kevin85421)
  • [release][4/N] Update Ray images / versions in kubectl plugin (#2938, @kevin85421)
  • [release][3/N] Update RayService e2e tests YAML files from Ray 2.9 to Ray 2.41 (#2937, @kevin85421)
  • [release][2/N] Update RayCluster Helm chart from Ray 2.9 to Ray 2.41 (#2936, @kevin85421)
  • Delete [raycluster|rayjob|rayservice]_types_test.go unnecessary tests (#2935, @kevin85421)
  • [release][1/N] Update YAMLs from Ray 2.9 to Ray 2.41 (#2934, @kevin85421)
  • [CI] Generate CRD json schema separately in pre-commit (#2930, @MortalHappiness)
  • [CI] Enable testifylint expected-actual rule (#2914, @davidxia)
  • [docs] move pre-commit instructions to main dev docs (#2921, @davidxia)
  • [CI] Enable testifylint float-compare rule (#2910, @MortalHappiness)
  • [CI] Fix lint error (require-error) (#2931, @MortalHappiness)
  • [kubectl-plugin] support general kubectl switches like --context (#2883, @davidxia)
  • [CI] Enable testifylint require-error rule (#2909, @MortalHappiness)
  • [chore][kubectl-plugin] use consistent capitalization (#2922, @davidxia)
  • [RayService] Refactor unit tests for ShouldPrepareNewCluster (#2928, @kevin85421)
  • [RayService] Add a safeguard to prevent overriding the pending cluster during a upgrade (#2887, @rueian)
  • [CI] Auto download golang tools in pre-commit (#2917, @MortalHappiness)
  • [CI] Enable testifylint bool-compare rule (#2911, @400Ping)
  • [CI] Enable testifylint empty rule (#2908, @400Ping)
  • [CI] Enable testifylint formatter rule (#2915, @400Ping)
  • [Fix][kubectl-plugin] make tests use a temporary kube config (#2894, @davidxia)
  • [kubectl-plugin] update context error messages (#2891, @davidxia)
  • Use webhook.CustomValidator instead of deprecated webhook.Validator. (#2803, @mbobrovskyi)
  • [kubectl-plugin][feat] support specifying number of head GPUs (#2895, @davidxia)
  • [CI] Enable testifylint error-nil rule (#2907, @MortalHappiness)
  • [CI] Enable testifylint rule (#2896, @MortalHappiness)
  • [Fix][kubectl-plugin] Fix no context nil error SIGSEGV in tests (#2892, @MortalHappiness)
  • [docs][ray-operator] fix typo in Golang version (#2893, @davidxia)
  • [RayService] Refactor envtests (#2888, @kevin85421)
  • [RayService] Remove outdated env tests (#2886, @kevin85421)
  • [RayService] More envtests that follow the most common scenario in the RayService code path (#2880, @rueian)
Source: README.md, updated 2025-06-20