KubeRay is an open-source Kubernetes operator that simplifies deploying and managing scalable Ray applications on Kubernetes clusters.
A toolkit to run Ray applications on Kubernetes
KubeRay is primarily used by developers and DevOps teams to run distributed AI and machine learning workloads using Ray on Kubernetes with ease. It automates cluster lifecycle management, job submission, and service deployment, enabling scalable and fault-tolerant AI model training and serving in cloud-native environments.
KubeRay is tightly integrated with the Kubernetes ecosystem and requires a working Kubernetes cluster. Users should refer to the official Ray documentation for detailed setup and best practices. The dashboard and API server components are experimental and may not be production-ready. Leveraging the kubectl ray plugin simplifies managing Ray workloads but familiarity with Kubernetes concepts is beneficial.
Refer to the Ray documentation for user-facing installation and setup instructions at https://docs.ray.io/en/latest/cluster/kubernetes/index.html
Use the kubectl ray plugin (available from KubeRay v1.3.0) to simplify workflows
Deploy KubeRay operator and CRDs on your Kubernetes cluster following Ray's Kubernetes cluster setup guides
Optionally install the KubeRay Dashboard (experimental) starting from v1.4.0 for resource visualization
kubectl ray cluster create -f raycluster.yaml
Creates a RayCluster resource to deploy and manage a Ray cluster on Kubernetes
kubectl ray job submit -f rayjob.yaml
Submits a RayJob which automatically creates a RayCluster, runs the job, and optionally deletes the cluster after completion
kubectl ray service create -f rayservice.yaml
Deploys a RayService resource for zero-downtime upgrades and high availability of Ray Serve deployments
kubectl ray --help
Displays help and usage information for the kubectl ray plugin commands