Primary Use Case

KubeRay is primarily used by developers and DevOps teams to run distributed AI and machine learning workloads using Ray on Kubernetes with ease. It automates cluster lifecycle management, job submission, and service deployment, enabling scalable and fault-tolerant AI model training and serving in cloud-native environments.

Key Features

Custom resource definitions for RayCluster, RayJob, and RayService to manage Ray workloads

Automatic lifecycle management including cluster creation, autoscaling, and fault tolerance

RayJob support for automatic cluster creation, job submission, and cleanup

RayService for zero-downtime upgrades and high availability of Ray Serve deployments

Kubectl plugin to simplify Ray operations on Kubernetes

Integration with Kubernetes ecosystem tools like Prometheus, Grafana, and ingress controllers

Experimental dashboard for resource management and monitoring

API server for simplified configuration and UI integration

Insights & Recommendations

KubeRay is tightly integrated with the Kubernetes ecosystem and requires a working Kubernetes cluster. Users should refer to the official Ray documentation for detailed setup and best practices. The dashboard and API server components are experimental and may not be production-ready. Leveraging the kubectl ray plugin simplifies managing Ray workloads but familiarity with Kubernetes concepts is beneficial.

Installation

Refer to the Ray documentation for user-facing installation and setup instructions at https://docs.ray.io/en/latest/cluster/kubernetes/index.html
Use the kubectl ray plugin (available from KubeRay v1.3.0) to simplify workflows
Deploy KubeRay operator and CRDs on your Kubernetes cluster following Ray's Kubernetes cluster setup guides
Optionally install the KubeRay Dashboard (experimental) starting from v1.4.0 for resource visualization

Usage

kubectl ray cluster create -f raycluster.yaml

Creates a RayCluster resource to deploy and manage a Ray cluster on Kubernetes

kubectl ray job submit -f rayjob.yaml

Submits a RayJob which automatically creates a RayCluster, runs the job, and optionally deletes the cluster after completion

kubectl ray service create -f rayservice.yaml

Deploys a RayService resource for zero-downtime upgrades and high availability of Ray Serve deployments

kubectl ray --help

Displays help and usage information for the kubectl ray plugin commands

Smart Usage Notes

Leverage KubeRay's autoscaling and fault tolerance features to simulate adversary persistence and lateral movement in Kubernetes environments.

Integrate KubeRay with existing DevSecOps pipelines to automate AI model security scanning and runtime monitoring.

Use the kubectl ray plugin to simplify operational workflows during red and purple team exercises focused on AI/ML workloads.

Deploy the experimental dashboard to enhance visibility into AI workload behaviors and detect anomalous activity in real time.

Combine KubeRay with Prometheus and Grafana integrations for comprehensive security telemetry and alerting on AI model deployments.

OpenSec Atlas

kuberay

About This Tool

Primary Use Case

Key Features

Insights & Recommendations

Installation

Usage

Smart Usage Notes

Security Capability Profile

Tags

You Might Also Like