| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| Ray-2.51.0 source code.tar.gz | 2025-10-25 | 184.8 MB | |
| Ray-2.51.0 source code.zip | 2025-10-25 | 190.6 MB | |
| README.md | 2025-10-25 | 15.7 kB | |
| Totals: 3 Items | 375.4 MB | 0 | |
Release Highlights
Ray Serve:
- Application-level autoscaling: Introduces custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. This is a significant advancement over per-deployment autoscaling, allowing for more intelligent resource management at the application level.
- Enhanced autoscaling capabilities with replica-level metrics: Wires up
AutoscalingContextwithtotal_running_requests,total_queued_requests, andtotal_num_requests, plus adds support for min, max, and time-weighted average aggregation functions. These improvements give users fine-grained control to implement sophisticated custom autoscaling policies based on real-time workload metrics.
Ray Libraries
Ray Data
π New Features:
- Added enhanced support for Unity Catalog integration (#57954, [#58049])
- New expression evaluator infrastructure for improved query optimization (#57778, [#57855])
- Support for SaveMode in write operations (#57946)
- Added approximate quantile aggregator (#57598)
- MCAP datasource support for robotics data (#55716)
- Callback-based stat computation for preprocessors and ValueCounter (#56848)
- Support for multiple download URIs with improved error handling (#57775)
π« Enhancements:
- Improved projection pushdown handling with renamed columns (#58033, [#58037], [#58040], [#58071])
- Enhanced hash-shuffle performance with better retry policies (#57572)
- Streamlined concurrency parameter semantics (#57035)
- Improved execution progress rendering (#56992)
- Better handling of empty columns in pandas blocks (#57740)
- Enhanced support for complex data types and column operations (#57271)
- Reduced memory usage with improved streaming generator backpressure (#57688)
- Enhanced preemption testing and utilities (#57883)
- Improved Download operator display names (#57773)
- Better handling of variable-shaped tensors and tensor columns (#57240)
- Optimized aggregator execution with out-of-order processing by default (#57753)
π¨ Fixes:
- Fixed renamed columns to be appropriately dropped from output (#58040, [#58071])
- Fixed handling of renames in projection pushdown (#58033, [#58037])
- Fixed vLLMEngineStage field name inconsistency for images (#57980)
- Fixed driver hang during streaming generator block metadata retrieval (#56451)
- Fixed retry policy for hash-shuffle tasks (#57572)
- Fixed prefetch loop to avoid blocking on fetches (#57613)
- Fixed empty projection handling (#57740)
- Fixed errors with concatenation of mixed pyarrow native and extension types (#56811)
π Documentation:
- Updated document embedding benchmark to use canonical Ray Data API (#57977)
- Improved concurrency-related documentation (#57658)
- Updated preprocessing and data handling examples
Ray Train
π New Features:
- Ray Train V2 is now enabled by default (#57857)
- Top-level ray.train aliases for public APIs (#57758)
- Enhanced checkpoint validation with validate_function parameter (#57742)
- Improved error handling for V1/V2 API mixing (#57570)
π« Enhancements:
- Improved async checkpointing and validation benchmarks (#57530)
- Better training ingest performance with soak testing (#57120)
- Enhanced checkpoint upload functionality with better retry logic
- Improved error messages with TrainingFailedError module updates (#57865)
- Better timeout handling for torch trainer tests (#57873)
- Enhanced pytorch profiler integration (#57133)
- Improved circular dependency isolation (#57710)
π¨ Fixes:
- Fixed after_worker_group_poll_status errors resulting in ControllerError (#57869)
- Fixed job working directory test issues (#58010)
- Fixed Vicuna release test example to use V2 (#57767, [#58053])
- Disabled train_colocate_trainer test (#57963)
- Fixed various V2 migration and compatibility issues
π Documentation:
- Documented checkpoint_upload_fn (#57742)
- Added PyTorch Profiler and Ray Train template (#57133)
- Improved checkpoint and validation documentation
π Architecture refactoring:
- Migrated multiple release tests to Train V2
- Better integration with Ray Data for training workloads
- Improved controller error handling and reporting
Ray Tune
π« Enhancements:
- Updated release tests to import from tune (#57956)
- Better integration with Train V2 backend
Ray Serve
π New Features:
- Application-level autoscaling. Introduces support for custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. (#57535, #57548, #57637, #57756)
- Autoscaling metrics aggregation functions. Adds support for min, max, and time-weighted average aggregation over timeseries data, providing more flexible autoscaling control. (#56871)
- Enhanced autoscaling context with replica-level metrics. Wires up AutoscalingContext constructor arguments to expose total_running_requests, total_queued_requests, and total_num_requests for use in custom autoscaling policies. (#57202)
- Multiple task consumers in a single application. Ray Serve applications can now run multiple task consumer deployments concurrently. (#56618)
π« Enhancements:
- Reconfigure invoked on replica rank changes. The reconfigure method now receives both user_config and rank parameters when ranks change, enabling replicas to adapt their configuration dynamically. (#57091)
- Celery adapter configuration improvements. Added default serializer and new configuration fields to enhance Celery integration flexibility. (#56707)
- AutoscalingContext promoted to public API. The autoscaling context is now officially part of the public API with comprehensive documentation. (#57600)
- Async inference telemetry. Added telemetry tracking to monitor the number of replicas using asynchronous inference. (#57665)
- Rank logging verbosity reduced. Changed seven rank-related INFO logs to DEBUG level, reducing log noise during normal operations. (#57831)
- Controller logging optimized. Removed expensive debug logs from the controller that were costly in large clusters. (#57813)
π¨ Fixes:
- Max constructor retry count test fixed for Windows. Adjusted test resource requirements to account for Windows process creation overhead compared to Linux forking. (#57541)
- Streaming test stability improvements. Added synchronization mechanisms to prevent chunk coalescing and rechunking, eliminating test flakiness. (#57592, #57728)
- Autoscaling test deflaking. Fixed race conditions in application-level autoscaling tests and removed flaky min aggregation test scenario. (#57784, #57967)
- State API usage test corrected. Fixed a unit test that was broken but not running in CI. (#56948)
- Controller recovery logging condition fixed. Updated test condition to properly verify debug and JSON logs after controller recovery. (#57568)
π Documentation:
- Custom autoscaling documentation. Added comprehensive guide for implementing custom autoscaling policies with examples and best practices. (#57600)
- Replica ranks documentation. Documented the replica rank feature, including how ranks are assigned and how to use them in reconfigure methods. (#57649)
- Application-level autoscaling guide. Added documentation explaining how to configure and use application-level autoscaling policies. (#57756)
- Autoscaling documentation improvements. Updated serve autoscaling docs with clearer explanations and examples. (#57652)
- Performance flags documentation. Documented performance-related configuration flags for Ray Serve. (#57845)
- Metrics documentation fix. Corrected ray_serve_deployment_queued_queries metric name discrepancy in documentation. (#57629)
- AutoscalingContext import added to examples. Fixed missing import statement in custom autoscaling policy example. (#57876)
- App builder guide typo corrected. Fixed command syntax error in typed application builder example. (#57634)
- Celery filesystem broker note. Added warning about using filesystem as a broker in Celery workers. (#57686)
- Async inference alpha stage warning. Added notice that async inference is in alpha stage. (#57268)
π Architecture refactoring:
- Autoscaling control moved to application state. Migrated autoscaling control loop from deployment state to application state, preparing for application-level autoscaling. (#57548)
- Async capability enum removed. Cleaned up unused async capability enum from codebase. (#57666)
Ray Serve/Data LLM
π New Features:
- Updated vLLM to 0.11.0 and Nixl to 0.6.0 (#57201)
- Video processor support for multimodal pipelines (#56785)
- Enhanced callback API for engine customization (#57257)
- Unified and extended builder configuration for LLM deployments (#57724)
π« Enhancements:
- Protocol-based typing improvements and cleaner inheritance structure (#57743)
- Better engine metrics enabled by default (#57615)
- Simplified NIXL dependency management in ray-llm images (#57706)
- Per-stage map kwargs for LLM processor preprocessing/postprocessing (#57826)
- Improved architecture documentation (#57830)
- Better code structure alignment with architectural design (#57889)
- Enhanced multimodal support with Deepseek compatibility (#56906)
π¨ Fixes:
- Fixed NIXL limitations with proper exception handling (#58159)
- Improved runai_streamer for vLLM 0.10.2+ integration (#56906)
π Documentation:
- Added comprehensive architecture documentation for Ray Serve LLM (#57830)
- Reorganized LLM documentation with improved navigation (#57787)
- Added benchmark page for performance reference (#57960)
- Converted quick-start guide to MyST Markdown (#57782)
- Better organization of Ray Serve LLM documentation (#57181?)
RLlib
π New Features:
- Prometheus metrics support for selected RLlib components (#57932)
- Enhanced support for complex observations in SingleAgentEpisode (#57017)
π« Enhancements:
- LINT improvements with enabled ruff imports for rllib/utils (#56737)
- Better type hints for learner_connector (#57673)
- Improved throughput metrics to avoid biasing (#57215)
π¨ Fixes:
- Fixed segment_tree.py edge case (#57599)
- Fixed small bug in type hints (#57673)
Ray Core
π New Features:
- Enhanced Ray Direct Transport (RDT) with improved NIXL integration and garbage collection (#57671, [#57603], [#58159])
- Cgroups support improvements with better system resource management (#57776, [#57864], [#57731], [#58017], [#58028], [#58064])
- Fault-tolerant RPC improvements for better distributed reliability (#57786, [#57861])
- Exponential backoff for retryable gRPCs (#56568)
π« Enhancements:
- Migrated from STATS to metric interface in RPC components (#57926)
- Improved histogram metrics midpoint calculation (#57948)
- Made FreeObjects non-fatal for better error handling (#57550)
- Enhanced ReleaseUnusedBundles fault tolerance (#57786)
- Made DrainRaylet and ShutdownRaylet fault tolerant (#57861)
- Better error handling for metric and event exporter agent (#57925)
- Improved raylet shutdown process and file organization (#57817)
- Reporter agent can now get PID via RPC to raylet (#57004)
- Enhanced ray.get thread safety (#57911)
- Configurable proto naming during event JSON conversion (#57705)
- Better handling of detached actor restarts (#57931)
- Improved lease rescheduling in local lease manager during node draining (#57834)
π¨ Fixes:
- Fixed "RayEventRecorder::StartExportingEvents() should be called only once" error (#57917)
- Fixed deadlock when cancelling stale requests on in-order actors (#57746)
- Fixed raylet shutdown races (#57198)
- Fixed log monitor seeking bug after log rotation (#56902)
- Deflaked multiple test suites for better CI reliability
- Fixed various memory and resource management issues
- Better handling of actor and task failures
π Documentation:
- Added JaxTrainer API overview to Ray docs (#57182)
- Fixed various typos and documentation issues
- Updated autoscaling and system configuration guides
- Enhanced SLURM documentation with symmetric-run support (#56775)
π Architecture refactoring:
- Dashboard API server subprocesses moved into system cgroup (#57864)
- Driver moved into workers cgroup for better isolation (#57776)
- Improved worker-raylet interface separation (#57804)
- Better plasma store provider architecture
Dashboard
π« Enhancements:
- Added percentage usage graphs for resources (#57549)
- Introduced sub-tabs with full Grafana dashboard embeds on Metrics tab (#57561)
- Added queued blocks to operator panels (#57739)
- Improved operator metrics logging for better clarity (#57702)
- Better filtering and display in job lists
π¨ Fixes:
- Fixed filtering issue in job list (#56946)
- Fixed incomplete card content on overview page (#56947)
- Filtered out ANSI escape codes from logs (#53370)
Autoscaler
π New Features:
- KubeRay autoscaling support with top-level Resources and Labels fields (#57260)
- Bundle label selector support in request_resources SDK (#54843)
- Application Gateway for Containers as ingress for Ray clusters on Azure
π« Enhancements:
- Azure improvements: Cleaning up extra resources (MSI, VNET, NSG) during cluster teardown (#57610)
- Updated defaults for Azure cluster templates (#57716)
- Better availability zone support for Azure node pools (#55532)
- Hello world release tests for Azure and GCE (#57597, [#57695])
- Improved cluster resource state handling to fix over-provisioning (#57130)
π¨ Fixes:
- Fixed autoscaler state synchronization issues (#57010)
- Better handling of node state information (#57130)
- Improved timeout handling for patch requests (#56605)
Thank you to everyone who contributed to this release! Special thanks to all the contributors who helped make Ray 2.51.0 possible through bug fixes, features, documentation improvements, and testing efforts.