Solving the inference and training crisis with dynamic, popularity-based scheduling.
all-to-all
communication required for expert parallelism and the allreduce
communication required for data parallelism.
The Orchestrator’s underlying training fabric incorporates a prioritized communication scheduler. It uses tensor partitioning to break large communication operations into smaller “micro-ops.” The scheduler ensures that the blocking, critical-path all-to-all
operations are always given exclusive access to the network, while opportunistically scheduling the allreduce
micro-ops in the gaps. This strategy can accelerate training step time by up to 1.73x, providing a significant economic advantage and enabling faster iteration for creators building on the platform.