We make spot instances suitable for ML training. Spot instances are 70% cheaper than on-demand instances but are prone to interruptions.
We mitigate the downsides with persistence features, including optional fallback to on-demand instances.
Instances are booted and stopped automatically on detecting idle time.
We resume training after interruptions, using the last checkpoint via EBS volume.
Develop your model locally using a Docker container and run it on cloud.