Skip to content
/

When your ML.NET training data does not fit: ‘The asynchronous operation has not completed’

When you have created a machine learning model, you will retrain that model when new data is available. But when I recently added a couple of images to the training set of my own ML.NET model, I was faced with the following exception:

System.InvalidOperationException: 'The asynchronous operation has not completed.'

The application did work for weeks, so what has changed? And more importantly, how to fix this situation?

Finding the problem

The offending code is inside the Fit method, which is part of my preprocessing pipeline.

var imageData = mlContext.Data.LoadFromEnumerable(images);
var shuffledData = mlContext.Data.ShuffleRows(imageData);

var preprocessedData = preprocessingPipeline
    .Fit(shuffledData)
    .Transform(shuffledData);

When searching for this exception message, I found a GitHub issue mentioning this exact exception. Jon Wood mentions a change introduced in version 1.5.1 of the NuGet package as the cause of this exception. However, I am still on version 1.5.0, so maybe these causes are not really related?

Get it running again

Of course, I first tried if upgrading from version 1.5.0 to version 1.5.1 would help. But not very surprisingly, this did not remove the exception.

As the fix for the GitHub issue is already merged back, it will be part of version 1.5.2. As this version is not yet available, I added the daily NuGet feed and tested with the daily preview version 1.5.2-29129-1.

Now the exception is gone! Great.

Moving forward

You will have to wait until the machine learning team releases version 1.5.2 of the Microsoft.ML NuGet package.

However, if you are in a hurry, you can use the preview version of the NuGet package as I did. Another work-around might be to limit the size of your sample set. For my data set it seems 1046 is the magic number.

var imageData = mlContext.Data.LoadFromEnumerable(images.Take(1_046)); // also works

Although this might not be ideal if you have a much larger data set.