Launch HN: Evidently AI (YC S21) – Track and Debug ML Models in Production

shcheklein · on July 7, 2021

Hey, Elena, Emeli, congrats with the launch! Question- am I right, that you don't "dictate" specific monitoring infrastructure? You provide a tool that can take a model and calculate/plot certain important characteristic about it( (e.g. drift)? Do you have a plan to make connectors, templates, some additional infrastructure to simplify integration with Grafana? What other monitoring infrastructure tools you consider integrating?

elenasamuylova · on July 7, 2021

True, we do not want to force users into a rigid workflow. We build the tool so it can integrate with other parts of the ML stack. Right now you can for example use Evidently to calculate if the model has drifted, and then log the results in a JSON format, or take only some parts of the JSON (for example metrics you want) and then send them to an external dashboarding tool. As long as you can process a JSON output or store an HTML file you can build any workflow around that.

We plan to work on tutorials and prepare native integrations with some popular tools like Grafana, MLflow, DVC and some others.

rodrigorivera · on July 7, 2021

Hi Evidently team, congratulations on the launch! I like a lot what you are doing. I work primarily with time-series data. I hope we will see in the future a module to handle time-series problems such as forecasting. Also, please include integration with Google Colab. My colleagues and I work extensively with it, and it is the platform that we use to test new libraries. Otherwise, keep with the great work!

elenasamuylova · on July 7, 2021

Thanks - integration with Сolab is in the near-time roadmap. You can already use it to generate JSON profiles and export HTML reports. We are working now to make it possible to display all the interactive plots directly inside Сolab.

Longer term, we also plan to add more reports for specific problem types, as such time series or recommendation systems. Keep an eye on the repo!

pplonski86 · on July 8, 2021

Congratulatuions on launch! Models monitoring is really important and there is huge lack for tools to do this. How would you notify users (model owner) about potential problems? Are you going to use email notifications?

Does user need to manually set threshold when alarms should be triggered?

What to do if there is a data drift?

What is the most difficult to detect drift example that you see?

elenasamuylova · on July 8, 2021

Thanks for the support!

For the moment the workflows you describe are built externally: - To set notifications you can send the output from Evidently to other tools like Grafana and then build a notification workflow around it. If you have a batch model, you can use some workflow manager (like Airflow, or simply a cron job) to schedule a monitoring job at every model run and then log the results or send an email report. - Thresholds are manual. We learnt that the model owners usually have to tune them anyways since the models are very different (a small deviation in one model is nothing, in another is a disaster). But we plan to add the ability to generate default thresholds as the tool grows.

We are working on native integrations and tutorials for MLflow and Grafana in the next couple of weeks.

When you detect data drift, there are usually 3 options: - Retrain the model if you can (if you can label the data, for example) - Limit the model application (for example, tune the classification threshold, or exclude certain segments) - Pause the model or use a fall-back strategy (e.g. human-in-the-loop decision making)

Drift detection is really non-trivial when you have a lot of data (it will often show "drift" just due to the volume). We know some users need a solution for this use case, and plan to add something here.

Another aspect is that you have to be aware of which features are important for the model to not get too many false alarms.

elenasamuylova · on July 8, 2021

Would be cool to know if you see other tools in your stack you'd want to integrate with - we want to make Evidently very easy to plug in existing workflows. Let us know!

streetcat1 · on July 7, 2021

Thanks for the info, so can you please elaborate more on how are you access the prediction logs? Is there a specific log format? How do you know the model input schema?

elenasamuylova · on July 7, 2021

Right now we ask the user to prepare the logs on their side (or schedule a job to push the logs to the tool). We learnt that most teams store the prediction logs anyways - since they are usually used for retraining. So we thought that is the simplest and most universal interface for integration for now.

The tool now works with tabular data. Depending on the report type you can include only the input features (e.g. for data drift report), or also add the prediction and target column to the table (e.g. for model performance report). So you might need to perform some basic transformations (e.g. to add the target column if this data comes later) to prepare the input.

To specify the schema, you need to configure a simple column mapping (basically show where the target or prediction columns are, and optionally specify which features are categorical and numerical).

You can check the requirements for each report in the documentation https://docs.evidentlyai.com/

emelidral · on July 7, 2021

To add to this, if the column_mapping is not provided we try to parse data automatically assuming that the schema is standard (e.g. you use the column names like "target" and "prediction") We also process the features based on pandas data type. In future to want to make it super easy to avoid writing extra configuration so we will try to parse as much as possible, but of course give the user the opportunity to override.

fighterpilot · on July 7, 2021

If the ground truth is unavailable in real time, are you still able to detect anomalies with the inputs?

elenasamuylova · on July 7, 2021

We do not directly detect anomalies but we have two types of checks to run when there are no actuals or ground truth labels: 1) data drift to compare the statistical distribution of the input features to the past 2) prediction drift to compare the distribution of the model predictions to the past

To control the sensitivity of monitoring you can manually decide if you want to monitor all features, or maybe only the most important ones. This is not automated yet.

We also generate a few dashboards to show the relationship between the features and predicted values - to help with visual debugging.

We plan to add some unsupervised approaches like outlier detection later on. But for the moment we do not have checks on the level of individual objects in the data.

rehabemam · on July 7, 2021

That will be very helpful, excited to contribute and see this growing, thanks for sharing.

elenasamuylova · on July 7, 2021

Thanks! Looking forward to your contributions! <3

garuti · on July 8, 2021

Amazing! Congrats on the launch. We are looking foward to use it!

billconan · on July 7, 2021

how to obtain ground truth in production?

elenasamuylova · on July 7, 2021

Depends on the use case and available instrumentation!

If the feedback is available almost instantly (e.g. you recommend something to a user based on a model prediction and you know if they clicked on it or not), you can log the user action in your data warehouse to have the ground truth easily available for further analysis. Then you run the performance reports on top of complete logs.

In other cases you might have to wait for the ground truth (e.g. you predict the demand for some future period and then wait for it to materialize, or you need to label the data first). In this case you can log the ground truth to the data warehouse once it becomes available and join with the prediction logs. You can then run complete performance monitoring with error analysis as a batch job. In the meantime, you can still monitor the data drift.

Could you describe a specific use case and environment? We can brainstorm how to best arrange it.