Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Encord (YC W21) – Unit testing for computer vision models
91 points by ulrikhansen54 on Jan 31, 2024 | hide | past | favorite | 16 comments
Eric and Ulrik from Encord here. We build developer tooling to help computer vision (CV) teams enhance their model-building capabilities. Today we are proud to launch our model and data unit testing toolkit, Encord Active (https://encord.com/active/) [1].

Imagine you're building a device that needs to see and understand the world around it – like a self-driving car or a robot that sorts recycling. To do this, you need a vision model that processes the real world as a sequence of frames and makes decisions based on what it sees.

Bringing such models to production is hard. You can’t just train it once and then it works—you need to constantly test and improve it to make sure it understands the world correctly. For example, you don't want a self-driving car to confuse a stop sign with a billboard, or classify a pedestrian as an unknown object [2].

This is where Encord Active comes in. It's a toolkit that helps developers “unit test”, understand, and debug their vision models. We put “unit test” in quotes because while it isn’t classic software unit testing, the idea is similar: to see which parts of your model are working well and which aren't. Here’s a short video that shows the tool: https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK [3]

For instance, if you're working on a self-driving car, Encord Active can help you figure out why the car is confusing stop signs with billboards. It lets you dive into the data the model has seen and understand what's going wrong. Maybe the model hasn't seen enough stop signs at night, or maybe it gets confused when the sign is partially blocked by a tree.

Having extensive unit test coverage won’t guarantee that your software (or vision model) is correct, but it helps a lot, and is awesome at catching regressions (i.e. things that work at one point and then stop working later). For example, consider retraining your model with a 25% larger dataset, including examples from a new US state characterized by distinctly different weather conditions (e.g., California vs. Vermont). Intuitively, one might think ‘the more signs, the merrier.’ However, adding new signs can confuse the model, perhaps it’s suddenly biased to rely mostly on surroundings because signs are covered in snow. This can cause the model to regress and fall below your desired performance threshold (e.g., 85% accuracy) for existing test data.

These issues are not easily solvable by making changes to the model architecture or hyperparameter tuning (e.g., adjusting learning rates), especially as the types of problems you are trying to solve by the model get more complex. Rather, they are solved by training or fine-tuning the model on more of "the right" data.

Contrary to purely embeddings-based data exploration and model analytics/evaluation tools that help folks discover surface-level problems without offering suggestions for solving them, Encord Active will give concrete recommendations and actionable steps to solve the identified model and data errors by automatically analyzing your model performance. Specifically, the system detects the weakest and strongest aspects of the data distribution, serving as a guide for where to focus for improving subsequent iterations of your model training. The analysis encompasses various factors: the ‘qualities’ of the images (size, brightness, blurriness), the geometric characteristics of objects and model predictions (aspect ratio, outliers), as well as metadata and class distribution. It correlates these factors with chosen model performance metrics, surfacing low performing subsets for attention, providing you with actionable next steps. One of our early customers, for example, reduced their dataset size by 35% but increased their model’s accuracy (in this case, the mAP score) by 20% [4], which is a huge improvement in this domain). This is counterintuitive to most people as the thinking is generally “more data = better models”.

If any of these experiences resonate with you, we are eager for you to try out the product and hear your opinions and feedback. We are available to answer any questions you may have!

[1] https://encord.com/active/

[2] https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg

[3] https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK

[4] https://encord.com/customers/automotus-customer-story/




This is really cool. The annotation-to-testing-to-annotation-etc. feedback loop makes a ton of sense, and I'd encourage others who may be confused on this post to look at the Automotus case study https://encord.com/customers/automotus-customer-story/ which has a great diagram.

For those of us with similar needs for annotation and "unit testing," but on text corpuses, I'm aware of https://prodi.gy/ for the annotation side, but my understanding is the relationship between model outputs and annotation steering is out of scope for that project - do you know of tooling (open source or paid) that integrates an "Active" component similarly to what you do? Or is text a direction you want to go as well?

[I'm a fan of Vellum (YC W23) for evaluation and testing of multiple prompts https://www.vellum.ai/blog/introducing-vellum-test-suites - but I don't believe they feed annotation workflows in an automated and full-circle way.]


Good question! We are focused on vision at the moment, but we are indeed looking at text in the future. Happy to connect and have a chat around that if you are open as we would be curious to hear more about new text use cases


I had a look at your pricing page — https://encord.com/pricing/ — and was sad to see no pricing is actually communicated there.

What could I expect to pay for my company to use the Team plan?


We base our pricing on your user and consumption scale and would be happy to discuss this with you directly. Please feel free to explore the OS version of Active at https://github.com/encord-team/encord-active. Note that some features, such as natural language search using GPU accelerated APIs, are not included in the OS version.


Can't you set usage-based pricing?

edit: It looks like you just launched appropriately early :) I assume you're aware of products like stigg.


We run usage-based and tiered pricing, but we haven't gotten around to building out a self-serve "sign-up-with-credit-card" product yet. For all the advances in Stripe and automated billing, these things still take some time to implement for a short-staffed engineering team :-)


Does this include tools to evaluate for performance on out-of-distribution and adversarial images?


Yes - the tool can definitely help with that. We combine the newest embedding models with various other heuristics to help identify performance outliers in your unseen data.


This looks promising - but how is this different from tools like Aquarium Learning or Voxel51?


Those are both great tools. However, there are a number of differences, but the two most prominent are that: 1) Encord Active automatically analyses internal metrics to find the most relevant data and labels to focus on to improve model performance; and 2) it is optimised for the full 'continuous' training data workflow including the human-in-the-loop model validation and annotation.


This is amazing!!!


Congratulations Eric and Ulrik!


Congrats on the launch!

I haven’t had a chance to try out Active yet, but having had a project with Erik and the team a while back, they’re a great team to work with :)


Thank you! It was great working with you and your team as well :)


Congrats on the launch Eric!


Thanks Kyle, appreciate it! Has been very nice collaborating with you!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: