Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: An open-source tool that semantically profiles your data using LLMs (github.com/cocoon-data-transformation)
10 points by zh2408 4 months ago | hide | past | favorite | 2 comments
The problem we solve is profiling tables: this is the initial step where you need to understand the table and identify any anomalies.

During the process, many small decisions require semantic understanding. For example, missing values are normal for 'deathdate' (still alive) but abnormal for 'name.' For outliers, 100 for ages is fine, but some are -1, which is impossible! We use LLMs to semantically understand your tables and detect anomalies.

You can try it by uploading a CSV, and we will email back the profile: https://cocoon-data-transformation.github.io/page/

Let me know your feedback. Thanks!




cool project. getting insecure form warnings when submitting.

you'll want to spin up an ingress (nginx, ..) to front your requests & use TLS (let's encrypt)

edit: the CSV i used had dates in 2024 -- got this back

"Timestamps are from year 2024 which is in the future."


Thank you so much! I'll fix the warning. Yes, that's an issue with LLMs. I'm using Claude 3, which was trained on data up to August 2023. I'll add the current date in the prompt. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: