Hacker News new | past | comments | ask | show | jobs | submit login
Data Wrangler (observablehq.com)
72 points by polm23 on Aug 22, 2021 | hide | past | favorite | 8 comments



Data wrangling is the reason why BI should be done by developers.

Alas, somehow the no-code movement sold big comapny managers tools that promised to solve it all, and when things get hard, they sell “expert consulting” time to setup the reports.


I'm on a team of Devs and non Devs. We have non Dev bi people on the team, they set timelines in days for things that would take me minutes in a spreadsheet and hours in code.

It's kind of awkward.


Give them a benefit of doubt, they may have a larger context and specific experience. Usually it is not that simple.

The initial requirement may turn out wrong upon user playing with the implementation. There may be corner cases, dirty data that will need manual or automatic cleanup. Simple query that runs just fine on a small amount of data chokes up on production database and you need to debug with query planner and introduce additional indices etc etc.


It would be more awkward if it was the other way around…


Excellent executive summary for non-executives. Some of these (client) companies are so afraid of code it's bewildering. It's almost like they don't trust themselves to test & maintain even small code bases.

I've seen them go out of their way to pursue point-and-click solutions to data warehouses/lakes: thousands of ETL jobs manually coded (and manually tested) with very little "code" reuse. The inconsistencies/deviations from conventions were worse than the development waste. Consumers of these data will have to deal with inconsistencies in naming conventions, versioning strategies, and broken SCDs for years to come. Too often the architects/data modelers/ETL developers don't even know what the primary key is! (Don't lose your keys folks.)


Developers are often uninterested in data pipelines with the exception of high scale, high throughput, business critical data pipelines e.g. Analytics product offerings, backend work, ML and other activities.

Standard BI suffers from incoherent requirements and often lacking an internal customer. It's in some ways a good problem to have 100 different independent reports in case you decide to shut 50-80 of them off one day.


I couldn't quite figure out to do anything... "Group by" does nothing, and the docs say it takes effect when you use some aggregate function, of which I couldn't find any?

Would there be any advantage to using ruby or python in an interactive shell or jupyter-style environment?

There are also some spreadsheet-like tools I've found useful, most of all https://openrefine.org.


You can try group by and then count.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: