Hacker News new | past | comments | ask | show | jobs | submit | ethink's comments login

Hello guys,

As cleaning data takes most of our time in data science tasks

I've created an ebook to make the command line as easy as possible to do that task.

The ebook includes code snippets using the terminal dealing with lots of data from the COVID Tracking Project, Reddit users, a scientific paper discussing clickbait and non-clickbait article headlines, and more.

Used some GNU, BSD commands and command-line utilities like csvkit and the fastest tool: xsv. Some benchmark results included as well.

Be one of the first 10 who gets this ebook for free: How to Clean Data at the Command Line

Would love to see your feedback, Thanks!


Hey, would like to share How to Clean CSV Data at the Command Line: https://www.ezzeddinabdullah.com/posts/how-to-clean-csv-data...


Thanks for replying I really appreciate many comments and I respect all opinions except insults. I'm still learning and will always do and I consider my writing as a way to learn more about the field.


Thanks for sharing, will check it out :)


In-depth tutorial on how you can build a technical documentation and even write your own book with a documentation generator tool, sphinx.

Happy reading!


Thanks for noting this issue. I guess I will have to fix that to be more responsive for mobile screens.


800-point codeforces problem


A tutorial about using Tweepy, an API wrapper for twitter, to get trends. I also explain the difference between authentication and authorization when using Twitter API and some best practices


With Python and Javascript solutions: Learn how to think about Bad Triangle, an *800 codeforces problem



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: