As cleaning data takes most of our time in data science tasks
I've created an ebook to make the command line as easy as possible to do that task.
The ebook includes code snippets using the terminal dealing with lots of data from the COVID Tracking Project, Reddit users, a scientific paper discussing clickbait and non-clickbait article headlines, and more.
Used some GNU, BSD commands and command-line utilities like csvkit and the fastest tool: xsv. Some benchmark results included as well.
Be one of the first 10 who gets this ebook for free: How to Clean Data at the Command Line
Thanks for replying I really appreciate many comments and I respect all opinions except insults. I'm still learning and will always do and I consider my writing as a way to learn more about the field.
A tutorial about using Tweepy, an API wrapper for twitter, to get trends.
I also explain the difference between authentication and authorization when using Twitter API and some best practices
As cleaning data takes most of our time in data science tasks
I've created an ebook to make the command line as easy as possible to do that task.
The ebook includes code snippets using the terminal dealing with lots of data from the COVID Tracking Project, Reddit users, a scientific paper discussing clickbait and non-clickbait article headlines, and more.
Used some GNU, BSD commands and command-line utilities like csvkit and the fastest tool: xsv. Some benchmark results included as well.
Be one of the first 10 who gets this ebook for free: How to Clean Data at the Command Line
Would love to see your feedback, Thanks!