Hacker News new | past | comments | ask | show | jobs | submit login
CSVfix is a tool for manipulating CSV data (code.google.com)
44 points by pessimizer on Nov 12, 2013 | hide | past | favorite | 15 comments



I can't speak for this lib, but I've had a lot of success with csvkit [0], json2csv [1] and json directly with jq [2].

[0] https://github.com/onyxfish/csvkit

[1] https://github.com/jehiah/json2csv

[2] https://github.com/stedolan/jq


This is a great list, but IMO lacks the most powerful (but unfortunately unpopular) one:

https://github.com/dbro/csvquote

Apply it first, then do the normal processing with GNU coreutils and you'll cover most use cases.


Thanks very much! You just made my day!


I've had a good time with it. Fast, portable, feature-rich and well documented: http://csvfix.byethost5.com/csvfix15/csvfix.html


csvkit is awesome. My only complaint was that it correctly parsed a file and left my confused when I had half the number of rows I expected because I didn't realize each row contained a line break.


I've had some success cleaning up CSV data in the past using OpenRefine[0] (née Google Refine, and Freebase Gridworks before that). It is a really powerful tool for getting data in a consistent format.

[0] http://openrefine.org/


I love open refine but I like scripted events so I can repeat it easily and move on to other tools. Currently I use Python and Pandas.

I want a way to use Open Refine and export the code to Python.


To quickly examine CSV data in PostgreSQL, you can do this:

  CREATE EXTENSION file_fdw;

  CREATE SERVER my_server FOREIGN DATA WRAPPER file_fdw;

  CREATE FOREIGN TABLE my_csv (
    field_a text,
    field_b smallint,
    ...
  ) SERVER my_server
  OPTIONS ( filename 'some/file/path.csv', format 'csv', header 'true' );

  select * from my_csv;


The first python script I wrote is a fairly ugly hack that converts CSV to a SQLITE-compatible SQL so you can query it. https://github.com/elidickinson/csv-tools/blob/master/csv2sq...


I once wrote a tool to insert Apache logfiles into a SQLite database to run queries against.

I'm frequently surprised by how popular that project remains:

  * http://steve.org.uk/Software/asql


That reminds me, there's actually a funky old Microsoft skunkworks project that lets you query CSVs and Apache logfiles and all sorts of stuff via SQL: http://www.microsoft.com/en-us/download/details.aspx?id=2465 I have no idea if it even still runs on current windows versions.


The best tool I've ever used for cleaning up large datasets, CSV or otherwise, is Google Refine[1]

It takes some time to get used to the workflow, but it's very powerful and does a great job making messy data usable.

[1]https://code.google.com/p/google-refine/


I've used this in an (unfortunately unsuccessful) hackathon project - it definitely got us out of CSV-cleaning-hell.


I'm surprised no one has mentioned Microsoft's Log Parser. It provides query access to CSV, XML and many types of log files.

http://technet.microsoft.com/en-us/scriptcenter/dd919274.asp...


Way off topic but one of my favorite iOS apps is CSVtouch. It works well with .csv files and it lets me keep my data is an open format (although it can only view files and not edit them).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: