Hacker News new | past | comments | ask | show | jobs | submit login

It would be so great if awk had a csv mode. For whatever reason (Excel), CSV seems to be the default text format for field oriented data.

Maybe I’m dumb but I’ve never come up with a separator regex that is quite right.




The simple case is:

    FS = ","
or:

    awk -F ',' <program>
If you're working with CSV data that has quoted strings with embedded commas, FPAT is your friend:

    FPAT = "([^,]+)|(\"[^\"]+\")"
See: https://www.gnu.org/software/gawk/manual/gawk.html#Splitting...


just want to say thanks for this, I haven't had to deal with CSV for years and now only a week after you posted this I needed it.

brew install gawk

good to go

:)


For _actually_ comma separated values, just use awk -F , '...'

Is this not what you mean?

Edit: This comment might be helpful to you: https://news.ycombinator.com/item?id=22110036


The problem is that the field themselves can contain quotes ('"'), which escapes the comma. So the standard FS=/-f doesn't work properly.

It looks like FPAT from your linked article is for gawk. Gawk is great, but it's not everywhere. Still - it's good to know. Thanks!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: