Show HN: Tidy Viewer – a cross-platform CSV pretty printer for viewer enjoyment

flusteredBias · on Sept 27, 2021

I spend a lot of time in the terminal and want to quickly glance at a csv files without making a new script, opening excel, or using a tui. I made tidy-viewer (tv) because current tools like cat and column were not pretty enough.

tv modifies raw files in the following ways:

1. NA detection and highlighting 2. Printing only significant digits 3. Header and footer meta data

I have been using this a lot at work. There is a lot more work to do, but it is in a usable state.

Give it a try! If you like it then star on Github!

contravariant · on Sept 27, 2021

The NA detection and higlighting is nice but I'm not sure how I feel about showing anything other than the exact textual value. I don't mind abridging quotes when they're not necessary, but showing "N/A", NA,, etc. as the same value is a bit iffy.

psadauskas · on Sept 27, 2021

When presented with a similar problem, I tend to use non-ascii characters. For example, in my `~/.psqlrc` I have:

    \pset null ␀

Looks like this in output:

    40 |       ␀ | 2021-09-23 20:42:32.536571 | ␀
    41 |      15 | 2021-09-23 20:42:33.177474 | ␀
    42 |      19 | 2021-09-23 20:42:33.212133 | ␀
    43 |       ␀ | 2021-09-23 20:42:33.247346 | ␀

a1369209993 · on Sept 27, 2021

I was going compain about null bytes in text (never, period), but then realized you actually did mean the U+2400 SYMBOL FOR NULL[0] character itself. That's surprisingly viable (though you do now have to worry about the string "\xE2\x90\x80" ending up in your data).

0: Which is actually incorrectly named - it should be "SYMBOL FOR NUL".

flusteredBias · on Sept 27, 2021

It is rough. There are many ways that different tools put NAs, na, N/A, "", etc. in a file. To chose only "NA" would mean I would be excluding the output of other tools. I chose accessibility over specificity. #trade-offs.

contravariant · on Sept 27, 2021

Fair enough but that doesn't explain why you chose to display all of them as "NA". As you say there are lots of different ones, hence it would be a bad idea to pick one as the 'default' to display. To me it's important whether something is missing, filled with "N/A", or "null", or "Not Applicable" etc.

eevilspock · on Sept 27, 2021

Simple: provide CLI switches to let the user decide what they want for NA detection (current behavior as default, user can provide alternate NA values, per the source file or the natural language it is expressed in), and how they want them displayed, whether as-is, blank or a consistent custom value (as-is should be the default).

ayoubElk · on Sept 27, 2021

Maybe you could just background highlight the empty cells?

flusteredBias · on Sept 27, 2021

Well, I think being promiscuous with "NA", "N/A", nan, etc. is a separate issue from a blank cell. A blank cell is literally missing. That should be filled with NA.

nicoburns · on Sept 27, 2021

> A blank cell is literally missing. That should be filled with NA.

Why? "NA" stands for "Not Applicable", but a blank cell in a CSV could represent any number of things of which "Not Applicable" is only one.

flusteredBias · on Sept 27, 2021

haha. You are right "NA" stands for "Not Applicable". That is not always how people/programs using it though. What are some alternatives that you would suggest? I am happy to learn.

nicoburns · on Sept 27, 2021

I would suggest similar to what other people have suggested where you color the background of the cell red and then just display the literal content of the cell. I think it would be reasonable to have this configurable via command line arguments though, so if you like the "NA" that could also be a mode.

Perhaps it would make sense to have a "pretty" mode and a "literal" mode (which would also turn off the clever processing of numbers)?

ac29 · on Sept 28, 2021

I prefer "ND" (No Data), personally.

einpoklum · on Sept 27, 2021

First of all - kudos on tackling this task - it is indeed very annoying to get CSVs to render nicely on a terminal.

1. How does tidy-viewer compare with csvlook?

2. Looking at the demo video, there seems to be an odd fixation with "N/A". The CSV spec, AFAIK, doesn't recognize this phrase. I don't understand why someone would expect a quoted string field whose raw characters are "n/a" should be rendered as anything other than n/a (i.e. lowercase and without the quotes). I'm guessing maybe in your workflow you want to use that phrase a lot, but for a tool for the general public I'd not do this kind of interpretation; and I would leave an empty field as empty.

3. tidy-viewer seems to require "unstable library features", or at least ones which were unstable as of Rust 1.48.0 . It would be nice if you could be compatible with older rust distributions/versions.

4. Many systems, especially older ones, especially ones which you access remotely and don't have root privileges on, won't have a rust installation. It would be even more convenient if you could provide binaries with little or no extra dynamic library dependencies, which could be used on older / rustless systems. I realize this is a tall order, however.

5. What about scrolling? The worst part of viewing CSVs is having to handle wide ones which exceed the terminal width, and having decent horizontal as well as vertical scrolling ability. less doesn't cut it, because it doesn't keep the header row, plus it doesn't recognize field widths.

6. tidy-viewer does not seem to support wrapping longer fields onto multiple terminal lines.

7. When the user doesn't specify the color scheme, are you choosing one based on the terminal colors, or are you using absolute color values? I suggest the former.

8. tidy-viewer loads and parses the entire CSV immediately; and, in fact, seems to keep two copies of it in memory at once. This means it cannot be used with large files without thrashing; and even if your CSV does fit in global memory, it will still be kind of unusable, trying to dump gigabytes onto the terminal.

Bottom line: A nice initial effort, but the more serious challenges are yet to be tackled, plus needs to be more robustly cross-platform.

notafraudster · on Sept 27, 2021

The norm of treating missing data as NA exists in R (which the developer of this is clearly inspired by based on the GitHub readme.). Pandas in Python is stuck with NaN for numeric types (not quite correct) and "" or None for string types. Personally I like the choice to both explicitly render missing data in colour and to apply NA as a placeholder text to display that colour.

bofh23 · on Sept 30, 2021

> 5. What about scrolling?

The less(1) command has horizontal scrolling, just invoke it with the -S or --chop-long-lines options or toggle that feature while paging a file.

I agree it would be nicer if less(1) had a user configurable header with a format option to set it to the contents of the first line in a file or stdin or perhaps the most recent line matching a regex to allow for multiple tables and an option to make it scroll horizontally in -S or --chop-long-lines mode.

flusteredBias · on Sept 27, 2021

First of all - kudos on tackling this task - it is indeed very annoying to get CSVs to render nicely on a terminal.

> How does tidy-viewer compare with csvlook?

The most important issue to me is that csvlook is a much less pleasant viewing experience, but there is also this ...csvlook reads and parses all of the data. Try pushing diamonds.csv to csvlook. When I do it on my machine it takes 15.228 seconds while tv takes 0.0042 seconds. For this reason tv is much faster, but speed is not the goal of the package. tv's purpose is to maximize viewer enjoyment.

2. Looking at the demo video, there seems to be an odd fixation with "N/A". The CSV spec, AFAIK, doesn't recognize this phrase. I don't understand why someone would expect a quoted string field whose raw characters are "n/a" should be rendered as anything other than n/a (i.e. lowercase and without the quotes). I'm guessing maybe in your workflow you want to use that phrase a lot, but for a tool for the general public I'd not do this kind of interpretation; and I would leave an empty field as empty.

I could not say it better than this:

> The norm of treating missing data as NA exists in R (which the developer of this is clearly inspired by based on the GitHub readme.). Pandas in Python is stuck with NaN for numeric types (not quite correct) and "" or None for string types. Personally I like the choice to both explicitly render missing data in colour and to apply NA as a placeholder text to display that colour.

3. tidy-viewer seems to require "unstable library features", or at least ones which were unstable as of Rust 1.48.0 . It would be nice if you could be compatible with older rust distributions/versions.

That is a good point. I also release binaries which I think makes this requirement less needed. What are your thoughts.

4. Many systems, especially older ones, especially ones which you access remotely and don't have root privileges on, won't have a rust installation. It would be even more convenient if you could provide binaries with little or no extra dynamic library dependencies, which could be used on older / rustless systems. I realize this is a tall order, however.

With github actions I auto-build binaries for many OSes. See https://github.com/alexhallam/tv/releases/tag/0.0.13

5. What about scrolling? The worst part of viewing CSVs is having to handle wide ones which exceed the terminal width, and having decent horizontal as well as vertical scrolling ability. less doesn't cut it, because it doesn't keep the header row, plus it doesn't recognize field widths.

Scrolling is nice. To offer scrolling the only option I am aware of is turning this cli into a tui. I made the choice early on to stay chose the more minimal path and stick to a cli. The goal is to be a `column` replacement not a spreadsheet replacement.

6. tidy-viewer does not seem to support wrapping longer fields onto multiple terminal lines.

The goal is to glance at the data as a whole not a cell or fields. If there are cells with long text they get cut at 20 characters. I like this a lot. I would prefer to know that there is a lot of text that I can dig into latter, but when I am glancing at the csv I just want an overall picture. In my view tables of data are data visualizations meaning that I don't have to show everything to understand enough of it.

7. When the user doesn't specify the color scheme, are you choosing one based on the terminal colors, or are you using absolute color values? I suggest the former.

Great question. I want to eventually add the ability for users to make a config file will their own colors. At this time I just have absolute presets. If you are interested I would happily take a contribution that allows users the option to configure tv with some dotfile.

8. tidy-viewer loads and parses the entire CSV immediately; and, in fact, seems to keep two copies of it in memory at once. This means it cannot be used with large files without thrashing; and even if your CSV does fit in global memory, it will still be kind of unusable, trying to dump gigabytes onto the terminal.

That is almost true. tidy-viewer reads the entire csv, but only parses the head. If I knew of a way to get the number of rows and columns of a csv without reading the whole file then I would. I know there is a good deal more room for memory optimization. This is not my strength and I am still learning.

9. Bottom line: A nice initial effort, but the more serious challenges are yet to be tackled, plus needs to be more robustly cross-platform.

Thanks for the compliment. It is still a work in progress.

moritonal · on Sept 27, 2021

Man.. TV is such a good name for a visualiser tool that it'd be excellent if it could pretty-print any content given to it.

Does your framework support the idea of pre-parsing the file's content and selecting an appropriate renderer, or is it fairly tied to CSVs?

flusteredBias · on Sept 27, 2021

I would need more details on what you mean by "pre-parsing". It works with any deliminator, it could be comma-separated, pipe-separated, etc.

mixmastamyk · on Sept 27, 2021

Doesn't need to be pre-parsed. Perhaps give the filename to the utility instead of content via stdin. Then filename gives a hint. If there is none, run "file filename" (via library) beforehand.

inostia · on Sept 27, 2021

I actually like `tidy` better and honestly would rather have `cat a.csv | tidy ...`. But it's probably already a thing.

flusteredBias · on Sept 30, 2021

You don't have to make the alias `tv`. Feel free to make your alias tidy='tidy-viewer'.

hnlmorg · on Sept 27, 2021

Nice idea. I might actually work on this myself

asicsp · on Sept 27, 2021

Hey, if you are able to edit the title of this submission, can you add `Show HN: ` prefix?

See https://news.ycombinator.com/showhn.html for details.

flusteredBias · on Sept 27, 2021

Blast, I am sorry. It looks like I can't edit the title anymore. Otherwise I would make the change.

IgorPartola · on Sept 27, 2021

This looks great! I wonder how long it’ll be until someone posts a long ask snippet that will do something similar and claim this isn’t progress, but rest assured that they are wrong. I’m adding tv to my toolbox.

flusteredBias · on Sept 27, 2021

Thanks! I appreciate the compliment!

berlinquin · on Sept 27, 2021

Cool project! I'm familiar with column, and this looks like a good replacement.

Curious, how do you handle formatting on cells with long strings that need to overflow to multiple lines? As soon as you try to optimize the column widths for table length, you start hitting an NP-hard problem.

https://quintenkent.com/content/column-problem.html

flusteredBias · on Sept 27, 2021

I actually read that article when I started making the package. You can see some of the input data here https://github.com/alexhallam/tv/blob/main/data/a.csv. I let the user chose how long the max column width should me then append "...". The default value is 20 characters.

mileza · on Sept 27, 2021

I think it's a great first effort, but there are a number of possible improvements to do. The most obvious one would be to support passing the file as an argument instead of using cat or the redirection operator every time. It's great that it works with stdin to allow piping into it, but it's cumbersome if you just want to take a file and print it, which will no doubt be a common use case.

MobiusHorizons · on Sept 27, 2021

Do you think ‘tv <file.csv’ does what you want well enough? What is the behavior when you run ‘tv file.csv’ does it just block waiting for input?

I think it’s great for a visualizer like this to encourage people to get used to the power of shell pipelines if possible.

GuB-42 · on Sept 27, 2021

It works, but almost all UNIX commands that work on pipelines can take a list of files as arguments. Out of the commands I use regularly, "patch" is the only one that works exclusively from stdin, probably because file arguments have a different, somewhat obscure, and probably historical meaning.

If appropriate, using files as arguments instead of using shell pipelines is a best practice. Commands can optimize for that use case, print better error messages, etc...

And it is not a good thing to encourage useless use of cat. If you goal is to show how your tool is to be used with pipelines, show an actually useful pipeline for example "sed '1b;/abc/!d' file.csv | tv". The "sed" command prints the first line (header), and all lines containing "abc".

MobiusHorizons · on Sept 28, 2021

Fair enough.

Someone · on Sept 27, 2021

I think it’s a bad idea to go against half a century of conventions without good reason.

It’s very surprising to see a tool that only works as a filter, and doesn’t take file paths as arguments.

sneak · on Sept 27, 2021

Your asciinema playback made me twitch. Is the lack of a trailing space in your PS1 intentional?

thamer · on Sept 27, 2021

Some (most?) tools that output data in columns and fit each one to the largest value in that column need to scan the whole file as a first pass just to start displaying data.

Not only is it the case with this tool, but from what I'm reading in main.rs it looks like it's also loading the whole file in memory. I was going to say that scanning the file was a deal-breaker, but if true this is much more resource-intensive.

This looks like a nice tool, but these design choices seem to limit its use to relatively small files. It could be updated to have a read-ahead buffer instead and adjust its output as new lines are discovered with values of different width, although doing this without a jarring resize could be challenging.

Could someone with better knowledge of Rust than mine confirm this?

I see the full dataset being loaded here[1] and the column widths being computed here.[2]

[1] https://github.com/alexhallam/tv/blob/main/src/main.rs#L183-...

[2] https://github.com/alexhallam/tv/blob/main/src/main.rs#L218-...

flusteredBias · on Sept 27, 2021

> these design choices seem to limit its use to relatively small files

1. As a rule-of-thumb, I have been working on functionality before optimization. That said, `tv` is really fast. It is completely false that `tv` only works for relatively small files. I just pushed a 624MB file to `tv`. It ran in 2.8 seconds. With `column` it takes 5.0 seconds. Now, I would love help from programmers smarter than me. I am sure there are a lot of optimization gains to be had in `tv`. I just wanted to make sure potential users are not misled. `tv` is performant.

> Some (most?) tools that output data in columns and fit each one to the largest value in that column need to scan the whole file as a first pass just to start displaying data.

> Not only is it the case with this tool, but from what I'm reading in main.rs it looks like it's also loading the whole file in memory.

2. `tv` reads once, but parse partly. This means that it reads the full file only to grab the number of rows. It only parses(take) the first n rows.

https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...

TAForObvReasons · on Sept 27, 2021

If the goal is to calculate the correct column width, you have to do one pass through the data before writing the first row.

If the file can be read multiple times (not a UNIX stream), you can just read the file twice.

If the file is a stream, instead of retaining the entire dataset in memory, you can write to a temporary file and re-parse it after calculating the widths.

flusteredBias · on Sept 27, 2021

The correct column width is calculated from the first n rows not the full file.

A stream does not work for tv because a stream does not know how many rows are in the file a priori. Displaying the dimensions of the file is a priority for `tv`. I am very happy with that trade-off. I would rather know the dimensions of a file than have a file stream of unknown dimensions.

unclad5968 · on Sept 27, 2021

If you did it the way he's talking about you would stream through the file to find how many rows and write the file as a temp file that you could re-parse for the actual data.

I'm not saying you should or shouldn't, but your use case doesn't bar you from using streams.

flusteredBias · on Sept 27, 2021

I see. Thanks for the clarification.

eevilspock · on Sept 27, 2021

I like this idea. I don't think it would be jarring if the read-ahead buffer was a minimal number of lines, i.e. looking like distinct pages. The default could be at least the line height of the terminal, or some multiple.

There could be an option to redisplay the header row for resized "pages".

There could be a CLI switch giving the user control, i.e. make everyone happy.

killjoywashere · on Sept 27, 2021

I think data scientists will recognize this problem, and there's a well-used solution: .head()

Just show me the top 5 rows. That's all most people are looking for.

cat data/a.csv | tv --head

fiddlerwoaroof · on Sept 27, 2021

Or:

    head -n5 data/a.csv | tv

eli · on Sept 27, 2021

Unless your csv has embedded line breaks

IncRnd · on Sept 28, 2021

If tv works with embedded EOLs, you can do this:

  cat data/a.csv | tv | head -n5

It is more resource intensive, but it pushes the problem you mentioned onto tv. If tv doesn't work with embedded EOLs, then you need to fix your data or fix your tool.

flusteredBias · on Sept 27, 2021

Can you give an example of what you mean? If it breaks tv then I would like to add it to the automated tests and see if we can work on it.

eli · on Sept 27, 2021

No sorry, I assume tv is fine. The problem is in assuming `head -n5` gives you 5 rows and piping that into tv.

flusteredBias · on Sept 27, 2021

Oo I see. Thanks for clarifying.

franga2000 · on Sept 27, 2021

> Just show me the top 5 rows. That's all most people are looking for.

Is it? I'd wager that can't be more than half its use at most. Accessing a specific section that could be at any section of the file is very common in my experience, as is truly random access. Both of these, as well as the first few rows use case, are far better served by a page system.

claimred · on Sept 27, 2021

Wanted to mention that Windows PowerShell supports pretty CSV printing out of the box, like so

  Import-Csv .\Levels.csv | Format-Table

  Count Level elevation Level name Name              Object type Unique ID
  ----- --------------- ---------- ----              ----------- ---------
  1     -600.0000000    Store      -0,600 - Store    Level       dc611fed-1783-d759-053a-b19848c51491
  1     2850.0000000    Store      +2,850 - Store    Level       c59f2ae4-0e94-6ea0-bd82-8306971e628c
  1     3350.0000000    Roof       +3,350 - Roof     Level       7b487ac2-e102-dc23-6ad9-81c39124de1d

hnlmorg · on Sept 27, 2021

Shameless plug, but so does my shell, https://github.com/lmorg/murex

  $ open test/example.csv | format generic
  Login email         Identifier  One-time password  Recovery code  First name  Last name  Department   Location
  rachel@example.com  9012        12se74             rb9012         Rachel      Booker     Sales        Manchester
  laura@example.com   2070        04ap67             lg2070         Laura       Grey       Depot        London
  craig@example.com   4081        30no86             cj4081         Craig       Johnson    Depot        London
  mary@example.com    9346        14ju73             mj9346         Mary        Jenkins    Engineering  Manchester
  jamie@example.com   5079        09ja61             js5079         Jamie       Smith      Engineering  Manchester

My shell also aims to have closer compatibility with POSIX (albeit it's not a POSIX shell) so you can use all the same command line tools you're already familiar with too (which, for me at least, was the biggest hurdle in my adoption of PowerShell).

It also supports other file types out of the box too. eg jsonlines

  $ open test/example.csv | format jsonl
  ["Login email","Identifier","One-time password","Recovery code","First name","Last name","Department","Location"]
  ["rachel@example.com","9012","12se74","rb9012","Rachel","Booker","Sales","Manchester"]
  ["laura@example.com","2070","04ap67","lg2070","Laura","Grey","Depot","London"]
  ["craig@example.com","4081","30no86","cj4081","Craig","Johnson","Depot","London"]
  ["mary@example.com","9346","14ju73","mj9346","Mary","Jenkins","Engineering","Manchester"]
  ["jamie@example.com","5079","09ja61","js5079","Jamie","Smith","Engineering","Manchester"]

slaymaker1907 · on Sept 27, 2021

PowerShell is actually pretty good at manipulating CSV and JSON. However, I would definitely recommend using v7 (i.e. pwsh) since it has many improvements over v5 (default on Windows). For example, Group-Object seems to be several orders of magnitude faster using the latest version.

sixothree · on Sept 27, 2021

There's also ConsoleGridView.

https://devblogs.microsoft.com/powershell/introducing-consol...

qorrect · on Sept 27, 2021

Dang looks sick, wonder if I get it on *nix.

IncRnd · on Sept 28, 2021

Yea. That looks nice. Apparently, you can get it on linux, sort of. YMMV.

  https://stackoverflow.com/questions/39180173/powershell-for-linux-workaround-for-missing-out-gridview

IncRnd · on Sept 28, 2021

So does linux and such:

  cat data.csv | sed 's/,/ ,/g' | column -t -s, | less -S

wly_cdgr · on Sept 27, 2021

I thought this article was about a new way of understanding actual televisions

brap · on Sept 27, 2021

Same. Read this as some profound observation on what TVs actually are. Even thought to myself "huh, I guess OP does has a point..."

BrianOnHN · on Sept 27, 2021

> "huh, I guess OP does has a point..."

LMAO, me too!

Guess I've been spending too much time thinking.

Edit: this reminds me of Jimmy Kimmel's segment where they "bleep and blur whether they need it or not" so that innocent TV clips appear to have profanity/innuendo etc.

flusteredBias · on Sept 27, 2021

LMAO. I wish.

c3534l · on Sept 27, 2021

What is CSV other than a matrix of data? Its matrices all the way down.

semireg · on Sept 27, 2021

Consider adding a code snippet showing a.csv output. I had to watch a video just to see text.

Someone · on Sept 27, 2021

And a list of options. Reading the first 80 lines of https://github.com/alexhallam/tv/blob/main/src/main.rs was in some sense more educational than the readme.

It, for example, allowed me to make an educated guess as to the answer to the question “how does this handle huge files?”. It by default only reads 25 lines.

(That makes the example from the header:

   cat diamonds.csv | head -n 35 | tv

a bad example. You shouldn’t need that head in-between.

However, line 167 says

  //.take(row_display_option + 1)

That seems to indicate this reads the entire file into memory, and that guess wasn’t that educated at all.

GrayShade · on Sept 27, 2021

Not OP, but there's a screenshot under https://github.com/alexhallam/tv#1-na-comprehension.

Which is actually worse for screen reader users, I suppose.

flusteredBias · on Sept 27, 2021

I have some work to do on the README. I will show the output better. The difficulty with showing the output only is that it does not capture the coloring. Maybe I will show the output, or add a picture, or have an animated gif. Maybe all three.

6502nerdface · on Sept 27, 2021

A while ago, Two Sigma Investments open-sourced its own curses-based internal tool for pretty printing tabular data: https://github.com/twosigma/ngrid

gurgeous · on Sept 27, 2021

Hey, great start. I spend half my day in CSVs and I am definitely your target audience. Most of the time I use bat, visidata or tabview. In many ways tabview is the best, though recently the project has been abandoned.

tv looks excellent. Fun name. I think if you added a couple of features it would ascend to my toolbox:

(1) scrolling (horizontal and vertical)

(2) better command line parsing. Running "tv" without stdin or arguments should produce an error/help. Running "tv xyz.csv" should read that file.

Good luck!

udkl · on Sept 27, 2021

What you need is https://www.visidata.org/

GrayShade · on Sept 27, 2021

Cool project (and written in my favourite language), but I really hate how there's no space at the end of your prompt.

lambic · on Sept 27, 2021

And the use of cat on a single file..

netcraft · on Sept 27, 2021

I don't understand your comment - could you expand on it?

ComputerGuru · on Sept 27, 2021

cat just regurgitates the contents of the file, but the resulting piped fd is non-seekable. Since almost every command that can operate on a file from stdin can also operate on the file by name/path, at best this is just a needless invocation of a process (i.e. `tv foo.csv` should have been used instead of `cat foo.csv | tv`) - if the app in question can't handle paths, then you can have the shell pipe the file into it instead (e.g. `tv < foo.csv`). At worst, the recipient program would need to buffer the entire contents of the input if it needs to perform non-sequential operations on the source data - this is the case with things like `tac` that need to seek to the end of the input (see https://github.com/neosmart/tac for how `cat foo | tac` requires buffering but both `tac foo` and even `tac < foo` don't).

flusteredBias · on Sept 29, 2021

Love the idea! The issue is already open. I will merge a PR before the next release. https://github.com/alexhallam/tv/issues/49

netcraft · on Sept 27, 2021

thank you! I knew some of that but learned a lot too.

mthoms · on Sept 27, 2021

Google the phrase "useless use of cat".

To some, it's a faux-pas. Personally, I like the aesthetics of cat for my own scripts. It follows the "pipe flowing" idiom better.

There are performance reasons why "useless cat" should be avoided though. So avoid it where performance is important (or when some other hardcore CLI jockey is going to see your code :))

bofh23 · on Sept 30, 2021

Avoiding “useless cat” on the command line is premature optimization. Sure, don’t do it in a script that is invoked a lot but it shouldn’t be a concern when prototyping a filter pipeline.

  $ cat foo | head -4
  b
  a
  c
  b
  
  $ cat foo | head -4 | sort
  a
  b
  b
  c
  
  $ cat foo | head -4 | sort | uniq -c
  1 a
  2 b
  1 c
  
  $ cat foo | head -4 | sort | uniq -c | sort -k1nr | head -1
  2 b

hibbelig · on Sept 27, 2021

I understand that people say you can replace

    foo < bar

with

    <bar foo

but I've never been able to get myself to do this.

stavros · on Sept 27, 2021

I have to agree, I felt dirty watching the recording. The project looks great, though.

flusteredBias · on Sept 27, 2021

Love the idea! Already opened the issue and will merge a PR before the next release.

GrayShade · on Sept 27, 2021

Ah, I didn't mean a newline at the end of the output. Just a space after your shell prompt ($), see e.g. `user@~/code/data$cat` in https://github.com/alexhallam/tv/blob/main/img/column_v_tv2.....

flusteredBias · on Sept 27, 2021

lol. Got it.

BiteCode_dev · on Sept 27, 2021

It feels like when somebody opens a ( but doesn't close it.

hoosieree · on Sept 27, 2021

BiteCode_dev · on Sept 27, 2021

There is such a big latency in HN for self closing parenthesis.

dotancohen · on Sept 27, 2021

Very nice! How does it handle CSVs that are wider or longer than the terminal? How does it deal with columns that are exceptionally long, or multiline?

Often when working with large CSV files, I'll need to show or hide specific columns, especially if they are very long. Also, grepping the output for a specific line will hide the header as well, not to mention make the output unnecessarily wide if non-matching lines have longer fields than do the matching lines. So a built-in grepping feature would make this very useful.

flusteredBias · on Sept 27, 2021

> How does it handle CSVs that are wider or longer than the terminal?

Columns that are exceptionally long but cutting and appending an ellipsis if the line is over 20 characters.

> a built-in grepping feature would make this very useful.

see the following for csv data manipulation:

xsv - Command line csv data manipulation. Rust

csvtk - Command line csv data manipulation. Go

tsv-utils - Command line csv data manipulation toolkit. D

q - Command line csv data manipulation query-like. Python

miller - Command line data manipulation, statistics, and more. C

dotancohen · on Sept 27, 2021

Terrific, thank you!

treve · on Sept 27, 2021

I've made a Lotus 1-2-3 inspired CSV viewer for the terminal too. Had big plans for it, but it's just a basic viewer now:

https://github.com/evert/csv123

flusteredBias · on Sept 27, 2021

Keep at it. I think it looks great!

stavros · on Sept 27, 2021

This is quite nice, but I don't like how it cuts off the output (instead of making it scrollable). Also, why require the use of `cat`? Accepting a filename so I can do `tv foo.csv` would be much more ergonomic, in my opinion.

bityard · on Sept 27, 2021

Why waste code teaching the program how to open files correctly when the shell already knows?

tv < foo.csv

stavros · on Sept 27, 2021

Because I'd rather the program did the work and let me skip the extra keystroke.

leptoniscool · on Sept 27, 2021

CSV is still a major part of the development ecosystem, amazing that it has such staying power after all these years.

davidatbu · on Sept 27, 2021

What is the value-add when compared to using `xsv` to pretty print? Is it only the fact that it highlights NA values?

flusteredBias · on Sept 27, 2021

xsv is one of my favorite data manipulation tools. Also, the author of that package is one of the best developers I know. I use xsv with tv. I normally pipe the output of xsv to tv.

1. As you noted NA comprehension 2. Column overflow logic for different sized terminals 3. Summary meta data in the header 4. Significant digits logic. This allows users to view more columns than they would otherwise view due to decimal dust shifting the columns over. 5. This is the most import! It looks really pretty!

davidatbu · on Sept 28, 2021

Thanks!

robbiejs · on Sept 30, 2021

Seems like a great command line tool. Congrats!

FYI, I have made a browser-based tool for viewing and editing CSV files. Super fast, with Excel-like controls: https://editcsvonline.com.

flusteredBias · on Sept 30, 2021

Thanks and great work!

oaiey · on Sept 27, 2021

There should be a new package next to the traditional gnu tools containing the modern needed tools e.g. jq, curl or tv. Sometimes i really miss the extended sw package on some machines?

flusteredBias · on Sept 27, 2021

Sounds like a good "awesome" page on github.

b0afc375b5 · on Sept 28, 2021

I like httpie as an alternative to curl.

tambourine_man · on Sept 27, 2021

This looks great.

How well does it handle the edge cases of CSV, like escaped commas, quoted text, escaped quotes, and all that fun, fun stuff?

flusteredBias · on Sept 27, 2021

If you come across an edge case that tv does not handle then let me know. I will add a tests csv file as part of the current portfolio of test csvs. https://github.com/alexhallam/tv/tree/main/data

flusteredBias · on Sept 27, 2021

I have been using tv now for a couple months at work. It has been working well on the data I see. If you find edge cases then please open an issue with an example csv.

GrayShade · on Sept 27, 2021

Pretty well, I'd guess. It uses a well-tested CSV library.

watersb · on Sept 28, 2021

This is beautiful. Perl could do this, but I have to think about it every time. Often, I don't want to think.

mejari · on Sept 27, 2021

Does it really need libc6 >= 2.31? Having problems installing it on Ubuntu 16.04 LTS because of that dependency.

flusteredBias · on Sept 27, 2021

Open an issue. I would have to look into that.

frankfrank13 · on Sept 27, 2021

Brings a tear to my eye. Everything else I've used is so heavy-handed, I just wanted the jq of CSV's

timwis · on Sept 27, 2021

csvkit’s csvlook works similarly but outputs a markdown table.

xtat · on Sept 28, 2021

Very handy thanks for sharing.

flusteredBias · on Sept 29, 2021

Thanks!

sigg3 · on Sept 27, 2021

This is kinda cool, but why not just use column?

The video of catting into tv is equivalent to:

    column -s, -t FILE

mixmastamyk · on Sept 27, 2021

It shows why in the readme, the benefits are small but real. Others have recommended improvements.

iso1210 · on Sept 27, 2021

Demo looks great, alas the prebuilt binaries don't work

  $ wget https://github.com/alexhallam/tv/releases/download/0.0.10/tidy-viewer
  --2021-09-27 16:47:43--  https://github.com/alexhallam/tv/releases/download/0.0.10/tidy-viewer
  Resolving github.com (github.com)... 140.82.121.3
  Connecting to github.com (github.com)|140.82.121.3|:443... connected.
  HTTP request sent, awaiting response... 404 Not Found
  2021-09-27 16:47:44 ERROR 404: Not Found.

The deb in https://github.com/alexhallam/tv/target doesn't exist either

flusteredBias · on Sept 27, 2021

1. Looks like you are using 0.0.10 the current version is 0.0.13

2. I need to update the README the binaries are here https://github.com/alexhallam/tv/releases/tag/0.0.13

I recently moved away from manually building binaries to automated building for many architectures. I am still learning how to use github actions to build for a matrix of architectures. I am still learning.

rlue · on Sept 27, 2021

Why not visidata?

https://www.visidata.org/ https://www.youtube.com/watch?v=N1CBDTgGtOU

(It does much, much more than pretty printing, but no reason you can't use it for that.)

flusteredBias · on Sept 27, 2021

I love visidata! But when I want to just glance at a csv file I reach for tv (I used to use `column` which is more of a tv competitor than visidata). This is for a couple reasons.

1. tv gives a quick summary of the count of rows and columns

2. tv shows all columns at the bottom that don't fit in the terminal. With vd I have to scroll on wide data.

3. tv guides the eye to missing data better with NA highlights

4. tv has sigfig logic that is better. I work with files where the decimal dust can become long. Those unnecessary characters pushes remaining columns off the screen. This means the user would need scroll over to see additional columns. I generally think it is better to avoid additional key presses if possible.

5. tv is fast for large files. It does not have to read and format all of the data like vd. tv is focused more on looking at the file and not operating on file. It does not have to do as much as vd. That helps tv with what it is uniquely good at. "Do one thing and do it well"

It does not matter if your file is really wide (lots of columns) or really long -- tv will give the user a compact useful pretty print of the data. Why not use vd as a TUI spreadsheet and tv for glancing at csv files. They are both great tools in my eyes with different purposes.

saulpw · on Sept 27, 2021

Hey there, VisiData author here. Nice work with tv! I'm sure it's more useful than VisiData for certain use cases. I just want to clear some things up since there are a few misconceptions here (which will happen if you don't use VisiData a lot):

1. In VisiData, The number of rows is always shown in the lower right, and you can see the number of columns with either Ctrl+G or a list of the columns with Shift+C. Or Shift+I for the list of columns with summary statistics (mode/distinct/errors/etc). This is an extra keystroke, but the amount of data you can get with that keystroke more than justifies it.

5. VisiData will instantly open and show any file it can, and continue to load the rest until it's done or you press Ctrl+C (or quit). Everything in VisiData is lazily evaluated, so it's not actually doing any more work than tv when you view the first page of rows, and then you can see the next few pages of rows with only one keystroke (PgDn, as opposed to having to edit a command and rerun it). Fewer keypresses ftw!

A lot of people think VisiData is a TUI spreadsheet, but vd is not a "spreadsheet" in the classic sense, as it's not cell-based. Its primary use-case is exploring and wrangling tabular data. It just turns out that this is what a lot of people are doing with their spreadsheets, but they have to bend over backwards to get Excel/whatever to play nice with their data's structure. By the same token, if you try to do little single-cell formulas in VisiData, it's going to be quite difficult.

For people who like static binaries and only need to view a few rows of CSV files, or produce part of a larger report in a pipeline, tv could be a better fit than VisiData, especially if it continues to be maintained. I'm always excited to see new data tools in the terminal space!

flusteredBias · on Sept 29, 2021

Oo, I am sorry. I see I misrepresented VisiData. I apologize. Thank you for the corrections.

I have a lot of respect your work. Let me know if I can make it up to you. I would be happy to point people to VisiData in my README as a recommendation of a tool that is built to explore and wrangle tabular data.

Also, thanks for the compliment! Like you, I like seeing more data tools in the terminal.

flusteredBias · on Sept 29, 2021

https://github.com/alexhallam/tv/pull/58

I added VisiData in my README and represented it in a positive light in the description. Again, just wanted to apologize for my mistake.

#better-together

qwertyuiop_ · on Sept 27, 2021

This is why I love HN. Never knew this existed. It has become my favorite tool in the past 5 mins I installed it. Also reminds me of Mainframe programs that I encountered in the past. I wish we had more tools like this instead of electron mouse click based apps for people who prefer speed and keyboard.

d4rkp4ttern · on Sept 28, 2021

Visidata is fantastic, it is a lifesaver for viewing and getting stats on tabular data. It also handles json and others

certifiedloud · on Sept 27, 2021

Visidata is a great interactive tool. TV seems like it would be better when scripting, or in one-liners.

dotancohen · on Sept 27, 2021

For scripting I would use grep and cut, maybe awk. For scripting with CSV files, at least in my experience, you usually want specific columns from specific lines.

If TV had a switch for specifying only certain columns, that would make the job much easier.

flusteredBias · on Sept 27, 2021

sounds like you are looking for xsv. I like that tool a lot for selecting specific columns.

ComputerGuru · on Sept 27, 2021

XSV [0] can also pretty-print (minus the colors), but that's just the tip of the iceberg as far as what it can do. It's very handle for quick statistical analysis of CSV input.

[0]: https://github.com/BurntSushi/xsv

flusteredBias · on Sept 27, 2021

I love xsv! I mention in the readme that command line data manipulation tools are great compliments to tv.

https://github.com/alexhallam/tv#tools-to-pair-with-tv

baggiponte · on Sept 27, 2021

that's exactly the comment I was looking for! xsv is super powerful and I think you might both draw inspiration from one another. I read above that tv reads everything into memory: maybe you can exploit some xsv tricks to avoid that. I feel tv looks great to visualise the outcome at the end of a pipeline, perhaps with xsv. I am no Ruby expert either, but this can become a cool Homebrew binary: people on macOS will use it too!

flusteredBias · on Sept 27, 2021

I will add some Homebrew installation instructions. That is now an open issue. I want this tool to be highly accessible. Again, xsv is the best. I like the idea of small utilities that specialize in a specific task.

From the Unix philosophy:

> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".

baggiponte · on Oct 1, 2021

Sure! I completely understand and I am really glad you made this great tool to display the data in such a beautiful format. Maybe there can also be a “—head” flag to display only the top n rows by default, given that the data is read into memory?

TacticalCoder · on Sept 27, 2021

It's very weird for a project made to "maximize viewer enjoyment" to not put a space after the prompt. The one saved character on the line is definitely not worth the illegible resulting line: this doesn't maximize my enjoyment at all when viewing the examples.

hnlmorg · on Sept 27, 2021

That's a really uncharitable comment considering it the developers prompt and has nothing to do with `tv` aside from appearing in the asciinema demo.

stavros · on Sept 27, 2021

That's just the shell prompt on the recording machine. You can use your own prompt in your shell.

drcongo · on Sept 27, 2021

Some might call this comment petty, but that was making my brain itch too.