Hacker News new | past | comments | ask | show | jobs | submit login

> No need to read the lines all into memory first.

It looks like that code does read the whole file:

(with a foo.csv that is 350955 bytes long:)

  % python -V
  Python 3.11.4
  % python
  >>> f = open("foo.csv")
  >>> f.tell()
  0
  >>> header, *records = [row.strip().split(',') for row in f]
  >>> f.tell()
  350955
I thought that using a list comprehension to bind header and records was eagerly consuming the file, so I changed it to a generator comprehension with

  >>> f.close()
  >>> f.open("foo.csv")
  >>> header, *records = (row.strip().split(',') for row in f)
  >>> f.tell()
  350955
nope, I guess the destructuring bind does it?

  >>> f.close()
  >>> f.open("foo.csv")
  >>> headers, records = f.readline().strip().split(','), (row.strip().split(',') for row in f)
  >>> f.tell()
  125
not as neat, though. Is there a golf-ier way to do it?*



The parent poster was pointing out that this requires having two in-memory complete copies of the file:

    [... for row in open(filename).readlines()]
The readlines return value is one copy, and the list comprehension is another copy. However, that first copy can be avoided with:

    [... for row in open(filename)]
The entire file must still be read to evaluate the list comprehension.

Additionally, this doesn't do what you think it does:

    >>> header, *records = (row.strip().split(',') for row in f)
Compare to this, using a variable for clarity:

    >>> gen = (row.strip().split(',') for row in f)
    >>> header, *records = next(gen)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: