Hacker News new | past | comments | ask | show | jobs | submit login

This is an elegant solution for sequences, but it doesn't work for arbitrary iterables (which need not support slicing). While this generalization might not be needed for language ngrams, the general problem of taking n items at a time from an iterable pops up in various places.

Here's a generator that yields ngrams from an arbitrary iterable:

  from collections import deque
  from itertools import islice
  
  def ngram_generator(iterable, n):
      iterator = iter(iterable)
      d = deque(islice(iterator, n-1), maxlen=n)
      for item in iterator:
          d.append(item)
          yield tuple(d)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: