Hacker News new | past | comments | ask | show | jobs | submit | mwotton's comments login

I used sqlite as a backend for scraping 160 million websites a day. It works extraordinarily well if you hold it right - in my case, that meant devoting a database file to each thread. (Arguably in this context, it's competing with flat files, but it does a really good job there.)


Me too. Got its quirks (I wish it noticed stack-relevant changes and ran intero-restart itself) but I find it really valuable, especially with Anthony Cowley's suggestion-handling stuff.


I've implemented this. It isn't too bad up to a certain point. You have to be a bit careful about your filesystem/layout of files, lots of filesystems don't particularly like it when you have a few hundred million files in one directory.


I have a crawler downloading 160 million websites a day, almost all in Haskell. It is a much more practical language than the alternatives.


as a contributor to the other ElasticSearch client - kudos. This is much nicer than ours in many ways :)


Thank you very much and thanks to everyone that helped me build this. You, carter, and the others in the Haskell/Haskell.au IRC channels were a big help. The thank-you list would be huge if I tried to list everybody that made a library I used or whose blog post helped me along.


Other ES client? In haskell?


yeah - worked on it a bit with Ollie Charles.

https://github.com/ocharles/Elasticsearch

it's not very principled, tbh. uses error etc rather than proper sum types for returning errors.


To be fair, errors are implicitly encoded in the HTTP statuses and I'm not doing anything about Conduit's exceptions right now. This is something I'd like to fix later.


Ah right, I thought there might be another one I didn't know about! ocharles tends to be a lot more principled these days if it's any comfort :-)


yes, I probably should have mentioned the other 90% of the project, hey :)


This is true :) been a great tool to write a spider in, fast native code, a strict type system and very cheap threads has made it much faster than it otherwise would have been. ZeroMQ has also been a great help in modularising it. I'll try to get a piece out on the technical architecture later. (I'm the CTO, btw.)


What benefit does that have over what's actually there?


My point was not to argue that something like that would be better solution, but since you're asking: Having a special syntax would make the learning curve a little shallower for newcomers. And it would simplify certain constructs -- instead of having to lift IO values or using mapM_ or whatever, you could actually deal with the results from impure functions directly, no unwrapping or rewrapping needed.

While using the type system to implement an effects system is theoretically elegant, I think it's a beautiful hack that has made the language fussier and more obtuse in practice.


That would certainly be true if purity enforced by monadic I/O were the end of the story, but it isn't. While new users create a lot of hot air about monads and I/O, intermediate-experienced Haskell users just use them for various different purposes and get on with life.

At the end of the day most of us have differing opinions on what constitutes simplicity and elegance. It's certainly true that a "pure" annotation like you're proposing is a much smaller change to introduce in an imperative setting. I recollect D or Rust or something is doing this. But in the functional programming context the monadic solution is more general, and a two-function type class with 3 (IIRC) algebraic laws is not considered an overbearing amount of complexity, though there are of course interesting alternatives with their own merits.


Sure. IO plays a part of a larger system of monads and functional programming, but it's not a prerequisite for impurity to exist -- it is more like a happy confluence of various strands of functional theory. After all, Haskell had I/O before the IO monad existed (although it was apparently not a happy solution).

Personally, I find monadic I/O theoretically elegant, but it comes at the cost of clumsiness when applied to real-world programs. To me, Haskell's "do" blocks feel like an implicit admission of this clumsiness; they are a crutch to work around the fact that having to constantly wrap and unwrap data is something of a chore.


I guess I just don't see them that way. Most of the time my code is in pure land, and I don't avoid "do" when the result is more readable. I think the ugliest thing in Haskell is probably monad transformer stacks, but that's mainly because I think they're overused by folks who create more abstraction than they need as a matter of habit.

That said, the kinds of things I do on the side with Haskell tend to have a small, well-defined I/O surface, so it could be that I'm spared the worst of it by my interests. I suppose if that weren't the case I'd probably favor OCaml more than I do.


Rust used to have a "pure" annotation, but it's gone, partially because it was a pain to have to write the annotation everywhere, partially because it's not needed for memory safety anymore, partially because nobody can agree on what "pure" means.


I wouldn't have anticipated that, but now that you say it I could see why that would lead to a debate. How would you deal with mutable data structures, for instance? What if it accesses the environment, but in a fashion you could somehow guarantee were safe? In Haskell the programmer can circumvent the system with unsafePerformIO if they know something they can't convince Haskell otherwise, but it almost seems like you'd need a "pure-but-not-really" annotation to do this kind of thing in an imperative language that actually enforced purity.


C++ walked in these very same footsteps. First by not having const, then by having it, then by allowing exceptions to constness, then by introducing const_cast and finally by allowing temporarily mutable const objects.


C++ const is defective because it's a shallow const. You can modify an object through a const pointer.

The D language "fixes" this by making const transitive (and also adding an immutable annotation, which means the object is truely read-only, as in "read-only memory").


"pure nothrow @safe" is what you get to prepend to your functions in D. I prefer Haskell's approach.


Row polymorphism has something to say about this, doesn't it? Coming from Haskell, I'm still finding concatenative programming hard to handle.


well, let's look.

map f (h:t) = f h:map f t

map f [] = []

member x (h:t) = x == h || member x t

member _ [] = False

I think the only difference is that you can't match for equality directly in the pattern.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: