Hacker News new | past | comments | ask | show | jobs | submit | pwm's favorites login

Off topic - but this might of interest.

http://maizure.org/projects/decoded-gnu-coreutils/


That's close to the current version, but a little out of date. Here's the code running now:

    (= gravity* 1.8 timebase* 120 front-threshold* 1
       nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)

    (def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
      (* (/ (let base (- (scorefn s) 1)
              (if (> base 0) (expt base .8) base))
            (expt (/ (+ (item-age s) timebase*) 60) gravity))
         (if (no (in s!type 'story 'poll))  .8
             (blank s!url)                  nourl-factor*
             (mem 'bury s!keys)             .001
                                            (* (contro-factor s)
                                               (if (mem 'gag s!keys)
                                                    gag-factor*
                                                   (lightweight s)
                                                    lightweight-factor*
                                                   1)))))

Scientific software development (mostly biomedical), Telecommunication software development, Computer science / Budapest, Hungary / Laszlo K / kocsis1977 gmail

Details can be found here: https://www.linkedin.com/in/lkocsis


> I don't think you're really correct about "email addresses" being context-free, or at least, citation, please?

> When I look at a generic "email address" entry field on a random form on the Internet, say on the sign-up page for some hot new startup's service, I expect it to take what RFC 5322 §3.4.1[1] calls an `addr-spec`; specifically, I don't ever expect such fields to take the grammar of what that RFC calls an `address`.

Well, sure. Let's look at what RFC 5322 defines as an addr-spec[1]:

    addr-spec       =   local-part "@" domain
And how does it define a local-part?

    local-part      =   dot-atom / quoted-string / obs-local-part
Let's ignore quoted-string and obs-local-part for the moment. What is a dot-atom?

    dot-atom        =   [CFWS] dot-atom-text [CFWS]
And what is CFWS?

    CFWS            =   (1*([FWS] comment) [FWS]) / FWS
What's a comment?

    comment         =   "(" *([FWS] ccontent) [FWS] ")"
So far, all of this has been matchable with a regular expression. But what's a ccontent?

    ccontent        =   ctext / quoted-pair / comment
See that there? A comment is composed of a balanced pair of parentheses around, perhaps, another comment! Thus (this (is (a (heavily (commented (email \(address))))))foo@bar.example(some more (to prove (the point))) is a perfectly viable RFC5322 address!

Pair-balancing, of course, is impossible with regular expressions, since matching pairs requires push-down automata (which match CFGs) and cannot be done with finite-state machines, (which match regular expressions).

QED.

> Also, using that assumption, your "perfectly valud[sic] email addresses such as …" would appear to not be valid, as it has unbalanced quotes.

Nope, there are no unbalanced quotes in (this)"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"(is)@(valid)example.org(honest): the first quote balances with the third, while the second quote is one of a quoted pair \" (which is allowed within a quoted-string, which is allowed within a local-part). It's all allowed per the spec.

I'll admit that it's a bit surprising, but it's true. One simply cannot match a valid RFC5322 addr-spec with a regular expression. One can, of course, match it with something which pretends to be regular but isn't really (as I noted).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: