The 25 minute value is only the time to verify a single Coq proof. I'm measuring...

h91wka · on Oct 25, 2019

> The 25 minute value is only the time to verify a single Coq proof.

Logically it is a singe proof, but technically it consists of quite a few theorems and lemmas, so this is not a valid comparison. Also proofs in Coq are typically structured and pretty high-level because of all the automation, while metamath proofs look like core dumps :)

  ${
    $d A a b v w x y $.  $d B a b v w x y $.  $d F a b v w x y $.
    $( The ` 2nd ` (second member of an ordered pair) function restricted to a
       one-to-one function ` F ` is a one-to-one function of ` F ` onto the
       range of ` F ` .  (Contributed by Alexander van der Vekens,
       4-Feb-2018.) $)
    f1o2ndf1 $p |- ( F : A -1-1-> B -> ( 2nd |` F ) : F -1-1-onto-> ran F ) $=
      ( vx vy va vv vb vw c2nd syl cv cfv wceq wi wcel wa wrex ex com23 wf1 crn
      cres wfo ccnv wfun wf1o f1f fo2ndf weq wral f2ndf cxp wss fssxp cop ssel2
      elxp2 sylib anim12dan fvres adantr ad2antlr eqeq12d op2nd eqeq12i funopfv
      wf vex f1fun anim12d eqcom biimpi eqeqan12d simpl anim12i f1veqaeq sylan2
      opeq12 syl6 com14 syl6bi pm2.43i syld impcom syl5bi sylbid adantl adantlr
      com13 com12 wb eleq1 bi2anan9 anbi2d fveq2 simpllr imbi12d imbi2d 3imtr4d
      simpr rexlimdvva rexlimivv imp mpcom ralrimivv dff13 df-f1 simprbi dff1o3
      sylanbrc ) ABCUAZCCUBZJCUCZUDZXNUEUFZCXMXNUGXLABCVHZXOABCUHZABCUIKXLCBXNU
      AZXPXLCBXNVHZDLZXNMZELZXNMZNZDEUJZOZECUKDCUKXSXLXQXTXRABCULKXLYGDECCCABUM
      ZUNZXLYACPZYCCPZQZYGOXLXQYIXRABCUOKYIYLXLYGYIYLXLYGOZYAFLZGLZUPZNZGBRFARZ
      YCHLZILZUPZNZIBRHARZQYIYLQZYMYIYJYRYKUUCYIYJQYAYHPYRCYHYAUQFGYAABURUSYIYK
      QYCYHPUUCCYHYCUQHIYCABURUSUTYRUUCUUDYMOZYQUUCUUEOZFGABYNAPZYOBPZQZYQUUFUU
      IYQQZUUBUUEHIABUUJYSAPZYTBPZQZQZUUBUUEUUNUUBQZYIYPCPZUUACPZQZQZXLYPXNMZUU
      AXNMZNZYPUUANZOZOZUUDYMUUNUUSUVEOZUUBUUIUUMUVFYQUUSUUIUUMQZUVEUURUVGUVEOY
      IUURUVGUVEUURUVGQZUVBXLUVCUVHUVBYPJMZUUAJMZNZXLUVCOZUVHUUTUVIUVAUVJUURUUT
      UVINZUVGUUPUVMUUQYPCJVAVBVBUUQUVAUVJNUUPUVGUUACJVAVCVDUVKGIUJZUVHUVLUVIYO
      UVJYTYNYOFVIGVIVEYSYTHVIIVIVEVFUVHXLUVNUVCUVGUURXLUVNUVCOZOXLUURUVGUVOXLU
      URYNCMZYONZYSCMZYTNZQZUVGUVOOXLCUFZUURUVTOABCVJUWAUUPUVQUUQUVSYNYOCVGYSYT
      CVGVKKXLUVGUVTUVOUVNUVGUVTXLUVCUVNUVGUVTUVLOOUVTUVNUVGUVNUVLUVTUVNUVPUVRN
      ZUVGUVNUVLOOUVQUVSYOUVPYTUVRUVQYOUVPNUVPYOVLVMUVSYTUVRNUVRYTVLVMVNXLUVGUV
      NUWBUVCXLUVGUVNUWBUVCOOXLUVGQZUWBUVNUVCUWCUWBFHUJZUVOUVGXLUUGUUKQUWBUWDOU
      UIUUGUUMUUKUUGUUHVOUUKUULVOVPABYNYSCVQVRUWDUVNUVCYNYOYSYTVSSVTTSWAWBWAWCW
      ATWDWJWETWFWGTSWHWKWIVBUUOYLUURYIUUNYJUUPUUBYKUUQYQYJUUPWLUUIUUMYAYPCWMVC
      YCUUACWMWNWOUUOYGUVDXLUUOYEUVBYFUVCUUNUUBYBUUTYDUVAYQYBUUTNUUIUUMYAYPXNWP
      VCYCUUAXNWPVNUUOYAYPYCUUAUUIYQUUMUUBWQUUNUUBXAVDWRWSWTSXBSXCXDXESTXEXFDEC
      BXNXGXKXSXTXPCBXNXHXIKCXMXNXJXK $.
  $}

...

dwheeler · on Oct 25, 2019

> Logically it is a single proof, but technically it consists of quite a few theorems and lemmas, so this is not a valid comparison.

I think it's 100% valid, indeed, the Metamath and/or Metamath Zero verifiers are verifying much more. I'm measuring time to verify ALL proofs of an entire system, including every theorem and lemma. The Coq measure you're quoting is the time to verify just a single proof with its lemmas. Coq is slower at verifying "a single proof of a theorem including all its lemmas" (though a complex one) than Metamath is at verifying "proofs of all theorems including all lemmas". To be fair, Coq is not designed to do this quickly, so that should not be surprising.

> Also proofs in Coq are typically structured and are pretty complex because of all the automation, while metamath proofs look like core dumps :)

You're showing the "compressed" format. That's simply its internal storage format, which is normally optimized to reduce storage space. Humans do not need to work in that format (and usually don't). Normally for final display the proofs are shown in HTML, for example:

> http://us.metamath.org/mpeuni/f1o2ndf1.html

Note that in the HTML display every step is justified by a hyperlink; you can click on the hyperlink to see the justification (recursively). For example, step 3 uses the justification "syl"; if you click on "syl" you'll find that it says that if ⊢ (𝜑 → 𝜓) and ⊢ (𝜓 → 𝜒) then ⊢ (𝜑 → 𝜒). The theorem syl is itself proved, and you can click on its links to see how it derives all the way back to definitions and axioms.

If you want to see just simple text, you can see the proof in text format by installing "metamath" and running this:

> metamath 'read set.mm' 'show proof f1o2ndf1'

As far as "structured" goes, I think structuring proofs is necessary for any system. A complex proof in Metamath is typically built up from axioms, previous proofs, and lemmas as needed. I think that's necessary for any math verification system.

digama0 · on Oct 25, 2019

It goes without saying that metamath is not designed to be read from the source directly. To be perfectly fair, Coq isn't either; you can kind of get the gist but to really understand what's going on you have to start up Coq and look at proof states. Metamath as it's meant to be seen looks like this: http://metamath.tirix.org/f1o2ndf1.html .

I agree that calling it "one proof" is a mischaracterization, but comparing all of Raft to all of set.mm is probably about right. I would guess set.mm is 10-100 times larger than Raft in terms of number and complexity of actual proof content, but it checks in a fraction of the time, in part because of that "core dump". You might think it ugly, but have you ever looked at a .vo file? The difference is metamath checks its 'cache' in to version control.

h91wka · on Oct 25, 2019

> The difference is metamath checks its 'cache' in to version control.

As I said in the initial post, there is an entire world of difference between checking the proof term and, well, proving. Coq can dump proof terms that can also be verified relatively quickly, but typically people prefer to run proof scripts in CI. And proof scripts actually search for the proof, rather than simply describe it. So, travis run time is not a terribly precise metric.

As for comparing the number of lemmas... In Coq one has "magical" tactics such as `omega' or `lattice'. Metamath doesn't have tactics (?), so I presume that's the reason they need so many lemmas in standard library; just like you need 100x time more code in asm than in Haskell.

digama0 · on Oct 25, 2019

Metamath has tactics, but they aren't part of metamath per se; they are part of the tools that you use to write proofs. You don't make the rest of the world run the same proof search you did hundreds of times, finding the same proof every time, when you could just write down the proof.

I'm aware this is an entirely unfair comparison. Coq and Metamath aren't doing anywhere near the same amount of stuff, so of course Metamath will be faster. But the question is: are you doing what is necessary? Is running omega all the time a requirement for large scale proof checking, or is it just the way Coq does things? I think you will find that it's not necessary in the abstract, but it's also pretty hard to write Coq any other way. (To be absolutely clear, I don't mean don't use omega, I mean use omega once and generate a proof, and don't make the rest of the world have to re-run your proof script.)

The sizes of things are actually hard to gauge in Metamath if you want to compare LOC to amount of human work, because what you see in a Metamath file are actually the output of some honest to goodness "tactic" or high level user interface. No one is writing asm here.

If you are not writing lemmas, and are relying on Coq tactics to rediscover the proof all the time, well, that's just demonstrably slow and redundant. In Metamath we use a combination of tactics to find proofs and optimizers to extract common lemmas and use the library of existing lemmas effectively; the Coq model leaves no space for the optimizer, so you get something... suboptimal.