And this scaling of the work factor - is that tied to timestamps in the previous blocks? (Ie - the work factor is something like max(average(time_blockN - time_blockN+1)?
[ed: from other comments, I gather that "passage of time" means "added blocks (hashes)" - but I'm still not clear on where the agreement on what constitutes "in 10 minutes on average" comes from - if A sees an addition to the blockchain of N blocks with X proof-of-work, lets call it N * X from B, and sees M blocks with Y proof-of-work from C -- it should be easy to figure out which is bigger of N * X and M * Y (here we don't really assume multiplication, but some way to figure out maximum proof-of-work).
But how does A know how long either took? Is the time local, so that A looks at the head of the B and C chains, and considers when A saw this head, and figure the elapsed time based on that?]
Yes, but it is not adjusted per-block, but every 2016 blocks (Which is 2 weeks time if each block takes 10 minutes to find).
It's also worth keeping in mind the problem that timestamps are technically forge-able by the miner. Bitcoin doesn't really solve this problem, but it does place requirements that the timestamp of a block be larger then the average of the past 11 (With various constraints). Thus if you presumably have mining distributed enough, then one giving you as invalid of a time as would still be accepted wouldn't end-up affecting things long-term since they can't generate blocks fast enough to heavily affect the average. Still technically an attack vector though.
To respond to your edit (Which I don't think I quite explained):
In that situation, A doesn't care how long it actually took to find those blocks, the scaling of the difficulty/work-factor handles that. A can rely on the fact that (Unless someone introduced or removed a significant chunk of hashing power, or broke the hashing algorithm) the current network difficulty represents the amount of work needed to represent approximately '10 minutes'. PoW below that difficulty isn't accepted, and above that would be considered proving more work then required (But is also, of course, harder to find. So it proves more, but should take longer then 10 minutes for the network to find).
Since Bitcoin calculates the difficulty to result in about 10 minutes between blocks (As I explained in my last comment), that's how long it should approximately take. It's possible B or C got lucky and found it faster then that, but statistically that is unlikely since the hash result is completely random. If significantly more hashing power is added to the network and the blocks are consistently found faster then the target time, then when the difficulty is recalculated later-on (After the 2016 blocks are done) it will go-up, making them again take about 10 minutes with the hew hashing power taken into account.
[ed: from other comments, I gather that "passage of time" means "added blocks (hashes)" - but I'm still not clear on where the agreement on what constitutes "in 10 minutes on average" comes from - if A sees an addition to the blockchain of N blocks with X proof-of-work, lets call it N * X from B, and sees M blocks with Y proof-of-work from C -- it should be easy to figure out which is bigger of N * X and M * Y (here we don't really assume multiplication, but some way to figure out maximum proof-of-work).
But how does A know how long either took? Is the time local, so that A looks at the head of the B and C chains, and considers when A saw this head, and figure the elapsed time based on that?]