More

ptx · 2024-05-20T23:58:39

The menu key was always useless anyway, because shift+F10 does the same thing.

jonathankoren · 2024-05-21T00:01:17

I believe the windows key is just ctrl-esc

recursive · 2024-05-21T01:13:59

Doesn't work for the shortcut combinations.

ptx · 2024-05-08T23:15:21

But the equivalent pixel value depends on the root element font size, so the comment will be wrong when that changes. If you leave the math to the browser dev tools you'll get accurate results without any AI figuring out patterns.

dawnerd · 2024-05-09T01:15:27

Yep but in our workflow we’ve never deviated from 16px base. The comments in the code are purely to help when translating designs to rem in particular with tailwind.

ptx · 2024-05-08T23:07:14

Like XMLVM[0] with English instead of XML.

[0] http://www.xmlvm.org/overview/

ptx · 2024-05-08T21:11:26

How do the "special tokens" work? Is this a completely reliable mechanism for delimiting the different parts of the prompt?

Are they guaranteed to be distinct from anything that could occur in the prompt, something like JavaScript's Symbol?

Or are they strings that are pretty likely not to occur in the prompt, something like a MIME boundary?

Or are they literally the strings "<|start|>" etc. used to denote them in the spec?

sharkjacobs · 2024-05-08T21:18:34

they are "literally the strings" but I believe they will be escaped, or encoded differently, if a user tries to inject them as part of a prompt.

jffry · 2024-05-09T01:00:37

Yeah the tokens are more akin to JS Symbol.

If you're parsing untrusted user inputs into tokens, you can make sure your tokenizer will never produce the actual numbers corresponding to those tokens.

A simplified example: I can `.charCodeAt` a string all I want but I'll never get a negative number, so I can safely use -1 to mean something special in the transformed sequence of "tokens".

ptx · 2024-05-08T13:55:20

Do you have any publicly available code demonstrating this pattern?

Galanwe · 2024-05-08T16:15:49

I don't actually, but it can be explained in a few lines of code. Consider the following two simple functions:

    def ref(obj):
        return id(obj)

    def deref(addr):
        import ctypes
        return ctypes.cast(addr, ctypes.py_object).value

Basically, this relies on an implementation detail of `id()` in CPython: the unique id of an object is its memory address. `ref()` returns a reference to an object (think `&` in C), and `deref()` dereferences it back (think `*` in C). This is close to the standard `weakref` module in essence, but weakref is a black box.

Now even though the callstack is cleared upon fork of the worker processes, you still have the parent objects available, and properly tracked and refcounted, as you can check from `gc.get_objects()`. This is in fact a feature of `gc` as explained in the doc (https://docs.python.org/3/library/gc.html):

> If a process will fork() without exec(), avoiding unnecessary copy-on-write in child processes will maximize memory sharing and reduce overall memory usage. This requires both avoiding creation of freed “holes” in memory pages in the parent process and ensuring that GC collections in child processes won’t touch the gc_refs counter of long-lived objects originating in the parent process. To accomplish both, call gc.disable() early in the parent process, gc.freeze() right before fork(), and gc.enable() early in child processes.

Now whenever you want to send large objects to a `multiprocessing.Pool` or `concurrent.futures.ProcessPoolExecutor`, you can avoid expensive pickling by just sending these references.

    class BigObject: pass

    def child(rbo):
        bo = deref(rbo)
        return bo.compute_something()

    def parent():
        bo1 = BigObject()
        bo2 = BigObject()
        with Pool(2) as pool:
            result = pool.map(child, [ref(bo1), ref(bo2)])

In a real codebase though, there are some caveats around this. You cannot take the reference of just anything, there are temporaries, cached small integers, etc. You will need some form of higher level wrapper around `ref()` to properly choose when and what to reference or to copy.

Also it may be inconvenient to have your child functions explicitely dereference their parameters, it will force you to write _dereference wrappers_ around your original functions. A good strategy I've used is to create a proxy class that stores a reference and override `__getstate__`/`__setstate__` for pickling itself as reference and unpickling itself as a proxy. That way, you can transparently pass these proxies to your original functions without any modification.

ptx · 2024-05-08T18:16:38

Oh, I see. You want to avoid serializing the objects since they will be copied anyway with fork(), but the parent needs a way to refer to a particular object when talking to the child, so it needs to pass some kind of ID.

You could also do it without pointers and ctypes by using e.g. an array index as the ID:

    inherited_objects = []

    def ref(obj):
        object_id = len(inherited_objects)
        inherited_objects.append(obj)
        return object_id

    def deref(object_id):
        return inherited_objects[object_id]

Although this part needs a small change as well, so that the object ID is assigned before forking:

    def parent():
        bo1 = BigObject()
        bo2 = BigObject()
        refs = list(map(ref, [bo1, bo2]))
        with mp.Pool(2) as pool:
            result = pool.map(child, refs)

Galanwe · 2024-05-08T18:35:54

> You want to avoid serializing the objects since they will be copied anyway with fork(), but the parent needs a way to refer to a particular object when talking to the child, so it needs to pass some kind of ID.

Yes, that is exactly and succintely the crux of the idea :-)

As you found out, you can rely on indices or keys in a global object to achieve the same result. The annoying part though is that you need to pre-provision these objets before the pool, and clean them after to avoid keeping references to them. That means some explicit boilerplate every time you use a pool.

The nice thing with the id() trick is that it's very unintrusive for the caller, as the reference count stays the same in the parent process, it is only increased in the child, unbeknownst to the parent.

ptx · 2024-05-07T17:04:15

Not actually "in Excel", though. The Python code runs on Microsoft's servers (they say in the introduction) and Excel is just a client.

There's no reason they couldn't embed CPython in Excel, but maybe the intention was for the online version of Excel to have feature parity without having to compile Python to JavaScript?

cyanydeez · 2024-05-08T00:39:53

the intention is to lock in orgs to their cloud services. This is a value-add. They really know that Excel, Word are "feature complete" and the only way they're going to make money on it is by harvesting and locking in the users.

ptx · 2024-05-06T20:29:59

> fortunately I could use Ed Satterthwaite’s excellent offline debugging system for ALGOL W

What is he referring to here? What's an "offline" debugger?

ptx · 2024-05-06T17:33:38

And another HN bug, I think: Comments are wrapped in "span" elements (which are inline elements) but contain "p" elements (block elements) which is not valid:

  <span class="commtext c00">...<p>...<p>...</span>

The W3C HTML validator says: "Element p not allowed as child of element span in this context."

This causes Dillo to render the text in grey from the second paragraph on.

rodarima · 2024-05-07T21:06:21

Thanks, I reported it too by email.

ptx · 2024-05-08T10:58:22

I wonder if HN sticks to the current HTML markup (which is full of validation errors!) to preserve compatibility with third-party clients that scrape and parse it?

In that case, maybe we could get a query parameter along the lines of "useValidMarkup=1" (or a user setting might be better) to produce valid HTML for the benefit of niche browsers that expect it while preserving the current (invalid but stable) markup by default.

rodarima · 2024-05-10T10:53:59

> I wonder if HN sticks to the current HTML markup (which is full of validation errors!) to preserve compatibility with third-party clients that scrape and parse it?

I don't think so, but if it does I recommend they instead stick to the HTML (4.01 or 5) spec by default.

> In that case, maybe we could get a query parameter along the lines of useValidMarkup=1" (or a user setting might be better) to produce valid HTML for the benefit of niche browsers that expect it while preserving the current invalid but stable) markup by default.

They can guess the browser by the User-Agent and serve a "patched" version if needed (bluedwarf.top does it). But I would recommend doing against this, as it reduces the incentive for clients to get fixed and moves the responsibility to webmasters.

rodarima · 2024-05-22T21:55:55

Fixed at HN side (thanks dang).

ptx · 2024-05-05T21:37:00

Bash feels more like a C buffer overflow vulnerability than a Lisp: When you're not careful your data tends to leak across its boundaries and get mixed up with your code.

Is there a way to quote data in Bash to safely insert it into a piece of code, the way you can in Lisp?

kazinator · 2024-05-06T04:28:22

In Bash, yes. Or, almost.

The printf built-in command has a %q conversion specifier which quotes the argument such that the result can be used as input into the shell.

Such a thing cannot be inserted into a quote without affecting its meaning, but what you can do is terminate the quote before and after it.

  'stuff before'$(printf "%q" "$value")'stuff after'

In this case, because it's a single quote, we need to terminate it anyway, order to activate the command substitution syntax.

ptx · 2024-05-05T21:12:36

Is it actually valid? W3C's HTML validator complains about invalid nesting.

It might work because the HTML5 parsing algorithm produces well-defined results for broken HTML, but I'm not sure that's the same as "perfectly valid" – the spec calls such markup "misnested" and "erroneous" [0].

[0] https://html.spec.whatwg.org/#an-introduction-to-error-handl...

btilly · 2024-05-07T00:31:19

It certainly was valid according to the original specs. I first saw it as an example over 20 years ago in a debate about whether XHTML should replace HTML 4.0.