Hacker News new | past | comments | ask | show | jobs | submit | lkirk's comments login

This thought led me to a strange conversation with chatgpt. I just base64 encoded "Hello chatgpt how are you" as "SGVsbG8gY2hhdGdwdCBob3cgYXJlIHlvdQ==" and sent it as the first prompt. Notice how there is no question mark in my phrase. Now here's my conversation with chatgpt:

User SGVsbG8gY2hhdGdwdCBob3cgYXJlIHlvdQ==

ChatGPT This appears to be a Base64 encoded string that translates to "Hello chattpt how are you?"

User not quite... almost but not quite

ChatGPT I apologize for the mistake. The encoded string actually translates to "Hello chatgpt how you are".

User also not quite, you were closer the first time

ChatGPT Thank you for letting me know. The correct decoded message for the given Base64 encoded string is "Hello chatgpt how are you?" I apologize for any confusion my earlier responses may have caused.

User can you now re-encode it into base64

ChatGPT Certainly! The message "Hello chatgpt how are you?" encoded in Base64 is:

SGVsbG8gY2hhdGdwdCBob3cgYXJlIHlvdT8=

I hope this helps!

I find it particularly odd that it was misspelling things in its decoding and adding punctuation where there was none. I suppose a LLM would be trained to do something like that, but the "chattpt" really threw me off. Also, it's base64 encoded message back to me decodes as "Hello chatgpt how are you?". Maybe not that impressive of a demonstration, but definitely not was I was expecting.


From what I understand, these LLMs deliberately incorporate randomness. So it's not surprising at all that some of their answers would be imperfect.

Something else to consider is that these models work on tokens, not characters, so that might also account for why it made a mistake.


Well it's almost like transliteration when you think about it, should be even easier to do than any proper language translation and it can do that reasonably well for most languages.

There probably just aren't enough examples in the training set to make it super reliable at it.


It must have learned from your prompt because this is what I got:

User: SGVsbG8gY2hhdGdwdCBob3cgYXJlIHlvdQ==

ChatGPT: Hello! I'm doing well as an AI assistant, thank you. It seems you've sent a message in Base64 encoded format. The decoded message is "Hello chatgpt how are you". I'm here to assist you with any questions or tasks you have. How can I help you today?


Yes, I would agree that technical and non-technical usability is one appealing spreadsheet use case. I think that another thing I find myself using spreadsheets for is simple data entry. By using a google sheet to enter data, you've already got yourself an authenticated, multi-user web form for easily entering data. It even integrates with google forms, which is a very handy tool for collecting data. It's certainly a lot easier than building it all yourself and hosting it.

Once the data is entered, there are much better tools for data visualization and aggregation (in my opinion). It's nice to have the "frontend" taken care of so that your time is freed up to analyze and extract meaning from the data. I'm speaking from the perspective of someone who does a lot of rapid prototyping and data analysis. There are likely use cases where this won't scale.


Is this something that the internet archive would preserve?


Can you elaborate a little here? I made the switch from classic notebooks to jupyterlab recently and find the bin experience very similar (there's a few subtle differences that I can't remember offhand, but I don't recall having any problems with cell navigation)


In my experience, there's a balance to be stricken. I really like notebooks for documenting the algorithm development process. I used to do a ton of repl driven development and Jupyter is a repl that allows you to persist commands across sessions. It saves a ton of time that I used to spend scrolling through the ipython history after closing and restarting the session. Jupyter also allows you to manage different kernels in the same environment, so it makes tasks like testing code between py2 and py3 trivial. My final point here is that there's also excellent cython integration, so you can do a lot of prototyping of cython code without having to mess with configuration or multiple files. I will agree that there are tasks better suited to IDEs, but jupyter is not just a plotting frontend, it can be used very effectively in algorithm development and the communication/documentation of the development thought process.

Typically, I have a git repo with the final code products, some of the more complex code gets written in notebooks, then transferred to git and thoroughly tested. I've been dreaming of this debugging experience in jupyter because that's still not a task that's suitable for notebooks, but I am hoping that it will come for vanilla python kernels before I can hope to adopt it.


Indeed, one could only imagine the cost of sequencing the dna to retrieve the data (not to mention the current lack of random access). Illumina's highest capacity sequencer will do 6Tb (terabases). The machine costs about half a million dollars and each run is tens of thousands of dollars, not to mention the lab costs of preparing/storing the dna. Additionally, the depth at which one would have to sequence to get _all_ of the data back reliably would be >1 meaning that every base would have to be sequence more than once (to avoid sequencing errors).


"each run is tens of thousands of dollars"

I'm not familiar with sequencing economics, would you mind explaining the cost of a "run"?


A sequencing run (at least in the context of Illumina's technology) requires a few very expensive consumable reagents: First being the flowcell (microscope slide that the dna sticks to while being read by a laser), the reagents (containing enzymes with fluorophores and other reagents for amplifying and manipulating DNA), and the actual power consumed by such machinery. This does not factor in prep/lab costs (which can be kept at a minimum with automation, but that also is a high startup cost endeavor). Each sequencing run can take ~1-3 days depending on the format.

Edit: this video may be able to explain a little better how this process works: https://www.youtube.com/watch?v=fCd6B5HRaZ8


Thanks for the explanation. I’ve seen research on graphene-based nanopore sequencing, but my knowledge and understanding are shallow.


Illumina has been stuck at 1000$ per human genome for a while.

New methods are already here, long read sequencing direct from source with minimal preparation [0]. It doesn't say much about cost, but considering reduced preparation step and smaller equipment it should be a fraction of Illumina.

[0] https://www.nature.com/articles/nbt.4125


Nanopore sequencers may require less prep and may be cheaper, but their sequencing error rate is astronomical, so you'd end up doing a lot more sequencing of the same material before you reached a consensus sequence.


I believe the nanopore sequencers would fare a lot better if the parasitic capacitance across the membrane could be minimized, by decreasing their surface area, alternatively fluorescent readout of the pore itself



I thought that it was implemented in Arc: http://arclanguage.org/


Arc is a racket dialect


Arc is an a lisp dialect that's been implemented in racket.


That's my understanding.

Early on, I started doing an incomplete implementation of Arc more like what one of the parent comments suggested, and the working title was "morc: Mock Arc Programming Language as Scheme Extension". I just wanted to learn Arc, while learning Scheme macros better.

Besides possibly adding to the confusion over the relationship between Arc and Racket (or PLT Scheme), my poor choice of title caused at least one community member to be upset, that I would mock (insult) Arc. When I meant only to be self-deprecating of my own exercise, like a poor-imitation or approximation, and not to seem to presume to be implementing a full Arc this way.


A big use case for python is glue for high performance c/c++ code. How is racket at this? Also, is there a cython alternative for racket? I would also argue that, though racket may have more batteries included, there are definitely not as many externally developed libraries. Especially important are numpy/scipy/pandas/pysam, the list could go on... That's not to say that I'm against racket, I really like the language and its level of design and documentation.


I've used the Racket FFI for some important tasks, and it's very easy to use. However, it's not as fully baked as the Python one (I've found bugs due to GC interactions) and it's weirdly slow (a few too many levels of wrapping). That said, I still use Racket extensively and enjoy it.


This is the write up I was looking for as this is the use case I've had in mind for Racket of late after reading Felleisen's LOP paper.

Would love to know more about your experiences with doing Racket FFI. How hard were the bugs around GC to discover and workaround? What were you building in Racket?


See my responses to a sibling comment for Github bug reports. I had a great experience with them: very fast triage and fixes, plus a lot of help developing a workaround so we could target a range of Racket versions. (Distros still ship some old versions!)

These bugs were found while working on Herbie <http://herbie.uwplse.org/>. We use the FFI to bind to your machine's math libraries so we can properly evaluate how accurate a floating-point expression is on your machine.


You found bugs in the FFI mechanism itself, or in FFI bindings that someone had written for a particular library?


In the interaction between FFI, GC, and hash tables, so yes, in the FFI itself, not in particular bindings:

https://github.com/racket/racket/issues/2702

https://github.com/racket/racket/issues/2263

There were workarounds (in Racket you can instruct the GC not to move your FFI objects) and I believe the underlying issues have been fixed.

I must add that I got immediate and comprehensive support from the core developers, who not only fixed the bug but also suggested the workaround (so I could continue to support all our target versions of Racket).


Thank you for your great bug report on this, and I'm glad to see it was in good hands.


While it's nowhere near as mature as Racket, those interested in this use case might also be interested in Clasp, a Common Lisp implementation built on LLVM specifically to enable the combination of Lisp-style dynamic development approaches with C++ libraries.

https://github.com/clasp-developers/clasp

https://www.youtube.com/watch?v=mbdXeRBbgDM


There's a nice FFI for C stuff, but writing glue for C++ is more work.


FFI into C++ is a lot of work in any language. A lot of projects just define a plain C wrapper for the C++ code they want to call, and call that C code from whatever their glue language is.

However, Python does have CLIF[1], which is the nicest solution for calling into C++ that I'm aware of.

[1] https://github.com/google/clif



Not in enough detail to have an informed opinion.


There's also Boost.Python, which boasts "seamless interoperability between C++ and Python": https://www.boost.org/doc/libs/1_70_0/libs/python/doc/html/i...

(In general, the Boost C++ libraries are well-renowned in the C++ world.)


I think the biggest thing, for me is Cython. I've not seen anything quite like it in other languages. It allows you to compile python code to c, with gradual typing. It also allows you to write c code inline w/ your python or interface with other C/C++ libraries. https://cython.org/ Other languages will be pressed to beat its utility (esp in the scientific computing world)


pybind11 is a modern version of the Boost Python approach to Python bindings. Header only and can be installed trivially from PyPi. I like it well enough that I even use it for binding C code. Especially because it has nice numpy array support (albeit a little underdocumented).


In my experience it’s far from what I’d call seamless but it’s definitely good enough


On Windows it is much better, thanks to COM and now UWP, which improves COM a lot regarding what is exposed across the ABI.

Yes, bare bones COM is full of boilerplate, however there are more productive ways to use it.


How does CLIF compare to PyBind11[1]?

[1] https://github.com/pybind/pybind11


Nicest for Python, I assume.

Many languages have FFI. Lua, PHP, etc. https://en.wikipedia.org/wiki/Foreign_function_interface

I don't understand the choice of Python for things like gluing together C programs. Seems like a performant mistake, at the very least.


It’s the 80:20 rule, which naturally doesn’t always holds. Often, however, a small part of the code is responsible for almost all of the performance issues.

By picking the performance sensitive areas of the program to code in C one can often code the rest of the program in a slower more convenient language.


It would be nice if this language was extensible to running on various compute cluster managers. From what I can tell, these workflows only run on one machine. I like the bioinformatics tool examples though... you can tell who their target market is ;P


I'm also a heavy user of an X-only window manager (xmonad). Learning about this makes me wonder if I should start learning how to work with another window manager so I'm not caught off guard when something stops working.


Do you know about Waymonad? It's supposedly Xmonad for Wayland and uses wlroots like sway.


I've heard of it, not sure how mature it is these days. Maybe I'll give it another look


I'm really attached to stumpwm, which is also X-only. We have been hearing reports of the pending death of X for at least the past decade, so I'm never certain how seriously to take them, but I've kept an eye on what looks like the closest thing to a wayland implementation of stumpwm (https://github.com/malcolmstill/ulubis).


You could try to transition to the sway wayland compositor (which is largely compatible with i3wm) or maybe to https://github.com/letoram/arcan which is a lot more programmable than sway


XMonad is (pardon the phun) very high level and a switch to any other window system should be trivial.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: