Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to get back into AI?
266 points by quibono on Dec 10, 2022 | hide | past | favorite | 140 comments
I was involved in machine learning and AI a few years ago, mainly before the onset of the new diffusion models, large transformers (GPT*), Graph NNs and Neural ODE stuff.

I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.

Do you know of any good resources to slowly get back into the loop?

So far I plan on reading through the original Diffusion/GPT papers and start going from there but I'd love to see what you think are some good sources. I would especially love to see some Jupyter notebooks to fiddle with as I find I learn best when I get to play around with the code.

Thank you




I am an ML researcher working in the industry: by far the most effective way to maintain/advance my understanding of ML methods is implement the core of an interesting paper and reproduce (some) of their results. Completing a working implementation really forces your understanding to be on another level than if you just read the paper and think "I get it". It can be easy to read (for example) a diffusion/neural ode paper and come away thinking that you "get it" while still having a wildly inadequate understanding of how to actually get it to work yourself.

You can view this approach in the same way that a beginner learns to program. The best way to learn is by attempting to implement (as much on your own as possible) something that solves a problem you're interested in. This has been my approach from the start (for both programming and ML), and is also what I would recommend for a beginner. I've found that continuing this practice, even while working on AI systems professionally, has been critical to maintaining a robust understanding of the evolving field of ML.

The key is finding a good method/paper that meets all of the following

0) is inherently very interesting to you

1) you don't already have a robust understanding of the method

2) isn't so far above your head that you can't begin to grasp it

3) doesn't require access to datasets/compute resources you don't have

of course, finding such a method isn't always easy and often takes some searching.

I want to contrast this with other types of approaches to learning AI with include

- downloading and running other people's ML code (in a jupyter notebook or otherwise)

- watching lecture series / talks giving overviews of AI methods

- reading (without putting into action) the latest ML papers

all of which I have found to be significantly less impactful on my learning.


Sorry if this is a stupid question, but from a non-practitioner's perspective, how or why is this sensible?

Most of the cutting edge papers are trained on several $100k worth of GPU time, so does it even make sense to implement the algorithm without the available data & compute? How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?

Compare that to e.g. reimplementing a pure CS paper, almost anything can be reimplemented in a simple way - even something like "distributed database over 1000 nodes", well you don't technically need 1000 servers, you can just, you know, simulate them quite cheaply.

Of course there might be similar techniques for ML but I'm just not aware of them.


> how or why is this sensible?

The whole objective here is personal learning and this advice would be wildly different for how to practice ML professionally. The approach is directly analogous to advising a beginner programmer to get better at programming by actually writing computer programs.

> Most of the cutting edge papers are trained on several $100k worth of GPU time

Its besides the point, but I said nothing about a requirement that the methods that you choose to implement and learn from having to be cutting edge. More to the point, unless we have a different definition for what "cutting edge" means, you're wrong that "most of the cutting edge papers" require high computational resources. If that were true it would be nearly impossible for the field to make progress at the pace it does. There is a plethora of research in purely algorithmic approaches which do not require massive compute resources, and in fact this is the most productive portion of research to learn from because there the focus is on theory and progress in how to conceptualize / frame ML problems. Works which amount to "we took method X and massively scaled it up" are (in my opinion) less intellectually interesting to someone seeking to grow their knowledge in ML (though the results may might be extremely impressive and impactful, and it may be intellectually very interesting for the working directly on that project).

> How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?

This is like asking how you can be sure that you've correctly implemented a B tree if you haven't used it to serve a distributed database to 1 million users. The answer is small isolated tests.

One of the best ways to really test your knowledge of an ML algorithm is to design and write unit tests to assert it behaves correctly on trivial cases. You'll find bugs in your implementation, but you'll also be forced to think carefully about what the core characteristics of the algorithm are that must be asserted in order to convince yourself that its correct. Its a common beginner mistake in ML to just run/train your model and have that be the only test of its correctness. Its like deploying a web service with zero tests and letting "do I get X number of users" be the only test of your code's correctness. It sounds insane but its basically equivalent to what most beginners do in ML (my former self included).


Thanks for extensive answer.

Do you have a few of these cutting-edge algorithmic advances papers in mind, could you list them?

I guess I got too pessimistic because of things like "emergent features" [1] / "grokking" [2] that seems to happen only with a lot of compute, and also the fact that the original (vanilla) transformer architecture remains (one of) the best, despite many additional ideas and "advances" (but that is only evident at large scale) [3].

Because of the points above, it's really hard for me, as a non-expert, to assess which papers are true advancements, and which were only published in pursuit of vanity metrics (e.g. publication counts) but actually represent overfit/cherry-picked results rather than robust progress.

[1] https://timdettmers.com/2022/08/17/llm-int8-and-emergent-fea...

[2] https://www.lesswrong.com/posts/N6WM6hs7RQMKDhYjB/a-mechanis...

[3] https://twitter.com/YiTayML/status/1551657355036676096


I posted in another comment on this thread a list of papers which met these criteria for me at the time and which I learned a lot by implementing.

> it's really hard for me, as a non-expert, to assess which papers are true advancements

Its hard for me too, though I wouldn't consider myself an expert, just someone with a moderate amount of experience. Learning to discriminate important from less-important papers is another skill which takes effort to develop.


> How can you be sure that your implementation is correct, if you can't train it (hence you can't run proper inference with a good model)?

Seems impossible to know for sure. Best I can think is to train your model to overfit a small amount of data


Often it might be viable to implement prediction w/o necessarily implementing training (especially if there are published weights or a reference implementation). Not viable for papers where the key contribution is a change to the pre-training objective / training methodology / optimizer, but useful for papers where the key contribution is architectural.


> Most of the cutting edge papers are trained on several $100k worth of GPU time

You can scale some things down. VGG 16 is basically a stack of CNNs, there's no reason you need 16 of them with an input size of 224x224x3; you can just as easily watch a 4 layer CNN learn filters on inputs of size 64x64x1. Obviously if the paper's result is achieved from sheer compute this won't work, but plenty of results come purely from the architecture.

You could also implement and run networks that are designed to be really cheap to compute. ResNet/InceptionNet, for example. I think this is a pretty important part of the space right now, considering how performant, general, and therefore inefficient Transformer architectures are.


But these are "old" models from 5+ years ago. Implementing them is not going to help you get up to speed with more recent AI research. From the OPs post, it seems like he already knows these basics.


+1 on implementing papers, that's one of the best you can do to improve your skills (anywhere in science or engineering actually). A warning: I remember trying to do this back in my uni / grad days and more often than not there is key information or things (perhaps even by accident) left out of the implementation descriptions. I was more in mechanical engineering so perhaps this is less common in AI oriented papers, but I still think it's a valid thing to look out for.


> I was more in mechanical engineering so perhaps this is less common in AI oriented papers

No, you got it right. This is EXTREMELY prevalent in modern AI/ML papers, to everyone's detriment. In the majority of interesting cases, reproduction is only possible with the original code.


I think it's actually often worse in AI papers. Fortunately at least some bigger journals/conferences encourage or require releasing source code, which makes it easier to track down subtle details that the authors didn't clearly mention in the paper.

On top of that due to its dependence on data and the ability to 'fudge' statistics, a lot of AI papers aren't really that replicable even if there aren't any implementation subtleties. For example, I've run into papers on image generation which describe some trick to improve quality, but focus entirely on standardized scores without providing any visual comparisons (and thus as feared turning out to not have as much of a visual improvement as the scores would suggest on other datasets).

While in a lot of sciences or engineering many things can be attributed to being standard practice for experts in the field, AI moves too fast to have such standards and tends to be a bit too arbitrary for such standards to mean much.


This is an issue in biomedical research as well. Sometimes I've reached out to researchers who've done similar studies and ask them missing details in their methods.


Can you recommend some papers which fit those criteria?


Because of criteria 0, 1, 2 these entirely depend on the individual. However, some papers which fit the criteria for me at the time were the following:

- Score-Based Generative Modeling through Stochastic Differential Equations https://arxiv.org/abs/2011.13456

- Structured Denoising Diffusion Models in Discrete State-Spaces https://arxiv.org/abs/2107.03006

- Efficient and Modular Implicit Differentiation https://arxiv.org/abs/2105.15183

- Scalable Gradients for Stochastic Differential Equations https://arxiv.org/abs/2001.01328

- Bayesian Optimization with Unknown Constraints https://arxiv.org/pdf/1403.5607.pdf

- SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks https://arxiv.org/abs/2006.10503

- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking https://arxiv.org/abs/2210.01776


I am not sure if this might be exactly you are looking for but paperswithcode.com has a well organized selection of research with publicly available source code. Anyone trying to reproduce the code independently from the paper can always take a peek at the original source for details which may not be clear.


I liked this site initially, but decided to read the ToS and was a bit turned off:

> To the extent that you provide User Content, you hereby grant us (and represent and warrant that you have the right to grant) an irrevocable, non-exclusive, royalty-free and fully-paid-up, worldwide license to reproduce, distribute, publicly display and perform, prepare derivative works of, incorporate into other works and otherwise use and exploit such User Content, and to grant sublicenses of the foregoing rights.

So not only can Meta use these cutting edge techniques in their products without needing to request permission from the implementer, they can also sell those derivatives to anyone and practically have full ownership over what was submitted. Furthermore:

> You assume all risks associated with the use of your User Content.

and

> For the avoidance of doubt, Meta Platforms does not claim ownership of User Content you submit or other content made available for inclusion via our Website.

So, if anything goes wrong, it is the implementer’s fault entirely, but Meta may freely profit from those submissions in any way they see fit.

Sure, this is a cynical reading of the ToS, but I’m assuming the ToS will only ever be used in Meta’s favor…


Most recent papers, in NLP at least, are so sparse on detail that it is impossible to reproduce their models. And then there's the compute cost, as at least one other poster has mentioned.


I'm in the same boat kinda, but even more outdated on some parts. I had some AI specialization back in college, but that was before deep learning was even a thing, so we did stuff like self-organizing maps and evolutionary algorithms, but it wasn't really all that useful for much back then. Been following deep learning from the sidelines, but for work my AI work has been restricted to GOFAI until recently.

Some of the stuff I'm currently reading/watching or have recently

Practical Deep Learning, though it sounds like you may know this stuff already (https://course.fast.ai/)

Practical Deep learning part 2, more about diffusion models. Full course coming early next year (https://www.fast.ai/posts/part2-2022-preview.html)

Hugging Face course (https://huggingface.co/course/chapter1/1)

Diffusion models from hugging face https://huggingface.co/blog/annotated-diffusion https://huggingface.co/docs/diffusers/index

Andrej Karpathy's Neural Networks: Zero to Hero. He goes from the basics up to how GPT etc, so you can start wherever suits you (https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...)

3blue1brown's videos. I've found all his videos on neural networks and math worth watching, even for stuff that I already know, he sometimes has some new perspectives and nice animations.

brilliant.org. Nice math refresher and the courses there are almost like fun little games.


Have you had a look at https://nnfs.io/ ? I bought the book and am gearing up to start working through it, I would be interested to know your thoughts. Generally I want to chart a personal curriculum from data engineer to practical application of modern AI to real business problems.


I hadn't seen that book before.

Looks like it's a companion to this YouTube series that looks pretty interesting https://youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF...

I will check it out for sure


I'd be interested to know what you think, it's quite a time investment to study the entire book.


The neuromatch computational neuroscience course also seems quite interesting, though maybe less of practical use.

https://compneuro.neuromatch.io/

Recent research like "Relating transformers to models and neural representations of the hippocampal formation" might make it more relevant though (https://arxiv.org/abs/2112.04035v2)

quote from the abstract of that paper: "Many deep neural network architectures loosely based on brain networks have recently been shown to replicate neural firing patterns observed in the brain. One of the most exciting and promising novel architectures, the Transformer neural network, was developed without the brain in mind. In this work, we show that transformers, when equipped with recurrent position encodings, replicate the precisely tuned spatial representations of the hippocampal formation; most notably place and grid cells. Furthermore, we show that this result is no surprise since it is closely related to current hippocampal models from neuroscience. We additionally show the transformer version offers dramatic performance gains over the neuroscience version. This work continues to bind computations of artificial and brain networks, offers a novel understanding of the hippocampal-cortical interaction, and suggests how wider cortical areas may perform complex tasks beyond current neuroscience models such as language comprehension."


OP here.

For some context, something I should have mentioned in the original post but failed to do: I was not intending to do a professional pivot to an AI role; it is more of a personal interest. I used to be really excited about this stuff and am looking forward to getting involved in it again just because I find it interesting.

Thank you, I really appreciate everyone's responses.


I'm not sure why people are telling you professional stuff. You do you.

I am an ML researcher working on generative modeling. I think you have enough experience that you'll catch up quickly. But the question is most what you're interested in? With that I can give better advice. Don't let anyone stop you from learning just to learn. Not everything has to be a career. A lot of us got here because it's fun.

I do think you'll pick up diffusion models quickly. I like the explicit density side of things more and density estimation. So I like Song's works and similar with Kingma. Also check out Lilian Wang's blogs. They are a wealth of material and sources. Can't go wrong there. You'll find that diffusion and VAEs are kinda similar. The difficulty you'll have in understanding something like stable diffusion is actually the programming (at least this was the hardest part for me).

Good luck and let me know if I can help.


Im an ML researcher (reinforcement learning). I learned by implementing papers from scratch endlessly. Any specific subfield you are interested in?


What kind of programming and math proficiency is good to have before someone starts to implement papers?


I think you should be careful about dropping whatever you are doing and running back to this new iteration of AI.

Quite honestly, the opportunity all seems to be on the front end. The idea that you are going to airdrop yourself as a hands on AI programmer into this market doesn’t make a huge amount of sense to me from a career perspective.

The opportunity is with the tools and how they are applied. Building front end experiences on ChatGPT and integrations and applied scenarios.

You actually doing the AI yourself means competing with PHDs and elite academics immersed in the field.

I think knowledge of AI is far less valuable than knowledge of the emerging landscape combined with a broad understanding of different tools and how they are applied.

The new trend here is very strongly Large Language Models (LLM). You should be far more specific with what your goal is and where to spend your time.

A lot of the “AI” you are referring to seems to be no longer relevent or interesting to the market.

If you are spending time with Jupiter notebooks I would say you are probably completely wasting your time and heading in the wrong direction.

LLM is the major trend. Focus entirely on that and the tools landscape and how to integrate it and apply it. It feels like you are navigating using an out of date map.


I find the implicit assumption a bit funny that the only reason OP might be asking this is for career reasons rather than say, curiosity, the joy of learning, love of knowledge for its own sake.


That's exactly correct; I currently want to learn this more so in a hobby/personal interest fashion. I think the field is quite competitive already and I'm not fooling myself about doing novel model-architecture work professionally.


Not surprising though. I'm sure this is just my stereotype, but I swear that most times such assumption comes along with interesting words which I don't understand at all (in this context) like "tools" "integration" "landscape" etc.


And that he isn't in the "PhD, elite academics" crowd.


He can grow into all of that.


I wouldn't say that "the opportunity all seems to be on the front end". Specifically for stable diffusion, there are a lot of different ways to use the model. I think we're just starting to scratch the surface of what SD can do, so there is some value in tinkering with different ways to use and apply the model.

Example 1: have a look through here: http://synapticpaint.com/dreambooth/browse/ for some examples of dreambooth models people have created

Example 2: you can merge different dreambooth models together to varying degrees of success (the idea being, you train model A on subject A, model B on subject B, and now you want to generate pictures of A and B together). My understanding is that this doesn't work too well at the moment, but it's possible that a different interpolation algorithm can yield better results.

I do agree with the general sentiment that you wouldn't necessarily be training your own models or creating your own architecture, just want to provide the perspective that understanding the AI side is valuable because it can lead to different capabilities and products.


I'd say the "opportunity" is primarily in creating business models with these new AIs. The AI field is going to be innovating on them regardless what any individual chooses to do. Discussions about viable applications of these new capabilities are scant, beyond reducing existing technical artist head counts I see little to no discussion about new capabilities and new applications not possible before. Sure, we're developers, but we're also supposed to be entrepreneurs and this lack of creative discussion about what can be done that was not possible before is curious in itself.


Guess based on historical anecdata: I think that is because it looks like the current gen AI:s will help automating lot of stuff, but wont enable anything new. I would say we are at the analogous point where ’computer’ no longer meant a chain of humans with calculators. First the automation needs to become entrenched, then new innovations can emerge.


Can I future quote your "current gen AI's will help automating lot of stuff, but won't enable anything new" when that "something new" tears a new economic hole in our global economy?

Personally, I see this tech capable of destroying and recreating the advertising industry completely via inserting everyday consumers into ad media, depicting them as happy consumers of a product they've not used yet - while celebrity spokespeople, appearing as their personal friend, inform them how much the celebrity idolizes them for using said product. This is an obvious non-subtle application. There will be many, many more.


Current gen is not ready for that. It's ready soon enough for sure. When I said "gen" I meant "not currently" but with this speed of development I'm not ready to bet if the scenario you described is 3 months of 5 years away.


http://synapticpaint.com/dreambooth/browse/

Can you explain the value of these models, please?

This is a serious question. My eyes just see one horrendously ugly (and dystopian) eyesore after another.


I agree that the current produced images by say SD, are more of a curiosity than true art. Give it a year or so, I would say, and we will change our mind. Remember the early VR models? Not comparable with the quality you have now in real time. Only AI seems to increase in quality of output at a much faster pace.


This comment does not quite make sense: "The new trend here is very strongly Large Language Models (LLM)." Is every problem a language problem? No, of course. Is every problem going to be solved by an LLM? What about problems that require unique data sources that no LLM will ever be trained on?

And this: "If you are spending time with Jupiter notebooks I would say you are probably completely wasting your time" And how do you suggests one performs data analysis on any problem that's not an LLM -- data analysis of any kind, such as "is the model I am trying to build a front-end for even works for my problem?"


I don‘t like how this comments sounds …

Also you‘re wrong. Look at what the OP wrote and then look at how the latest models are actually built and you would see at least 2/3 of their knowledge is relevant.


There is a lot of transferable knowledge to gain from learning this stuff properly, even if you don’t expect to do core AI work in a commercial setting. Optimization, function fitting, probability/statistics, GPU programming…

My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.

Also there are still plenty of topics on which the new techniques can probably be fruitfully applied, especially if you have some domain knowledge that the math/CS PhDs don’t have.

For OP - I’m in a similar situation and have been going through Kevin Murphy’s “Probabilistic Machine Learning”, which is pretty massive and dense but also very lucid.


> My impression is that the field is more disciplined in terms of knowledge now than it was ~8 years ago - the fundamentals are better understood and more clearly expressed in literature.

Is that really true? That's not my impression at all (though to be fair I haven't been keeping up with current research as much as I used to). My understanding is that there is still hardly any knowledge on what deep learning models (and large language models in particular) actually learn. The loss surfaces are opaque, one still doesn't know why local minima reached by gradient descent tend to generalize fairly well for these models. The latent representations that generative language models learn is, with the exception of the occasional paper that finds some superficial correlations, hardly investigated at all and overall extremely poorly understood.

Very much interested in any references that contradict that sentiment.


Maybe I'm biased specifically because of the book I mentioned. For me it's providing a theoretical basis for many things that earlier I learned in a hand-wavy way (e.g. way back I took Hinton's NN course and Ng's ML course, and learned about gradient descent, momentum, regularization, training/validation loss etc, and now with this book for the first time I feel like I get the bigger picture in the context of optimization/stats).

The previous version of this book was from 2012 though and I'm not 100% sure how much of the material in the current edition is new (there is definitely a _lot_ more deep learning stuff in it).

So yeah it could be that my impression is wrong, or that I made the scope at which it applies sound bigger than it is.


I've only read (parts of) the Murphy book from 2012, I assume you're reading this one: https://probml.github.io/pml-book/toc1.pdf?

Almost all of the content that the new book covers, with the exception of the third part on deep learning, is about theory that was almost exclusively invented before 2012. Classical ML (non deep learning) is actually very rigorous compared to modern ML. There exist good theorems (statistical learning theory) for most of the classical models I'm aware of.


Yes that’s the one. Good to know, thanks for making me aware!


I disagree with this comment, and anyone reading it should take it with a big grain of salt. Let's go back to 2016 and replace "LLM" with "Reinforcement Learning". Everyone thought every problem could be solve by RL because it's a looser restriction on the problem space. But then RL failed to deliver real world benefits beyond some very specific circumstances (well defined games), and supervised learning is/was still king for 99% of problems.

Yes, LLMs are amazing but they won't be winning every single Kaggle competition, displacing every other ML algorithms in every setting.


Sure enough LLMs are not going to win every kaggle competition. But... I am fairly certain that transformers may. Embed all categorical values, scale continuous features by embeddings, and run it through a graph neural network, with high probability it will beat nearly everything.


Transformers require a lot of data to converge. There’s a reason tree models are still king of kaggle even though transformers have been around for 5 years now.


To me, the vast majority of people in the field seem to hold on to the idea that different technology suits different problems.

Also, are you aware that one of the most prominent AI tools of this month (ChatGPT) was obtained with RL!


Applied AI/ML is still a great career field, especially if you have a knowledge of physics.


Hi, would you mind elaborating, or throwing some links my way regarding the physics connection in particular? I am a physics postdoc that has been interested in the field for some time (this becoming less of a unique characteristic as the technology develops and hype cycles peak). I was motivated by Stuart Russell's Reith Lectures in 2021 to pursue the field outright and had been working towards this, but am becoming increasingly aware that seeking direct research involvement in the field is a bit quixotic starting from where I am.


I was motivated by Stuart Russell's Reith Lectures in 2021 to pursue the field outright

For anybody intrigued, here are those Reith Lectures:

https://www.bbc.co.uk/programmes/m001216k/episodes/player

https://wikipedia.org/wiki/Reith_Lectures


Thanks! The key point for me was that Russell put forward a strong case that the moral approach to the problems of artificial intelligence was to involve oneself with the evolution of these technologies where you can (8000hours style) and seek (in whatever limited capacity one can) to guide them away from the excesses and damages of the invisible hand (and the human condition it is a proxy for).

This would have been influential to me when I was a teenager and nihilistic about accelerating technology in the hands of distinctly-not-developing humankind (reading all about the Manhattan-project forebears to AI), but after devoting a decade to science and being fascinated yet boycotting of AI research and it's implications, I think it has ended up a bit too late for a change of heart and direction to have as much positive impact.

For those fortunate enough to be in a position of even trivial influence over the cutting-edge, and whose moral sophistication can therefore matter more than the next person, I hope you don't take the implications of whatever agency you may have lightly!


Anyone with even that kind of "trivial influence" can be absolutely sure than an "agency" is constantly tracking them.

I wouldn't be surprised if AI conferences have a minimum of 10 spooks from every major western intelligence agency in attendence.


One connection I've seen is energy-based models. I suppose you could try applying ML to things like fluid dynamics?

In general, it is thought that physics knowledge transfers better to ML than math because it's less abstract and physicists are more likely to be used to dealing with large datasets and software.


Thanks for the insight! I have some friends that do computational fluid dynamics, with particle physics being similarly numerical, and was looking at physics-informed ML for my own particular area in quantum physics in a recent grant application in the hopes for funding to close the gap a bit myself. What is so powerful about ML and related statistical techniques is their versatility and genericity, so a project that can be benefited by that region of statistics tends not to be too far away. I will look into energy-based models, too.


His comment is quite good, I do work with physics informed ML for cfd and other dynamical systems (temperature, hydrology etc.), there is just a ton of opportunity and funding for this type of work in research. Coming from a typical physics education, where you’re learning quantum and Astro, and realizing that 90% of the physics funding from government is in the earth sciences and the related physics was eye opening. I felt shortchanged by my physics education not even including fluids etc.


It was the same here - fluid dynamics was an elective at my university as well (one I took, but still not core syllabus). I guess amount of funding for a domain depends strongly on impact, and in the earth sciences output is much more immediately tangible than uncovering another supremely true but at-the-time inapplicable pattern of physical behaviour in the quantum, or context to humanity in the astronomical or cosmological domains.


check my reply to the other comment


Great! Thank you for taking the time. With that information I can look into a more local equivalent.


Could you elaborate? I’m in the AI field myself but as a software engineer. But have a PhD in physics


https://www.usajobs.gov/Search/Results?k=physical%20scientis...

If you're already in the field you'd likely have to take a pay cut, but the federal government is always interested in research physical scientists and will snatch up anyone who knows the physics and also knows ML.


One thing I'd like to add is that you do not have the computing power to generate a large language or vision model. Period. Unless you have hundreds of thousands of dollars for compute time you are just not going to do anything interesting with model building and AI.

Upgrading existing systems with AI is probably where it's at using existing models like stable diffusion, GPT-3 or some of the smaller downloadable language models if the task is very simple and the economics of using GPT-3 don't make sense.


Not OP but thanks for this response, as someone on the front end with a passing interest in AI this helped me recalibrate my thinking on this.


Is LLM even applicable to many things though? If we want more nuanced and contextual vision, does LLM help?


Multimodal models are also a red hot area. The interesting thing might be to combine specialized instances of LLMs and other models together, where each runs a specific subtask or event processing loop.

I think it'd be fun to use vision models to pick up the interesting parts of an image, describe only the high-yield context as English text, and put them in a sort of Unix pipeline or node graph to connect it to other models that can input/output text. With fine-tuned or prompt-engineered LLMs as the intelligent centerpiece.


Fine tuning models is where the future will be, it doesn’t necessarily mean LLMs but I suspect that LLMs will shortly become multi modal and at that point they will be the ultimate models to fine tune for a given task


Thanks for the comment.

>Fine tuning models is where the future will be

Who would request fine tuning? Businesses?

Are you suggesting to build some sort of data set broker?


Businesses who have their own data fine tune trained models (this is already a thing). So you need model repositories (like hugging face) not dataset brokers.


The LLM component is almost correct, but not exactly. The whole point is the transformer architecture, which really is just graph neural networks. Once you start seeing things through these lenses, the possibilities are endless and it starts making sense.


LLM is the major trend. Focus entirely on that and the tools landscape and how to integrate it and apply it.

Rather than jump on the same horse that everyone is jumping on, maybe one should start looking at where language models fail -- and from the very nature of how they are conceived and what they are made of -- will most likely never be good at.


Most failures of models are usually 1-2 research cycles of being updated, the amount of brain power dedicated to this problems is astounding


There is a huge opportunity on the backend for ML Engineers, running these models in production is expensive.


Yes this. Be a subject matter expert who knows how to take the technology and apply to some problem in a novel way. Don't be the method maker because you will always be behind. How do I know you will always be behind? If you weren't behind then you wouldn't ask questions like OP.


Nearly everyone in a field starts out behind.

MS and PhD students start out behind and they’ll spend most of their time during the next 2-5 years on irrelevant things (e.g. 95%+ of their graduate coursework will be behind the state of the art; most PhDs will focus on a niche project that fails to have major impact or relevance by the time they graduate).

It sounds like OP just wants to learn out of general interest, which is fine. But others shouldn’t be discouraged. A sufficiently-dedicated person with talent and a strong classic ML foundation can catch up reasonably fast, to the point of getting their foot in the door professionally.


Exactly this. Having a MS/PHD does not guarantee that you are on the forefront of a field. When I started my MS program, I was behind, and when I finished, I was behind.


What would you say about someone new to all AI starting by working through the Neural Networks from Scratch book? https://nnfs.io/


Any resources you can suggest to get this:

"knowledge of the emerging landscape combined with a broad understanding of different tools and how they are applied."


This is a good reply, but the OP might be generally interested in the fundamentals of ML


Similar situation as you. I stopped keeping up around 2015ish.

I just read Francois Chollet's Deep Learning with Python and found it to be a fantastic high level overview of all the recent progress. There's some code, but not a lot. I mostly just appreciated it as a very straightforward plain-language treatment of RNNs, CNNs, and transformers.

Now I'm going through Stanford's CS224 lectures.

I'm sort of planning to read papers but as some other comments have pointed out, I'm less sure of the ROI on that since I'm not sure how feasible a future in AI is for me


Following. I got off the train about two years ago to work more in engineering. The way I see it, if you’re not a research scientist - this field is best addressed as an ML engineer as there are more challenges in systems. Would love to be proved wrong.


Hey,

I am a research engineer/applied ai person building vision models in healthcare domain. I am currently preparing to transition to engineering roles like you did. For that, I am currently going through web dev - both frontend and backend. Would love to get some pointers from you on my approach and any recommendations from your side. Thanks!


The best advice I could give:

Build out your portfolio so you have projects that you can speak to and show your tech depth.

FWIW, I do AI backend at FAANG and mentor as tech lead.


Possibly too basic for you, but I was hugely impressed with https://www.nlpdemystified.org/course (found on HN a week or two back). Each chapter has a large jupyter notebook with lots of annotated sample code.

Also there's no cost, no book to buy, no email signup, it's just a guy sharing knowledge like the old days. Great course.


> I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.

It sounds like you're into it already. And you already know which new papers are interesting to you.


Maybe it's just me, but it sounds like you're only interested in getting back into AI because it is currently in the limelight. "Working in AI" a few years ago doesn't mean much if you have nothing to show for it. What I'm getting at, is your motivations don't seem genuine; there's nothing that tells me you care about the technology advancing more than a paycheck.

As someone with 10 years of professional experience in software, I find every AI "trend" that has come up in that time to be incredibly odd. It is certainly remarkable what chatGPT, StableDiffusion, and other examples are doing today... Ultimately people are giving waaaaaaaay too much credit without understanding the technical details. These are pidgeon-holed examples that still aren't solving any real problems.

AI is still just statistics with marketing.


On the side, I make cheap Wordpress sites for small businesses (mostly as a favor to folks I know personally) and always had a hard time writing good marketing copy.

I’ve found that what ChatGPT comes up with is often a great start and has already saved me hours of time. Is it as good as paying a professional? Probably not. But I think it’s fair to say these models are already solving real world problems, even if they still need a bit of a helping hand. Just my thoughts, I’m not an expert on the models themselves.


Also when programming, I run into small things four or five times a day, that I used to Google for, maybe search Stack Overflow. But there is a lot of blog spam to wade through and finding the right search terms is hard.

Now I just ask ChatGPT.


> I think it’s fair to say these models are already solving real world problems

> mostly as a favor to folks I know personally

Scale is where the rubber meets the road. I'm glad ChatGPT has been helpful to you, but it is still a gimmick for the majority.


For sure. I definitely can't speak to how well it would work at scale.


Huh, WTF? What's wrong in getting interested in something because there is new progress in it? And how can you assume he is only interested in it because of the money when he only asked about how to learn, play around with code and didn't ask anything about finding a job or projects to work on?


It's disingenuous. There's a difference between getting into a topic because it interests someone, and getting into a topic because it's incredibly hyped right now. Do you really believe it's the former...?

Considering OP mentioned a lot of recent hype trains relating to AI, it's perfectly reasonable to assume they are chasing dollars, rather than contributing to the underlying technology without further expectations.

So, OP comes off as very disingenuous when they lead with the hype and provide nothing of substance.


> Maybe it's just me

Maybe it is and he's simply somebody with an interest for the topic and some progress to catch up with.


Augmenting writers is one (ab)use of GPT. And getting bespoke stack overflow copy-paste source-code (for better or worse) too.


> AI is still just statistics with marketing.

I would say AI is the opposite of statistics, for good or for bad.


To be honest, for transformers just go to huggingface.co and see what interests you. They have tons of examples to run and they also link to all the papers in the documentation. It doesn't get much easier to get into it. Even for the more recent stuff like vision transformers and diffusion models.


The “hard” part is taking an example and adapting it to new tasks. This is the best way to learn imo


How are you software engineering skills. Thats the biggest gap i currently see at my current employer. Way too many data scientists not able to make an impact because they are not able to put their notebooks into a product and run it in prodction.


I partly agree with this but I think it's a bit overrated. There's no massive barrier to getting an actually good model into production. If the model seems promising but no one can figure out how to productionize it, it probably has fatal flaws as a model. There's no guarantee that a mess of Python code that somehow produces a nice AUC curve is actually doing anything valuable. As in science in general, there are many ways to fool yourself in data science.


I am not saying its a hard skill or anything. But asking your manager everytime to go find someone to put your notebooks in productions is not good strategy to succeed at work. If you can get your notebook pretty far into production and have software engineering skills it goes a long way.


I completely agree, I think we're saying the same thing. I guess I'm just emphasizing that it's hard to believe that such a person is even a good data scientist. Throwing everything over the wall is denying yourself a lot of valuable lessons about data science.


>I guess I'm just emphasizing that it's hard to believe that such a person is even a good data scientist.

Odd take to be honest. Like suggesting you cannot believe someone is a good materials scientist, because their CAD/mechanical design skills are not up to par with a mechanical engineer.


Not sure how CAD skills are relevant for a materials scientist, but the ability to put models into production is very relevant to data science.

If you always throw over the wall, you probably don't understand how results that look good in a Jupyter notebook can easily fall apart in the real world.


I see this too. So many well intended projects that were scrapped because management didn't see a business case from jupyter notebook results...


What is involved in doing this and why isn’t there a service that exists to make it easy? Is it all very bespoke?


This should jog your memory and catch you up with the latest and greatest.

https://course.fast.ai


This is the course I used last time I wanted to get up to date, I would recommend.


I'm curious what you discover as I did some AI decades ago, and now I have a new AI problem. I'm trying to research how to build a generic self learning board game agent for my platform ( https://www.adama-platform.com/ ) as I've reduced the game flow to a decision problem. How I intend to start is to experiment with simple stuff at a low level, and then use that experience to find out what to buy.


I signed up for Jeremy Howard’s second AI course at the University of Queensland. The course lectures were streamed live to participants around the world in October and November. An online forum organizes everything.

When this course becomes public next year, I think it will be a great way to get caught up. In the meantime, you might still be able to pay the AU $500 fee and watch the course content, which was all recorded, if you are anxious to get going.


Personally I think that contributing to an open source community is the move. Join the Eleuther Discord. Futz around on Hugging Face. Play with notebooks on Uberduck. Have fun!!! Gatekeeping is dumb.


Eleuther has many members who are very difficult to work with.

Also, if they ever make another dataset like the pile, don't expect to get your own data into it (even if they say that they will add it).

I'd work with other groups if I were you.


The latest development in AI are very cool, but a big lure in my opinion.

You have two options:

1. Work full-time for companies doing this state-of-the-art stuff (OpenAI, Meta, etc.)

2. Work full-time for a (good) AI company that is doing interesting AI work, but most likely not based on GPT/SD/etc.

In both cases, you will learn a lot. Anything else seems like a costly and dangerous distraction to me.


Read all the leading papers, many times, to get a deep understanding, the writing quality is usually pretty low, but the information density can be very high, you'll probably miss the important details the first time.

Most medium and low-quality papers are full of errors and noise, but you can still learn from them.

Get your hands dirty with real code.

I would take a look at those:

https://github.com/geohot/tinygrad

https://github.com/ggerganov/whisper.cpp


Disagree. The information density of 99% of ML research papers is not high. It's usually just one idea that's often just an evolutionary modification of something that already exists, or something that worked in a different ML area applied here, along with an intro, maybe some supporting math, and some experiments. The information density might seem high if you're not familiar with previous papers that are referenced. I also don't agree that medium quality (how to decide what's medium quality anyway? includes main conferences?) papers are full of errors.


This comes from my experience with Computer Vision papers about ten years ago.

I've done some work in the field, and I've seen errors in almost all papers.

Sometimes they are not really errors, but they give a solution that only works in the very specific context they tested it and completely falls apart anywhere else.

I've lost countless hours and days studying and replicating papers. Of course, ML could be an entirely different experience.


I do expect that it was different back then. The replication situation has improved somewhat lately but it's still not great.


If you're something of a snakeoil salesman with some, but not deep, technical knowledge, then AI Ethics is for you. There are (or were) companies out there who would pay big bucks for folks to tell them some models potentially could be biased or otherwise discriminatory because of poorly selected data. Wash, rinse, repeat and see your paychecks roll in.


Related question, what's the current state of paraphrasing text with off-the-shelf Python libs — PyTorch/Transformers?


Very easy to do.

Just use Hugging Face.


Please give an actual answer next time.

The OP wants specifically sentence-transformers, which are hosted on huggingface but it's a separate library.

You'll want to go with whatever models that the sentence transformers documentation recommends.


Yeah, last time I played around with it was something like FB/Bart-CNN but I was under the impression that things have evolved/changed a lot since then.


> I am comfortable with autograd/computation graphs, PyTorch, "classic" neural nets and ones used for vision-type applications, as well as the basics of Transformer networks (I've trained a few smaller ones myself) and RNNs.

Maybe I am a bit off the track. But how do someone reach this state?


You can try out fast.ai's courses.

Also there is a youtube series by Andrej Karpathy that goes into much more detail and builds a library from scratch.


Similarly, my linear regression and decision tree ML skills are feeling really outdated :)

But understanding AI fundamentals gives me a fresh perspective on how to build applications that leverage ChatGPT (for example).

Crafting the inputs to achieve desired outputs. Training the models with a corpus of data relevant to a niche industry, etc...


What's your goal?

For example, do you want to develop models as a hobby? Make models or software for a living? Use AI in some particular problem domain?


I meant more as a hobby, though I would love to practically apply some of these models in a little side project.


Everyone is different, but I'd have trouble making time for a project like this unless it served some goal I truly cared about.

If you're the same, is there something more specific to help you focus and put in the time?



In the same spirit, what are the three learning algorithms to implement for learning? My to-do list has:

1) Refresher: RNNs, deep net classifier

2) LSTMS

3) Self Attention anything

Any other suggestions for getting back in the loop?


Do you guys expect DL to have longevity generally speaking, or is the usurper already on the horizon?

There are a few non equivalent universal approximation approaches. I’m not sure I fully understand why this is will end up being the one even on a 10 year horizon.


By Deep Learning you mean neural networks with more than one hidden layer ? I've seen some results from massive random forests that are on par, but I don't know of anything that is set to replace the NN architecture, which at this point has been around in one form or another since 1957.


For diffusion models I recommend you this blog post https://eugeneyan.com/writing/text-to-image/


I recommend this video from Andrew Ng: https://www.youtube.com/watch?v=avoijDORAlc


Find the paper that got you into that (or anything else) then see who's cited it since and catch up that way.


Start with survey papers. Starting with GPT is jumping in the middle. It's better to know the big picture first.


+1 to this.

This is the way to start a PhD research (assuming OP is trying to stay up-to-date in this area by "get back").


AI is so much more than the eye-catching generative art experiments.


don't




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: