One of the best math teachers on YouTube has to Professor Leonard. He has a lot of different courses and doesn't shy away from details. His way of teaching really worked for me.
If you follow along you will get a solid foundation accompanied by a lot of "aha" moments.
Mathpix PDF search is fully visually powered and does not use underlying PDF metadata, even working on handwriting. Itβs a great choice for researchers (especially in STEM) who want to build a searchable archive of PDFs.
Amazon Textract does a phenomenal job of extracting text from dodgy scanned PDFs - I've been running it against scanned typewritten text and even handwritten journal text from the 1880s with great results.
Lucien Hardy wrote a paper[1] showing how one "naturally" ends up with a quantum theory by demanding a few reasonable axioms. The paper also goes into how this implies complex numbers (and rules out quaternions).
Scott Aaronson has a more accessible (and humorous) article on it here[2].
Entanglement has been shown to be intimately linked to this[3] result, which is interesting given the experimental evidence[4] for entanglement.
Not my field, but I found this interesting at least.
(A) Frame the full picture in your mind before you start to write [1, 2]. A short story: Someone once wrote a long letter to a friend, and finished it with something like "Sorry, I did not have the time to write a shorter one!" [3] I.e., it can take longer to write shorter! The reversal comes from conceptualization and framing of what you want to say, not from the actual writing itself. Use mind maps for this if that would help.
(B) Be conscious of word choice. Check even those words out in a dictionary which you think you already understand. You may soon have a refined judgment on when to check.
(C) Introduce time breaks before reading your creation for self reviews. Writing involves intense thinking which is a subject to confirmation bias. Reading your own writing a few days later will give you a flavor of what it feels like as a reader.
(D) Close the feedback loop with the actual readers, whether live or offline. Watch for the discussions that happen on the subject and see if those discussions could have been avoided by writing differently, thereby driving consensus faster.
>> which of my edited drafts should I choose as final version.
Why not have only one draft, a single working copy of the doc? :-)
There are application features and behaviors that all users need, majority of them need, a minority need, some need. Users have personal preferences and individual information that the software needs to know.
We tend to handle some of the above using code, some using configurations.
Guidance #1
Configuration is not a solution across the board. Configurations are often a premature optimization towards saving future efforts.
Code _is_ configuration for the processor. Coding has development methodologies and tools designed by the industry over decades, most of which is not applicable to configurations. So don't make code run-time configurable. Change the code as and when needed. Refactor.
Exceptions:
- For personal preferences and individual information.
- If runtime behavior of the code must be changeable without rebuilding the code.
Guidance #2
If the software behavior can be changed with lesser number of lines of configuration than the number of lines of code, develop better abstractions in the code. (Do not invent DSLs, create better abstractions in the code itself.)
Keep code configurable via hard coded configurations at an appropriate place somewhere within the code. This encourages modularity. However, limit the flexibility to at most 30% development effort overall overheads above and beyond the currently known requirements. If development efforts overheads for the flexibility is much more, that flexibility is a premature optimization (keeping in mind, you aready have flexibility via ability to change the code). If you are not thinking above and beyond the currently best known requirements, you may find the requirements changing faster than what you can keep pace with.
Guidance #3
Instead of configurations, find a more specific alternative.
- Machine Learning models are technically code configurations, though we do not see it that way. ML comes with needed tooling to manage.
- Knowledge graphs.
- Data exchange file formats.
- Etc.
Guidance #4
Configurations are not for SDEs.
Identify the owner who would be responsible for changing the configurations and see them as customers, in the current phase of development. Think of what help and tools are you providing them to manage the configurations.
Guidance #5
Do not let the space of configurations multiply. Configurations parameters must be modular (i.e., independent) just like code.
Just as functions having more than three parameters should be avoided, same applies to configurations impacting the behavior of a function. Avoid more than three of them taken together for any function.
Guidance #6
A configuration is also a contract. Pay no less attention to it than to function interface or API design.
Configurations need to be equivalently documented. Think of them as command line arguments. If the user needs to know the implementation internals to understand the command-line arguments, default against having them.
Guidance #7
Backward compatibility and blast radius reduction are not valid arguments to have distributed configurations. Depend on automated and manual testing instead. Reason correctly about how many of the users would need the variation in software behavior.
If a code change is to be made (i.e., it makes sense), try to apply it everywhere. (I presume when Microsoft went to fixed and mandatory Windows update cycles, it would have helped them a lot.)
Guidance #8
If different Product Managers serving different regions or types of users ask for different requirements and you as developers see no valid reason for it, make them talk to each other and document their collective reasoning before getting back to you.
I have held many such sessions for kids. Instead of showing them the latest and the greatest:
- I point them to the technology already around them, in their daily use, that they see as too obvious by now. And then share stories of how all that had come about to be. Simple things like soap, door handles, stairs, pencils, clocks, ...
- Ask them simple questions that they never asked. How does an eraser erase pencil marks? How is mass conserved as a tree grows out of a seed? Why do women typically keep long hair while men keep short? Why don't animals do their own photosynthesis instead of depending on plants (or why don't plants also move around like animals)?
- Another session I am planning will share bios of many famous people, showing them how extraordinary came out of the ordinary.
It seems surprising to me that we teach them about planets, exotic natural phenomenon like chemical reactions, magnets, etc., without first talking about much more relevant things like why does matter occupy space (or why don't we just fall through the floor below us). The result is kids (and adults) who commonly talk about voltage without having slighest idea of what it actually is.
I wish I had recorded the session I gave. It was several times more interesting and engaging than what the slides read on their own.
I haven't read this book, but have read a blog by Pinker on the same lines. I just also skimmed through the Wikipedia link you mentioned.
The book would not be a recommendation from NLP algorithms perspective. I am sure though that it would be a good book as Pinker is a mind-opening writer.
I'll nevertheless give you some deep food for thought relating to languages:
Check out slide 14 in my talk, especially the remark on its right side. It is a powerful thought.
There is an equilibrium process involved in the shaping of the language over the time, though language must by definition have at least some standardization for it to work, which means it would resist change.
What Pinker is saying at the high level is that the official rules of grammar sometimes deviate away from the equilibrium point. For example, technical jargon and acronyms are often easier for the speaker than the listener. A poor handwriting is likely less tiring for the writer than easier for the reader.
There would also be deviations which make it harder for both the speaker and the listener.
Yet, in both of the cases above, the language would resist change.
As writing came into being (keep in mind that we have been talking for a few million years, and writing only for a few thousand, so no comparison!), written material stands for much longer than vibrations in air molecules, or our memories, do. That further slows down language velocity as standardization now sits across time too.
We now stand where language is standardizing across the world, and getting frozen in the Internet, there's further velocity reduction involved. (Albeit, newer concepts are adding increasingly faster to the languages...)
How do you, in such cases, make the above deviations from the optimum go away?
There are some rules of language, grammar, that shouldn't be. They are like legacy code.
Pinker is educating us about such rules. He's trying to make style come back to that optimal.
Let's take a simple example. Did you know, comma classically sits inside quotes? Here's an example I picked from [1]:
"Good morning, Frank," said Hal.
Note that the comma after Frank is inside the quotes! Why should it be that way? No wonder putting comma outside is gaining acceptance. :-)
There are even weird rules for what happens when multiple paragraphs are to be included inside a single quote. (Hints: Number of opening and closing quotes is not equal in some English dialects under this scenario. Weird, hmmm.)
These rules should just go away. It's better to choose the optimals for the language and make those cultural, style, shifts happen in our language.
Take care. Good discussion. Feel free to reach out again.
And please feel free to refer others to my slides page as you see fit. :-)
First off, put it into your head that people skills are critical, for both professional and personal lives. Make it a goal for yourself to develop them.
Below are a few things that I had found helpful:
1. Book "Human Relationships" by Steve Duck [1]. The author of the book says that his students were suffering from the same people/relationship issues as everyone else in spite of the relevant education in psychology. So he reasoned something is all wrong about the way social psychology is taught, and wrote this book for helping people as oppose to teaching them. One impact on me was learning that the percentage of people feeling shy about initiating a conversation at some point in their lives was nearing half of them. In other words, the person in the front of you could also be just waiting to talk to you. I had read the first edition of the book which had very natural tone to it. The fourth edition [1] seems much refined for rigor, which seems impacting the basic premise of the book! So consider buying an older edition.
2. The challenge for me wasn't just difficulty in talking, but a limitation of interest and knowledge outside of the STEM fields. This then becomes a vicious cycle since you would not talk to people and not even learn about topics outside of work. Build some common interests outside of work, may be just by reading some books in isolation. Read a lot of news, as a lot of conversations build on it.
3. Early on, I used to be the silent one in many conversations because of #2 above. I started participating in the conversations simply by asking questions on what I did not understand. Asking too many questions annoys people, so need to be balanced. Read about the discussed topics offline afterwards as needed. Over the time, you get to understand those conversations, will start participating, and also, those people would start accommodating you while calibrating themselves for you with the skill level you have.
4. The people around would accommodate you, as far as they do not see it as your lacking interest in them. It's better to be seen as a person lacking people skills rather than as one lacking interest in them. Try not to miss lunches and dinners opportunities at work, even if you are not talking much there.
5. One-to-one conversations are easier. Break the ice with those. Soon you would be comfortable in a group setting where you are comfortable with say half of the people.
6. Join social media and make connections with all those people. Being behind a keyboard instead of face-to-face helps because you get more time to think how to respond. Do genuinely participate, click Likes, etc. This will not only develop connections with those people, but also slowly make you better for live verbal conversations.
Modern Higher Algebra by A. Adrian Albert (1937, Dover/Cambridge). It covers both abstract algebra and linear algebra.
Most modern textbooks tend to approach linear algebra from geometric perspectives. Albert's text is one of the few that introduce the subject in a purely algebraic approach. With a solid algebraic foundation, the author was able to produce some elegant proofs or results that you don't often see in modern texts.
E.g. Albert's proof of Cayley-Hamilton theorem is essentially a one-liner. Some modern textbooks (such as Jim Hefferon's Linear Algebra) try to reproduce the same proof, but without setting up the proper algebraic framework, their proofs become much longer and much harder to understand. Readers of these modern textbooks may not realize that the theorem is simply a direct consequence of Factor Theorem for polynomials over non-commutative rings.
With only about 300 pages, the book's coverage is amazingly wide. When I first read the table of content, I was surprised to see that it not only covers undergraduate topics such as group, ring, field and Galois theory, but also advanced topics such as p-adic numbers. I haven't read the part on abstract algebra in details. However, if you want to re-learn linear algebra, this book may be an excellent choice.
>> piece of software that mostly does what the customers want even if it is low-quality
The state of the software is currently much worse in my opinion.
Quality is not easily quantified while the price is. Metrics at the customer end are hard to collect (it requires software development too, raising the costs), and in the current state of the art, it also requires having customer support staff too which is still costlier. As a result, quality does not even gets quantified properly. A natural result of which is quality reducing below what would customers desire.
This isn't much different than where quality of MP3 players, laptops and smartphones was headed. Perhaps quality then was being measured just by percentage of customer returns, not by customer satisfaction. Steve Jobs then changed the game. Apple's products would just "feel right" to the customers. Apple iPod took over the market even after being much costlier. It then took a couple years for the rest of the laptop/smartphone manufacturers to catch up.
1. Select the right audience for your message. (E.g., Does it need to go to the Google founders, or is the true audience open public since the problem is still bigger?)
2. Educate your audience ahead of time, or at least as the opening part of the message:
2a. What the intentions for the message are? (E.g., help understand the true root causes of diversity gaps, reiterate that there are invalid opinions commonly held, and also boldly indicate that someone needs to stand up even in a culture there people do not openly share such analysis and/or opinions.)
2b. What the approach is (which would be to deep dive into the correct root cause of the issues (e.g., why working women have lower average salary than working men)? And explain that without understanding of the true root causes, it will only take longer to reach the optimal balance, never reach it perfectly, and the current situation may already be the maxima under the current set of beliefs.
2c. What are common pitfalls and misconceptions (e.g., assuming all gender gaps are coming from discrimination as the memo writer noted)?
3. The core message:
3a. What are the knowns (existing research) that points to the root cause?
3b. What are the unknowns that still need researching?
3c. Summarize the proposed next steps.
4. If as a result of capturing all of the above, the message is becoming too long:
4a. Revisit if you are targeting the right audience.
4b. Split it into smaller logical blocks, and deliver them in steps, so that people can actually follow, getting convinced along the way, rather than get inflamed.
5. There will still be people unfortunately who would not understand all this but who will resist nevertheless. A second round of explanations/FAQs would be needed to address them.
6. There will still be people unfortunately ... (Your approach above should be devised to help reach consensus.)
7. If at any point you realize that something was wrong in your own thinking, correct it, and revisit if your broader message still stands and is worthy. If you come out wrong too often, it's good for you (having learned something new), but be prepared to loose trust.
Consider decimal numbers between 0 and 1 in binary.
Here's how I am going to synthesize this set.
Step 1: Take non-decimal binary numbers and consider them to be padded with an infinite of zeros at the left.
...0000000
...0000001
...0000010
...0000011
...0000100
...0000101
...0000110
..........
Do we agree that this will contain all the non-decimal non-negative binary numbers?
In particular, is the following number in the above set? ...111111 (all ones, not zero-padding on the left). If not, why not. And if not, this seems to be a matter of definition to me. If yes, move on.
Step 2: Place a decimal at the end of each line above and flip left and right.
0.0000000...
0.1000000...
0.0100000...
0.1100000...
0.0010000...
............
Do we agree that this contains all the decimal positive binary numbers between zero and one: [0, 1).
Let's now apply diagonalisation on this.
It says that the number 0.11111111... will not be present in the above set.
Perhaps someone can see my confusion and enlighten me. :-) Thanks!
Minds, Brains and Machines by Geoffrey Brown [1] for introducing me to the complexities of the mind-body problem. It did not show the answers of course, but helped me think right about it.
Siddhartha by Herman Hesse [2] for contributing to helping me come out of excessive questioning of everything (philosophy) to science that helps towards actually answering the questions answerable.
Feynman Lectures in Physics [3] and Surely You're Joking Mr. Feynman [4], with no need to explain "how". :-)
The Ghost in the Atom [5] for explaining varied views on the nature of science, especially Quantum Mechanics, and what goes in the minds of the top-notch scientists working on these problems.
Parsing Techniques by Dick Grune [6] for teaching me the fundamentals of computer science and helping me proceed with my deep interest in Artificial Intelligence.
To maximize social developments, we reward those who create value for the society. If someone, via their own efforts and without stealing/fraud, generate a lot of wealth, they would have invariably added value to the society by creating a win-win for the buyers, employees and themselves.
When however someone born to a rich person inherits wealth, they are at an unfair advantage for something that they did not really accomplish, and not for any value they added to the society.
The world should aim to equalize opportunity for everyone (irrespective of who they are born to), and then reward them when they create more value than others from the same opportunity. Agreed that defining "equalized opportunity" is hard -- what all would it cover, health?, edutcation?, etc. However, even with some challenges here, this may be better than the current system.
The "use it or lose it" would make the wealthy put money back into the system, and automatic adjustments would result via supply and demand balancing. I understand that it would also increase wastage, and this would need some thinking through. My gut reasion however is how much money can the super rich really use and waste? Ultimately, even items bought by them but unused may be reusable by others (who ideally should not be their hiers at 100%).
I have similar traits. People often tell me that I do not speak enough. However, whenever I do say something, people find it novel, valuable, insightful, and nearly always correct.
Once people get to know me well, they start working around this to hear my points of view, and that they keep coming back indicates that I am truly adding good value to them.
I am both a good listener and a reader, and in fact a good speaker and writer too.
However, the quality I produce comes at a time cost as a result of which I am not real-time and nor perceived as speaking 'enough'.
-- The following are things that have generally helped:--
* Do pre-work whenever feasible. This reduces the mental processing power needed in real-time.
{Now once the meeting begins and I am not the initiator, I may only be listening and observing what all things are being missed in the discussions, what common confusions or the hidden assumptions are.}
* Even if not, consider yourself as the owner of the meeting discussion. Assume that if good conclusions or decisions are not coming, it would be your own fault. This helps to participate in a meeting more effectively, even making a callout to slow down the discussion if needed. At times, you may direct the current meeting at understanding the problem better, and invite a follow up meeting for the actual resolution.
* Fine-tune yourself between answering and asking questions. If I already have good thoughts, I may be illuminating others, clarifying, answering, and moderating the discussion. If not, I may be asking the questions on my mind, say to probe assumptions or just to fill my knowledge gaps.
* Making keyword-level notes during the meeting helps. You cannot spend the time writing details as then the needed meeting participation time and attention are lost. Having short notes however helps to bring needed things back into mind quickly. Some people naturally do better at this, having good short-term and long-term memory.
* Use body language to indicate space needed to speak up. During the meeting, I may slowly stand up in a corner or walk up to a whiteboard, aiming to grab speaking time. Being in a standing position helps (it's a body language thing).
* Another important but tricky piece is the time gap between people taking turns to speak. Wait too long, and someone else may start speaking. Wait too short, you may be cutting the cirrent speaker short. Raising hand here helps if nothing else. If I find someone else starts speaking but that is only taking a wrong turn (e.g., incorrect hidden assumption, or going off-topic), I may grab the speaker position by clarifying exactly that (E.g., "we just spoke about this but I think there is a more fundamental thing we need to consider. {It goes here.}")
* Needless to say practice and experience helps.
* If there are no clear goals for a meeting, just general chit chat between a large group of people, it gets harder, however the similar techniques as the above help.
Overall, as far as you are maintaining good quality in what you say, don't consider being slow as a loss. It hurts the most only for the first few months on a new team or group as people are getting to know you better.
I just finished looking through the manuscript [https://deeplearningtheory.com/PDLT.pdf]. Mathematics is heavy for me especially for a quick read, albeit a great thing I see is that the authors have reduced dependencies on external literature by inlining the various derivations and proofs instead of just providing references.
## The epilogue section (page 387 of the book, 395 in the PDF) is giving a good overview, presented below per my own understanding:
Networks with a very large number of parameters, much larger than the size of the training data, should as such overfit. The number of parameters is conventionally taken as a measure of model complexity. Having a very large network can allow it to perform well on the training data by just memorizing it and perform poorly on unseen data. Somehow these very large networks are empirically performing well still in achieving generalization, i.e., these are recognizing good patterns from the training data.
The authors show that model complexity (or ability to generalize well I would say) for such large networks is dependent on its depth-to-width ratio:
* When the network is much wider than deeper (the ratio approaches zero), the neurons in the network don't have as many "data-dependent couplings". My understanding from this is that while the large width gives the network power in terms of number of parameters, it has lessor opportunity for a correspondingly large number of feature transformations. While the network can still fit the training data well [2, 3], it may not generalize well. In the authors' words, when the depth-to-width ratio is close to zero (page 394), "such networks are not really deep" (even if depth is much more than two) "and they do not learn representations."
* On the opposite end, when the network is very deep (ratio going closer to one or larger), {I'm rephrasing the authors from my limited understanding} the network needs non-Gaussian description of the model parameter space, which makes it "not tractable" and not practically useful for machine learning.
While it makes intuitive sense that the network's capability to find good patterns and representations depends on the depth-to-width ratio, the authors have supplied the mathematical underpinnings behind this as briefly summarized above. My previous intuition was that having a larger number of layers allows for more feature transformations, giving the network a higher ease of learning. The new understanding via the authors' work is that if for the same number of layers, the width is increased, the network now has a harder job to learn feature transformations commensurate with now larger number of neurons.
## My own commentary and understanding (some from before looking at the manuscript)
If the size of the network is very small, the network won't be able to fit the training data well. A network with a larger size would generally have more 'representation' power, allowing it to know more complex patterns.
The ability to fit the training data is of course however different from ability to generalize to unseen data. Merely adding more representation power can allow it to overfit. As the network size starts exceeding the size of the training data, it could have a tendency to just memorize the training data without generalizing, unless something is done to prevent that.
So as the size of the network is increased with the intentions of giving it more representation power, we need something more such that the network first learns the most common patterns (highest compression, but lossy) and then keeps on learning progressively more intricate patterns (now lessor compression, more accurate).
My intuition so far was that achieving this was an aspect of the training algorithm and cell design innovations and also of the depth-to-width ratio. The authors however show that this depends on the depth-to-width ratio and in the way specified. It is still counter-intuitive to me that algorithmic innovation may not play a role in this, or perhaps I am misunderstanding the work.
So now the 'representation power' of the network and its ability to fit the training data itself would generally increase with the size of the network. However, its ability to learn good representations and generalize depends on the depth-to-width ratio. Loosely speaking then, to increase model accuracy on training data itself, model size may need to be increased while keeping the aspect ratio constant at least as far as the training data size is larger, whereas to improve generalization and finding good representations for a given model size, the aspect ratio should be tuned.
Intuitively, I think that under a pathological case where the network is so large that merely its width (as opposed to width times depth) is exceeding the size of the training data, then even if the depth-to-width ratio is chosen according to the guidance from the authors (page 394 in the book) the model would still fail to learn well.
Finally, I wonder what the implications of the work is for networks with temporal or spatial weight-sharing like convolutional networks, recurrent and recursive networks, attention, transformers, etc. For example, for recurrent neural networks, the effective depth of the network depends on how long the input data sequence was. I.e., the depth-to-width ratio could be varying simply because input length is varying. The learning from the authors' work I think should directly apply if each time step is treated as a training sample on its own, i.e., backpropagation through time is not considered. However, I wonder if the authors' work still presents some challenges on how long could the input sequences be as the non-Gaussian aspect may start coming into the picture.
As time permits, I would read the manuscript in more detail. I'm hopeful however that other people may achieve that faster and help me understand better. :-)
As of now, I personally feel that development of new programming languages rests on incremental paradigms. I see scope for a lot more innovation. Following are some thoughts:
Have various computer science algorithms (graphs, trees, parsing, et al) tightly integrated into the language. (I know the popular opinion is to have these in libraries, however, I think having built-in data structures and algorithms would lead to further innovation.) The tooling should aware of run-time complexities of various algorithms and steps and thereby reason about the same. (E.g., "Warning: You have declared 'xyz' to be a graph, however, a tree is sufficient here.") Tooling should be able to say that while the programmer thought the implementation is Big-O[N^3], it's actually exponential.
I wonder if programming languages can start making native use of machine learning advancements.
Copilot et al already would have some internal representation as embeddings, so instead of just generating code as text, there should be value in using those representations as a part of the build tool-chain too.
Better representation of 'is-a' relationships between various types. E.g., I should be able to pass a tree to a graph algorithm without writing any extra code or having run-time performance overheads.
Get rid of overflows or precision constraints fundamentally from the language (and the the underlying hardware; over the time). I would like the tooling to automatically reason about which type is the best fit for the implementation during compilation -- byte, unsigned int 32, big-int, etc., via analysis of the program, and tell me what it chose and why. It may chose to typecast automatically for specific calculations when intermediary calculations need larger bit widths.
Likewise, I would like innovation on how null references can be fundamentally dealt with as opposed to how each new programming language tries to work around the same in various ways.
The fact that we use monospace fonts for programming seems to me to be a weakness of programming languages. We often need some alignment between adjacent lines of code to see the semantics of the code well. I would like to see a programming language which works well with variable-width fonts, without causing readability issues. Why you may ask? I think this should be explored as this would lead to innovation on how to make semantics come out independently of font width and text alignment.
Having an 'alias' at many places would be super helpful. E.g. a base class may be generic method named such, whereas a derived class (more specific) may use the very same method implementation, while choosing a better name for the method. This is syntactic sugar only as this is readily doable now by having a new method that calls the base method as it is. This may be handy for example when the is-a relationship is handled between a tree and a graph in the example above.
Language-native integration of units and dimensional analysis.
Finally, I am a big fan of symbolic computation abilities like seen in Wolfram Language. The tooling can reason about the expressions/code, and figure an optimal strategy whether at compile-time or run-time.
Your argument is boiling down to saying: This cannot happen because I have this XYZ belief, which I have because this has never happened in the past.
It should be understood that singularity has not yet been achieved. That is not in itself a reason to believe that it cannot be!
More below.
>> Airplanes and calculators exist within the realm of what it is possible for a human to design.
Any actual reason to believe that (a) AGI that exceeds human performance, (b) weak AI, (c) strong AI, (d) AI that can improve itself, (e) AI that can autonomously pick up new domains, (f) AI that has a sense of humor, ... are not within "what is possible for humans to design"?
>> Why is it that we have not designed a calculator that can invent new forms of numbers or a new paradigm of maths.
As far as I know, Doug Lenat had developed a program called Eurisko, which came up with a notion that was it not taught. As I recall, the program started looking into opposite of prime numbers, i.e., numbers which are highly composite. Doug later discovered the concept pre-existed.
I have myself also coded a program which was domain-specific, however it provably made inventions that were patentable, and which I could not have invented. I know because one of them were recently invented by a colleague and it was even filed for patenting.
A desk calculator here is a wrong example; that machine is too limited. However, there may not be a valid and scientific reason to believe that intelligence, judgment, creativity, wisdom, are more than mere calculations. (I won't dwell into consciousness, etc., though I can.)
>> something that exist or is created within the bounds of human imagination cannot (imo) escape the limitations of that same imagination.
I have already given examples that are small but which countered that.
AlphaZero was limited to playing chess, agreed, but it's imagination did extend well beyond that of it's creators or even the human chess players it played against. In fact, very recently, there was a research published trying to understand how it or a more recent chess program, loosely speaking, 'thinks'.
AlphaZero was limited to playing chess, because we trained it using specific cost functions for chess.
If there's enough training data, etc., then in theory, we could define cost functions that are similar to those of a species or a biological ecosystem, i.e., survival instincts, procreation, whatever. Such a system may develop it's own internal models and behaviors, and even possibly notions of various emotions.
Just think it through. If you can come up with an actual answer why it cannot be done, then both you and I, and at least I :-) will be able to invent to break that answer! :-)
It's better to ask why it cannot be done, that will take us there. (The debate whether we want to or not is a separate track.)
Summary: Qualcomm wanted me to devise a computer vision solution that was nearly three orders of magnitude power-efficient than what they had then. There was a clear justification existing as to why such a drastic improvement was needed. Nobody had a solution in spite of trying for a long time. Most laughed it as impossible. I started by looking for a proof as to why it could not be done if it indeed could not be done. After some three months of pulling my hair, I started getting glimpses of how to do it. Some three months later, I could convince myself and a few others that it is doable. Some three months later, the local team was fully convinced. Some three months later, the upper management was convinced that the research phase was over.
Details:
I was told to reduce power consumption for an existing computer vision solution by nearly three orders of magnitude. This included end-to-end power consumption including camera as well as power for computer vision computations.
No one in the industry knew how to do this. Many industry experts laughed at the goal itself.
To make it worse, I by myself had no prior expertize in computer vision, nor in camera design, though I knew image processing. I was assigned to the project solely because of my brand name with the company. Being ignorant perhaps helped since else I may not even have accepted the assignment.
I was told this is to be a five-years type research program, given the agressive goal.
I was told that the company was ready to do change whatever is needed to make this happen, be it the camera design (including pixel circuit), processor architecture, computer vision algorithms, even optics and packaging.
With no starting point in hand (and in fact a false start assumed based on half-baked ideas from one university), I had head-aches daily for more than a month. Most of this was to learn and deep dive into the prior art.
The goal I assumed was to either solve the problem, or show a theoretical basis for why it cannot be solved. (In hindsight, to solve a challenging problem, try to prove why it cannot be solved, since even that is usually hard, and you may prove yourself wrong in the process by solving it!)
There was no assigned team, given that there was no solution in mind. There were people with varied skillsets (optics, packaging, digital circuits) to help as per need.
Three months in, I had some ideas for how it could work. Six months in, there were three more engineers on the project working under me on a near full-time basis. And some were complaining that we don't know what we were doing. Seven months in, I had Excel-level calculations to show that it can work, and actually a simpler solution than what I had in mind at the three-month point. Nine months in, with about seven people working, the team was convinced of the solution, as validated via rough circuit design and simulations. By the end of the year, senior management was convinced that we have solved the problem and the research phase is over. We had an estimated five years to solve the problem, and the solution achieved three times less power than the target!
The solution was a mix of several new inventions (one was given gold rating by the company) and significant power savings obtained just by careful design. Unfortunately, I cannot talk much about the solution itself.
We started showing early preview of the capabilities (under NDA and without disclosing how we solved it) at multiple Tier 1 companies, and saw their jaws drop. At one company, one whispered to their team members, "Are these guys kidding?!". At another company, someone smiled silently for half-an-hour till we started showing the system emulator.
Soon, MIT Technology Review talked about the work [1].
I have since then left the company, so do not know much about the current state of the project, other than additional news coverage received over the time. Additional references are there in my LinkedIn profile.
PS: I have solved many hard problems through my career, have written about the one that gave me the most head-aches. Here is a list of some other ones [2].
>> What is the hardest technical problem YOU have run into?
I have solved about ten "hard" problems in my career, most of which has been in R&D. Each one of these had multiple prior failed attempts, and in some cases took me months of thinking before I could find a solution.
1. Qualcomm wanted me to devise a computer vision solution that was more than two orders of magnitude power-efficient than what they had then. There was a clear justification existing as to why such a drastic improvement was needed. Nobody had a solution in spite of trying for a long time. Most laughed it as impossible. I started by looking for a proof as to why it could not be done if it indeed could not be done. After some three months of pulling my hair, I started getting glimpses of how to do it. Some three months later, I could convince myself and a few others that it is doable. Some three months later, the local team was fully convinced. Some three months later, the upper management was convinced. You can read the rest here: https://www.technologyreview.com/s/603964/qualcomm-wants-you...
2. I wanted to solve a specific machine learning and Artificial Intelligence challenge. I would code for a day or so, and then again run into days of thinking how to proceed further. E.g., coded a specific parser algorithm for context-free grammars, including conversion to Chomsky normal forms, in 1.5 days including basic testing. However, what's next. Woke up with new ideas for about ten days in a row. Conceived Neural Turing Machines back in 2013, about a year before Google came up with their paper on the subject. (Unsurprisingly, I did not had that name in mind for it back in 2013.) I also did not get an actual opportunity to work on it, as a result of which I am still not sure if I could have actually done it.
3. Needed to make a very sensitive capacitance measurement circuit, trying to get to atto Farad scale floating capacitance even with pF scale parasitic capacitance to ground. The noise and power requirements were very challenging. After about three months of seeking inputs from the team lead without hearing a solution, I ended up coming up with a solution. I later discovered that the technique was already known in RF circles, though only a few were aware of it. Capacitance measurement circuits with such sensitivity did not show up in the market for several years. (My effort was target at using inside a bigger system.)
4. I was working on measuring bistable MEMS devices. The static response of these was well understood. However, so far, the dynamic response was only measured by the team; there was no theoretical explanation behind it. We invited several professors working in the field to give seminars to us, and asked questions for this, but never heard back a good answer. A physicist colleague found an IEEE paper giving the non-linear differential equations behind it, which worked, but yet provided no insights into the device behavior, and took time to solve numerically. I wanted a good enough analytical solution. I kept on trying whenever I had time-opportunity, while the physicist colleague kept on telling me to give up. Six months later. I woke up with a solution in mind, and rushed to the office at 7 am to discuss with whosoever was there at work at that time. The optics guy I found did not fully understand it, but did not find it crazy either. A few hours later, the physicist friend confirmed my insight by running some more numerical solutions. I could then soon find tight enough upper and lower bounds, and the whole thing fit the measurements so well that most people thought it was just a "curve fit". (It was pure theory vs. measurements plotted together.)
5. I proposed making pixel-level optical measurements on mirasol displays using a high-resolution camera to watch those pixels after subjecting them to complex drive waveforms. Two interns were separately given the task (surprisingly without telling me), and both failed to develop algorithms for pixel-level measurements. Later a junior employee worked on it, was unable to develop pixel-level measurements still, though was able to get it to work at lower resolutions. The system took about 40 minutes of offline processing in Matlab. Later, a high-profile problem came up where pixel-level measurements were a must, and I was directly responsible for solving. Solved in one day. Processed images taken in real-time, not 40 minutes. The system stayed in deployment for years to come.
6. We had bistable MEMS devices, and there was a desire to make tri-stable MEMS devices. Several people at the company attempted it, including a respected Principal Engineer, but no one could figure how to even start. I could not figure either at the outset, but started bottoms up from Physics and using Wolfram Mathematica to create visualizations around the thing. And bingo. In a few days, I had not only figured how to make these tri-stable MEMS devices, but also multiple schemes of driving them. My VP's reaction was "Alok, you should patent that diagram itself", given the clarity it had brought on the table.
7. We were creating grayscale/color images using half-toning. A famous algorithm, Floyd Steinberg, works very well for still images but has lots of artifacts for videos. An PhD student working in the field was brought in as an intern, nevertheless, the results were not great. The team also tried binary search algorithms to find the best outputs iteratively, however, it was not implementable in real-time as needed. I was interested in the problem, but was not getting time to give it a fresh thought that it needed, until one day. A few days later, the problem was solved. I developed some insights into it and just had the solution coded, to the surprise of people who had spent months working on it.
I spent entire 2013 building a startup in AI, specifically in NLP.
I had well-defined business applications in mind. (I had already deprioritized over ten ideas after discussing with various people including potential customers, and chose to focus on the NLP ones.)
I was ahead of the times in vision, and right in technical direction (deep neural networks plus more). But I was not confident and not comfortable in raising money. Some friends were willing to give seed funding, after seeing a startup in a domain I had deprioritized emerge with a successful exit. I did not accept the offer.
There was one person helping me as a technical co-founder. I needed a business person to join as a co-founder, however, no one I wanted from my circles was willing to take the plunge.
A friend casually connected me to a VC. I did not even know that the person I was going to meet was a VC. I reviewed the applications with him. He liked the proposals and said he could get me customers the moment the product is ready. I did not even discuss raising money.
Behind the scenes, I was facing a significant personal challenge.
I underestimated the amount of time needed for training data preparation. More time was going on that instead of algorithm development.
Starting in 2014, a former colleague offered me a contract job in computer vision (even though I had no prior experience in computer vision), which I took to support the startup. That not only consumed me fully, but was very interesting and challenging in its own right. The NLP startup work ended up on the backseat. My computer vision work was briefly covered by an independent blog writer here: https://www.technologyreview.com/2017/03/29/243161/qualcomm-...
I was left with a feeling that had I raised money and thereby gone more strongly towards the startup work, I could have been at the forefront of modern AI with deep neural networks.
To summarize, I failed because of the personal challenge which is in fact continuing till date and has severely demotivated me, and because of my reluctance to risk someone else's money.
Aside: Having learnt computer vision afresh on this contract job, I later implemented simplified versions of several classical computer vision algorithms in Microsoft Excel, using a series of one-liner formulas: https://news.ycombinator.com/item?id=22357374
This person was several levels above me in the hierarchy, and famously known to be short-tempered. Before my first meeting itself, I was given generic tips on how to handle, none of which however seemed effective.
I realized that the person was very smart and always interested in understanding everything in depth. The teams however would fail in explaining properly, which would lead him to frustration.
As he was trying to do this for a large number of domains, he was often behind in his understanding of the state of the art, would not know the jargon, acronyms, etc.
People on the other hand habitually skip defining terms they use, miss putting units on physical quantities, not label the axes of a graph, etc., all of which would instantly frustrate him.
The same thing would happen when he won't hear back crisp and/or correct answers for his questions.
Going slow with him, managing to get him understand what you had to say, he would now spot loose ends, hidden assumptions, fallacies in your arguments with ease. He would try to explain, but would be frustrated fast if the team won't get it.
With just around two-to-three meetings, I could see where his frustrations were coming from, and was then comfortably able to present to him. At some point, I started becoming known as the face of the project to put before him.
He was a nice person actually.
He would not mind accepting his mistakes, and the same applied to me. He once spotted a mistake in my analysis (the metric I was optimizing itself was not really correct) some twenty minutes into the one-hour meeting. I not only immediately accepted, but also announced that the rest of the meeting time is unneeded as I need to fix and come back later. The meeting however gracefully continued as he started focussing on other things in the presentation that were secondary but still valid.
To summarize, the true tip for working with him was to be good at explaining with simplicity and strive towards the truth. That was all.
----
In another experience, someone I worked with was again very smart, and would usually get his team to produce unquestionable results. However, he would favor people from his own cultural background more, give them undue trust, and in the process nearly humiliate people who were actually correct. In this case, I ended up changing the team.
This person at a later point heard back from the human resources team for complaints made against him, and believe that he improved after that.
https://librosa.org/
Audio track seperation:
https://github.com/adefossez/demucs
demucs works pretty well.