Hi HN, I'm one of the lead authors on the paper! Gorilla is an open source effort and we would love to hear from the community. Let us know if you have any questions or suggestions!!
Congratulations on shipping! This is pretty cool. Building next-gen Zapier on top of this would be a great use of this.
There is a finite number of public APIs (100K?) which keeps the problem manageable. IMO, adding support for custom/private APIs (something like OpenAI functions) will make this a very powerful tool.
Your license says it can be used commercially by anyone while Llama's license says it can only be used for research purpose. Isn't your license bound by the Llama usage license ?
Yes we have three set of models. One based on llama - which you are right, cannot be used commercial. We have two additional models based on MPT-7 base and Falcon-7B which can be used commercially with no obligations!
There is Open Llama 7B available that is Apache 2.0 licensed. Would you consider fine tuning that one as well for a commercial use of this with Llama model?
> while Llama's license says it can only be used for research purpose
Minor nitpick, but we still don't have a clear legal answer on whether this would be binding to people who didn't sign that agreement, because we still don't have a clear legal answer on whether model weights are covered by copryight.
That being said, it is good for projects to point out that there's uncertainty over whether Llama can be used commercially; so I agree with the overall point.
It’s further unclear whether a fine tuned model which has different weights counts as a copyright violation at all. Doesn’t stop a wealthy company from suing though.
AIUI it uses the Llama architecture, but not Facebook's Llama weights. It uses MPT-7B, which was trained from scratch: https://www.mosaicml.com/blog/mpt-7b
Congratulation, great paper! It should have been put on HN earlier ;)
I have a few questions:
* you say (page 4): "We then perform standard instruction finetuning on the
base LLaMA-7B model" Could you perhaps provide a reference to the _exact_ finetuning approach you used? I'm afraid different groups of people have a different notion of "standart" (see for example pages 131-155 from https://arxiv.org/abs/2302.08575 for various fine-tuning approaches) and without knowing exactly how fine-tuning was carried out, it can be very difficult reproduce your research and results exactly.
* the idea of using AST Sub-Tree Matching is nice. Could you please let me know which function in which file from your GitHub repository this is implemented in?
I'm still not sure though about some nitpicky things:
- do you change all the weights, or just the ones from the last layer when fine-tuning?
- do you just train on the _code_ field from the JSON file with the self-instruct data, or do you also use the other fields to train (or do you use the other fields just for downstream evaluation purposes)?
I think it could be a major selling point of your paper if on Github (or in an appendix to your preprint, if you update it on arxiv), you had a section where you document the training process in detail
Hey everyone, I am one of the authors of the Gorilla project. Super excited to see how the project grows! We have released LLaMA based, MPT based (Apache 2.0) and Falcon based (Apache 2.0) models so far. Something cooler is coming soon!
In the colab example it appears you are using the openai python library but with the gorilla model instead of openai's models. That works? How do you set that up?
# Query Gorilla server
def get_gorilla_response(prompt="I would like to translate from English to French.", model="gorilla-7b-hf-v0"):
try:
completion = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
return completion.choices[0].message.content
except Exception as e:
raise_issue(e, model, prompt)
At first I was excited -- this is the second time i'm.seeing this advertised, and we are thinking through reliable API call-out strategies for louie.ai -- but then I got confused by the paper:
Is this really just tested against 95 API calls, and I'm guessing largely from just a small number of libriaries like pytorch?
More importantly, if anywhere near true, is there any reason to (so far) use this for use cases like OpenAI's around calling generic OpenAPI style libs (zapier scenario), known specific tools, or random python libs not in that dataset?
I'm really thinking 3 scenarios for our users:
-- python libraries we know they'll want to use ahead of time, like pandas and pygraphistry
-- Same for CLI, like AWS and az, and OpenAPI from and index
-- Long-tail that we don't expect, esp in python + js, so on the fly, with limited time budget for inspecting GitHub/Google/etc
So far, we generally find auto approaches too unreliable for non-hobbyists, and have to tune a bunch for each tool and database we teach louie. This line of research is def interesting to us...
My issue with this is that it needs to be retrained on a regular basis to make sure latest APIs are included. There needs to be a long term assessment to understand its viability in a commercial setting. Otherwise we'll jump in and after 6 months it will begin producing out of date suggestions for some edge cases. And then again if you need to support an old API how can you be sure it will produce the scoped results?
Neat idea, @shishirpatil! We are developing EvaDB [1] for shipping simpler, faster, and cost-effective AI apps. Can you share your thoughts on transforming the output of the Gorilla LLM to functions in EvaDB apps -- like this function that uses the HuggingFace API -- https://evadb.readthedocs.io/en/stable/source/tutorials/07-o...?
Outperforms ChatGPT and GPT-4 in generating api calls? Or in every type of querying? It's exciting that this is open, looking forward to trying all of this from end to end!
I don't know how its performance compares, but its architecture is completely different: LangChain is a "normal" software library, but Gorilla is itself an LLM:
> Gorilla is a LLM that can provide appropriate API calls. It is trained on three massive machine learning hub datasets: Torch Hub, TensorFlow Hub and HuggingFace. We are rapidly adding new domains, including Kubernetes, GCP, AWS, OpenAPI, and more. Zero-shot Gorilla outperforms GPT-4, Chat-GPT and Claude. Gorilla is extremely reliable, and significantly reduces hallucination errors.
My reading of that abstract is that it's an LLM that outputs API calls instead of natural language (or maybe it still outputs natural language, but it can use API calls during inference? I didn't read very far), whereas LangChain is simply a software library. In theory, you could probably get Gorilla to output LangChain "API" (function) calls...
<<<explanation>>>: 1. Import M2M100ForConditionalGeneration and M2M100Tokenizer from the transformers library.
2. Load the pre-trained M2M100 model and tokenizer using the from_pretrained() method. The model is trained to translate text from English to Chinese, among other languages.
3. Encode the input text in English using the tokenizer.
4. Generate the translation using the model.generate() method.
5. Decode the output tokens using the tokenizer to obtain the translated text in Chinese.
Hi @csjh, we trained to model to also additionally output additional context so it would be useful for a downstream task. We wrapped the API call with special decorator so it's easier to just regex. Would you like to just have the API instead? Happy to release an API only model if there is wider interest - It's a strictly easier task for Gorilla LLM :)
Ideally I'd like the result to be in a clean json with a key that's strictly the result and other keys for context information. It would reduce needing to use regex everywhere.
Langchain is a terrific project that tries to teach agents how to use tools using prompting. Our take on this is that prompting is not scalable if you want to pick between 1000s of APIs. So Gorilla is a LLM that can pick and write the semantically and syntactically correct API for you to call! A drop in replacement into Langchain!