Hacker News new | past | comments | ask | show | jobs | submit login
Gorilla: Large Language Model Connected with APIs (shishirpatil.github.io)
246 points by throwaway888abc on June 14, 2023 | hide | past | favorite | 49 comments



Hi HN, I'm one of the lead authors on the paper! Gorilla is an open source effort and we would love to hear from the community. Let us know if you have any questions or suggestions!!


Congratulations on shipping! This is pretty cool. Building next-gen Zapier on top of this would be a great use of this.

There is a finite number of public APIs (100K?) which keeps the problem manageable. IMO, adding support for custom/private APIs (something like OpenAI functions) will make this a very powerful tool.


Thanks for the kind words @gsharma! Private APIs is something we are definitely thinking about.


Is Gorilla helpful for private APIs? Is it addressing the same need that OpenAI's new "function-calling" feature?


Your license says it can be used commercially by anyone while Llama's license says it can only be used for research purpose. Isn't your license bound by the Llama usage license ?


Yes we have three set of models. One based on llama - which you are right, cannot be used commercial. We have two additional models based on MPT-7 base and Falcon-7B which can be used commercially with no obligations!


There is Open Llama 7B available that is Apache 2.0 licensed. Would you consider fine tuning that one as well for a commercial use of this with Llama model?


Yes, when we released the initial set of models Open Llama was still at 600B checkpoint and not finished training yet. It's an easy port :)


> while Llama's license says it can only be used for research purpose

Minor nitpick, but we still don't have a clear legal answer on whether this would be binding to people who didn't sign that agreement, because we still don't have a clear legal answer on whether model weights are covered by copryight.

That being said, it is good for projects to point out that there's uncertainty over whether Llama can be used commercially; so I agree with the overall point.


It’s further unclear whether a fine tuned model which has different weights counts as a copyright violation at all. Doesn’t stop a wealthy company from suing though.


AIUI it uses the Llama architecture, but not Facebook's Llama weights. It uses MPT-7B, which was trained from scratch: https://www.mosaicml.com/blog/mpt-7b


the code is open source but not the weights. as far as i can tell.


Hi swyx, the weights are also open sourced at https://huggingface.co/gorilla-llm Let us know if you are unable to access them.


Congratulation, great paper! It should have been put on HN earlier ;)

I have a few questions:

* you say (page 4): "We then perform standard instruction finetuning on the base LLaMA-7B model" Could you perhaps provide a reference to the _exact_ finetuning approach you used? I'm afraid different groups of people have a different notion of "standart" (see for example pages 131-155 from https://arxiv.org/abs/2302.08575 for various fine-tuning approaches) and without knowing exactly how fine-tuning was carried out, it can be very difficult reproduce your research and results exactly.

* the idea of using AST Sub-Tree Matching is nice. Could you please let me know which function in which file from your GitHub repository this is implemented in?

Again, great job on publishing this paper!

---

Best regards,

friederrr.org


Thanks @sfriedr We generate self-instruct data and then fine tune the base model with perplexity loss. The self-instruct data is https://github.com/ShishirPatil/gorilla/tree/main/data/apibe...

Thank you! Yes, the code can be found here: https://github.com/ShishirPatil/gorilla/tree/main/eval/eval-...

Hope this helps. Let me know if you have any follow-ups!


Awesome, thanks for letting me know!

I'm still not sure though about some nitpicky things: - do you change all the weights, or just the ones from the last layer when fine-tuning? - do you just train on the _code_ field from the JSON file with the self-instruct data, or do you also use the other fields to train (or do you use the other fields just for downstream evaluation purposes)?

I think it could be a major selling point of your paper if on Github (or in an appendix to your preprint, if you update it on arxiv), you had a section where you document the training process in detail


(whoops, this comment/questions should have been to as an answer to your other comment @shishirpatil)


Seems @shishirpatil ran out of steam answering questions. Too bad.


(Or maybe the questions were too tricky and he wasn't able to answer, heh)


Haha, was busy yesterday! Or was I? :P


What's a good/affordable GPU to run these projects locally?

It seems like building anything on top of these runs into either a big GPU cost for yourself or a big compute cost if you scale for others.


3090 can run 30b parameter models, 2x can run 65b parameter models.

4090 can run the same, very slightly faster for much more money.


We named the project Gorilla cause it is an cute animal that use tools !


Hey everyone, I am one of the authors of the Gorilla project. Super excited to see how the project grows! We have released LLaMA based, MPT based (Apache 2.0) and Falcon based (Apache 2.0) models so far. Something cooler is coming soon!


In the colab example it appears you are using the openai python library but with the gorilla model instead of openai's models. That works? How do you set that up?

  # Query Gorilla server 
  
  def get_gorilla_response(prompt="I would like to translate from English to French.", model="gorilla-7b-hf-v0"):
  try:
    completion = openai.ChatCompletion.create(
      model=model,
      messages=[{"role": "user", "content": prompt}]
    )
    return completion.choices[0].message.content
  except Exception as e:
    raise_issue(e, model, prompt)


they point openai.api_base to their server that implements the same API


That’s clever. Do other LLM API’s do that?


Yesterday there was a "Launch HN" thread for credal.ai [0] and I noticed that they use the same openai.api_base trick [1].

[0] https://news.ycombinator.com/item?id=36326525 [1] https://credalai.notion.site/Drop-In-APIs-3a45d32405c347e8bf...


It would take you (or gpt) 3 seconds to write an openai compatible wrapper; the inference api is trivial for all LLMs.


Ah, I missed that. Thanks.


At first I was excited -- this is the second time i'm.seeing this advertised, and we are thinking through reliable API call-out strategies for louie.ai -- but then I got confused by the paper:

Is this really just tested against 95 API calls, and I'm guessing largely from just a small number of libriaries like pytorch?

More importantly, if anywhere near true, is there any reason to (so far) use this for use cases like OpenAI's around calling generic OpenAPI style libs (zapier scenario), known specific tools, or random python libs not in that dataset?

I'm really thinking 3 scenarios for our users:

-- python libraries we know they'll want to use ahead of time, like pandas and pygraphistry

-- Same for CLI, like AWS and az, and OpenAPI from and index

-- Long-tail that we don't expect, esp in python + js, so on the fly, with limited time budget for inspecting GitHub/Google/etc

So far, we generally find auto approaches too unreliable for non-hobbyists, and have to tune a bunch for each tool and database we teach louie. This line of research is def interesting to us...


My issue with this is that it needs to be retrained on a regular basis to make sure latest APIs are included. There needs to be a long term assessment to understand its viability in a commercial setting. Otherwise we'll jump in and after 6 months it will begin producing out of date suggestions for some edge cases. And then again if you need to support an old API how can you be sure it will produce the scoped results?


Neat idea, @shishirpatil! We are developing EvaDB [1] for shipping simpler, faster, and cost-effective AI apps. Can you share your thoughts on transforming the output of the Gorilla LLM to functions in EvaDB apps -- like this function that uses the HuggingFace API -- https://evadb.readthedocs.io/en/stable/source/tutorials/07-o...?

[1] https://github.com/georgia-tech-db/eva


Outperforms ChatGPT and GPT-4 in generating api calls? Or in every type of querying? It's exciting that this is open, looking forward to trying all of this from end to end!



Currently released weights:

https://huggingface.co/gorilla-llm


Heads up, your discord link is broken


working for me...


How does this compare to LangChain?


I don't know how its performance compares, but its architecture is completely different: LangChain is a "normal" software library, but Gorilla is itself an LLM:

> Gorilla is a LLM that can provide appropriate API calls. It is trained on three massive machine learning hub datasets: Torch Hub, TensorFlow Hub and HuggingFace. We are rapidly adding new domains, including Kubernetes, GCP, AWS, OpenAPI, and more. Zero-shot Gorilla outperforms GPT-4, Chat-GPT and Claude. Gorilla is extremely reliable, and significantly reduces hallucination errors.

My reading of that abstract is that it's an LLM that outputs API calls instead of natural language (or maybe it still outputs natural language, but it can use API calls during inference? I didn't read very far), whereas LangChain is simply a software library. In theory, you could probably get Gorilla to output LangChain "API" (function) calls...


Still outputs natural language. Example from their colab:

Input: I would like to translate from English to Chinese.

Output:

<<<domain>>>: Natural Language Processing Text2Text Generation

<<<api_call>>>: M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_1.2B')

<<<api_provider>>>: Hugging Face Transformers

<<<explanation>>>: 1. Import M2M100ForConditionalGeneration and M2M100Tokenizer from the transformers library.

2. Load the pre-trained M2M100 model and tokenizer using the from_pretrained() method. The model is trained to translate text from English to Chinese, among other languages.

3. Encode the input text in English using the tokenizer.

4. Generate the translation using the model.generate() method.

5. Decode the output tokens using the tokenizer to obtain the translated text in Chinese.

6. Print the translated text.


Hi @csjh, we trained to model to also additionally output additional context so it would be useful for a downstream task. We wrapped the API call with special decorator so it's easier to just regex. Would you like to just have the API instead? Happy to release an API only model if there is wider interest - It's a strictly easier task for Gorilla LLM :)


Ideally I'd like the result to be in a clean json with a key that's strictly the result and other keys for context information. It would reduce needing to use regex everywhere.


I would love this!


"We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls"

Sounds like it's another LLaMA variant specifically fine tuned for API calls.


Good point. This was the original release, we now also have Apache-2.0 licensed models finetuned on MPT-7B and Falcon-7B!


There is Open Llama 7B which is Apache 2.0 licensed, please consider checking it out https://github.com/openlm-research/open_llama


Langchain is a terrific project that tries to teach agents how to use tools using prompting. Our take on this is that prompting is not scalable if you want to pick between 1000s of APIs. So Gorilla is a LLM that can pick and write the semantically and syntactically correct API for you to call! A drop in replacement into Langchain!


Is this module raw uncensored ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: