Hacker News new | past | comments | ask | show | jobs | submit login

One of the valuable things AutoChain provides is simulated conversation evaluation. Testing and tuning those agent is very hard and expensive. checkout the evaluation output from AutoChain, for example.

"conversation_history":

      "user: I want to change the shipping address for my order",
      "assistant: Can you please provide the order ID?",
      "user: The order ID is 456",
      "assistant: The order with ID 456 has not yet shipped. I can update the shipping address for you. Please provide the new shipping address.",
      "user: The new shipping address is 234 Spear St, San Francisco",
      "assistant: The shipping address for your order with ID 456 has been successfully updated to 234 Spear St, San Francisco.",
      "user: Thank you for updating the shipping address for my order",
      "assistant: Thank your for contacting"

   "is_agent_helpful":
   {
      "rating": 5,
      "reason": "The conversation reaches the expected outcome. The user successfully provides the order ID and the new shipping address, and the assistant updates the shipping address for the order with ID 456. The assistant confirms the successful update and thanks the user for contacting."
   },
   "actions_took":

      {
         "tool": "check_order_status",
         "tool_input":
         {
            "order_id": "456"
         },
         "tool_output":
         {
            "status_code": 200,
            "order_id": "456",
            "order_status": "not_shipped",
            "tracking_url": "example.com/456",
            "shipping_address": "301 ivy street san francisco ca"
         }
      },
      {
         "tool": "change_shipping_address",
         "tool_input":
         {
            "order_id": "456",
            "new_address": "234 Spear St, San Francisco"
         },
         "tool_output":
         {
            "status_code": 200,
            "order_id": "456",
            "shipping_address": "234 Spear St, San Francisco"
         }
      }

   "num_turns": 8,
   "expected_outcome": "found order status and changed shipping address"



If you're dealing with a set amount of functions (like updating shipping address), aren't we just building chatbots again? I thought we all agreed that we'd much rather just have a bespoke button for updating the shipping address than having to deal with a chatbot. The conversation presented here is especially vexing as it sounds so mechanical that there's hardly any reason to use any AI at all.


LLM today makes it a lot easier to build chatbot without decision tree and handle more complex scenarios. What you would need to do is providing it sufficient tools to use and a scope for what it can do, so that the chat bot doesn't have to be mechanical


> I thought we all agreed that we'd much rather just have a bespoke button for updating the shipping address

I don't think the jury is in yet. People love to be able to ask any bad/lazy question and get good answers. GUIs are not great for that.


I think it is more like the chatgpt hype bought chat robots a retrial.

Pretty sure the verdict was clear on the first one.


An interesting point: When/If half of the websites deprecate GUI buttons in favor of a do-all chatbot half of the people will forget how to use GUI buttons.


Porque no los dos?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: