back to tutorials

Cost-effective Vision ChatBot

Learn how to create a cost-effective vision chatbot that understands images and enhances user interactions.

In the dynamic world of conversational AI, enhancing your chatbot with image recognition capabilities can significantly elevate user interactions and satisfaction. Imagine a customer service bot that not only answers queries but also interprets images, identifying products, analyzing visual content, and providing detailed descriptions. This tutorial will guide you through the process of leveraging ChatBotKit's vision skillset to create a chatbot that understands images, offering a seamless and enriched user experience.

Using advanced vision models can be expensive, but ChatBotKit provides a cost-effective solution by allowing these capabilities to be used only when absolutely necessary. This ensures you harness the power of image recognition without incurring unnecessary costs. Follow along as we delve into the step-by-step instructions for integrating image recognition into your chatbot, making your AI interactions more engaging and effective.

Step-by-Step Guide

  1. Create a Bot:

    • Navigate to the ChatBotKit platform and create a bot.
    • Provide your bot with a name and description that reflects its purpose.
    • Establish the bot's backstory to give it a unique personality.
    • Select a modern, cost-effective model such as gpt-4.1-mini or gpt-5-mini (or claude-4.6-sonnet when you need higher reasoning quality).
  2. Define Skillsets:

    • Skillsets are essential instructions that empower your bot to perform specific tasks. For image recognition, create a new skillset named "Image Recognition".
    • Under the new skillset, add an ability called "Describe Image". You can either create one from scratch or install the pre-built Describe Image ability (view/describe) from the ChatBotKit ability catalogue.
  3. Configure the Vision Ability Instruction:

    • If creating a custom ability, use the following template for the ability instruction. This instructs the bot to fetch and analyze an image from a user-provided URL using the view action and the current parameter syntax:

    Note: The $[...] syntax marks an AI-populated parameter (the image URL extracted from the user's message), while ((...)) marks a template parameter you can configure per ability.

  4. Connect the Skillset:

    • Connect the Image Recognition skillset with your bot by selecting it from the drop-down menu in the bot configuration.
  5. Test Your Bot:

    • Interact with your bot by providing various image URLs and verify its ability to correctly fetch and describe the images.
    • Refine the skillset instructions as needed to improve the accuracy and detail of the descriptions.
  6. Deploy and Monitor:

    • Deploy your bot on your chosen platform, such as a website, Slack, or Discord. You can even deploy it to WhatsApp or Telegram where the bot can interact with user-uploaded images directly.
    • Monitor its performance and the use of vision skillsets to ensure it remains efficient and cost-effective.

Troubleshooting

  • Bot does not process the image: Ensure the skillset is connected to the bot and that the model you selected supports vision capabilities. Models like gpt-4o, gpt-4.1, gpt-5, and claude-4.6-sonnet support image input.
  • Incorrect ability parameter syntax: Make sure you are using $[param_name|description] for AI-extracted parameters and ((param_name|description)) for template parameters. The older ${...} syntax is no longer supported.
  • Image URL not accessible: The view action fetches the image from the provided URL. Ensure the URL is publicly accessible and points directly to an image file.

By following these steps, you can create a powerful and cost-effective chatbot that leverages image recognition, enhancing user interactions and providing valuable visual analysis. For more detailed information and examples, explore the ChatBotKit Documentation.