Dataset Search
Dataset search is the core functionality that enables intelligent information retrieval from your knowledge bases. Unlike traditional keyword-based search, dataset search uses semantic understanding to find records that are contextually relevant to a query, even when the exact words don't match. This makes it ideal for powering conversational AI applications where users ask questions in natural language.
When you search a dataset, the system converts your query into a vector embedding that captures its semantic meaning, then compares this embedding against all record embeddings in the dataset using similarity metrics. The most relevant records are returned ranked by their relevance scores, giving you the best matches for answering questions or providing context to your bots.
Dataset search is used extensively in bot conversations to provide relevant context for generating accurate responses. When a user asks a question, the bot can search connected datasets to find pertinent information, then use that information to formulate an informed answer. This retrieval-augmented generation (RAG) approach significantly improves response quality and accuracy.
Performing Searches
To search a dataset, provide a natural language query string that describes the information you're looking for. The search query can be a question, a statement, or keywords - the semantic search will find contextually relevant records regardless of exact phrasing:
The response includes an array of records ordered by relevance, with each record containing its text content, source information, metadata, and a relevance score. Higher scores indicate stronger semantic similarity between the query and the record content.
Understanding Relevance Scores
Each returned record includes a relevance score that indicates how well it matches your search query. Scores typically range from 0 to 1, with higher values representing stronger semantic similarity. You can use these scores to filter results or present only the most relevant records to users.
The scoring algorithm considers semantic meaning rather than simple keyword matching, so records with different wording but similar concepts will score well. This allows the search to understand synonyms, related concepts, and contextual relevance that keyword searches would miss.
Advanced Filtering
In addition to the basic search query, you can apply filters to narrow results based on metadata fields, source information, or other record attributes. This is useful when you need to search within specific categories, filter by document types, or restrict results to particular sources:
Filters allow you to combine semantic search with structured metadata queries, giving you precise control over which records are considered while still leveraging semantic relevance ranking.
Search Performance and Limits
Dataset search is optimized for speed even with large knowledge bases containing thousands of records. Search queries typically complete in milliseconds, making them suitable for real-time bot interactions where users expect immediate responses.
By default, searches return the top most relevant records. You can adjust the number of results returned based on your use case - fewer results for focused answers, more results when you need comprehensive context or want to present multiple options to users.
Important: You must have access permissions to search a dataset. This includes datasets you own directly, datasets shared with your organization, or public datasets. The search endpoint will verify permissions before processing your query.