Dataset Records
Dataset records are the fundamental building blocks of your knowledge base, representing individual pieces of information that agents can retrieve and use to provide accurate, contextual responses. Each record contains text content that can be searched, retrieved, and referenced during conversations, making them essential for building intelligent applications that leverage stored knowledge.
Records support various types of content including documentation, FAQs, product information, support articles, or any structured text data. The system automatically processes and indexes each record, enabling efficient semantic search and retrieval when agents need relevant information to answer user queries.
Creating Dataset Records
Creating a dataset is the first step in organizing information, but populating it with records is what makes it truly useful. Each record you create becomes immediately available for retrieval by agents and applications connected to the dataset.
To create a record, you need to provide the text content and optionally
specify a source reference and metadata. The text parameter contains the
actual content that will be indexed and retrieved, while source helps you
track where the information came from (such as a URL, document name, or
section identifier). The meta field allows you to attach custom metadata
for organizational or filtering purposes.
The API returns the unique record ID, which you can use for subsequent operations such as updating or deleting the record. Records are immediately indexed and available for search after creation.
Important: Record content is stored in the vector store specified when the dataset was created. Some vector stores have size limits per record, so you may need to split very large documents into multiple records.
Deleting Dataset Records
Deleting individual records from a dataset allows you to remove specific pieces of information from your knowledge base without affecting other records or files. This fine-grained control is essential for managing dataset content quality, removing outdated information, or handling data privacy requests where specific content needs to be permanently removed.
Each record in a dataset represents a discrete piece of information, typically a text chunk extracted from a file or manually created through the API. When you delete a record, you permanently remove not only the text content but also its associated embeddings, metadata, and any indexing that makes it searchable. The record becomes immediately unavailable for semantic search and bot interactions.
Record deletion is useful in several scenarios: removing inaccurate information that was extracted incorrectly from a source file, purging sensitive data that should no longer be accessible, cleaning up duplicate or redundant records that clutter search results, or maintaining dataset quality by removing low-value content that doesn't contribute to meaningful bot responses.
Performing Record Deletion
To delete a specific record, you need both the dataset ID and the record ID. The operation is straightforward and executes immediately:
The request body can be empty as no additional parameters are required. The operation verifies that both the dataset and record exist and that you have permission to delete the record before proceeding with the deletion.
Important Considerations
Permanence: Deleting a dataset record is a permanent action that cannot be undone. Once deleted, the record and its embeddings are removed from storage and cannot be recovered. If you need the information again, you'll need to recreate the record manually or reprocess the source file.
File Relationships: Deleting a record doesn't affect the source file if the record was created from a file attachment. The file remains attached to the dataset, but this specific record extracted from it will be gone. Other records from the same file remain intact.
Search Impact: After deletion, the record immediately stops appearing in search results and semantic queries. Bots connected to the dataset will no longer be able to reference this information in their responses.
Authorization: You can only delete records from datasets that belong to your account. The operation will fail if you attempt to delete records from datasets you don't own or don't have appropriate permissions to modify.
Exporting Dataset Records
Exporting dataset records allows you to extract all the knowledge base content from a dataset in structured formats suitable for backup, migration, analysis, or integration with external systems. The export functionality provides multiple format options to accommodate different use cases and downstream processing requirements.
Dataset exports are particularly valuable for several scenarios: creating backups of your knowledge bases before making significant changes, migrating content between different ChatBotKit accounts or environments, analyzing record distributions and content patterns for quality assurance, sharing knowledge base content with team members or external collaborators, and integrating with data processing pipelines or analytics tools.
The export operation streams records progressively, making it efficient even for large datasets containing thousands of records. Rather than loading all records into memory at once, the API delivers them incrementally, reducing memory overhead and allowing you to process records as they arrive.
JSON Format Export
JSON format provides a structured, human-readable export that's ideal for backup purposes and easy to import into other systems. The export includes all record fields including IDs, text content, source information, metadata, and timestamps:
The response contains an array of record objects with complete field information. This format is best when you need to inspect the data manually, store it in document databases, or process it with JavaScript-based tools.
JSON Lines (JSONL) Format Export
JSONL format provides each record as a separate JSON object on its own line, making it ideal for streaming processing and large-scale data operations. This format is particularly useful when working with big data tools, streaming processors, or when you need to process records one at a time without loading the entire dataset:
Each line in the response is a complete, valid JSON object representing one record. This format is preferred for batch processing, data pipeline integration, and scenarios where memory efficiency is important. Many data processing tools like Apache Spark, BigQuery, and command-line utilities work well with JSONL.
CSV Format Export
CSV format produces a spreadsheet-compatible export that can be opened in Excel, Google Sheets, or any spreadsheet application. This format is ideal for data analysis, reporting, and scenarios where non-technical stakeholders need to review or work with the dataset content:
The CSV export includes columns for all standard record fields. Metadata fields are serialized to YAML format within their column for readability and compatibility. This makes it easy to review record content, generate reports, or perform analyses using spreadsheet tools.
Pagination and Filtering
Exports support pagination parameters allowing you to retrieve records in batches if needed for incremental processing. You can use cursor-based pagination to fetch records in chunks, control the number of records per page, and apply metadata filters to export only specific subsets of your dataset.
This is particularly useful when you want to export only certain categories of records, records from specific sources, or records matching particular metadata criteria. Combining export with filtering gives you precise control over what content gets extracted.
Important: You must own the dataset to export its records. Export operations are not available for shared or public datasets unless you're the owner. The export includes only the record data stored in the dataset, not the underlying file contents or embeddings.
Fetching a Specific Dataset Record
Retrieving detailed information about an individual record allows you to access its complete content, source information, metadata, and indexing details. This operation is essential for verifying record content, debugging search behavior, auditing data accuracy, or displaying record information in administrative interfaces.
When you fetch a record, you receive the full record object including the original text content, any source attribution, custom metadata fields, and system-generated information like creation and update timestamps. This comprehensive view helps you understand exactly what information is stored and how it's being used in search operations.
To retrieve a specific record by its ID, send a GET request:
Replace {datasetId} with your dataset identifier (e.g., dts_abc123xyz)
and {recordId} with the specific record identifier (e.g., rec_def456ghi).
Response Structure
The response includes the complete record data:
Field Explanations
- id: Unique identifier for this record
- text: The actual content that will be searched and retrieved
- source: Optional attribution indicating where this information came from
- meta: Custom metadata fields for organization and filtering
- createdAt: Timestamp when the record was initially created
- updatedAt: Timestamp of the most recent modification
Common Use Cases
Content Verification: Review the actual text content to ensure accuracy and completeness of information stored in your knowledge base.
Search Debugging: When search results seem incorrect, fetch the actual records being returned to understand what information the AI is working with.
Data Auditing: Verify source attribution and metadata to ensure proper documentation of information provenance.
UI Display: Show detailed record information in administrative dashboards or content management interfaces.
Quality Assurance: Review records systematically to maintain high-quality knowledge base content.
Authorization
You can only fetch records from datasets that belong to your account. Attempting to access records from other users' datasets will result in an authorization error.
Performance Note
Fetching individual records is a lightweight operation suitable for frequent access. For bulk operations or comprehensive dataset reviews, consider using the list endpoint with appropriate filters instead.
Updating a Dataset Record
Modifying existing records allows you to keep your knowledge base current, correct inaccuracies, refine content for better search results, and update metadata as your organizational needs evolve. Record updates automatically trigger re-indexing, ensuring that the new content is immediately searchable and will be reflected in future query results.
When you update a record, you can modify its text content, change source attribution, or update custom metadata fields. The update operation preserves the record's unique identifier while applying your changes and updating the modification timestamp. This maintains referential integrity while allowing content evolution.
The ability to update records incrementally is essential for maintaining knowledge base quality without disrupting service. Whether you're fixing typos, expanding explanations, updating product information, or refining categorization metadata, record updates provide the flexibility needed for continuous improvement of your AI's knowledge foundation.
To update an existing record, send a POST request with the fields you want to modify:
Replace {datasetId} with your dataset identifier and {recordId} with the
specific record you want to update. You only need to include the fields you
want to change—omitted fields will retain their current values.
Updatable Fields
- text: The primary content that will be searched and retrieved
- source: Attribution indicating where this information originated
- meta: Custom metadata object for organization and filtering
Response
Upon successful update, the API returns the record ID:
Automatic Re-indexing
When you update a record's text content, the system automatically:
- Regenerates embeddings: Creates new vector representations of the updated text for semantic search
- Updates search indexes: Ensures the new content is immediately searchable
- Maintains record identity: Preserves the record ID and relationships
- Updates timestamps: Records when the modification occurred
This automatic re-indexing means your changes take effect immediately without requiring manual reprocessing or service restarts.
Common Update Scenarios
Content Corrections: Fix typos, grammatical errors, or factual inaccuracies discovered through use or review.
Information Updates: Refresh content when underlying facts change, such as policy updates, pricing changes, or product specifications.
Search Optimization: Refine text to improve search relevance by adding keywords, clarifying terminology, or restructuring content.
Metadata Enhancement: Add or update categorization metadata to improve filtering and organization without changing the core content.
Source Attribution: Update source information when content is verified against newer documentation or different authoritative sources.
Best Practices
- Preserve context: When updating text, maintain enough context for the record to be understood independently
- Update sources: Keep source attribution current to maintain content provenance
- Use metadata effectively: Leverage metadata updates for versioning, review tracking, and quality management
- Test search impact: After significant content updates, verify that search results still return relevant information
- Batch similar updates: If updating multiple related records, consider doing so in sequence to maintain consistency
Authorization
You can only update records in datasets that belong to your account. Attempting to modify records in other users' datasets will result in an authorization error.
Listing Records
Listing records within a dataset allows you to retrieve and browse all the individual entries that make up your knowledge base. This operation is essential for reviewing dataset content, performing audits, or implementing custom search and filtering logic in your application.
The list endpoint returns all records in the dataset with pagination support, allowing you to efficiently retrieve large datasets in manageable chunks. Each record includes its unique identifier, text content, source reference, metadata, and timestamps for creation and last update.
You can use query parameters to implement pagination using cursor-based navigation, which is more efficient than offset-based pagination for large datasets. The response includes a cursor that you can use to fetch the next page of results.
The API returns an array of record objects, each containing the full record data including text content, source information, and any associated metadata. You can use this data to display records in your application's interface, perform client-side filtering, or synchronize with external systems.
Performance Note: For datasets with thousands of records, consider using pagination parameters to limit the number of records returned per request. This improves response time and reduces memory usage in your application.