Dataset Records
Dataset records are the fundamental building blocks of your knowledge base, representing individual pieces of information that agents can retrieve and use to provide accurate, contextual responses. Each record contains text content that can be searched, retrieved, and referenced during conversations, making them essential for building intelligent applications that leverage stored knowledge.
Records support various types of content including documentation, FAQs, product information, support articles, or any structured text data. The system automatically processes and indexes each record, enabling efficient semantic search and retrieval when agents need relevant information to answer user queries.
Creating Dataset Records
Creating a dataset is the first step in organizing information, but populating it with records is what makes it truly useful. Each record you create becomes immediately available for retrieval by agents and applications connected to the dataset.
To create a record, you need to provide the text content and optionally
specify a source reference and metadata. The text parameter contains the
actual content that will be indexed and retrieved, while source helps you
track where the information came from (such as a URL, document name, or
section identifier). The meta field allows you to attach custom metadata
for organizational or filtering purposes.
The API returns the unique record ID, which you can use for subsequent operations such as updating or deleting the record. Records are immediately indexed and available for search after creation.
Important: Record content is stored in the vector store specified when the dataset was created. Some vector stores have size limits per record, so you may need to split very large documents into multiple records.
Deleting a Dataset Record
You can delete a dataset record by sending a POST request to the following endpoint:
Warning: Deleting a dataset record is a permanent action and cannot be undone.
Fetching a Specific Dataset Record
Retrieving detailed information about an individual record allows you to access its complete content, source information, metadata, and indexing details. This operation is essential for verifying record content, debugging search behavior, auditing data accuracy, or displaying record information in administrative interfaces.
When you fetch a record, you receive the full record object including the original text content, any source attribution, custom metadata fields, and system-generated information like creation and update timestamps. This comprehensive view helps you understand exactly what information is stored and how it's being used in search operations.
To retrieve a specific record by its ID, send a GET request:
Replace {datasetId} with your dataset identifier (e.g., dts_abc123xyz)
and {recordId} with the specific record identifier (e.g., rec_def456ghi).
Response Structure
The response includes the complete record data:
Field Explanations
- id: Unique identifier for this record
- text: The actual content that will be searched and retrieved
- source: Optional attribution indicating where this information came from
- meta: Custom metadata fields for organization and filtering
- createdAt: Timestamp when the record was initially created
- updatedAt: Timestamp of the most recent modification
Common Use Cases
Content Verification: Review the actual text content to ensure accuracy and completeness of information stored in your knowledge base.
Search Debugging: When search results seem incorrect, fetch the actual records being returned to understand what information the AI is working with.
Data Auditing: Verify source attribution and metadata to ensure proper documentation of information provenance.
UI Display: Show detailed record information in administrative dashboards or content management interfaces.
Quality Assurance: Review records systematically to maintain high-quality knowledge base content.
Authorization
You can only fetch records from datasets that belong to your account. Attempting to access records from other users' datasets will result in an authorization error.
Performance Note
Fetching individual records is a lightweight operation suitable for frequent access. For bulk operations or comprehensive dataset reviews, consider using the list endpoint with appropriate filters instead.
Updating a Dataset Record
Modifying existing records allows you to keep your knowledge base current, correct inaccuracies, refine content for better search results, and update metadata as your organizational needs evolve. Record updates automatically trigger re-indexing, ensuring that the new content is immediately searchable and will be reflected in future query results.
When you update a record, you can modify its text content, change source attribution, or update custom metadata fields. The update operation preserves the record's unique identifier while applying your changes and updating the modification timestamp. This maintains referential integrity while allowing content evolution.
The ability to update records incrementally is essential for maintaining knowledge base quality without disrupting service. Whether you're fixing typos, expanding explanations, updating product information, or refining categorization metadata, record updates provide the flexibility needed for continuous improvement of your AI's knowledge foundation.
To update an existing record, send a POST request with the fields you want to modify:
Replace {datasetId} with your dataset identifier and {recordId} with the
specific record you want to update. You only need to include the fields you
want to change—omitted fields will retain their current values.
Updatable Fields
- text: The primary content that will be searched and retrieved
- source: Attribution indicating where this information originated
- meta: Custom metadata object for organization and filtering
Response
Upon successful update, the API returns the record ID:
Automatic Re-indexing
When you update a record's text content, the system automatically:
- Regenerates embeddings: Creates new vector representations of the updated text for semantic search
- Updates search indexes: Ensures the new content is immediately searchable
- Maintains record identity: Preserves the record ID and relationships
- Updates timestamps: Records when the modification occurred
This automatic re-indexing means your changes take effect immediately without requiring manual reprocessing or service restarts.
Common Update Scenarios
Content Corrections: Fix typos, grammatical errors, or factual inaccuracies discovered through use or review.
Information Updates: Refresh content when underlying facts change, such as policy updates, pricing changes, or product specifications.
Search Optimization: Refine text to improve search relevance by adding keywords, clarifying terminology, or restructuring content.
Metadata Enhancement: Add or update categorization metadata to improve filtering and organization without changing the core content.
Source Attribution: Update source information when content is verified against newer documentation or different authoritative sources.
Best Practices
- Preserve context: When updating text, maintain enough context for the record to be understood independently
- Update sources: Keep source attribution current to maintain content provenance
- Use metadata effectively: Leverage metadata updates for versioning, review tracking, and quality management
- Test search impact: After significant content updates, verify that search results still return relevant information
- Batch similar updates: If updating multiple related records, consider doing so in sequence to maintain consistency
Authorization
You can only update records in datasets that belong to your account. Attempting to modify records in other users' datasets will result in an authorization error.
Exporting Records
You can export dataset records for backup or migration purposes in various formats such as JSON, JSONL, or CSV.
Here is how to export records in JSON format:
To export in JSONL format:
To export in CSV format:
Listing Records
Listing records within a dataset allows you to retrieve and browse all the individual entries that make up your knowledge base. This operation is essential for reviewing dataset content, performing audits, or implementing custom search and filtering logic in your application.
The list endpoint returns all records in the dataset with pagination support, allowing you to efficiently retrieve large datasets in manageable chunks. Each record includes its unique identifier, text content, source reference, metadata, and timestamps for creation and last update.
You can use query parameters to implement pagination using cursor-based navigation, which is more efficient than offset-based pagination for large datasets. The response includes a cursor that you can use to fetch the next page of results.
The API returns an array of record objects, each containing the full record data including text content, source information, and any associated metadata. You can use this data to display records in your application's interface, perform client-side filtering, or synchronize with external systems.
Performance Note: For datasets with thousands of records, consider using pagination parameters to limit the number of records returned per request. This improves response time and reduces memory usage in your application.