Documentation Index Fetch the complete documentation index at: https://visual-layer-my-changes.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
How This Helps Duplicate detection helps you identify redundant images or frames across your dataset. Use it to streamline cleanup, reduce storage, and improve data quality before training or export.
Prerequisites
A dataset in READY status.
A dataset ID (visible in the browser URL when viewing a dataset: https://app.visual-layer.com/dataset/<dataset_id>/data).
A valid JWT token. See Authentication .
Find Duplicates Using VQL
The preferred approach uses the Visual Query Language (VQL) filter on the Explore endpoint . The duplicates filter groups visually similar media into duplicate clusters.
GET /api/v1/explore/{dataset_id}?vql=[...]&entity_type=IMAGES&threshold=0
Authorization: Bearer <jwt>
VQL Duplicates Filter
Pass a duplicates filter in the vql array:
[{ "duplicates" : { "op" : "duplicates" , "value" : 0.95 }}]
The value field sets the similarity threshold (0.0–1.0). A value of 0.95 returns only clusters where images are at least 95% similar to each other. Lower values return more permissive groupings.
Example
curl -H "Authorization: Bearer <jwt>" \
"https://app.visual-layer.com/api/v1/explore/<dataset_id>?vql=%5B%7B%22duplicates%22%3A%7B%22op%22%3A%22duplicates%22%2C%22value%22%3A0.95%7D%7D%5D&entity_type=IMAGES&threshold=0"
Decoded VQL:
[{ "duplicates" : { "op" : "duplicates" , "value" : 0.95 }}]
Response
{
"clusters" : [
{
"cluster_id" : "d0470097-0c77-4a9c-9edf-289680df7f71" ,
"type" : "IMAGES" ,
"n_images" : 3 ,
"similarity_threshold" : "0" ,
"relevance_score" : null ,
"previews" : [
{
"type" : "IMAGE" ,
"media_id" : "300dad2c-1234-11f1-8483-5a879df30de4" ,
"media_uri" : "https://cdn.example.com/.../image.jpg" ,
"media_thumb_uri" : "https://cdn.example.com/.../thumb.webp" ,
"file_name" : "car_001.jpg" ,
"width" : 1920 ,
"height" : 1080
}
],
"labels" : null ,
"user_tags" : null
}
],
"metadata" : {
"used_duckdb" : true
}
}
Each cluster in the response contains a group of near-duplicate images. The previews array shows representative images from the group.
Find Duplicates Using duplicate_threshold
You can also use the duplicate_threshold query parameter directly on the Explore endpoint as a simpler alternative to VQL.
curl -H "Authorization: Bearer <jwt>" \
"https://app.visual-layer.com/api/v1/explore/<dataset_id>?duplicate_threshold=0.95&entity_type=IMAGES&threshold=0"
Parameter Type Description duplicate_thresholdfloat Similarity cutoff (0.0–1.0). Returns only clusters containing near-duplicates at this threshold or higher.
Filter by Uniqueness
To find the most unique images (the opposite of duplicates), use the uniqueness VQL filter.
[{ "uniqueness" : { "op" : "uniqueness" , "value" : 0.8 }}]
This returns images with a uniqueness score above the specified threshold. Higher values mean more unique content.
curl -H "Authorization: Bearer <jwt>" \
"https://app.visual-layer.com/api/v1/explore/<dataset_id>?vql=%5B%7B%22uniqueness%22%3A%7B%22op%22%3A%22uniqueness%22%2C%22value%22%3A0.8%7D%7D%5D&entity_type=IMAGES&threshold=0"
Python Example
The following example retrieves all duplicate clusters and prints a summary.
import requests
from urllib.parse import quote
import json
VL_BASE_URL = "https://app.visual-layer.com"
JWT_TOKEN = "<your-jwt-token>"
DATASET_ID = "<your-dataset-id>"
headers = { "Authorization" : f "Bearer {JWT_TOKEN} " }
def find_duplicates ( similarity_threshold : float = 0.95 , page : int = 0 ):
vql = json.dumps([{ "duplicates" : { "op" : "duplicates" , "value" : similarity_threshold}}])
resp = requests.get(
f " {VL_BASE_URL} /api/v1/explore/ {DATASET_ID} " ,
headers = headers,
params = {
"vql" : vql,
"entity_type" : "IMAGES" ,
"threshold" : 0 ,
"page_number" : page,
},
)
resp.raise_for_status()
return resp.json()
results = find_duplicates( similarity_threshold = 0.95 )
clusters = results.get( "clusters" , [])
print ( f "Found { len (clusters) } duplicate cluster(s)" )
total_duplicates = sum (c.get( "n_images" , 0 ) for c in clusters)
print ( f "Total duplicate images: { total_duplicates } " )
for cluster in clusters:
n = cluster.get( "n_images" )
cid = cluster.get( "cluster_id" , "" )[: 8 ]
print ( f " Cluster { cid } ... — { n } near-duplicate images" )
for preview in cluster.get( "previews" , [])[: 3 ]:
print ( f " { preview[ 'file_name' ] } " )
Response Codes
See Error Handling for the error response format and Python handling patterns.
HTTP Code Meaning 200 Results returned successfully. 401 Unauthorized — check your JWT token. 404 Dataset not found. 422 Invalid query parameters — check VQL syntax or threshold value.
Semantic Search Search using natural language text queries with VQL.
Export a Dataset Export duplicate clusters for downstream deduplication.