Synthyra LogoSynthyra

Arpil 8th, 2025

Research

Translator - Broad protein annotation fast

Full research results of the Translator model.

  • Annotation Vocabulary

Coming soon

Use it now

Translator is callable as a public independent API endpoint with your Synthyra API key:

import requests

api_key = "..."  # synthyra.com/settings?section=api-keys

resp = requests.post(
    "https://api.synthyra.com/v1/translator/run",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "sequences": ["MEV..."],
        "ids": ["random_seq"],
        "num_annotations": 32,
        "top_k": 3,
    },
    timeout=180,
)
for ann in resp.json()["results"][0]["annotations"]:
    print(f"[{ann['aspect']}] {ann['name']} ({ann['confidence']})")

Pricing: $0.002 per sequence. See the API docs for the full inventory.

Some teaser results:

Translator predicts a fixed number of protein annotations from the Annotation Vocabulary from an input protein sequence.

The topk parameter controls the number of annotations retrieved per "token."

The confidence parameter controls the minimum predicted confidence score for an annotation to be included in the output.

Shown below is a figure showcasing the trade-off between topk and confidence. A higher topk value will result in more annotations being retrieved, but at the cost of lower confidence and precision.

Lower topk values will result in higher precision, meaning each annotation shown is more likely to be correct.

Higher topk values will result in higher recall, meaning that within the set of annotations, more are likely to be retrieved.

A very high topk value is way to explore possible annotations but is less likely to be accurate.

The optimal topk value is a trade-off between precision and recall, often measured by their harmonic mean (F1 score), which is at topk=3 for our evaluation sets.

The figure below also showcases the minimum confidence score at each topk value such that every annotation above that confidence was correctly predicted.

Therefore, you can adjust topk and confidence to be more "sure" about the output or more "exploratory."



Figure 1: By optimizing the topk and minimum confidence of the Translator model, we show that predicted protein annotations on unseen data are highly accurate.


Try Our Protein Analysis Tools

Protein-Protein Interaction

Predict interactions and binding affinities between protein pairs

Protein Properties

Analyze biochemical properties of protein sequences

More Tools

Discover our full suite of protein analysis tools

Related Articles

May 31st, 2026

Accidental Taxonomists: When Protein Models Learn the Wrong Shortcut

Protein models can appear to predict interactions while actually learning species differences. Accidental Taxonomists explains the shortcut and how to avoid it.

  • Protein Protein Interaction
  • Dataset Curation
  • Atlas

May 31st, 2026

Annotation Vocabulary: Teaching Protein Models the Language of Function

Annotation Vocabulary turns protein properties into a structured language, giving models a cleaner bridge between sequence, function, and design.

  • Annotation Vocabulary
  • Protein Function
  • Atlas
BlogInitiativesSign In
Terms of ServicePrivacy Policy

© 2026 Synthyra. All rights reserved