June 7, 2023
BlogResearchSynteract: Predicting Protein Interactions from Sequence
Synteract was the first step in Synthyra's protein interaction research line, showing that sequence-based AI could prioritize likely protein-protein interactions before expensive wet-lab validation.
- Protein Protein Interaction
- Synteract
- Atlas
By Logan Hallee
The Problem: Proteins Rarely Act Alone
Proteins bind, recruit, block, modify, stabilize, and regulate one another. Those protein-protein interactions, or PPIs, are central to signaling, disease biology, immune response, and drug discovery.
The challenge is scale. Wet-lab and in vitro assays remain the source of truth, but testing every possible pair is too slow and expensive. A single organism can contain thousands of proteins, which means millions of possible pairs.
Synteract asked whether a protein language model could help narrow that search. Given two amino acid sequences, could the model estimate whether the proteins are likely to interact?
The First Synteract Idea
The first Synteract model treated PPI prediction as a sequence understanding problem. It used a large protein language model that had learned broad patterns from protein sequences, then adapted that knowledge to distinguish likely interactors from likely non-interactors.
That framing mattered. It showed that a model did not need a solved structure, a hand-built feature set, or organism-specific pathway information to produce useful interaction signals. The amino acid sequence itself contained enough information to make the problem worth pursuing.
For researchers, the practical value was triage: use in silico prediction to prioritize which interactions deserve experimental follow-up.
The Negative Example Problem
PPI modeling has a quiet data problem. Databases contain many examples of proteins that interact, but few experimentally verified examples of proteins that do not interact. Observing a binding event is easier than proving that two proteins never interact under any condition.
Synteract tackled that imbalance by building synthetic negative examples and testing whether those examples could help the model learn a useful boundary between interacting and non-interacting pairs.
The result was encouraging, but it also exposed a broader lesson: how negative examples are chosen can define what the model learns.
A Warning About Shortcuts
One important finding from the early Synteract work was that some PPI datasets could reward the wrong behavior. If negative examples are created by pairing proteins from different cellular compartments, a model can look strong while learning localization patterns instead of interaction biology.
That insight became a recurring theme in the Synteract research line. A useful model should not succeed because it found an artifact in the dataset. It should succeed because it learned a signal that helps researchers reason about biological interactions.
This concern later grew into the Accidental Taxonomists work and the same-species controls used in later Atlas-facing interaction modeling.
Why It Mattered
Synteract was not the final answer. It was the proof that sequence-only interaction prediction was worth pursuing.
It showed that modern protein language models could support PPI prediction across biologically diverse data. It also showed that these systems must be evaluated carefully because apparent performance can hide dataset shortcuts.
That combination shaped the later Synthyra approach. Atlas is the production-facing continuation of this research direction, extended with improved modeling, safeguards, calibration, API infrastructure, and network-scale analysis.
What It Enables
The goal is not to replace experiments. It is to make experiments more efficient.
A sequence-based PPI model can help a team choose which candidate binders to synthesize, which off-targets to inspect, which disease pathway partners to prioritize, or which newly sequenced proteins deserve closer study.
Synteract began that path by showing that interaction prediction could move from slow pairwise biology toward fast, scalable hypothesis generation.
This blog post summarizes work in the following paper:
Protein-Protein Interaction Prediction is Achievable with Large Language Models
Logan Hallee, Jason P. Gleghorn
bioRxiv 2023.06.07.544109; doi: https://doi.org/10.1101/2023.06.07.544109
Related Research
September 18, 2025
BlogSynteract-4: Interaction Prediction as Retrieval
Synteract-4 reframes protein-protein interaction prediction as sequence-only representation learning at proteome scale.
- Protein Protein Interaction
- Synteract
- Atlas
April 22, 2025
BlogSynteract2: The Next Stage in Protein Interaction Prediction
Synteract2 extends sequence-based interaction modeling toward affinity and binding-site prediction while keeping evaluation caveats explicit.
- Protein Protein Interaction
- Synteract
- Atlas