Synteract: Predicting Protein Interactions from Sequence

Synteract was the first step in Synthyra's protein interaction research line, showing that sequence-based AI could prioritize likely protein-protein interactions before expensive wet-lab validation.

Protein Protein Interaction
Synteract
Atlas

By Logan Hallee

The Problem: Proteins Rarely Act Alone

Proteins bind, recruit, block, modify, stabilize, and regulate one another. Those protein-protein interactions, or PPIs, are central to signaling, disease biology, immune response, and drug discovery.

The challenge is scale. Wet-lab and in vitro assays remain the source of truth, but testing every possible pair is too slow and expensive. A single organism can contain thousands of proteins, which means millions of possible pairs.

Synteract asked whether a protein language model could help narrow that search. Given two amino acid sequences, could the model estimate whether the proteins are likely to interact?

The First Synteract Idea

The first Synteract model treated PPI prediction as a sequence understanding problem. It used a large protein language model that had learned broad patterns from protein sequences, then adapted that knowledge to distinguish likely interactors from likely non-interactors.

That framing mattered. It showed that a model did not need a solved structure, a hand-built feature set, or organism-specific pathway information to produce useful interaction signals. The amino acid sequence itself contained enough information to make the problem worth pursuing.

For researchers, the practical value was triage: use in silico prediction to prioritize which interactions deserve experimental follow-up.

The Negative Example Problem

PPI modeling has a quiet data problem. Databases contain many examples of proteins that interact, but few experimentally verified examples of proteins that do not interact. Observing a binding event is easier than proving that two proteins never interact under any condition.

Synteract tackled that imbalance by building synthetic negative examples and testing whether those examples could help the model learn a useful boundary between interacting and non-interacting pairs.

The result was encouraging, but it also exposed a broader lesson: how negative examples are chosen can define what the model learns.

A Warning About Shortcuts

One important finding from the early Synteract work was that some PPI datasets could reward the wrong behavior. If negative examples are created by pairing proteins from different cellular compartments, a model can look strong while learning localization patterns instead of interaction biology.

That insight became a recurring theme in the Synteract research line. A useful model should not succeed because it found an artifact in the dataset. It should succeed because it learned a signal that helps researchers reason about biological interactions.

This concern later grew into the Accidental Taxonomists work and the same-species controls used in later Atlas-facing interaction modeling.

Why It Mattered

Synteract was not the final answer. It was the proof that sequence-only interaction prediction was worth pursuing.

It showed that modern protein language models could support PPI prediction across biologically diverse data. It also showed that these systems must be evaluated carefully because apparent performance can hide dataset shortcuts.

That combination shaped the later Synthyra approach. Atlas is the production-facing continuation of this research direction, extended with improved modeling, safeguards, calibration, API infrastructure, and network-scale analysis.

What It Enables

The goal is not to replace experiments. It is to make experiments more efficient.

A sequence-based PPI model can help a team choose which candidate binders to synthesize, which off-targets to inspect, which disease pathway partners to prioritize, or which newly sequenced proteins deserve closer study.

Synteract began that path by showing that interaction prediction could move from slow pairwise biology toward fast, scalable hypothesis generation.

This blog post summarizes work in the following paper:

Protein-Protein Interaction Prediction is Achievable with Large Language Models
Logan Hallee, Jason P. Gleghorn
bioRxiv 2023.06.07.544109; doi: https://doi.org/10.1101/2023.06.07.544109

Related Research

September 18, 2025

Blog

Synteract-4: Interaction Prediction as Retrieval

Synteract-4 reframes protein-protein interaction prediction as sequence-only representation learning at proteome scale.

Protein Protein Interaction
Synteract
Atlas

April 22, 2025

Blog

Synteract2: The Next Stage in Protein Interaction Prediction

Synteract2 extends sequence-based interaction modeling toward affinity and binding-site prediction while keeping evaluation caveats explicit.

Protein Protein Interaction
Synteract
Atlas