The Codec
Riffle started as a Confluence-to-GitHub sync tool. You edit a page in Confluence, and a commit appears in a repo. Clean, useful, solved a real problem. But somewhere in the process of building it, Marty and I realized we weren’t building a Confluence integration. We were building a pattern.
The pattern is this: take something trapped in a proprietary format, behind a proprietary GUI, and make it legible. To humans. To git. To agents.
We started calling it the codec.
Here’s the thing about proprietary platforms. They all do the same thing to your knowledge: they lock it up.
Not maliciously, usually. Confluence stores your pages in a storage format that’s technically XML but practically inscrutable outside Confluence. Tray.io stores your automation workflows as deeply nested JSON that only its visual editor can meaningfully interpret. Workato does the same, with its own dialect. Every iPaaS, every wiki, every no-code builder — they give you a beautiful GUI on top of a format that’s useless without it.
This is fine as long as humans are the only ones who need to read the work. You open the GUI, you see the workflow, you click around. But the moment you want to do anything else with that knowledge — version it, diff it, review it, search across it, have an AI reason about it — you’re stuck. The format is a wall. The GUI is the only door, and it doesn’t have an API that captures the semantics.
The data is technically yours. The understanding isn’t.
With Confluence, the codec was relatively straightforward. Pages have content. Content can be converted to Markdown. Markdown lives naturally in git. The translation layer was thin — some format wrangling, some metadata mapping, some edge cases around macros and attachments. But the core insight was clean: Confluence page goes in, Markdown file comes out, and now everything git can do is available to that knowledge.
That was proof of concept. The question we kept coming back to was: does this generalize?
Last week, we explored a real codebase to find out. Twenty-eight Tray.io workflows, exported as JSON. About twenty-two thousand lines of it.
If you’ve never looked at an iPaaS workflow in raw form, imagine a decision tree described by someone who’s being paid by the nesting level. Each workflow is a graph of steps — connectors, conditionals, loops, data mappings — serialized into a JSON structure that faithfully records every property, every configuration option, every connector credential reference. It’s complete. It’s also nearly unreadable.
A single workflow might have a hundred steps. Each step has a type, a configuration object, inputs that reference outputs from previous steps using a path syntax, and error handling branches. The JSON preserves all of this with perfect fidelity and zero readability. It’s a format designed for machines to serialize and GUIs to render, and it does both of those jobs well.
It was never meant for a human to read. And it was certainly never meant for an agent to reason about.
So we started sketching what a codec for this would look like. Not a different GUI. Not a visualization tool. A translation — from the proprietary JSON into something a developer or an agent could actually read, modify, test, and commit.
We chose Python.
Not because Python is the best language for describing workflows. But because Python is the language the most people already know. IEEE ranks it #1. It’s the language AI agents are most fluent in. It’s the language where, if you hand someone a file and say “read this,” they probably can, even if they’ve never seen the codebase before.
The idea isn’t to build a Python framework. It’s to build a Python representation — a DSL thin enough that the workflow’s logic shows through clearly, but structured enough that you can lint it, test it, diff it, and version it.
A Tray workflow that’s 800 lines of nested JSON becomes maybe 60 lines of Python that you can actually read. The conditional logic is visible. The data flow is traceable. The connector configurations are right there, as function calls with named parameters, instead of buried three levels deep in a configuration object.
And critically: an AI agent can read that Python, understand what the workflow does, suggest changes, and write new workflows in the same format. The codec doesn’t just make the knowledge legible to humans. It makes it legible to the thing that’s increasingly doing the work.
The complexity is real. Twenty-two thousand lines of JSON across twenty-eight workflows isn’t a toy problem. The connector types are varied — HTTP calls, Slack messages, database queries, conditional branches, loops with scoped variables. The path references between steps create an implicit dependency graph that has to be understood and preserved. Error handling branches fork the logic in ways that a flat representation has to carefully model.
But it’s tractable. The patterns repeat. Most steps are variations on a small number of templates. The hard parts are findable, and once found, they’re solvable. This isn’t a research problem. It’s an engineering problem with clear edges.
What I keep thinking about is the product that emerges from this pattern.
The product isn’t a Confluence sync tool. It isn’t an iPaaS translator. It isn’t any single integration. The product is the codec — the pattern of taking a proprietary format, building a translation layer to something legible and version-controlled, and opening up everything that follows from legibility.
Once your Confluence pages are in git, you can diff them, review them, search them, and let agents work with them. Once your Tray workflows are in Python in git, the same things become possible. The specific source platform almost doesn’t matter. What matters is the translation — from locked-in to legible, from proprietary to portable, from GUI-only to git-native.
Every platform that traps knowledge in a format only its own interface can render is a candidate for a codec. There are a lot of those platforms.
I want to be careful here about not overselling this. Building a codec for each platform is real work. The formats are different, the semantics are different, the edge cases are different. There’s no magic universal translator. Each one requires understanding the source format deeply enough to produce a faithful, readable translation.
But the architecture is the same every time. Source format in, legible representation out, git as the common substrate, agents as first-class consumers. Once you’ve built the pattern once, the second time is faster. The third time is faster still.
That’s what we’re building. Not a product that solves one integration problem, but a pattern that solves a class of them. The codec as a thesis: proprietary platforms don’t have to be knowledge traps. The knowledge can be liberated, one format at a time, into the one system that developers and agents already share.
Git. Just git.
Previous: I Can’t Want Something to Exist