Go back

Data Platform For Carbon Trading Company

How We Rebuilt a Carbon Trading Platform's Data Infrastructure (Without Breaking It)

A two-year journey from chaos to clarity in one of Europe's fastest-growing carbon markets

The Emergency Call

"We have a problem. Actually, we have several problems, but we're not entirely sure what they all are."

That's how the conversation started with the CTO of a European carbon trading company back in early 2022. They were experiencing explosive growth as carbon markets heated up globally, but their data infrastructure was barely holding together. Somewhere between installing Keboola and trying to extract data from a dozen different carbon market APIs, things had gone sideways.

They knew they needed help. They just weren't sure how deep the problems went.

What We Found

When we first logged into their Keboola environment, the situation was worse than they'd described. It wasn't that the platform was broken—pieces of it worked. The problem was that nobody really understood which pieces, or why, or how they all fit together.

Imagine trying to navigate a city where all the street signs have been removed, half the maps are outdated, and the person who planned the roads left no documentation. That's what their data warehouse felt like.

Pipelines were held together with manual workarounds. Data extraction was ad-hoc and fragmented. When something broke—which happened frequently—the team had to reverse-engineer what was supposed to happen before they could even start fixing it. New data sources were being added through increasingly creative hacks because nobody wanted to touch the core system for fear of breaking everything else.

Meanwhile, the business was growing at thirty percent quarter over quarter. Trading volumes were up. Regulatory reporting requirements were getting more complex. And the finance team kept asking why their Snowflake bills were climbing faster than their data volumes.

The technical debt was compounding faster than anyone realized.

Starting From First Principles

Most data engineering firms would have jumped straight into building. We did something different: we spent two full months just listening and analyzing.

We sat with the trading team to understand their actual workflows, not what the requirements document said they should be doing. We mapped every data source, traced every pipeline, documented every transformation. We interviewed stakeholders across operations, compliance, and analytics to understand what data they actually needed versus what they were currently getting.

This wasn't billable work that looked impressive on a project plan. It was slow, methodical detective work. But it was essential.

By the end of those two months, we had a complete picture of what the system was supposed to do, what it actually did, and most importantly, what it needed to do as the business scaled. We identified which parts of the existing setup could be salvaged and which needed to be rebuilt from scratch.

The strategic decision we made surprised the client. Instead of ripping out Keboola and starting fresh with a different platform, we recommended keeping it but rebuilding everything on top of it properly. The platform itself wasn't the problem—the implementation was.

Rebuilding While Everything's Running

Here's the tricky part about fixing data infrastructure in a trading company: you can't just turn it off for maintenance. Carbon markets operate around the clock. Traders need real-time data. Compliance reports have regulatory deadlines. Everything has to keep running while you're fixing it.

We spent the next six months essentially performing open-heart surgery on a running system.

First came the unglamorous but essential work of cleaning up the existing mess. We removed redundant pipelines, fixed broken connections, standardized naming conventions. We organized the chaos into a proper medallion architecture where Bronze layer held raw data exactly as it came from sources, Silver contained cleaned and validated data, and Gold held business-ready datasets optimized for analytics.

For carbon market data sources that didn't have standard Keboola connectors, we built custom Python extractors. These weren't quick hacks—we designed them with proper error handling, monitoring, and retry logic so they wouldn't silently fail like the previous implementations.

Meanwhile, we were establishing Snowflake as the proper data warehouse foundation with well-designed schemas, security controls, and access management. The difference was night and day. Within six months, data was flowing reliably for the first time since the platform launched.

But we were just getting started.

The Cost Crisis That Wasn't

Around month seven, we noticed something concerning. As data volumes grew, Snowflake costs were climbing significantly. This wasn't a surprise—more data generally means higher costs. But the rate of increase was concerning.

Instead of waiting for this to become a budget crisis, we got ahead of it.

We restructured the architecture to bypass Keboola's processing layer for high-volume data sources, ingesting directly into Snowflake where it made sense. We rewrote expensive queries that were scanning entire tables when they only needed a slice of data. We implemented proper clustering and partitioning strategies. We built cost monitoring dashboards so the finance team could see exactly where money was going instead of just watching the bill grow.

The reporting layer was rebuilt using Streamlit, creating interactive dashboards directly on Snowflake. This eliminated yet another layer of data movement and gave traders and analysts the real-time insights they'd been asking for.

The results surprised even us. Despite processing three times more data than when we started, monthly Snowflake costs dropped by forty percent. The client's CFO sent us a thank-you email. That doesn't happen often in data engineering.

Building for the Long Term

By month thirteen, the infrastructure was stable and efficient. But we had one more critical goal: making sure the client's team could run this independently.

We migrated all transformations to dbt, bringing version control, automated testing, and comprehensive documentation to their data workflows. Every transformation was tested, documented, and tracked through GitHub with proper CI/CD pipelines. No more mystery SQL scripts that only one person understood.

But technology is only half the battle. We spent significant time training their internal team on dbt, SQL best practices, and Keboola management. We created comprehensive documentation including a full data catalog, lineage tracking, and operational runbooks. We held weekly knowledge transfer sessions where we walked through not just what we built, but why we made specific decisions.

The goal was never to make them dependent on us. The goal was to leave them with infrastructure they could confidently operate, debug, and extend themselves.

What Two Years of Work Delivered

After twenty-four months of collaboration, here's what changed:

The infrastructure that once required constant firefighting now runs reliably without intervention. Pipelines that used to fail weekly now have a ninety-five percent success rate. When something does break, the team can diagnose and fix it in minutes instead of hours because the system is properly documented and organized.

Development velocity increased dramatically. What used to take weeks of careful navigation through tangled dependencies now takes days. The team can confidently add new data sources, build new analytics, and respond to business requirements without fear of breaking unrelated systems.

The cost structure that was spiraling out of control is now predictable and optimized. They're processing significantly more data while spending less money. The scalability concerns that kept the CTO up at night have been solved—the infrastructure is ready to handle ten times current volumes.

Most importantly, the internal team is empowered. They're not just maintaining what we built; they're actively extending it. They understand the modern data stack. They know how to use dbt properly. They can debug complex data quality issues. They've internalized the practices that will keep this infrastructure healthy for years.

What We Learned

Every project teaches you something. This one reinforced several truths about data infrastructure work.

First, rushing to build before understanding the business is expensive. Those two months we spent on discovery and analysis paid for themselves many times over by helping us avoid costly mistakes and rework.

Second, proactive cost management beats reactive budget cuts. By monitoring costs closely and optimizing continuously, we avoided the emergency cost-cutting exercises that often compromise functionality.

Third, incremental delivery keeps stakeholders engaged and allows course correction. We didn't disappear for six months and then reveal a finished system. We delivered value every month, gathered feedback, and adjusted our approach based on real-world usage.

Finally, knowledge transfer can't be an afterthought. Building systems that only you can maintain creates dependency, not value. The measure of success isn't just what you build—it's whether the client can sustain and extend it after you leave.

The Challenges Nobody Talks About

Not everything went smoothly. Building custom integrations for niche carbon market data sources was harder than expected. Some of these APIs were barely documented, had rate limiting that wasn't mentioned anywhere, and returned data in formats that required creative parsing.

Balancing the convenience of Keboola's managed platform against the cost efficiency of direct Snowflake ingestion required careful analysis for each data source. The decision wasn't always obvious, and we had to reconsider our choices several times as volumes changed.

Training non-technical business users on modern data concepts without overwhelming them took patience and iteration. The compliance team didn't need to understand medallion architecture or what dbt does—they needed to understand how to get reliable regulatory reports. Finding that balance between technical accuracy and practical usability was an ongoing challenge.

Why This Matters for Your Business

If you're reading this and thinking "that sounds familiar," you're not alone. These problems aren't unique to carbon trading.

We see the same patterns across industries: rapid growth outpacing data capabilities, platforms installed but not properly implemented, technical debt accumulating faster than teams realize, costs rising without clear visibility into what's driving them.

The good news is that these problems are solvable. It takes time, careful planning, and a willingness to fix foundations before adding new features. But the payoff—stable infrastructure, empowered teams, predictable costs, and the ability to actually use your data for competitive advantage—is worth the investment.

The question isn't whether your data infrastructure needs attention. The question is whether you're going to address it proactively or wait until it becomes a crisis.

Project Details:

Duration: 24 months
Technologies: Keboola, Snowflake, dbt, Python, GitHub, Streamlit
Industry: Fintech / Carbon Trading
Team: 1-2 data engineers
Engagement: Retainer evolving to advisory