Confluent – Allium’s Blueprint for Scaling Blockchain Data with Data Streaming

It was great to be featured on Confluent's "Life Is But A Stream" podcast, where I had the chance to dive deep into how we're building Allium's real-time blockchain data infrastructure. As co-founder and CEO, I've been leading our mission to organize the world's blockchain data – transforming petabytes of raw on-chain information into accessible, normalized datasets for enterprises.

In this conversation, I shared our experience building a real-time data streaming architecture that processes 120 megabytes per second of blockchain data, as well as our philosophy around data products: bridging the gap between raw blockchain complexity and business-ready insights. We covered the challenges of data governance at a global scale, lessons learned from moving beyond webhooks to enterprise-grade streaming, and our vision to become the data operating system for the blockchain industry.

"Life Is But A Stream" is Confluent's podcast series exploring how industry leaders harness data streaming to drive innovation, enhance customer experiences, and build mission-critical systems. They focus on real-world implementations, giving founders and engineering leaders a platform to share their technical journeys and streaming strategies.

If this conversation resonates and you'd like to learn more about us – or discuss your own data streaming challenges – reach out to us at hello@allium.so.

Introduction

Joseph: Welcome to Life Is But A Stream, the web show for tech leaders who need real-time insights. I'm Joseph Morais, technical champion and data streaming evangelist here at Confluent. My goal is helping leaders harness data streaming to drive instant analytics, enhance customer experiences, and lead innovation.

Today I'm talking to Ethan Chan, co-founder and CEO of Allium. In this episode, we'll find out what it takes to build a business on real-time data. We'll break down how data streaming has become a core part of Allium's product strategy, why partnering with vendors turned out to be a smarter, more cost-effective solution, and how Ethan rallied both his engineers and customers around a real-time vision.

What Allium Does

Joseph: Let's jump right into it. What do you and your team do at Allium?

Ethan: We are a blockchain data platform company. We take in all the blockchain data and ingest it, normalize it, standardize it, map it to different use cases, and serve it to our customers. If you think about what Bloomberg did for financial data or what Google did for public web page data, we are doing the same thing for blockchain data. We are organizing the world's blockchain data, making it accessible for customers and people in the blockchain crypto space who need to understand what activity is happening, to financial institutions such as Visa who need to understand the stablecoin movements, and also the public sector. How do you put good policy? How do you regulate this digital asset industry? All of them have the same underlying element that they need good, high-performing blockchain data.

Joseph: And is it specific to cryptocurrency, or are there applications of blockchain that are outside of that?

Ethan: Think of it as a Turing machine, as a computer. Whatever happens in a blockchain of a currency is a smart contract that someone defines that this has value, that a group of people define that has value. A smart contract is a small subset of it, and we have customers of ours building their applications on top of this globally distributed computer, and we work closely with them because the blockchains are mainly optimized for writing data to the blockchain and not reading data from the blockchain, and that's where Allium plays. Anytime your application needs to read the state of my wallet, my balance, they can read the data from Allium.

Joseph: That's fantastic. And to tie it together, all of this data, because it's on a blockchain, is out there publicly. But being able to consume all that data across many blockchains, across many different currencies, and turning it into meaningful information, that's what Allium does?

Ethan: That's correct. There are a couple hundred blockchains out there already. We have nearly a hundred of them. There are probably more than a hundred million different tokens out there with different prices, and there are petabytes and petabytes of data, it's only growing over time. Someone has to go in and organize all of that and make it manageable.

Joseph: Who are your customers and who aren't your customers?

Ethan: Software engineers who need to read high-performance data to power their applications such as wallets, trading apps, or even real-time monitoring systems. We also serve analysts. Who wants to understand relative market share, is a token trending up, trending down, should I buy, should I not buy? And also thirdly, we also serve accounting and auditing use cases whereby people who are trading on the blockchain or understanding what's happening, they want to make sure that they don't go to jail. They have all their data in one place so that they can reconcile their finances in one single place, in one single ledger.

Data Streaming Strategy at Allium

Joseph: Fantastic. I know that you guys are all in on data streaming, and I know today's conversation we're gonna get in depth, but at a high level, what is your company's product strategy around data streaming, or how is data streaming involved in your product strategy?

Ethan: Internal data streaming allows us to pass all the data within our own data systems. In terms of growing their revenue, growing the customer base, it's crucial because many of our customers demand real-time data, and they want to control the logic they put on top of the data, on top of the blockchain data. How it relates to a business is that anytime, if you think about our customers who need data streaming, they want to know, give me the latest balances, give me the latest transactions for these sets of wallet addresses I care about. That's where that real-time use case and being able to control their own rules on top of the if this then that on top of the data streams is important for them and us as well.

Joseph: Internally it's about decoupling microservices, producers and consumers, but externally to your customers, it's about how do I provide that data to them?

Ethan: Yes.

Joseph: That's a good strategy.

The Journey to Data Streaming

Joseph: Ethan, tell me, what have you built or are currently building with data streaming?

Ethan: When we first came into this industry three to four years ago, having spent time in the data infrastructure and the machine learning space, what we saw of how people needed to access real-time data in this industry, was more polling. The limitations for continuous polling and webhooks is that they're not to the level of guarantees, enterprise guarantees that this industry is expecting as it matures and grows. The problem with webhooks is that whenever your webhook goes down, you don't know what messages you miss.

Joseph: Sure.

Ethan: Now let's say you're a bank, you're a custodian, custodying all these assets, you're building these monitoring, risk monitoring rules. If you missed a transaction, you don't even know that you missed it, that's not good. That's our journey into this space, and that was when we were one of the companies or earlier companies in the space that said, "Hey, we're not gonna reinvent the wheel." That's a well-known framework that does it well and does data streaming and data delivery in a real-time stream sharing. And that's how we ended up on the Kafka choice.

Joseph: Gotcha. You have mechanisms to go to those webhooks to grab that data, but then when you're presenting it, you're using data streaming as that intermediate layer that has all those enterprise scaling and all those consistency features?

Ethan: On the one side, we do work closely with the RPC nodes to get the data out as soon as possible. And then in terms of fanning it out to different destinations, and sharing with our customers or even internal microservices that you mentioned, that's where we use data streaming for that.

Why Confluent?

Joseph: What inspired you to originally use data streaming? Was there a specific tipping point?

Ethan: Specifically on data streaming, at the previous company I was at, I wasn't a founder, but as one of the earlier engineers there. I remember I had a teammate or a colleague that was managing all the data streaming services. It was painful. That was when I always knew that at least specifically on that tipping point, if we ever did any data streaming, we would try and work with a provider that we could trust and rely on.

Joseph: That's great, 'cause my follow-up is gonna be why Confluent?

Ethan: I would definitely say my own personal journey as a developer engineer, is that early days, I always say I can build that myself. This is an open source library, I can host it. Don't pay that vendor or whatever. I used to be at that, you can build, I know how to build it. And then when you realize that you have to do migrations, you have to maintain stuff 24/7, there are random edge cases you don't know until you deploy into production. What happens if your node goes down? Do you have redundancy? All these little things that honestly burn your weekends away, especially if you're breaking such a critical system, 'cause streaming happens almost at the extreme left. If you break that system, every other engineer at company is gonna complain and you'll become infamous for no reason. And that normally happens when you try and host it yourself. And again, I am sure there are experts that could do it easily, but most people aren't experts. And that was why we definitely always wanted to go for a managed service from the get-go.

Joseph: Especially when you have, if it becomes your critical system of record. And some people aren't that far into data streaming, but when they are, you find that this becomes something that can't break. Why not pass that off to somebody who's made it as unbreakable as it possibly can? And then of course, having that level of support, if things do go awry, which I hope they never do.

Growth with Confluent for Startups

Joseph: Now, I know Allium has become a huge standout in the Confluent for Startups program. The way I understand it, you guys discovered Confluent Cloud and started with our free trial, and then eventually were issued $20,000 worth of credits as part of that program. And I know you guys have grown to a six-figure commitment with Confluent, which is unbelievably exciting. Tell me how you got started in the program, and what drove that level of investment and growth?

Ethan: Confluent had many connectors already built. And one of the critical connectors was with Snowflake. The question is not whether or not, can you do it, it's whether should you do it. That's the driving force behind many of the decisions that we make now. We sure we could write our own connector, but do we want to maintain it 24/7? Do we want to do that? And Confluent, with all the various connectors that you guys already had, we connect to Snowflake, BigQuery, Databricks as well. That's with the brand and the long tail of integrations, I never have to worry about that. That was a big plus. But what also drove more usage is that, two years ago, we probably only had three blockchains on our platform. Ethereum, Polygon, maybe Solana to some extent? And now we have close to 100. We are close to 85 or 86 right now, and probably gonna hit 100 soon. To give you a sense, we have more than 10, 20X the number of... 30X the number of blockchains since we first started on Confluent. And then the third phase is that customers started to realize that webhooks are not gonna cut it. I get it. Webhooks, I'm a developer, I know how to use a webhook quickly, get stuff going, but if something happened on a blockchain, I need to at least know it happened. If my system goes down, I can replay the history so I can reconcile my data again. Those are the valuable pieces of what Confluent and Kafka provides, and that's why that drove the growth. To recap, connectors, explosion of data, size, the number of chains, and also the industry maturing and saying that we cannot build mission-critical systems on top of webhooks anymore.

The Business Challenge that Started It All

Joseph: Before they were your customers, what were these businesses, what were the real challenges that you said, you know what, we need to build a company around this?

Ethan: Imagine that everything that the blockchain spits out, it's structured, but it's also, anyone can write any dumb smart contract into the blockchain, anyone. Again, there are 85 different blockchains, of which there are maybe 10 different bigger ecosystems. Think of a different ecosystem as a different operating system. Windows, Linux, even within the Linux ecosystem, there's different forks, you have Red Hat, different variations there. Imagine now, you have many different operating systems, different types of blockchains out there, and there's also some social agreement standard of everybody agrees that certain smart contracts means this is a transfer of real money, but then whenever there's a huge problem of fragmentation of understanding, what happened across all the different ecosystems and normalizing and standardizing that. The key insight that we saw is that people don't want to go in, and be an expert in every single operating system, as a blockchain system, and then parse the data, and then map it to the use case that people are that they're eventually trying to solve. Let me talk about what these real-world use cases are. For example, Visa is one of our customers. They want to know, what are the stablecoin volume transfers across different blockchains and across different currencies, stablecoins out there. It's a currency backed by real money, real U.S. dollars somewhere or some dollars by some bank somewhere, and they want to know, for the 50 different blockchains out there, how is money getting transferred around? They don't want to know what the activity is, of course because Visa is one of the biggest payment networks in the world. They want to know what's happening, and for customers, the thesis, the insight was that, I was sure that there was no way that these folks would come in, and try and hire 10, 20 engineers, parse everything out, normalize it, and then pay. You said six-figure, but the total spend is way more than seven figures across all infrastructure services. To be able to answer a simple question such as, hey, how much volume was transferred in the past 24 hours? To get onto that simple question, is much data munching that goes behind it, and my thesis was that simple, that if you build an efficient company, you can execute well, and you can do this once, and again, sell it multiple times. And that was the core insight. And at the heart of it, the emotional heart of why we did this is because, I used to be a data scientist at some point as well, and the bane of their job is cleaning the data. No one wants to clean the data.

Joseph: No, they want the good stuff.

Ethan: They want the good stuff. They want to run their machine learning models, present the nice metrics, and then show they grew revenue by 5% and then get a promotion. That's what you care about, but then 95% of the day is figuring out where the data is and how to clean it.

Joseph: What does this field mean?

Ethan: What does this field mean? For the blockchain, something is similar that again, maybe not the data scientists, but a financial analyst or a product manager, a growth engineer. Whoever wants to understand more about what's happening, they have to go through all of the same steps. And if we can do it better, faster and cheaper, why should someone not use us? That was the key insight.

Joseph: It makes sense. You're saying, I realized that in order for these other financial institutions to do this, they're gonna have to build these teams and it's gonna be this many people. And this is to get to the dirty data, and then to clean it and to get insights. You're thinking, I bet if we build something such as this, all these other institutions would use it, and they don't have to have their own teams of 10 people to get the same thing that we're gonna try and build. I'm glad you had this insight because it sounds you're making it easier for these institutions to get the quality data that they need. And that's something we talk about here is this idea of data products. It's not having that raw data that's hard to classify, doesn't have any metadata. It's about taking that, making sure it's properly classified, making sure that the events are correct, and then taking multiple streams of data, and then converting them or combining them or filtering them or modifying them so that they're usable. And then you have that great downstream data product.

Stream Processing and Integrations

Joseph: I know you've already talked about integration a bit, especially with Snowflake and Databricks, but what outcomes have you seen or aiming for specifically with stream processing, and then any other integrations that you may not have mentioned yet?

Ethan: We do real-time processing. We use Apache Beam today for our real-time stream processing. Imagine the moment data comes up fresh from the blockchain, we also do processing. We have a Lambda architecture where we use Snowflake of course, Databricks for the DBT hourly builds, but we also have the real-time system for more mission-critical data schemas we have to parse out, such as real-time balances, real-time transactions, real-time NFT trades, DEX swaps. To give you a sense of those are the types of schemas we have. Our customers, they want, not the raw data. Because I joke that even if we deliver raw data to our customers, it's the start of your problems. If I gave you, even if I gave you the entire corpus of petabytes of raw data for free, you would still spend, a couple hundred thousand to store the data on your site. You don't even want to keep it yourself. Who wants to keep that much data? Right now, we want to do a double check. We're doing 120 megabytes per second of data through Confluent today. Per second. It also shows that that's where the entire blockchain industry is. It's where broadband was about 10 years ago maybe, give or take. But let's take it one step further, is that there is no human possible way for Allium to possibly fit every, the million use cases that the blockchain can spit out. Because anyone can publish any use case, any smart contract out there. How do we allow our customers to build on top of already enriched data that we have, such that we give them the flexibility whereby how do you bring your own transformations and shift left. That's the hot word right now, shift left. We are helping our customers also. Hey, I want to filter on the subset of stuff, and then do some simple transformations and then get my answer quicker. And I only care about a sliver of stuff. And then you extend it further. I want to build an alerting workflow, a monitoring workflow on top of the enriched data. And again, that's another downstream, you feed it another set of conditions. We want to become not the data platform for this industry, but the data operating system for this industry. We want to be the operating system for be it accounting, be it analytics, building part of your applications, how can we become that operating system for your company? And it's apt because the blockchain is an operating system. We are the data operating system for that layer for this industry.

Joseph: For the audience, shift left, in case you've never heard it, the idea is about bringing stream processing or processing of data closer to the source. In this case with Allium, they're grabbing all their data and putting it into data streaming. In many ways, that's as close as they can to manipulating that data. They're not gonna manipulate it before that. The idea is that, people nowadays are sending their operational data to the analytical systems and doing that data processing there, and building their data products and then maybe doing something such as reverse ETL and back. But the idea of shifting left is to do more of that processing closer to the source. And there are advantages to that. One, the people close to the source, the people that are producing the data are usually the ones that know it best. That's a good thing. You can reuse those data products in your operational estate. And then, it also reduces duplication of processing downstream.

Data Governance at Scale

Joseph: I'm curious about another aspect of the DSP. That's something we talk about here, the data streaming platform. How do you approach data governance?

Ethan: We are the data governance team for this industry. Not many, but many important teams in this industry.

Joseph: Sure.

Ethan: And it comes with data verification?

Joseph: Yes.

Ethan: We made those investments early on, early on. And people also forget that verifying your data is almost as expensive as ingesting the data.

Joseph: Yes.

Ethan: People don't know that. It's knowing where you messed up, is more expensive than, it's the same cost as doing it again. You know what I mean? It's not costly. There are clever ways to optimize stuff here and there, but conceptually, the maintenance you have to do. Because right now, every few minutes we have an Airflow DAG orchestrator that we run all the checks. Do all the blocks exist? Do all the transactions exist? Are we missing anything? Does the current block point back to the previous block? Are we missing any transaction hashes? Do the COUNT* for a certain time window and then check it back. Check with other schemas and see whether they all join nicely together. It's work and we do it for every single one of our 85 blockchains out there. And that's where data governance is. That's the first piece of verifying that your primary copy of data is correct. Here's where it gets even more confusing, is that we have customers all over the world. We have customers in Asia, we have customers in U.S. Central, U.S. East, U.S. West, on Databricks, on Snowflake, on BigQuery in Europe. We have to verify that when we replicate our data from our main copy through across the world, we're also making sure that we're not losing any data. Because anytime, my thesis is that anytime we move data from point A to point B, you have to re-verify when you moved it to another point. You have to run a check again. That is also expensive. And I joke that we are a CDN for the data layer, the data in this industry.

Joseph: Interesting. Because you have to replicate the data across the entire, different data regions across the world, different providers. Because again, let me bring it back to the question of which customer wants to store one petabyte of data? Internally, we had this project that we call Data 360, not data... That was a cooler name than data governance, but Data 360 is what we called it internally, to see hey, this data is complete in every single region. And also, we have tens of thousands of schemas. We don't want to replicate everything. Keeping track of what we replicate is also another headache. And for us, the data governance takes that multifaceted global approach. And I do joke, maybe we centralize, decentralized data. And then, we decentralize it again by spreading it across the world.

Joseph: Well, especially with all that replication and distribution of data, you want to make sure you have it right from the beginning. That makes sense.

Stream Sharing and Data Distribution

Joseph: Let's talk about data sharing. I know that is part of the way you deliver to your customers using stream sharing. Can you talk about that a little bit?

Ethan: We use data sharing and stream sharing. FYI, we also do data sharing in the data warehouse, every single major data lake, data warehouse provider has their own form of bulk data sharing already. We do all of that as well. We'll talk about more in the real-time stream sharing piece. We have thousands, maybe close to a thousand different topics that we share across different organizations, across different blockchains, across different schemas today. Many of our customers depend on our real-time data streams, data feeds to power their own apps or build their own monitoring systems or even reconcile their own internal data systems. Sometimes they already have their own system running, they want to use our data streams to double check what our job, checking their homework. Whether or not they're correct. It's been easy to use stream sharing through Confluent, easy. In fact, you can ask my customers, I always demo Confluent all the time, because I said, hey, look at all these real-time data feeds streaming in front of you. One-click share. If you give me your email, and you're good to go. You can push into production today if you wanted to, right after this call.

Joseph: I'm glad that feature is of much use to you. Again, for the audience, stream sharing is something that is exclusive to Confluent. We make it simple for anyone with a Confluent Cloud account to share a data stream, a topic, with anyone else with a Confluent Cloud account. And as Ethan mentioned, some of our partners in the analytics space have similar features as well, but it's an exciting feature and I'm excited Allium gets to use it to its fullest. And that's the benefit of the Confluent data streaming platform, is it's not the data streams or the stream processing or the support or the connectors. It's all these other value-add features, that you can build your businesses around. And that's why I get excited talking about the DSP.

The Future of AI and Data Streaming

Joseph: Tell me, I'm sure you never expected a question such as this, but what's the future of data streaming and AI at Allium?

Ethan: I spent six years in NLP and AI and machine learning before starting Allium. That's my background. For the first couple years at Allium, people always ask me, why don't you go back to AI or why? What is your AI strategy on top of the crypto data or in this space? And I always go and say, I'm gonna wait and see and wait for the infrastructure scaffolding to be built, and then I will save time buy not build, same thing we're working with. It's almost I did enough of the machine learning stuff myself to know that, there's stuff that if I don't want to do it, I don't want to do it. Please let someone else do it. The good news is that, there's been a Cambrian explosion of 10,000 startups that you can work with and partner with, to almost build the AI infrastructure already. How we're looking at it, is that we already have an AI assistant on top of our dataset because our customers, they don't want to understand what schemas we have, they want the answer. The AI assistant has been good at crafting, showing them which schemas to use and also crafting their right queries. And then we want to take that one step further, because I mentioned we want to be the data operating system. We want to get people to build your workflows on top of our data. We're building these primitives for people to go, let's say, want to reconcile all their balances, their audits in one single place. How do you get from that? And then, design the right primitive such that an AI agent can come in and automate that. And we think it starts with the data because the data is the hot pot. And we have a strong foundation and we're gonna start building layer and layer. And I'm pragmatic, as much as investors want to hear me say, we are AI first and everything. But the bottleneck to the best AI models, best AI outcomes, is the best datasets. And that is what I've been focusing on since the beginning.

Joseph: It's a good approach, Ethan. I know I'm certainly biased, but I feel all emerging tech, all has a single crux and that's the data. If your data's not ready, presentable, easily accessible, it doesn't matter what newfangled thing you're gonna introduce, you're gonna be limited by the data.

The Runbook: Tools and Strategies

Joseph: Our next segment, is the runbook. Where we break down strategies to overcome common challenges and set your data in motion. Ethan, tell me what, other than Kafka, what is the top tool Allium relies on for data streaming today?

Ethan: We use Apache Beam today, we use it on Dataflow. Dataflow. We use Dataflow for it right now for the streaming pieces we use, we run all these, write all these workers, to extract all the right fields. Again, do all the custom smart contract parsing out, and then send it on a merry way. That's one of the bigger tools we use for the real time piece.

Joseph: Excellent Beam. Are there any tools or approaches that you actively avoid? And that could be specific to data streaming and a vendor of an architecture?

Ethan: Because we have models, DBT models on Snowflake, running Snowflake already, all these usual DBT transformations, you know how to use them, but they don't perform in near-real-time. That's why you have to use something such as Beam. But ideally, why do I need to replicate my code somewhere else in a different language, different framework? It's not a tool to avoid, but there have been many companies such as the Modern Data Stack, maybe four or five years ago that have been trying to tackle this problem. I wish someone solves it overnight. I still haven't found that. At least it was solved at our scale, you know what I mean? Because it will save us time. But we've tried a number of them and for us it's we are still finding that holy grail, but I don't think it will exist. 'Cause I've been hearing this. I used to work on Apache Spark more than again, when I was at Cloudera, I built this auto config tool to prevent stuff from going out of memory, managing the Java memory heap, so there's no OOMs, and also working in some of the Spark streaming tuning stuff. But Spark Streaming back then was mini-batching. For me, I'm still waiting for that holy grail. I thought, I'm still waiting for that.

Joseph: Avoid completely translating all of your code and your data to another system, make it all interoperable or find things that are interoperable. That's a good pattern. It's good advice for anyone.

Ethan: And also, one nuance here at least for the blockchain. I'll space specifically because the skill set that is needed to understand the smart contracts in this industry, and then write high performance real-time code. There's not many people that have the intersection of skills. It's a labor intensive task and expensive. 'Cause in the field, even if they know the crypto data, they only know SQL but they don't know how to build an engineering system. And that is two different skill sets. If we can unify that, we open up, it changes the way we operate significantly. Instead of having two separate teams almost.

Joseph: Makes sense. You need a multimodal tool that can welcome anybody.

Customer Adoption and Overcoming Resistance

Joseph: For your customers to even adopt your services, they have to start with Confluent Cloud. I'm curious, how do you get your customers to buy into data streaming? Now I know you take them through the UI, but have you ever had any pushback and were you able in those scenarios to convince them yes, this is the right way to consume our data streams?

Ethan: Through any sales process, you always have to educate them and tell them why it's good. For most people, it's human nature. Even if you know something is better doesn't mean you'll do it. I should not drink, I should run more, I should exercise more. Everybody knows that, I need to be more hardworking or something. But no one does. I know I need to do it, but I'm not gonna do it.

Joseph: Right.

Ethan: That's... What we are battling what we should do versus what we want to do?

Joseph: Signing up for Confluent, people are scared about new vendors. Oh no, that's another part of my bill. I thought I'm only paying Allium, why now I have to pay Allium plus someone else. There's that friction involves, and involves educating the customer how to make their lives easier. But also, if someone wanted webhooks we could do it. It's almost we'll do it to prove a point. And then they're okay, you know what? Let's go and do the... Let's use event bus instead. That's part of the friction. And of course, for some of the bigger institutions, they can't sign up for a new organization ID overnight. If it's a startup, they can do it within one minute, but they need to get IT approval and that will take months. There's some frictions there. How do we meet them where they are? Maybe they're not using Confluent. How do we map it directly to. Maybe there are other event buses, we can fan it out to another event bus to fan it to them in their own local environment. We do those workarounds. It takes more time and effort but you do what you have to do. That's because ultimately, I strongly believe that not only are we the blockchain data experts, but we also want to make sure we're FedEx. We live to deliver. We will deliver data wherever you are.

Joseph: Wherever it is.

Ethan: Wherever it is. We'll meet you where you are, whatever country you're in, whatever form factor you're in, whatever stack you want to use, we have to meet you where you're at.

Joseph: I appreciate that. You get it done.

The Data Streaming Meme of the Week

Joseph: Now let's shift gears and dive into the real hard-hitting content, the data streaming meme of the week. And this is a unique one because usually, when I do a data streaming meme of the week, the person I'm interviewing is not the creator of the meme. See, I love this meme. Tell me what inspired you to make this?

Ethan: For this one, it's David Beckham and Posh Spice. Tell me the truth. It starts with it's free, be honest. No, it's open source, it's free. You can run it for yourself for free. Be honest. And then it ends no, you need five full-time engineers, pay for infra, pay for data, pay for storage, pay for networking, certification, trainings, SRE team around it, and then it's free. I may have borrowed this from someone who talked about Confluent, to be honest. I don't take credit. You can see that's a URL. I didn't create this myself, I reposted this. But I reposted it because, I'm also in the business of selling in the infrastructure space. And I see this all the time, because people always tell me that blockchain data is free, I can hit my own, I can get it myself. I have a computer science degree. I can figure this out myself easily. Blockchain is easy. This is smart contracts, whatever. It's easy. And goes back to always to the question of people never take into account the total cost of ownership of maintaining and building a system. If you're doing a hackathon project, doesn't matter. But when you're an enterprise, you're a serious business, you want to build for the long term. That's where, when you take in the total cost of ownership, that's when stuff that is free, that's the reason why it's free, because it's not free. That is where again, I mentioned at the start of our interview, that as I matured, I would definitely say, I was my first couple of years as one of those engineers that said, "I can do it myself, you should be doing it ourselves. This doesn't make any sense. Why are we paying this person? Pay me more money instead and I'll do it." Type of thing. But as I matured, thankfully, I moved away from that, from this myself.

Joseph: Well, a couple comments there. One, you're a startup, you guys are doing great. But other startups maybe, and you already figured this out, is that when you start working and your customers are gonna be enterprises, they're gonna have a certain level of expectation of uptime, of consistency. And that's where, using managed services can provide that to you, because maybe your team is still growing, and your sales is outgrowing your engineering, your SREs. That managed service helps take away that consternation around, well, is this system gonna be scalable? Do I need to worry about hiring 10 more people to scale at the growth we have? And the meme is funny. And I realize you weren't the originator, but you're the spreader of it, which makes you important to the meme world. Someone once described this to me as free, a puppy. I'm gonna give you a puppy, but guess what? You got to feed the puppy, you got to take it to the vet, you got to house the puppy. Free could mean expensive when you take everything into account.

Ethan: Exactly. And you learn the term TCO. Total Cost of Ownership. People don't realize that. People don't want to maybe come to terms with that sometimes, because also there's sunk cost fallacy as well. 'Cause I'm already one foot in, one foot out. I'm sure you face it at Confluent. And also, every single startup founder in the B2B enterprise, SaaS Space or Infra-space, we also face it. It's how do you position it? Essentially it makes sense for people to build stuff on top of your service.

Lightning Round

Joseph: Before we let you go, we're gonna do a lightning round. Byte-sized questions, byte-sized answers. And that is B-Y-T-E. It's hot takes but schema backed and serialized. Are you ready?

Ethan: Sure. Let's go.

Joseph: All right, Ethan. What's something you hate about IT?

Ethan: The what IT?

Joseph: The word, you're not the first person to give that answer, and I love that answer. What is the last piece of media you streamed?

Ethan: The last piece of media I streamed? Probably the Air Traffic Controller, New York, 'cause I'm gonna fly soon, I'm concerned about that.

Joseph: Awesome. Hopefully that comforted you. What's a hobby you enjoy that helps you think differently about working with data across a large enterprise?

Ethan: Funnily enough, not a hobby, but I do reading about fashion and enjoying fashion stuff. How does that relate to data. I care about this thing called data UX. If I design a data schema for you, for a customer, if you have to do one more left join than you need to, and get some nails there, as a data scientist or a data analyst, you will hate. You will, a little bit part of you dies inside of it. I care about that final presentation of the tails and schemas we deliver, and that's something that drives me. Because if you can design a schema that has all the right information that's already, I wouldn't even say proper normalized form, but in the useful form to the use cases, people will love it. And ultimately, you reduce the cost of curiosity, you get people to explore more, and that comes from, it's more a sense of aesthetics and design, and it has to fit, the data has to work.

Joseph: That's good, I appreciate it. That eye to detail that you get in the fashion industry. Can you name a book or resource that's influenced your approach to building a venture of an architecture or implementing data streaming?

Ethan: That's, this is one of my favorite books I recommend as well. The Philosophy of Software Design by Ousterhout.

Joseph: Great.

Ethan: One of the chapters he talked about shallow design and shallow API design versus... Sorry, deep API design versus shallow API design. And in terms of influencing the data piece. Again, sometimes the customers don't want to know everything. They want you to design an API or a schema that shows the right number of fields to answer the question. Everything else, don't even tell them about it. If they need it, you can open the hood for them. That book and that concept I learned from that book many years ago, I enjoy. That relates to data.

Joseph: No, it's great. What's your advice for a first-time chief data officer or somebody else with an equivalent impressive title?

Ethan: Number one, I should patent this name, but my title on LinkedIn is chief data plumber.

Joseph: Oh, I appreciate that. That's good.

Ethan: That is my title. Please do not take it. I am the chief data plumber already. But that advice is, what data leaders would face is that, how do you align? You can build the best data system in the world, but ultimately, you exist to fit that business need, the BI data powering an application. You have to make sure that you align on it. If you're not aligned on that, it doesn't matter if you had the best shiftless system that was optimized, whatever, it doesn't matter. Ultimately, are you driving the business results? And that's the core there.

Joseph: That's good. That's good advice. Now, Ethan, any final thoughts or anything to plug?

Ethan: Two parts of it. We are always hiring for amazing data... Amazing engineers who are interested in data infrastructure in general. We're always hiring for that. If there's anyone out there, also who ever needs blockchain data, DM me, call me, I'm responsive, but we're always hiring. What we say is that, we unfortunately, our infrastructure budget per engineer is higher than what we pay the engineer. That's the scale we work with. But on the flip side, if you're joining a company, you have stuff to learn, every platform, and Confluent, and you'll be using it.

Joseph: Excellent. Well, thank you for joining me today, Ethan. It was a pleasure discussing this with you and having a chat. For the audience, stick around because after this, I'm gonna give you my three top takeaways in two minutes.

Three Key Takeaways

Joseph: Wow, what a fantastic conversation with Ethan. Let's talk about those takeaways. The first one that I just, I'm gonna be thinking about this quite a bit, is how Allium is using data streaming to read once and deliver many times. As Ethan mentioned, the way that you query, the way you query blockchain, it's not performant, it's prone to failure, and it's not ideal for somebody building a real-time delivery system. Allium building something that they can retry and reliably read the data that they need, but then have it delivered many times through data streaming, a system that is specifically built for many things to read many times between whether it's transactions, consumer groups, et cetera, et cetera, an interesting, not novel, but a powerful way of using data streaming to serve their downstream customers.

Ethan asked a question where I think it was important. You can build it, but should you? This is something that everyone should think about, whether you're a startup or you're an enterprise, if there are well-established providers of a technology that have great uptime, have great total cost of ownership when you do an analysis, and have all these extra bells and whistles that make a system enterprise ready such as Confluent data streaming platform, you should consider. I can lift heavy rocks, but should I? Is this where I want to spend my time? And as it pertains to an enterprise, is this where I want my engineer spending their time? Or should they be building things that are specific to my business, my business logic, not doing undifferentiated heavy lifting? I love that.

And another thing that Ethan said is, you could build the system yourself such as Alliums, but all you get is the raw data. And he said, "Delivering raw data is the start of your problem." And I couldn't agree more. Again, it comes back to that idea of shifting left, of getting closer as you can to your data source and take this uncleansed, this raw data and turning it into something that is a data product. And doing that as early as you can, those data products can be used internally at your business, or maybe externally to your customers. Wow. Some fantastic takeaways in terms of how to use data streaming, and some things that you should be thinking about as you start your data streaming journey. That's it for this episode of Life is But a Stream.

‍

Allium Datashares

Allium Datastreams

Allium Developer

Allium Explorer