Learnings after 4 years working with +50 companies on data engineering projects — @javisantana

During the first 4 years of Tinybird (the company I founded) I’ve been helping our customers on the technical side (pre and post sales). I’ve probably talked to more than 100 companies and actively helped +50, ranging from those with just a few employees and Gigabytes of data to top companies in the world with Terabytes.

Tinybird (bear with me, this is not a post about the product) helps solve specific projects, and most of the use cases we had to deal with were not just about building a data platform but also refactoring what the company already had to meet real-time requirements. I think helping these companies change their mindset has saved them millions of dollars a year.

Some clarifications before I start: when I talk about real-time, people think about Kafka, Spark, Flink, etc. But reality means “what you were doing before, but actually fast” or “what you were doing before, but without having to go take a coffee when you run the pipeline”. I like to call it “high performance data engineering”. It usually means:

  1. Lots of data (more than dozens of millions of rows a day with a few years of history, usually Terabytes)
  2. Low latency end to end (so no or lightweight ETL)
  3. Sub second queries, usually <100ms
  4. Reasonable costs (similar to or lower than traditional ETL)

Some practical learnings, in no particular order:

I learned a lot more things about people itself, but that’s another story.