Ep 28: Taavi Rehemägi CEO of Dashbird
Welcome to the talking serverless podcast. I'm your host, Ryan Jones, joined today by Taavi, CEO and Co founder of Dashbird, a next generation monitoring platform for modern cloud environments.
Q: I know that Dashbird has been floated around the serverless community for a long time now. How's everything going with Dashbird? How are you doing?
Taavi: It's a super exciting time for us and I think for everyone in the serverless world feels like it's the inflection point in the market, where everybody is trying to use serverless in a more profound and mature way. We're learning a lot of things about the market and trying to evolve the company around it. So it's really a interesting time for us.
Q: What have you been seeing that’s saying people are really taking serverless seriously?
Taavi: I think there are two things. First, the life cycle we are seeing; people who started building a serverless couple of years ago are actually beginning to see adoption and more usage. Road bumps are slowly being removed from serverless so it’s more production-ready. You can overcome challenges around tooling and operational practices. I think that's due to a lot of work from the database side, also from the community and third party tools as well.
Q: What does serverless look like now compared to three years ago?
Taavi: I started bidding on serverless in 2015. If you look at the infrastructure that we had back then, I would say it was around 80%. Even though we were servers, we were building a lot of things ourselves. If you look at our stack right now or look at our users, like lambda, it isn't a big part of serverless. Infrastructures also use a large variety of different managed services; I think we're seeing that people are learning how to use all different types of services together in a meaningful way. I think that's been a change, but also something that we've learned about the market that it's not just going to be another computer platform.
Q: Do you have a philosophy that you follow; a serverless first mentality? Also, can you expand on this idea that it's more than a computer platform?
Taavi: I think our philosophy and the large shift is that you're uploading all the undifferentiated value creation to the cloud providers. Essentially what you're striving for is only focusing on the direct value to the customer, in who'd actually called or appealed, and everything else should be gotten from AWS or from some other cloud provider. I think there's been a lot of services launched in the last couple of years, and people have really adopted them as well.
Q: Could you talk a bit about operational excellence and what that means?
Taavi: The challenge a lot of companies are facing is that they went from quite dense, containerized workloads into having hundreds or even thousands of cloud resources across many different services like lambda, queues, and API's different databases. And so you end up with all of those resources actually having a lot of outputs like logs, metrics, tracing data, and so on. This creates a challenge around understanding the activity in the first place, as well as identifying failures or performance inefficiencies, costing efficiencies, and keeping the best practices overall. The essence of Dashbird is to help companies overcome the challenge or provide the tools for that.
Q: What are some common serverless monitoring tools that every application should use?
Taavi: The philosophy of Dashbird is three-fold. So I think that aligns quite well; how we've talked about the problem is how we're trying to build our platform. There's three things in my opinion. The first is having the infrastructure to access large amounts of data across all those resources, across regions or across different database accounts in an efficient and meaningful way. That means being able to search query to see dashboards to bubble up, to detect anomalies, and if you're dealing with an environment with thousands of resources, how do you consume that data and access that data? The democratization of the monitoring data is one of the biggest challenges that I see, so that's the first thing that we focus on and I think that most companies are focusing on.
The second part is basically what you mentioned, failure detection. So, if you look at scenarios, you have logged events that indicate failures, you have metrics that you should be looking at and you have configuration and tracing data as well. Each resource has a configuration that could indicate a problem, or could have a problem with it. So we automate the logged event listening. We use previous features to detect things like code exceptions, timeouts, configuration failures etc, across all of the resources. We're currently waiting for functions but we're also moving into everything that produces logs and is detecting things on the fly there, but we also have these four metrics as well, so we can automatically configure metric alarms and check for conditions that are suboptimal. Those could include API error, high delays in your queues, things like that. The second part is automating and making the alerts dynamic, and really checking for unknown failures that you don't anticipate. The third is you have all of those best practices, you have security, you have a well architected framework, and there's a lot of knowledge around how you should build serverless infrastructures.
If you're a team that's skating with serverless, or building on serverless, it takes really a lot of time and effort to actually build on the ins and outs of different services, and also the mindset of serverless. So we have a collection of curated rules and we work together with database teams to continuously check your infrastructure for security issues, compliance issues, ways you can improve your performance, or if you have some redundancy that you can actually optimize costs with. So basically, that's what we do, and I think that's a relatively interesting approach to operational excellence.
Q: How was Dashbird founded and how did you get involved in it?
Taavi: I've been a developer since I was 13 or 14. I worked at multiple startups in Estonia before; Estonia has a really cool startup landscape. Skype came from there, Transferwise, a lot of those companies that are quite successful right now. So when we were adopting serverless in 2015, you didn't have any tools in the market; there was no third party tooling. AWS sites were a science project, and we didn't have a lot of maturity around this technology. But we were hell bent on using serverless for everything, which made our team of ten people at the time go through all of the challenges ourselves.We had our own internal version of the serverless framework that worked exactly like the area service framework that got launched a year later, we had our own monitoring tools and CI/CD pipelines that we were building, and we went through this massive learning curve. But actually that’s what enabled us to be in this production environment with high loads in 2015 and 2016 and see all the challenges, but also see how well it actually works. I know that this is going to be the future. So that's how we ended up in this place, we understood that and we're ahead of the curve.
I think it was about May 2017 when we started seriously working. I think the week we put it live about sixty people joined, and we got this organic growth early on in that people were using it and coming back to it. So we saw that there was something. Initially the thinking was, you have a PMS like Docker or containerized environments, and you have something for Kubernetes. The next logical step at that time seemed to be functions, like it was going to be the next computer platform. That was the extent of our perception of the market. At that point, it was going to be another computer platform, but it wasn’t going to be a big shift. Later on we had our first five-hundred users, and constantly having those conversations we started to understand that it wasn’t actually the functions. It was this whole spectrum of different services that they were using. A lot of overwhelming questions were thrown at me, like, ‘okay it's cool that you're doing functions, but do those other things as well’. That's when the idea evolved.
Q: And the people that were on that project with you, where did they end up?
Taavi: Yes, so, the CTO and I are actually from the same company. We've been working together for seven years of course and on three different startups.
Q: When you were starting Gosper, did you have any outside funding initially?
Taavi: It was a side project. We were really serious about it, but we didn't have any funding until April 2018. So it took us half a year, we weren't even planning to raise money for a long time, but then it made a lot of sense. We were carrying customers who were struggling to build features and the opportunity was obvious at that point, so it made perfect sense.
Ryan Jones: I saw on the website that something like 7000 accounts are now hooked up to Dashbird!
Taavi: It's probably even more, because we don't update regularly!
Q:. So has seeing all these different people constructing surplus applications and building these out, has that changed the way that you thought about serverless?
Taavi: I think obviously from those 7000 AWS accounts, and even more people, there's different segments. So I think hardcore serverless chooses to build anything on serverless. There's a lot of people who are just learning or experimenting with it. There's lot of hybrid environments; you have Kubernetes or ECS, and then you have functions. I think that’s the other group that's interesting to us as they don't need to have lambda. You can use a different computer unit or platform, but still use the other managed services and still struggle with it. If you're looking at Fargate there's lambda, which will be the only computer platform of the future, but there will be other use cases where it's not good and the path will be diversified. I think what's absolutely the obvious trend is using the managed services for databases, queues, etc.
Q: How do you think about that when talking to customers; the AWS native monitoring will get you this far, but then Dashbird takes you further?
Taavi: So AWS is doing quite well with their monitoring solutions. It's really hard for them to feel like specialist tools, because if you look at services like cloud watch, every service connects to it and it’s their metrics into logs as well, or logs can go into s3 etc. It's hard to be something specific per serverless. It's kind of like buckets of metrics and buckets of logs and buckets of JSON data. That's not really being analyzed and made accessible in a meaningful way. That’s also one of the fundamental problems with logs is that you have, on the one end of the spectrum, services like elastic or log analytic services that keep the logs indexed with really no request storage. And on the other hand you have cloud watch and s3, where it's not indexed and it's not as searchable as easily and responsively. It's basically cold storage. The problem for managed services is that there's just so much data that it’s financially unreasonable to keep it in a hot environment, so you need to do some sort of in between solution where you curate what you're actually storing and what you're not storing.
I think there's a lot of fundamental challenges with general tools and there's a lot of different obstacles to overcome. That's one fundamental challenge that I see. The other is, even if you have those logs, metrics and have access to that data continuously running or checking that data for insights that you should be getting, there is a different story as well. So that's why we automate the alert coverage, dynamically manage alarms across the infrastructure and run our own checks against that they are, to understand if there's something you should be paying attention to or not. We sit on top of those services, instead of competing with them, and I think if you have a small scale service, you can actually do quite well with others.
Q: I'm using Dashbird and trying to make sure that my team is aware of different things that are happening. Are there any integrations they all have on the platform, do they go to slack or email, or have you found a certain medium that works best for teams?
Taavi: We do integrate with slack, email, web apps and SMS triggers as well. The overwhelming majority uses Slack. I think it's the tool of choice for a lot of teams. Microsoft Teams is also being demanded, so we're adding that integration soon. Another thing is JIRA, which we're looking into quite a lot of requests for. I think it's really important to have integrations. Google's essay Handbook says that over 70% of the used cases where people use a monitoring solution actually start with some sort of an alert, Kony to email or slack, and they actually click on it and end up in your application. So it's really going to integrate with whatever developers are using.
Q: How would you approach potentially too many alerts, too much data coming through?
Taavi: That's an excellent question actually. Our alert fatigue is a big problem. If you get overwhelming amounts of alerts all the time, you don't pay attention to any of them, which means that the important ones can actually slip through. What I recommend is identity. If you're going back into the infrastructure diagram and looking at your infrastructure. For instance Java API only monitored the API endpoints for long latency and error rates. If those two are fine, then you probably don't have a problem. If there's a problem with those, then you should get an alarm, but you shouldn't get an alarm if it comes to e-service pays for example. That's one place to start with; identify the ones that would directly affect the user and only monitor those. Be explicit with those, and the notion that if a thing ends up as an alarm to your email, then you should react to that. You should be really strict in that only alarms that you’re actually going to react to should end up in your inbox. For other things, for instance, if you connect Dashbird right now you will have hundreds of different actionable things that have actually changed. Obviously, those should not end up in your inbox, they should be aggregating out to reports or lists or something else that you can look at when you decide if you want to or not.
Q: Thinking more broadly about serverless, monitoring and observability, how do you see this changing as we look out towards 2023?
Taavi: In the future more and more people will use infrastructure as a service and really start neglecting managing their own infrastructure and things like that. It won't just be AWS; I think in the future you will have a wide variety of managed services that you will use. There's this notion of observability; the term itself means that you should be able to tell the internal state of the system by looking at the external outputs of that system. And so I think all of the services of the future will follow that paradigm. You won't need to attach any agents or any runtime things to it, it’ll just produce all the data for you to understand or have the ability to understand the internal state of all of those components. And where we see Dashbird and monitoring, providing all of that infrastructure and connecting with those services, not just displaying the data but transforming that data into meaningful information. If you're looking at metric data, log data or a managed MongoDB cluster for example, zero integration or whatever the monitoring platform is, should first centralize that across all the different managed services and then transform that data into meaningful standard. That helps you understand what you can do to make it better. I think it's the abstraction layer of that output data that's being visualized in an understandable way. I think it plays into the notion that developers should focus on the customer and creating differentiated value. The operational side should be undifferentiated and somebody should take care of that also.
Q: How can people get a hold of you? Do you have anything that you want to promote?
Taavi: Yeah, you can nap at us on twitter. @TodayMcGee is the handler, or caveatexpert.io. We have quite a lot of things always coming out. We're integrating with new services basically every month and adding more exciting stuff. The thing coming in October will be support for everything that looks in AWS and filtering for that. I think that will be quite exciting and will open up a lot of possibilities for more detailed monitoring over different services.