Blog

How Streamlit will help you get your machine learning products used

01 Aug, 2022
Xebia Background Header Wave

Moving data science models from a Proof-Of-Concept (POC) in some notebooks to a Minimal Viable Product (MVP) that provides business value can be a tough transition. This can have a multitude of reasons:

  • The amount of engineering work required to get your model integrated into existing systems.
  • The slow (or nonexistent) feedback cycle with your users, resulting in a very slow cycle time for the last few remaining tweaks that need to be done.
  • Lack of trust by your users in your model, resulting in your amazing model remaining unused.

We’ve been there ourselves: spending a lot of time on productizing a data science application, only to find out that our users didn’t like how it turned out and did not want to use it…

Even though these problems are largely a problem on the team process side of things, we see that there are tools that could help you mitigate or reduce the impact of some of them. One of them is Streamlit, which is a simple, Python-based, dashboarding tool.

In this blog post, We will highlight three uses of Streamlit that have helped us productionize machine learning models and move from POC to MVP quickly. In addition, we give you 3 tips that help you get started on building these dashboards.

The three ways Streamlit can be of use for moving from POC to MVP

1. Streamlit as prediction serving tool

If your users will interact with your model through the means of graphs or manual interpretation of its predictions, a Streamlit dashboard can be a valuable tool for prediction serving.

In some of our use-cases, there was a trade-off between integrating the predictions into some existing react-based web-app (which would require involvement of front-end engineers), or into a new Streamlit dashboard.

Even though the Streamlit app would need to be newly set up for this, it still resulted in large time savings; data scientists were able to iterate on the product and visualisations without any need for software engineering capacity. This made the feedback loop with the end-user ever so fast. Once the first users and data scientists were happy, and it was time to move away from an MVP, the predictions could be integrated into the final product.

2. Streamlit as a monitoring tool

During the development cycle of your model, data scientists, together with the users, spent time developing evaluation metrics for the model. Before the model goes live, these metrics are used to check if the model is fit for performance. However; the evaluation does not stop there. As soon as you’re moving towards the productionizing phase of your model (or even earlier) you need to start thinking of- and set up the monitoring for your model.

Streamlit can be an excellent tool to start off with. The ease of creating meaningful visualizations allows you to create dashboards that can be used not only by the data science team to monitor but could also be used by the users. Allowing the users to interact with some form of monitoring dashboard can help them understand the model better & helps with building trust in the model.

3. Streamlit as a means of giving model insights

Taking the previous concept even further; you can set up a dashboard that gives more insight into the model than just some pre-defined performance characteristics. You can add visualizations about the input data or features that your model is currently making predictions on. You can even add information about different forms of model drift. These things can welcome a user into actively using your model. This builds up trust through doing so. Even better, it might help involve the user in the debugging process for your model. This could create a fast feedback cycle that helps you and your users iterate on your model at lightning speed.

Why Streamlit makes all of this so easy

The core benefit of a tool like Streamlit is that it is Python based. Many of the insight graphs, evaluation graphs, and prediction graphs have already been created by data scientists during model development. This usually happens in the form of Jupyter notebooks. Moving the code to generate these visualizations into a Streamlit dashboard requires little work. Streamlit allows you to re-use any Python code you have already written. This can save considerable amounts of time compared to non-Python based tools where all code to create visualizations needs to be re-written.

Streamlit focuses on simplicity and accessibility for deploying machine learning models, but details about its security and privacy controls aren’t extensively covered in typical documentation. Generally, Streamlit includes features such as authentication and role-based access controls which help in managing data access, but organizations dealing with highly sensitive data might need additional layers of security measures.

Regarding scalability, Streamlit is effective for quick prototyping and small to medium scale applications. However, it may face performance issues as the user base grows or data volume increases because it isn’t primarily designed for high concurrency or vast datasets typical of large-scale enterprise environments.

Integration with other machine learning tools and environments is a strong suit of Streamlit. It is built with Python, which allows it to seamlessly work with numerous libraries and frameworks in the Python ecosystem, such as TensorFlow, PyTorch, and scikit-learn. This makes it a versatile tool for integrating various aspects of a machine learning workflow into an accessible dashboard.

Excited about building your own dashboard? 3 tips to get started

1. Try out the infra setup first, before committing to building a full dashboard

We have mentioned many times that Streamlit makes it simple to show your data to users. This requires you, however, to have some location to run your dashboard on that can be accessed by your users. In our case, we already had Kubernetes cluster with all networking in place. This already existing infrastructure made it very easy to spin up a Streamlit deployment next to our other web-apps. It is likely, however, that you don’t just have a ready-to-use Kubernetes cluster lying around. In that case, there are still a lot of options available to deploy your solution. A handy collection of options & tutorials on how to do so can be found on the Streamlit forum.

2. Keep it simple & know when to move on

Streamlit is a great place to start putting your first predictions live and get interactions with your users. Although it does offer a lot of customizability, you’re bound to run into some kind of wall when you push it too far in making it very interactive & visually appealing. We recommend to use it as a tool during the POC and MVP phases of a project, but keep in mind that you might have to move on after that to build out your functionalities. Building a Streamlit app doesn’t require a large time investment by itself, and we recommend not making this investment any larger than it needs to be.

3. Build a hello-world Streamlit template for your organization

Getting an empty dashboard running on infrastructure might be the most difficult part of building a Streamlit app. If you’re working in a data science enabling team & find multiple data science teams wanting to use Streamlit dashboards, it’s a great idea to provide them with a template on how to do so (such as a cookiecutter). Such a template should take care of some of the engineering parts (Dockerfiles etc.), and should document all steps to getting an app live. This saves frustration and time data scientists would rather spend on filling in the dashboard with valuable insights for their users.

Conclusion

We think that a Python-based dashboarding tools such as Streamlit can be a fantastic way to get results & insights in front of your user without a large development investment. In turn, this can help you speed up the iteration speed of your ML applications and help you create value with AI faster.

Are you interest in how we used Streamlit in practice at one of our clients? Check out our recorded talk at PyData Berlin 2022.

Daniel Willemsen
Daniel is a Machine Learning Engineer at GoDataDriven. He focuses on helping teams move their data science use-cases through the entire machine learning life-cycle: from ideation to production and maintainance.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts