Two step process to showcase data science project as web application on AWS
I will start this article by thanking to Nicolás Metallo for sharing the code on github.
Why should I build web applications for data science
As data scientist, I often work with data on my own laptop (Jupyter notebooks, VS Code, etc). To demonstrate visualizations and machine learning workflows to stakeholders I often build a proof of concept app on my computer. It works well for the most part but the demo stops as soon as the meeting with screen sharing ends.
Very often your colleagues and stakeholders would want to interact with your app and give you feedback. This is not possible if you have code on your computer as jupyter notebook. Recently I have discovered an easy way to easily share data science apps as web applications on AWS. I will share my experience and some tips to make this process seamless.
If you follow the instructions you will able to host your app on AWS. It may be self contained app with plots or an app that uses other services in AWS such as Sagemaker, etc. With additional effort this POC app you may be able to use this web application as first version of application for simpler use cases.
Challenges
There are two challenges to converting your jupyer notebook into a web application
- Converting your notebook (jupyter notebook) into an app. The popular choices here are R with Shiny, Dash with Plotly and my recent favorite Streamlit. I have used Shny and Dash but I found Streamlit to be very intuitive and was able to build an app with minimal effort. It is also possible to share this app with Python users unless the app is complex (many libraries, using Sagemaker endpoint).
- Porting your visual app to cloud will give visibility to many stakeholders. It is possible to do this if you are an expert in cloud or have support from other cloud experts. I wanted to spent as little time as possible and reuse the steps for every app. It is possible to host the app on any cloud — I found an easy way for AWS by reusing CDK code in github.
Building the app
- Start with Hello World app and add the steps from your jupyter notebook
- Add checkboxes, user inputs to make the app interactive
- Use caching in Streamlit to speed up your app. The app will load data once into cache and is very responsive every time you interact with the page and the plots, etc have to be redrawn.
Porting the app to AWS
The instructions below will work for data scientists who know very little about building cloud applications. If I can do it, anyone can do it. Though these are for AWS, you may be able to take the docker file and make it work on other cloud platforms.
Setup:
Then follow the instructions in this github project. You will only need to change app.py and requirements.txt to suit your app.
There is also another project for 1-click deployment here but I did not need to use it.
Troubleshooting
If you run into trouble during CDK, check your aws profile setup.
I had to change couple of commands (added —-profile default
) to make it work. Here are the commands that worked for me.
export AWS_DEFAULT_REGION=”us-east-1"
export AWS_ACCESS_KEY_ID=$(aws configure get default.aws_access_key_id)
export AWS_SECRET_ACCESS_KEY=$(aws configure get default.aws_secret_access_key)
cdk bootstrap aws://unknown-account/unknown-region --profile default
cdk deploy --profile default8:54:25 PM | CREATE_IN_PROGRESS | AWS::EC2::Route | WebDemoVPC/PublicSubnet1/Def
aultRoute[█████████████████████████████████████▋····················] (37/57)
If everything works well, you will get link to webapp as URL at the end (shown in bold). Copy it and share it with your audience.
StreamlitWorkshop-cdk✨ Deployment time: 407.66sOutputs:
StreamlitWorkshop-cdk.WebDemoServiceLoadBalancerDNS9E1356A5 = Strea-WebDe-XXXXXXXXXXXXXXX–NNNNNNNNNNNNN.us-east-1.elb.amazonaws.com
StreamlitWorkshop-cdk.WebDemoServiceServiceURLF46EEB7B = http://Strea-WebDe-XXXXXXXXXXXXXXXXX-NNNNNNNNNNNN.us-east-1.elb.amazonaws.com
Stack ARN:
arn:aws:cloudformation:us-east-1:NNNNNNNNNNN:stack/StreamlitWorkshop-cdk/XXXXXXX–98d8–11ec-a059–0abee6b5c353✨ Total time: 410.29s
Tips:
- Change cdk/cdk_stack.py to make changes to EC2 config and other options (choose instance size based on compute power needed in your app)
- Start with very simple “Hello world” Streamlit app. The github project is complex project and it is unlikely to work.
- Then you can replace the simple app to your own Streamlit app.
- When the setup works for you, note down the commands that worked for you so that you can reuse them next them. This helped me a lot and next time I took minutes instead of whole morning.
Note:
The setup steps (AWS CLI, npm, etc) on your local environment are not needed next time you decide to host the same app or another app on AWS. It will only need one command and your app is up and running in minutes.
I was able to port my app using this method with low cost (< 2 $ per week). I also shut down all resources with one command.
Conclusion
Demonstrating data science applications as web application to stakeholders has many benefits. When you have a concept ready, build a web application using Streamlit. Host in on cloud and gather feedback.