Hi, I’m Matt 👋
I use data & technology to create a love driven world
I create reusable data pipelines using Python
I solve problems differently
I squash blockers before, during, and after modeling
I am a leader at the core
I leave teams better than when I first found them
Want to chat? You can email me at matthew@wimagine.org
For a quicker peek at what I’ve done, you can check out my resume.
This resume might highlight the core skills and responsibilities I had, but the rest of this page highlights stories where my technical skills intersected with my problem solving and teamwork skills.
And these are the traits I try to bring everywhere I go as a data scientist, with convictions to leave the world better than I found it.
My Go-To Tech Stack
Python
My programming language of choice since 2014 to connect my domains of data engineering, data science, and data analytics.
Bash/Zsh
I’ve been living in a terminal since 2014 to run my tasks, set containers up, remote into machines, and more.
Git
My version control platform of choice since 2019 to facilitate team collaboration
Programming Languages
Python
C/C++ (Currently Not Used)
MATLAB (Currently Not Used)
R (Currently Not Used)
Python Skills
Biology: Biopython
Data Processing: MissingNo, Pandas, & Pickle
Data Visualization: Bokeh, Matplotlib, Seaborn
Deep Learning: Keras, PyTorch, Theano, & TensorFlow
Game Theory: openai & retro
NLP & Language: Beautiful Soup, NLTK, Prodigy, & Spa.cy
Math: Numpy & Scipy
Machine Learning: Scikit-learn
Parallel Computing: Dask
Story Telling: Jupyter Lab (on device and deploying on server)
Web: Flask & Requests
Operating Systems
Linux (since 2005)
macOS (since 2006)
Windows (since 1996)
Other Skills
Agile: Jira
Automation: Crontab and Jenkins
Cloud Computing: AWS (EC2, Lambda, Lightsail, S3), DigitalOcean, Domino, and Microsoft Azure.
Containers: Docker and Kubernetes
Databases: AWS DynamoDb, ElasticSearch, MySQL, SQL, SQLite, and Postgres.
GPU: Cuda
Notes: Markdown
Terminal: Bash, Csh, Tcsh, & Zsh
Version Control: Git (GitHub + GitLab), Subversion, & ClearCase
Web: HTML/CSS, Bootstrap, Google Analytics, and WordPress
I create reusable data pipelines using Python 💻
Instead of following the typical culture of re-creating a new ML pipeline for every project and/or dataset I work with, I make an effort to build on existing pipelines and expand their capability for my new projects/datasets, whenever feasible.
Collecting data from humans is always a messy time, because humans are inconsistent, which always resulted in challenges when we had to store the data. So, I created and manage a data engineering pipeline to extract, process, and load data from wearable sensors. This Python forward pipeline is 3+ years old and has been used on 4+ projects since, capable of processing 5+ datasets either locally or on AWS.
Several projects I’m on had needs for Reinforcement Learning and/or Game Theory modeling, with different pain points, and yet, similar needs. As a result, on my own initiative, I created and manage a RL/GT pipeline that is Python and Docker forward. This pipeline leverages the core OpenAI API, but adds scaling capabilities via Docker and custom gaming support, so that a game can transfer learn from other types of games.
Regardless of how big or small the codebase is, these are my standards:
Clear OOP Class Hierarchy
Self-Documenting Code
Docstrings & Comments
Test Cases
Code Reviews
I solve problems differently 💡
We wanted to get involved in detecting bias from ML models but this was becoming a more well researched space. So our team worked from the opposite direction, trying to find ways to detect cognitive bias in Git Commit Logs & Code Comments.
We got seedling money to annotate Git Commits & Code Comments for their cognitive biases. I then led a small effort to create an NLP model to read those messages and try to identify if that message was biased. This was easier said than done because programming artifacts like those don’t tend to follow English’s grammatical and linguistic expectations.
We ultimately concluded that while the model struggled at identifying the specific type of bias, just like its fellow human annotators, both the models and humans were able to conclude when a message was biased.
I squash blockers before, during, and after modeling 👟
I’m a full stack Data Scientist, having my hands in data engineering, data storage, feature generation, running models, the cloud, and all of the gotchas one could have in between. By having experience in all of these domains, I’m able to quickly identify and triage issues, even if I wasn’t planned to do so originally.
We were creating models that predict a person’s activity when I realized these deep learning models were tripping over a particular activity. In its defense, this activity could easily be mistaken for something else and we also had very little data for that activity. So to do my due diligence, I independently collected data of me doing that activity, ran that through the models, and was able to suggest that our models could be capable of predicting that activity if we had more data from more users of this activity.
A common theme in my workplace is they want to use cloud but for one reason or another aren’t always able to get access to the complete power of the cloud. In my work, I’ve deconstructed how cloud technologies work so I can deploy it in-house using the (often restricted) tools available to me. One project had access to AWS S3 but not its databases, so I had to deploy the databases on S3 myself. And on another project, there was no internet at all, so I had to creatively pair multiple machines together to create a makeshift cluster.
In my R&D Journey of trying to answer questions, I had the opportunity to implement these models
ML: Regression Modeling
ML: Classification Modeling
ML: Unsupervised Learning
Reinforcement Learning (MA-DDPG)
Deep Learning (TimeGAN)
I am a leader at the core 🧭
On practically every team I’ve been on, I’ve built a reputation for myself as a technical leader. Even though I’m not officially titled as a leader (for personal reasons), I bring my best engineering and leadership skills to the table, for every project I’m on.
Led Scrum Meetings as a Scrum Master
Provided technical direction to make sure we’re delivering what our customers need
Resolved blockers that originated both within the team and/or externally
Presented outputs and status to customers
Partnered with team members who might be struggling proactively
I leave teams better than when I first found them 🤝
I don’t just do my job, I energize and enrich the team because I genuinely believe that people are at the heart of innovation. I bring my best, I’m intentional about being present and helping people as soon as they ask, I’m honest about the mistakes I’ve made so others don’t repeat them, and I energize the office culture with fun and energy.
Many of our newer employees were being tripped up by my Corporation’s security restrictions for IT, which blocked many of the steps they are asked to do to get onboarded onto their projects. So I took initiative to document my onboarding & technical notes on a team wiki. Since then, the entire team has pitched in and it has went from a 1 page wiki to a 12 page wiki, now used and updated by several teams across the corporation.
During the pandemic, there was a lot of consternation about the direction and unity of our team. Even though I was an engineer, I independently began thinking of ways the team could be re-aligned so that everyone has a clear business and technical objective, thus providing redundancy should one of those areas become volatile. I proposed my plan to our leadership team and it is now the new organization scheme for the group.
Also during the pandemic it became clear that people were losing touch with each other and started feeling more isolated. In response, I created an after hours mailing list where I’d host virtual events and outdoor in person gatherings, to try to keep the team together. This helped built a new office culture where our remote members feel more included even before the pandemic and our in-person members feel more unified, even as we enter a more hybrid schedule.