Many people working and researching on machine learning don’t have “profound” knowledge in software development, since they transitioned from other fields. Of course, machine learning is much more than software development, however, I am convinced that good foundations thereof will largely benefit you in the long run.
With this blog post, my goal is to hint you to some great resources, that will hopefully quickly boost your software development skills. If you already have a lot of experience programming, then of course most of the content will at least sound familiar to you. However, I think that you will still be able to learn some new stuff.
Where to start?
I think a very good start for self-taught programmers is the missing semester course of MIT. This course is a quick overview of the most important tools, that can make your life much easier when coding. The course does not aim to cover every topic presented in detail, but instead tries to give you an overview of what is out there, so you can remember and utilize these tools whenever you feel the need.
The course website has a list of all the topics that are covered, which I summarize quickly here:
- Course overview + the shell: Motivation, introduction to the course and an introduction to the shell. Basic commands, streams and piping are covered.
- Shell Tools and Scripting: Introduction to using bash as a scripting language and overview of some of the most relevant shell tools.
- Editors (Vim): Short and concise introduction on how to use Vim to increase your coding speed. Many resources to learn Vim are provided, including a basic config file. Note, that there is plug-ins for PyCharm, VS Code and even Word.
- Data Wrangling: Overview of useful tools for data wrangling and introduction to regex.
- Command-line Environment: Basics of job control (kill and cause processes, background processes) and terminal multiplexers (sessions, windows, panes), aliases, dot files and remote control.
- Version Control (Git): Starting with a motivation and an introduction on how Git works internally, basic commands are explained. They recommend the free book Pro Git to dive deeper into it and have many other helpful resources on their site.
- Debugging and Profiling: Overview on debugging, including logging and debug tools. In addition to that profiling tools, including visualizations and resource monitoring.
- Metaprogramming: Overview on build systems, dependency management, continuous integration and some notes on testing.
- Security and Cryptography: Introduction to symmetric and asymmetric cryptography with many important and intuitive examples.
- Potpourri: Keyboard remapping, Daemons, FUSE, Backups, APIs, Common command-line flags/patterns, Window managers, VPNs, Markdown, Hammerspoon(desktop-automation-on-macOS), Booting + Live USBs, [Docker, Vagrant, VMs, Cloud, OpenStack], Notebook programming, GitHub
- Q&A: Several specific and interesting questions on the topics covered in the course.
Note that, they provide great course notes with further resources and exercises to complement the lectures.
Getting closer to production
My current work is in applications of machine learning. Thus, I often do not try to reinvent the wheel, but instead try to refine, enhance and combine techniques. This often also includes different pre- and post-processing steps. Hence, I usually end up with a pipeline of processes, which I try to manage in a flexible and sustainable way. In the following, I shortly want to show you (one possible way) how to achieve both these goals.
Creating flexible pipelines
To make your pipelines flexible, it is recommendable to build it by using micro-services, which could easily be combined in any chosen way. Since most of my work is in Python and I like to use restful APIs, my framework of choice is the combination of OpenAPI and Flask.
The idea here is, to specify your API in a standardized format. After this is done, you can use the OpenAPI Generator to generate a code base that consists of a web application in Flask and Connexion that has the specified API. The only thing left to do now, is to implement your logic into the respective Controller class. Check out this tutorial for more details.
Creating sustainable pipelines
Every project has its dependencies on certain packages and libraries. Independent, of whether you develop your own code or utilize one of the many great projects published on Github, you will always need to meet certain requirements regarding dependencies. Virtual environments are of course one way to go, however, to allow the easy and fast usage of your code using Docker is often recommendable.
In essence, Docker is a tool to emulate any specific software set up (operating system, software versions, etc.) in an easy and efficient way. It is a great tool and widely used in different areas of software development. There are already numerous great resources to get started with Docker, so I only want to direct you towards some of them:
- Great introduction video that covers the motivation, usages and functionalities in a very concise and comprehensible manner.
- The official tutorial has several step-by-step examples and is also a very good place to start.
- Docker Hub is a great resource, where you find existing Docker containers of (almost) every base configuration you can imagine.
Writing flexible and sustainable code
Design patterns are a very important concept in software development. They are widely used and can improve the flexibility and sustainability of your code. Here, I just want to point you to an online course I really find useful. Check out design patterns by University of Alberta. Note, that this course only has a free 7 day trial, nonetheless I think it is very much worth checking it out if you are not familiar with the concept.
Improving your algorithmic thinking
Programming is just about problem solving. This of course, is not a software tool you can quickly familiarize yourself with, but getting better at problem solving requires a lot of practice. In my opinion, a good way to start is taking the (temporarily) free online course on algorithm design and analysis, which will introduce you to the ideas of dynamic programming and graph algorithms.
If you are already familiar with these concepts, then practice is very important. A great resource to practice your algorithmic thinking is leetcode. You can find numerous tasks there, you can solve them in many different languages and your code will be checked online. In addition to that, you get interesting statistics, such as a comparison of runtime and memory requirements of your algorithm with other submissions, after submitting a correct solution. If you are interested, you can check out this Slack where 3 different tasks are posted every day in 2020. They also discuss the problems and solutions, however you can also just check out the list of tasks and follow on your own speed.
Wrap up and disclaimer
Since I studied mathematics, I never had full-fledged computer science education and thus was looking for some online resources to help me brush up my software development skills. So the advice you find here just represents my nonexpert opinion. However, since I was (and am) probably in a similar position as you are, I can assure you that the resources I mentioned here helped me a lot. Unfortunately, I have to admit though, that I do not spend as much time as I would like on resources like leetcode.
Nonetheless, I hope you enjoyed reading this blog post and it helped you in one way or another. If you have any questions, feel free to contact me :-)