Computer vision is a very active area of research. The progress in this field is really fast and very exciting. There are many great resources online, you can definitely benefit from. However, since the amount of resources is huge, it is also difficult to find a starting point. This post targets people interested in machine learning and computer vision, who didn’t have deeper education on the topic and want to know where to start. I do not try to give a complete overview, but intend to present useful resources that I personally recommend or were recommended to me.
Solid mathematical foundations are not always needed for the sole application of computer vision and machine learning techniques, however, the deeper you want to dig in, the more they will benefit you. Areas of particular interest are linear algebra, calculus, optimization and probability theory.
The Linear Algebra lecture by Gilbert Strang is very well known. In addition to that, I think the explanations and visualizations by 3Blue1Brown are very intuitive and give great additional insights and also Khan Academy has very helpful resources on Linear Algebra. For calculus, I recommend also checking 3Blue1Brown and Khan Academy to brush up your knowledge. The MIT course Introduction to Probability is a very good resource to freshen up your knowledge, but also to start learning. It starts with the basics and the provided explanations are very comprehensive. Mathematical details are presented where necessary, however, very difficult and tedious proofs are skipped. Also check out the two free books Mathematics for Machine Learning by Garrett Thomas and Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong.
Basic knowledge in programming is also very important. For more information on this, check out my blog post and the resource section at the end of this post.
For many applications, classical methods (not associated with deep learning) are still your best option. In addition to that, many important ideas originated exactly there and it is crucial to grasp at least the fundamentals. The Machine Learning course by Andrew Ng is a great point to start, if you have no prior knowledge. Note that this course already covers the basics of neural networks, however, focuses on classical techniques (regression and support vector machines). In addition to very comprehensive explanations, this course provides you with programming exercises that can be checked online.
There are many other great resources and small explanatory videos revolving around numerous methods. If you want to learn more, I recommend to start exploring using Christopher Bishop’s book Pattern recognition and machine learning, which has been made freely available recently, and to check the resource lists I provided below.
A comprehensive introduction to the image acquisition process, projective transformations, pose estimation and multi-view geometry can be found in the Robotics: Perception course (also as YouTube Playlist in Chapter 4). I highly recommend this course to get started with concepts in “classical” computer vision. To dive in deeper, check out the book Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman. It is a great resource to start exploring the area.
To get started with deep learning, check out CS230 Deep Learning. If you prefer a more hands on approach, I highly recommend the book Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron. The book itself is not free (but a very good investment), but you can check the Colabs of the book for free.
If you want to learn deep learning for computer vision, the Stanford course CS 231N: Convolutional Neural Networks for Visual Recognition is very popular and a great introduction to CNNs. Justin Johnson (one of the lecturers of the Stanford course), recently published a new course EECS 498-007 / 598-005 Deep Learning for Computer Vision at the University of Michigan. The courses are similar, however, the content has been updated (e.g. transformers) and important additional insights are provided (e.g. back prop with matrices). I highly recommend both courses.
For more advanced topics check out Advanced Deep Learning for Computer Vision from the Technical University Munich and Deep Unsupervised Learning from Berkeley for example.
Robotics is a very exciting area, where computer vision plays a crucial role. Approaches can be deployed and tested in the real world and create concrete benefits. There are also excellent resources in this area, for example:
- Modern Robotics Specialization: Mechanics, Planning, and Control
- CS 287: Advanced Robotics
- Reinforcement Learning Specialization
After reading all of this, you now need to decide what you want to focus on. I hope on your way the resources I provided will turn out to be useful, but make sure to check the resource list below and also my blog post on machine learning research resources. Have fun!
To find more resources, you can check out the following
- Deep Learning Drizzle: maybe the most complete list of deep learning courses online
- [D] Advanced courses update: Reddit topic on advanced machine learning courses
- AI Curriculum: short list of course recommendations with summary for each course by Machine Learning Tokyo
- fast.ai, deeplearning.ai and PyImageSearch are great places to learn
Some other resources you might find helpful are
- Interactive Tools for ML, DL and Math by Machine Learning Tokyio
- PyTorch image models by Ross Wightman
- Free Springer Books (COVID special)
- Deep Learning Models by Sebastian Raschka
- CNN Implementations by Machine Learning Tokyio
- The Matrix Calculus You Need For Deep Learning by Terence Parr and Jeremy Howard