Should You Become a Python Developer before Getting into Machine Learning? - Joe Rideg

Introduction

Let assume that you decided to work on becoming an engineer or scientist in Machine Learning. Great!

You are already over the analysis paralysis, whether you should pick Python or R for this purpose, and you chose Python wisely. Well done!

You have been working through an arbitrary chosen online Python course for a while already. As a result, you understand more and more fundamental concepts of the language. Data types, control flow statements, loops, iterations, lists, tuples, sets, and dictionaries are already topics you know a bit. You made a couple of exercise codes to gain experience with them.

After a while, you might wonder how much of Python is enough to jump into Machine Learning? Then, to start small pet projects, play and experiment with some data. Let’s see the two basic directions, where you could go after you finish your introduction phase of learning Python!

1. Becoming a junior Python developer first

The fundamental concept behind this path is that you need solid Python skills first. Basic knowledge of the language that you have already gained is beneficial but not enough yet. Therefore, you should work further beyond the basics and learn about more advanced topics related to Python and software development in general.

a) Object-oriented concepts (instances, constructors, class attributes, getters and setters, inheritance, polymorphism, composition)

b) Databases in Python (SQL)

c) Generators, comprehensions and Lambda Expressions

Only when you already have a good understanding of Python and some experience with coding exercises, that’s when you start to refresh math, download some data and do small projects.

It is reasonable to estimate that 80% of a typical machine learning project is related to data cleaning. To perform it effectively, you have to be aware of these more complicated, not so data science-related topics. Or, as a mid-step in your journey of becoming a machine learning engineer, you are ready to work as a Python developer, knowing that it is only a mid-station in your transition to machine learning and data science.

2. Becoming a Junior Data Scientist / Machine Learning engineer as soon as possible. The rest will come later.

Jumping into data projects as quickly as possible – and you will pick up the rest of the necessary skills on a deeper level later.

In this case, of course, your main goal is not necessarily to become a professional Python developer. Instead, you can gain real experience by working on your small projects. If you face a problem, you will look for a quick solution at Stack Overflow, then fix it. There is no fancy code, complicated background or theory behind your projects. Instead, you solve a series of minor, project-related problems in a “quick and dirty” way for your actual project.

Yes, your code is probably not so sophisticated. Just “good enough”. You pick up the most important skills related to packages like NumPy, PANDAS, Matplotlib, Tensorflow, Keras, PyTorch, Scrapy, OpenCV and Jupyter Notebook. Depending on your interests and your projects.

And then use a machine learning model, some dataset from Kaggle, and jump into hands-on experience in this field.

Furthermore, you work through books like “Data Science from Scratch” by Joel Grus, or “Hands-on Machine learning with TensorFlow” by Aurélien Géron.

The advantage of this approach is that you start to work on small projects as soon as possible. You will develop a reasonable assumption, which skillset is necessary for your work from Python and which is not so important. You can build your portfolio at GitHub with small projects. You can build up some essential experience and assumptions about the proper setting of hyperparameters for your machine learning model. So you can prove that you are capable of solving data-related problems, and if you share your codes and write blog posts about them, you offer evidence for anyone interested in those skills.

On the other hand, you might apply certain features of Python in a not very elegant or intelligent way. Maybe your code is a little bit longer and messier than it should be. As a worst-case scenario, you might realize that Machine Learning is not your cup of tea. Still, you don’t have yet a sufficient knowledge of Python to make your transition into software development if you are from a different field.

3. What about the necessary level of Mathematics?

Well, it’s a tricky question. There are certain phases of Machine Learning workflow when theory might not be crucial -yet. (like collecting data, cleaning it, etc.). But in the long run, you will have to refresh or learn Linear Algebra, Calculus, Probability and Statistics. You are supposed to know what is going on under the hood of a model. Based on my experience, knowing and understanding Mathematics could mean four levels:

-You read some book or notes and think that you understood the material as well immediately. It is a comfortable but not very probable scenario for most of us, apart from exceptional, intelligent students. Or you grab a cheat sheet about the most important topics and formulas.

-You watch a series of lectures, and you pause and rewatch them if you don’t understand something until everything becomes clear.

-You not only consume an online course, but you make notes as well. Not necessarily to use it later. Instead of wiring those formulas into your brain as well, not just passively listening or watching content.

-You do all the steps above, and you take one step even further: apply that fresh-learned theory to some exercises, performed with pen on paper, using your brain and more in-depth knowledge of abstract concepts.

-You solve so many simple examples that you can solve easy problems, applying that theory. But not only without mistake, but even relatively fast. (It’s like training for a challenging Mathematics exam). This level is probably not necessary for applying Machine Learning models.

So you don’t have to be a perfectionist. Still, you might consider choosing the proper level of knowledge that you need for Machine Learning (my educated guess would be watching lecture videos and making notes, but not necessarily solving examples by hand).

Conclusion

Both paths are reasonable, with their advantages and disadvantages. If you are a bit impatient, the second strategy could work better. On the other hand, if you tend to be a little bit maximalist, you might prefer to stick to certain learning phases. You don’t want to rush it or to go through Python on a superficial level. Nevertheless, both strategies are converging to a productive, valuable, demanded skillset of Machine Learning in the long term. The only mistake that you can make is simply giving it up too early, losing your motivation or just starting to procrastinate more and more until you run out of time.