Every Machine Learning Pro knows these Concepts — Learn these & Land that Job!

ML Ideas & concepts I want you to learn before you apply for any interviews (These will surely increase your chances!)

Karan Kaul | カラン
Python in Plain English

--

machine learning job tips & tricks
Image by fszalai from Pixabay

After working in the field for a while, I have gathered some understanding of what is important & what is required to excel in this vast & amazing field.

In this article, I will share some concepts & tools that professionals love & that according to me are crucial for getting into Machine Learning roles or for advancing your career in it.

First, Some Libraries 🤓

1. Pandas

Pandas is a fast, powerful, flexible and easy-to-use open-source data analysis and manipulation tool, built on top of the Python programming language.

Pandas is used everywhere & there is just no way around it. From elementary data analysis tasks to building & training ML models, having a good command over Pandas is absolutely essential to achieve success.

2. sklearn

scikit-learn is a Python module for machine learning built on top of SciPy.

It basically lets you perform any type of ML task, be it visualizations after dimension reduction, pre-processing your dataset or testing the performance of your model after you are done training.

It is the go-to library for any Machine Learning work you might want to do, you should definitely check it out.

3. Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

In any given project, especially at your workplace, you might need to demonstrate your findings or inferences to your team or to your boss.

Without a good knowledge of visualizations & different ways to represent your data, the hard work you did to uncover those inferences will reap no benefits because you are unable to present them.

Libraries like Matplotlib make it easy for us to create beautiful & informative visualizations in just a few steps.

There are many other libraries & tools you can use for visualizations but Matplotlib is one of the oldest & the most reliable ones. Some other libraries you can explore are seaborn & plotly.

Now Some Tools & Websites 🤗

1. Jupyter Notebook/Lab

JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning.

In a traditional IDE, when we execute a program, we execute the whole file or the whole code. In Data Science projects, however, we don’t follow this type of coding paradigm. Sometimes we need to execute certain parts of code separately based on the analysis or requirements.

Imagine if one specific part of your code takes 30 mins to execute. If you make changes to the other parts of your code & then run the file in a normal environment, all of your code will execute at once & that unchanged part will also execute again, taking 30 mins of extra time.

Using notebooks, we run cells(blocks of code) individually, which reduces the redundant execution time & we can check our outputs at each step.

(Note — Nowadays, we can also use this cell format or notebooks in many IDEs as well)

2. Google Colab

Google Colab, or ‘Colaboratory’, allows you to write and execute Python in your browser, with — Zero configuration required, Access to GPUs free of charge, Easy sharing.

It is similar to the notebooks we learnt about in the previous point. The difference here is that these notebooks will run on Google’s end, meaning you don’t need to worry about setting up a local machine with capable hardware.

You even get free access to GPUs for processing larger ML models, which is awesome!

3. Regex101

Regex101 is a website that allows you to compose & test regular expressions.

In many ML tasks, we employ a cleaning phase that is responsible for making sure that our data is clean & ready for further analysis. In NLP, many times cleaning the data means removing certain text, symbols, numbers or any other non-important pieces from within the data.

Sometimes these can be removed just by logic, but more complicated cleaning procedures will require some kind of expression to be composed for it.

build python regex using regex101
regex101 UI

Regex101 will help you compose expressions & test them on given examples. It gives you a list of all possible tokens that you can use & what they mean. Once your regex matches any part of the given text, it will be highlighted & a lot of information will be presented to you.

Once you are happy with the results, you can copy this regex & employ it in your code.

4. Spacy POS/Dependency Visualizer

This website allows you to visualize the different POS/dependencies within the given context.

In NLP, we make use of dependency/POS analysis to figure out the POS (parts of speech) tags for phrases & the relationship words have with each other within the given context.

Depending on the context you provide, this website will present the POS tags for different phrases & the dependency between them.

visualize POS tags & dependency using displacy
displacy webapp UI

You can also choose to not use the “merge Phrases” option & it will then output the POS tags for each individual word.

(Note — You can make this using the displacy feature from the spacy library in your code as well. Checkout the small code sample below.)

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
displacy.serve(doc, style="dep")

Finally, Some Concepts ⭐️

1. Ideas Behind Different ML Algorithms

You do not require an in-depth understanding of all the ML algorithms to get started. However, you must know the differences between the different algos & when one would be preferred over the other.

Algorithms I suggest to start with:

  1. Linear Regression (of course)
  2. SVM (very interesting ideas behind it, also read kernels)
  3. Decision Trees & Random Forests

2. Various NLP techniques & ideas

There are a lot of things that come under the umbrella of NLP. Some key ideas that I can think of right now are:

  1. Stopwords & lemmatization (what it is & how we deal with them)
  2. Text Embeddings (why we need them & different methods to get them)
  3. Topic Modeling & Clustering textual data

3. Comparing Results & Analysing Performance Metrics

After you get some ML algorithms under your belt, it is time to measure their performance.

Comparing the performance of each algorithm will provide you with deeper knowledge of how a given algorithm performs on certain types of data, and what the pros & cons of each algorithm are. Once you have this experience, you will be able to effortlessly pick the best possible algorithm for any given situation.

Some key metrics you should focus on are:

  1. Accuracy
  2. Precision
  3. Recall (Sensitivity or True Positive Rate)
  4. F1-Score
  5. Mean Squared Error (MSE)

That is all for this post.

Please drop some claps & comment if you have any queries. Thanks for reading! 💚

Recommended posts —

In Plain English

Thank you for being a part of our community! Before you go:

--

--