How to Implement a Machine Learning Algorithm

Implementing a machine learning algorithm in code can teach you a lot about the algorithm and how it works.

In this post you will learn how to be effective at implementing machine learning algorithms and how to maximize your learning from these projects.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

Benefits of Implementing Machine Learning Algorithms

You can use the implementation of machine learning algorithms as a strategy for learning about applied machine learning. You can also carve out a niche and skills in algorithm implementation.

Algorithm Understanding

Implementing a machine learning algorithm will give you a deep and practical appreciation for how the algorithm works. This knowledge can also help you to internalize the mathematical description of the algorithm by thinking of the vectors and matrices as arrays and the computational intuitions for the transformations on those structures.

There are numerous micro-decisions required when implementing a machine learning algorithm and these decisions are often missing from the formal algorithm descriptions. Learning and parameterizing these decisions can quickly catapult you to intermediate and advanced level of understanding of a given method, as relatively few people make the time to implement some of the more complex algorithms as a learning exercise.

Practical Skills

You are developing valuable skills when you implement machine learning algorithms by hand. Skills such as mastery of the algorithm, skills that can help in the development of production systems and skills that can be used for classical research in the field.

Three examples of skills you can develop are listed include:

Mastery: Implementation of an algorithm is the first step towards mastering the algorithm. You are forced to understand the algorithm intimately when you implement it. You are also creating your own laboratory for tinkering to help you internalize the computation it performs over time, such as by debugging and adding measures for assessing the running process.
Production Systems: Custom implementations of algorithms are typically required for production systems because of the changes that need to be made to the algorithm for efficiency and efficacy reasons. Better, faster, less resource intensive results ultimately can lead to lower costs and greater revenue in business, and implementing algorithms by hand help you develop the skills to deliver these solutions.
Literature Review: When implementing an algorithm you are performing research. You are forced to locate and read multiple canonical and formal descriptions of the algorithm. You are also likely to locate and code review other implementations of the algorithm to confirm your understandings. You are performing targeted research, and learning how to read and make practical use of research publications.

Process

There is a process you can follow to accelerate your ability to learn and implement a machine learning algorithm by hand from scratch. The more algorithms you implement, the faster and more efficient you get at it and the more you will develop and customize your own process.

You can use the process outlined below.

Select programming language: Select the programming language you want to use for the implementation. This decision may influence the APIs and standard libraries you can use in your implementation.
Select Algorithm: Select the algorithm that you want to implement from scratch. Be as specific as possible. This means not only the class, and type of algorithm, but also go as far as selecting a specific description or implementation that you want to implement.
Select Problem: Select a canonical problem or set of problems you can use to test and validate your implementation of the algorithm. Machine learning algorithms do not exist in isolation.
Research Algorithm: Locate papers, books, websites, libraries and any other descriptions of the algorithm you can read and learn from. Although, you ideally want to have one keystone description of the algorithm from which to work, you will want to have multiple perspectives on the algorithm. This is useful because the multiple perspectives will help you to internalize the algorithm description faster and overcome roadblocks from any ambiguities or assumptions made in the description (there are always ambiguities in algorithm descriptions).
Unit Test: Write unit tests for each function, even consider test driven development from the beginning of the project so that you are forced to understand the purpose and expectations of each unit of code before you implement them.

I strongly suggest porting algorithms from one language to another as a way of making rapid progress along this path. You can find plenty of open source implementations of algorithms that you can code review, diagram, internalize and reimplement in another language.

Consider open sourcing your code while you are developing it and after you have developed it. Comment it well and ensure it provides instructions on how to build and use it. The project will provide marketing for the skills you are developing and may just provide inspiration and help for someone else looking to make their start in machine learning. You may even be lucky enough to find a fellow programmer sufficiently interested to perform an audit or code review for you. Any feedback you get will be invaluable (even as motivation), actively seek it.

Extensions

Once you have implemented an algorithm you can explore making improvements to the implementation. Some examples of improvements you could explore include:

Experimentation: You can expose many of the micro-decisions you made in the algorithms implementation as parameters and perform studies on variations of those parameters. This can lead to new insights and disambiguation of algorithm implementations that you can share and promote.
Optimization: You can explore opportunities to make the implementation more efficient by using tools, libraries, different languages, different data structures, patterns and internal algorithms. Knowledge you have of algorithms and data structures for classical computer science can be very beneficial in this type of work.
Specialization: You may explore ways of making the algorithm more specific to a problem. This can be required when creating production systems and is a valuable skill. Making an algorithm more problem specific can also lead to increases in efficiency (such as running time) and efficacy (such as accuracy or other performance measures).
Generalization: Opportunities can be created by making a specific algorithm more general. Programmers (like mathematicians) are uniquely skilled in abstraction and you may be able to see how the algorithm could be applied to more general cases of a class of problem or other problems entirely.

Limitations

You can learn a lot by implementing machine learning algorithms by hand, but there are also some downsides to keep in mind.

Redundancy: Many algorithms already have implementations, some very robust implementations that have been used by hundreds or thousands of researchers and practitioners around the world. Your implementation may be considered redundant, a duplication of effort already invested by the community.
Bugs: New code that has few users is more likely to have bugs, even with a skilled programmer and unit tests. Using a standard library can reduce the likelihood of having bugs in the algorithm implementation.
Non-intuitive Leaps: Some algorithms rely on non-intuitive jumps in reasoning or logic because of the sophisticated mathematics involved. It is feasible that an implementation that does not appreciate these leaps to be limited or even incorrect.

It is easy to comment on open source implementations of machine learning algorithms and raise many issues in a code review. It is much harder to appreciate the non-intuitive efficiencies that have been encoded in the implementation. This can be a trap in thinking.

You may find it beneficial to start with a slower intuitive implementation of a complex algorithm before considering how to change it to be programmatically less elegant, but computationally more efficient.

Example Projects

Some algorithms are easier to understand than others. In this post I want to make some suggestions for intuitive algorithms from which you might like to select your first machine learning algorithm to implement from scratch.

Ordinary Least Squares Linear Regression: Use two dimensional data sets and model x from y. Print out the error for each iteration of the algorithm. Consider plotting the line of best fit and predictions for each iteration of the algorithm to see how the updates affect the model.
k-Nearest Neighbor: Consider using two dimensional data sets with 2 classes even ones that you create with graph paper so that you can plot them. Once you can plot and make predictions, you can plot the relationships created for each prediction decision the model makes.
Perceptron: Considered the simplest artificial neural network model and very similar to a regression model. You can track and graph the performance of the model as it learns a dataset.

Summary

In this post you learned the benefits of implementing machine learning algorithms by hand. You learned that you can understand an algorithm, make improvements and develop valuable skills by following this path.

What’s the fuss about?

Before we proceed to the core of the issue, let us remind in short what Machine Learning essentially is.

As you surely know, Machine Learning (ML) is one of the spheres of application of Artificial Intelligence (AI). In the framework of ML, IT systems are engaged in an automatic learning process based on experiences (data) and evolve without an explicit need to resort to programming them. Basically, we may say that Machine Learning is focused on the development of software that utilizes data for pattern recognition.

However, for Machine Learning to function properly and the software to make decisions, we need to train our algorithms. If we provide it with good training data and examples, it can then recognize patterns and dependencies and thus learn from data. We call this process model training.

testing data - How to Implement a Machine Learning Algorithm in Code

How does it work? Well, in short

The learning process starts with a prepared dataset (training dataset), which Machine Learning algorithms examine for patterns and dependencies. If this process ends successfully, the trained model will evaluate unknown data, which helps to make better decisions using these prognoses. The main goal here is to learn automatically without human intervention and to adapt actions accordingly.

machine learning process How to Implement a Machine Learning Algorithm in Code

The development of a model is an interactive process, which is often carried out several times until the result attains a certain quality level.

Machine learning in code How to Implement a Machine Learning Algorithm in Code

There are plenty of fish in the sea

For a start, let us briefly see what different algorithms used in Machine Learning we may employ. Actually, you can use these for practically any analytical problem.

ML Algorithms How to Implement a Machine Learning Algorithm in Code

Where can we find it?

Just a small and smart picture to illustrate where you can find ML without even noticing it.

algorithm implementation - How to Implement a Machine Learning Algorithm in Code

Surely there are other examples of spheres of ML application such as maintenance of equipment or supply chain management. To put it simply, possibilities are unlimited.

How to implement machine learning algorithms?

There is no universal recipe for a successful implementation of machine learning algorithms. So we won’t be trying to cover it all. Below there is just some basic stuff that you may find useful.

In general, we may surely say that if you want to get a deeper and more practical understanding of how machine learning algorithms work, implementing one as a training exercise is the best way to reach this goal. In addition, you will be able to internalize the algorithm and its mathematical description since you will regard vectors and matrices as arrays and computational intuitions for transformations on those variable structures.

To implement machine learning algorithms, you are required to work through a wide range of micro-decisions which formal algorithm descriptions often lack. If you manage to learn and parameterize such decisions, you’ll soon find yourself at an intermediate or even advanced level of managing the ML process.

What is the process after all?

Of course there are certain patterns and procedures you may follow to enhance your abilities to implement a ML algorithm all by yourself starting from nothing. Show no doubt — the more you implement the faster you learn and the more efficient and more customized your own process of working with algorithms becomes.

Below we present you with an example that you are welcome to embrace and incorporate into your practice.

Language. This choice affects APIs and libraries you can resort to in your implementation of machine learning algorithms.
Type of Algorithm. The more specific you are the better. Go deep into the class, types, description and implementations you want to put into practice.
Problem. Choose a problem or a few of them to test and validate how successful your implementation has been. ML algorithms hardly ever exist in isolation.
Research. Look through books, websites, libraries and any other materials you can read and obtain descriptions of the algorithm from. Ideally, it would be nice to have one major description of the algorithm to proceed from, but multiple perspectives on the algorithm is anyway preferable. They may come in handy if you want to work through the algorithm description at a faster pace. Also, it will help you to overcome roadblocks caused by ambiguities or assumptions.
Test. It’s better to have unit tests written for each function. Thus you will be able to grasp the purpose and expectations of each code unit prior to implementing them.

If you want to accelerate your progress along this path, we may advise you to port algorithms from one language to another. There are multitudes of open source implementations that you can code review and reimplement in another language.

On the other hand, you can open source your code during and after its development, adding solid comments and instructions on how one can build and use it. In doing so, you will market your skills, or be a source of inspiration for some beginners, or even come across a fellow ML programmer who may be interested in performing an audit or code review for you. Anyway, any feedback is always welcome, isn’t it?

Breaking Practical

If you have chosen to implement a machine learning algorithm by hand, you get an extra chance to develop valuable skills. Some of them will obviously help you in mastering the algorithm, others — in production system development, while certain remaining skills may be used for classical research in the domain.

Let’s consider some examples at a closer proximity:

Master Your Algorithms. Algorithm implementation is the first step towards mastership. During the implementation process, we have no other choice but to inspect the algorithm in detail. What is more, you become the creator of your own lab for taking trouble over computation, debugging and introduction of extra measures to assess the running process.
Development of Production Systems. Typically, custom implementation of ML algorithms takes place in production systems which require certain changes to be introduced into the algorithm for it to exhibit maximum efficiency. What do all businesses need? Lower costs and greater revenue. That is why results have to be better, faster, stronger. And that is where implementing algorithms by hand may prove to be immensely useful.
Reference Review. To achieve results, you definitely need to perform research by reading multiple descriptions of the algorithm. Furthermore, you will be locating and code reviewing other implementations of the algorithm to prove that you’ve understood everything correctly. This is targeted research, and by performing it you will learn how to make good use of research publications.

Understand it!

As simple as it may sound, you must understand your algorithms to become good at machine learning. Right after the implementation is over, you can start with introducing improvements to the implementation, be it experiments, optimization, specialization or generalization. And as technologies develop and the situation in the world is constantly changing, you will see that there is always room for improvement.

You learned a simple process that you can follow and customize as you implement multiple algorithms from scratch and you learned three algorithms that you could choose as your first algorithm to implement from scratch.

Search This Blog

Nitin Rathour