“Machine Learning” – the phrase itself creates a sense of excitement for anyone who has even remotely tried experimenting with making machines “learn” from their past experiences and automatically correct themselves if they are going wrong. The thought itself is so intriguing that not only technologists, but even the film industry has been fascinated by it, very obvious by the number of sci-fi movies which involve machines handling and even replacing humans in a lot of scenarios. Still a very distant dream, these movies have depicted what machine learning, in its pinnacle form, can and should achieve.
First up, we’ll get the term machine learning defined. According to WikiPedia, machine learning is “…a branch of artificial intelligence, … concerned with the design and development of algorithms that take as input empirical data, … and yield patterns or predictions….“. And that’s what it is. It simply helps you to identify patterns in data and can easily predict results for unknown states. The answer may be wrong or right, but that’s just a matter of how well you analyze the data and how much data you have.
Machine learning relies heavily on probabilistic models and theorems and how the data which it receives is used in a probability distribution. Machine learning can be broadly classified into two categories:
- Supervised Learning
- Unsupervised Learning
The difference between the two kinds of learning is dependent on how the underlying data is classified.
In supervised learning, we say that the result classes are predefined. That is, for each input, there are only a finite set of output classes for which a particular input can be a cause. The machine learner’s task is to search for patterns and construct mathematical models based on this data. The models are then evaluated based on how well they predict the output based on the variation in the data for a particular input.
In unsupervised learning, we do not have any fixed output classes. Instead, the algorithms employed must group the data in classification classes. In fact, the primary purpose of unsupervised learning algorithms is to generate classification labels themselves which are used to classify the incoming data. These labels are created on the fly and data is then grouped under these labels. Unsupervised learning is widely used in what is known as the “cluster analysis”. This is particularly useful in grouping textual data.
This is just a basic introduction of what is machine learning and its meaning. Hoping this shed some light on the subject. We’ll be soon be digressing on a very interesting Java framework for machine learning that is free, open-source, easy to use and really useful for processing a huge amount of data to make recommendations to the user – Mahout by Apache.