Matrix Factorization Methods

Collaborative Filtering has been criticized as having limited scalability, since computing similarity matrices on very large sets of items or users can take a lot of computing horsepower. I don't really buy this, however. Technologies such as Apache Spark allow you to distribute the construction of this matrix across a cluster if you need to. A legitimate problem with collaborative filtering is that it's sensitive to noisy data and sparse data. You'll only get really good results if you have a large data set to work with that's nice and clean.

接下来介绍一些Model-based methods. Instead of 寻找相似的物品或相似的用户, we apply data science and machine learning techniques to extract predictions from our ratings data.

1578547347570

There are a wide variety of techniques that fall under the category of matrix factorization. They managed to find broader features of users and items on their own, like action movies or romantic. They are described by matrices. The general idea is to describe users and movies as combinations of different amounts of each feature. For example, Bob is defined as being 80% an action fan and 20% a comedy fan. We'd then know to match him up with movies that are blend about 80% action and 20% comedy.

1578559424817