As we have made a good number of improvements in our visual recommendation technology we decided to benchmark it against other popular deep learning networks. The aim was to compare our recommendation against AlexNet (Caffe implementation) and ResNet (Torch implementation) and to outline errors in product similarity identification.
The setup for the comparison was such that with each network we fed in 3 different product categories and identified visually similar products based on a single thumbnail image. The errors are outlined in the images below in red circles. For the product categories we used 3260 different bags, 7028 different shoes and 3060 different t-shirts & tops. All networks were given the same images. In the first column in each network are original product images to which the network tried to find visually similar products (each row).
All in all CartSkill performed best with least amount of errors. Our approach places more emphasis on assessing product attributes based on its shape, such as the length of the sleeve, the shape of the heel on the shoe and the shape of the toe. This makes our network more suitable for determining similarities among fashion product. AlexNet and ResNet excel in recognizing patters but this comes at the expense of not being able to identify shapes as well. In addition, ResNet showed weakness in identifying colors. This goes to show that while there are a variety of networks out there ready for use, it takes time and effort to tailor the networks so that they provide accurate results for specific settings.