Luis Blanche

Logo

This is my Data Scientist / Machine Learning Portfolio

View the Project on GitHub LuisBlanche/portfolio

E-commerce Product Categorization using Large Language Models

Context

I was contracted by an e-commerce company to take part in a project to automate the seller catalog transformation into the marketplace operator catalog format.

Methodology

There are existing algorithms for product categorization that are available to bring down the number of possible categories so it is possible to build a prompt for the LLMs that fit into the context length. We prototyped the prompt and data pipeline using OpenAI. We then used GPT-4 to generate a validation dataset that was carefully reviewed by the team. We were then able to finetune SOTA open source models (like Llama 2, Mistral or Zephyr) to specialize the models for this task. The product categorization task is the entry task to facilitate attribute extraction downstream

Tools

My involvment

I was contracted as a member of a a team of 4 Senior Data Scientist and implicated mostly on the product categorization part.

Results and achievements