I'm reasonably new to machine learning, I've done a few projects in python. I'm looking for advice on how to approach the below problem which I believe could be automated.
A user in a data quality team in my organisation has a daily task of taking a list of company names (with addresses) that have been manually entered, he has to then search a database of companies to find the matching result, using his judgement - i.e. no hard and fast rule.
An example of the input would be:
Company Name, Address Line 1, Country
Of this, the user takes the company name and enters it into the search tool. Where he is presented with a list of results and he picks the best match but may choose not to pick any match. The search tool is built in house and talks to an external API, I have access to the source code so I can modify the search tool to capture the input, the list of results, and I could add a checkbox to see which result was used, and a check box to signify that none was chosen. Therefore this would become my labelled training data.
The columns used from the results to make the judgement are roughly the same:
Company Name, Address Line 1, Country
Given a company name like Stack Overflow, the results may return Stack Overflow Ltd., Stacking Overflowing Shelves Ltd. etc. The input data is reasonably good, so the results usually yield about 10 matches, and to a human, it's fairly obvious which one to pick.
My thought is that with enough training data I could call the API directly with the search term, and then choose the appropriate result from the list of results.
Is this something that could be achieved through ML? I'm struggling with the fact that the data will be different every time. Thoughts on the best way to achieve this are welcome, in particular how to structure the data for the model and what kind of classifier to use etc.