One of the challenges the client had is ensuring that work was not duplicated. There was a need to know if people in the organization had previously reviewed specific patents, and be able to tell what they thought of them. Additional metadata including current ownership of patents affects their desirability, so we wanted to be able to attach this data as well.
We developed a rails application that would take a list of patents, and “enrich” them with ownership data. This data is publicly available, but it is unstructured and a bit of a challenge to make the right decisions. However, we didn’t need to be 100% accurate, just really close. Ironically, there is no commercial offering in the tools that we evaluated that does this well. There are naming normalization issues here as well as the unstructured data. Our solution works pretty well.
From a usability standpoint, the user provides a spreadsheet with patent records. Minimally it has the patent number in a properly named column. Other data can be provided and will be used if present. If not present, the application acquires the data from public web sites.
Once the data set is enriched, the user may pull an ‘extract’ which is a normalized spreadsheet with macros to help the user analyze the set quickly. It includes links to other data, and buttons to mark and categorize patents. Once complete, the user sends the processed sheet back to the application.
The enrichment occurs asynchronously using a Ruby gem called delayed jobs that manages work to do in a database table, with retry and failure logic in case things go wrong. The enrichment work uses threads calling into the application separately from the main application running under Apache. The application runs under Apache using Passenger so there are multiple threads to serve users with high performance.