[Xapian-discuss] patent searches with xapian

Kevin Webb kevin at tackledesign.com
Fri Oct 27 23:17:59 BST 2006


Folks,

I'm writing simply to say thanks for building such a great piece of
software! I've been working with Xapian since the summer to construct
an interface for searching the complete US patent collection back to
1836 (the US Patent Office only offers searches back to 1976). It's
been a joy to use Xapian and I'm quite pleased with the search results
we've received. Our initial test system is now online and allows
searching of documents between 1836 and 1925 (about 1.5 million
documents):

http://search.allpatents.org/

We're working with folks at HP Labs to perform the OCR extraction on
the original page images which were scanned by the patent office. As
the OCR effort continues we'll expand our collection to include all US
patents.

Feel free to explore and make comments - if you have any thoughts on
how we might improve the interface or the search indexing I'd be glad
to chat! I may also write up a summary on how we've managed the text
over image markup implementation with Xapian (it's nothing
particularly fancy but may be of use to others none the less!)...

Thanks again for all your work!
Kevin Webb



More information about the Xapian-discuss mailing list