[Xapian-discuss] Tag-based filesystem with xapian, advice?
Karel Marissens
karel.marissens at gmail.com
Sat Feb 28 21:21:15 GMT 2009
Hi.
For my thesis, I'm working on a combination of a hierarchical and tag-
based filesystem written in python. I'm using FUSE to "write" the
filesystem. Now I am thinking about using xapian but could use some
advise.
Before I go into my questions, I'll explain the idea of the system
(questions after the horizontal line). The idea is that there's a
(hidden) directory using a hiërarchical filesystem. I then virtually
replicate this directory at a logical place (say, the homefolder of a
user) using FUSE. It can be used exactly the same as a user would
normally, but, I add extra functionality: tags. Every file will have
the ability to have different tags (keywords) associated with it. An
image of the christmas tree for example might be hiërarchically
located in /photo's/2008/christmas and be tagged as "tree, christmas,
photo, 2008".
How tags are added to files etc. is not important here. What is
important is that the association of a file with several tags will be
saved in a database, as this information needs to be searchable.
Every directory in the hierarchy will have a special folder: +FIND.
When one goes to /photo's/2008/+FIND, all tags associated with files
in the directory /photo's/2008, or any of its subdirectories, will be
visible as subdirectories. By opening such a subdirectory, a list of
tags (in the form of subdirectories) that can be combined with it will
be showed. So /photo's/2008/+FIND/christmas will show all tags
associated with files in the directory /photo's/2008, or any of its
subdirectories, which are tagged as christmas.
At any moment, the user can go in the special subdirectory +FILES to
see a list of all the files that comply to the selection. /photo's/
2008/+FIND/christmas/tree/+FILES will thus show all files in /photo's/
2008, or any of its subdirectories, which are tagged as christmas and
tree.
----------------------------------------------------------------------------------------------------
So, as I was searching for the best way to save all the needed
information in a database and find it back, I stumbled upon xapian. I
read the information pages and the whole API, looked at the few
examples I could find and did some small tests. I will only use
boolean search functionality as I have no need to "guess" which file
is most relevant, I just need to show them all.
Now my 1th question is, what is the path of a file? The content of a
document? A term? A value? I need to be able to use the path when
searching as I need to be able to limit the file-results to files in a
certain directory. Thus for example, only files that have a path of /
photo's/2008/*. Or do I have to work with a relevance-set or something?
I tried using the path as a tag itself, but when I do a query for "/
photo's/2008/*", it is automatically translated to 2 separated terms I
think? (a file tagged as 2008 also showed up for example)
My 2th question is, what is the easiest way to get a list of all the
tags associated with files in the resultset? I want to have a list of
all tags associated with files in /photo's/2008. One method would be
to do a search for all files in /photo's/2008, or any subdirectory,
loop all the results, and per document, loop the terms associated with
it and add these to a list.
My 3th question is how I can get ALL results? Get_mset() requires a
maximum amount of results. Do I just set it to an extremely big number
and see it as a safety-limitation that shouldn't be reached?
----------------------------------------------------------------------------------------------------
To sum it all up:
1) Where do I store the path of a file?
2) How do I get a list of all terms associated with documents in the
resultset?
3) How do I get ALL results, not a limited amount?
Thanks in advance for any advice!
Karel
More information about the Xapian-discuss
mailing list