From olly at survex.com Tue Sep 2 03:56:18 2014 From: olly at survex.com (Olly Betts) Date: Tue, 2 Sep 2014 03:56:18 +0100 Subject: [Xapian-devel] Full MVCC in Brass In-Reply-To: <878umih85b.fsf@awakening.csail.mit.edu> References: <87mwb0gdpr.fsf@awakening.csail.mit.edu> <87d2bvk1su.fsf@topper.koldfront.dk> <878umih85b.fsf@awakening.csail.mit.edu> Message-ID: <20140902025618.GJ22512@survex.com> On Wed, Aug 20, 2014 at 10:46:24PM -0400, Austin Clements wrote: > On Wed, 20 Aug 2014, Adam Sj?gren wrote: > > Austin Clements writes: > > > >> Full MVCC would enable Xapian to keep any database revision valid as > >> long as any reader is using it, thus eliminating the dreaded > >> DatabaseModifiedError and simplifying application logic. > > > > This would also make it easier to spool a consistent backup to tape of > > an index that is being used (updated), right? > > As Olly mentioned, this wouldn't ensure that a copy straight from the > filesystem would be a valid database. However, it would be pretty > close. The problem is that, even if you're holding a read lock on a > given revision, the database metadata in "iambrass" may no longer be > consistent with that revision. However, I think if you were to save the > iambrass file as it was when the reader opened and locked its revision, > you could then safely copy the remaining files directly from the file > system. Ah yes, that's nice and simple, and I think should work. We could provide a helper tool which did something like opening the database with a read lock, taking a exclusive lock on "iambrass.saved" and saving the version file data for the opened revision to it. This tool could run as a wrapper around the backup command. Then after a restore, you should just need to do this to get a working database: mv iambrass.saved iambrass Even if there's some subtlety I'm missing, we should certainly try to provide a way to make backups more easily. Cheers, Olly From olly at survex.com Tue Sep 2 05:24:31 2014 From: olly at survex.com (Olly Betts) Date: Tue, 2 Sep 2014 05:24:31 +0100 Subject: [Xapian-devel] Full MVCC in Brass In-Reply-To: <87bnreh8fi.fsf@awakening.csail.mit.edu> References: <87mwb0gdpr.fsf@awakening.csail.mit.edu> <20140820030012.GK7305@survex.com> <87bnreh8fi.fsf@awakening.csail.mit.edu> Message-ID: <20140902042431.GK22512@survex.com> On Wed, Aug 20, 2014 at 10:40:17PM -0400, Austin Clements wrote: > I haven't seen anything saying that LockFileEx interacts with CreateFile > locking, but if this goes in a new BrassLock, then it shouldn't be an > issue because we'll always use LockFileEx for brass databases. If it's feasible, keeping the lockfile called "flintlock" and the locking interoperable is helpful - it avoids issues like ending up with two databases of different formats in the same directory, as I've seen happen with flint and quartz (quartz used a "db_lock" as the lockfile name and the presence of the file indicated the lock was held). However the new locking code can certainly be a different class if that works best. I doubt LockFileEx() directly interacts with CreateFile(), but you need to open the file before you call LockFileEx(). And if current Xapian has the database locked on Windows, then the lock file can't be opened for write (because only FILE_SHARE_READ was specified). So when future Xapian tries to open to write, it won't be able to (it won't even get to the LockFileEx() stage). And I think we can continue to have writers open with only FILE_SHARE_READ to get locking to work the other way around (readers would I think need to open with FILE_SHARE_READ|FILE_SHARE_WRITE). > Okay. If we're on a platform that can't support MVCC, should an attempt > to open the database for read *without* opting out fail? Alternatively, > it could fall back to old-fashioned read locking (blocking writers) so > that readers still get the guarantee that DatabaseModifiedError will > never happen (and they could still opt out of all read locking and deal > with the consequences). Actually, while I said "opt out", there is the question of what the default should be (MVCC or DatabaseModifiedError). MVCC seems more appealing, though I'm not sure how the much issues with long-lived readers bloating database size will catch people out. But if MVCC is the default, I'm not sure what's best to do on a platform which can't support it. Failing seems the safest option, at least for now, but readers blocking writers isn't so bad - better than readers getting DatabaseModifiedError on some platforms but not others. > I think for a first shot I just won't do lock extension. The next step > would be to do it unless we're using F_SETLK. Then the last step would > be to add a lock helper. Steps 2 and 3 are backwards-compatible, so > they could be done at any point in the future. OK. I'm generally a fan of an incremental approach. Cheers, Olly From olly at survex.com Tue Sep 2 05:38:29 2014 From: olly at survex.com (Olly Betts) Date: Tue, 2 Sep 2014 05:38:29 +0100 Subject: [Xapian-devel] Full MVCC in Brass In-Reply-To: <87mwb0gdpr.fsf@awakening.csail.mit.edu> References: <87mwb0gdpr.fsf@awakening.csail.mit.edu> Message-ID: <20140902043829.GL22512@survex.com> On Tue, Aug 19, 2014 at 09:19:12PM -0400, Austin Clements wrote: > Before I wade too deep into this, I wanted to propose my plan. [snip] I've just read through this carefully, and it all sounds good to me. I'm currently working on merging some changes into brass, but they shouldn't touch the same areas as this, so don't be concerned if you merge from master and find changes to brass flooding in. Cheers, Olly From karthikiyer2000 at gmail.com Sat Sep 20 15:37:33 2014 From: karthikiyer2000 at gmail.com (karthik iyer) Date: Sat, 20 Sep 2014 20:07:33 +0530 Subject: [Xapian-devel] Help with xapian Message-ID: Hi, I am interested in doing some developement work for xapian. I have built the xapian core library on my system and also tried my hands on a few features of xapian. For the past few days I have been going through the code base of xapian. I am trying to understand how the features are implemented ( i tried looking into the codes of porter stemming and list encoding) but I've not had any significant progress. I am new to open source but have some experience with IR projects. Could you please suggest some proper place to start with? Regards Karthik From olly at survex.com Mon Sep 22 11:53:41 2014 From: olly at survex.com (Olly Betts) Date: Mon, 22 Sep 2014 11:53:41 +0100 Subject: [Xapian-devel] Help with xapian In-Reply-To: References: Message-ID: <20140922105341.GK31496@survex.com> On Sat, Sep 20, 2014 at 08:07:33PM +0530, karthik iyer wrote: > I am interested in doing some developement work for xapian. I have > built the xapian core library on my system and also tried my hands on > a few features of xapian. For the past few days I have been going > through the code base of xapian. I am trying to understand how the > features are implemented ( i tried looking into the codes of porter > stemming and list encoding) but I've not had any significant progress. > I am new to open source but have some experience with IR projects. > Could you please suggest some proper place to start with? My suggestion would be to set yourself a goal, for example some small new feature to add, and then work towards that. At least for me, that approach usually works better than trying to understand how something works just by looking at it. There's a list of project ideas on the wiki which might be suitable: http://trac.xapian.org/wiki/ProjectIdeas Or if you're interested in a particular area, people might be able to suggest something related to that. Cheers, Olly From karthikiyer2000 at gmail.com Mon Sep 29 13:40:44 2014 From: karthikiyer2000 at gmail.com (karthik iyer) Date: Mon, 29 Sep 2014 18:10:44 +0530 Subject: [Xapian-devel] Help with xapian In-Reply-To: <20140922105341.GK31496@survex.com> References: <20140922105341.GK31496@survex.com> Message-ID: Hi, I have started getting a hang of the xapian codebase. I think I would like to try my hands on the letor module of xapian. Could you please suggest some free data set for the training and testing of letor features. I am not able to get the INEX data set from anywhere (the one mentioned by parth gupta in his GSOC 2011 projecct. Regards Karthik On Mon, Sep 22, 2014 at 4:23 PM, Olly Betts wrote: > On Sat, Sep 20, 2014 at 08:07:33PM +0530, karthik iyer wrote: > > I am interested in doing some developement work for xapian. I have > > built the xapian core library on my system and also tried my hands on > > a few features of xapian. For the past few days I have been going > > through the code base of xapian. I am trying to understand how the > > features are implemented ( i tried looking into the codes of porter > > stemming and list encoding) but I've not had any significant progress. > > I am new to open source but have some experience with IR projects. > > Could you please suggest some proper place to start with? > > My suggestion would be to set yourself a goal, for example some small > new feature to add, and then work towards that. At least for me, that > approach usually works better than trying to understand how something > works just by looking at it. > > There's a list of project ideas on the wiki which might be suitable: > > http://trac.xapian.org/wiki/ProjectIdeas > > Or if you're interested in a particular area, people might be able to > suggest something related to that. > > Cheers, > Olly > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikiyer2000 at gmail.com Mon Sep 29 14:47:28 2014 From: karthikiyer2000 at gmail.com (karthik iyer) Date: Mon, 29 Sep 2014 19:17:28 +0530 Subject: [Xapian-devel] Help with xapian In-Reply-To: References: <20140922105341.GK31496@survex.com> Message-ID: Ok sorry, my bad. Just checked the INEX 2014 website. I believe that is where we get the data set from. Cheers Karthik On Mon, Sep 29, 2014 at 6:10 PM, karthik iyer wrote: > Hi, > > I have started getting a hang of the xapian codebase. I think I would like > to try my hands on the letor module of xapian. Could you please suggest > some free data set for the training and testing of letor features. I am not > able to get the INEX data set from anywhere (the one mentioned by parth > gupta in his GSOC 2011 projecct. > > Regards > Karthik > > On Mon, Sep 22, 2014 at 4:23 PM, Olly Betts wrote: > >> On Sat, Sep 20, 2014 at 08:07:33PM +0530, karthik iyer wrote: >> > I am interested in doing some developement work for xapian. I have >> > built the xapian core library on my system and also tried my hands on >> > a few features of xapian. For the past few days I have been going >> > through the code base of xapian. I am trying to understand how the >> > features are implemented ( i tried looking into the codes of porter >> > stemming and list encoding) but I've not had any significant progress. >> > I am new to open source but have some experience with IR projects. >> > Could you please suggest some proper place to start with? >> >> My suggestion would be to set yourself a goal, for example some small >> new feature to add, and then work towards that. At least for me, that >> approach usually works better than trying to understand how something >> works just by looking at it. >> >> There's a list of project ideas on the wiki which might be suitable: >> >> http://trac.xapian.org/wiki/ProjectIdeas >> >> Or if you're interested in a particular area, people might be able to >> suggest something related to that. >> >> Cheers, >> Olly >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olly at survex.com Tue Sep 2 03:56:18 2014 From: olly at survex.com (Olly Betts) Date: Tue, 2 Sep 2014 03:56:18 +0100 Subject: [Xapian-devel] Full MVCC in Brass In-Reply-To: <878umih85b.fsf@awakening.csail.mit.edu> References: <87mwb0gdpr.fsf@awakening.csail.mit.edu> <87d2bvk1su.fsf@topper.koldfront.dk> <878umih85b.fsf@awakening.csail.mit.edu> Message-ID: <20140902025618.GJ22512@survex.com> On Wed, Aug 20, 2014 at 10:46:24PM -0400, Austin Clements wrote: > On Wed, 20 Aug 2014, Adam Sj?gren wrote: > > Austin Clements writes: > > > >> Full MVCC would enable Xapian to keep any database revision valid as > >> long as any reader is using it, thus eliminating the dreaded > >> DatabaseModifiedError and simplifying application logic. > > > > This would also make it easier to spool a consistent backup to tape of > > an index that is being used (updated), right? > > As Olly mentioned, this wouldn't ensure that a copy straight from the > filesystem would be a valid database. However, it would be pretty > close. The problem is that, even if you're holding a read lock on a > given revision, the database metadata in "iambrass" may no longer be > consistent with that revision. However, I think if you were to save the > iambrass file as it was when the reader opened and locked its revision, > you could then safely copy the remaining files directly from the file > system. Ah yes, that's nice and simple, and I think should work. We could provide a helper tool which did something like opening the database with a read lock, taking a exclusive lock on "iambrass.saved" and saving the version file data for the opened revision to it. This tool could run as a wrapper around the backup command. Then after a restore, you should just need to do this to get a working database: mv iambrass.saved iambrass Even if there's some subtlety I'm missing, we should certainly try to provide a way to make backups more easily. Cheers, Olly From olly at survex.com Tue Sep 2 05:24:31 2014 From: olly at survex.com (Olly Betts) Date: Tue, 2 Sep 2014 05:24:31 +0100 Subject: [Xapian-devel] Full MVCC in Brass In-Reply-To: <87bnreh8fi.fsf@awakening.csail.mit.edu> References: <87mwb0gdpr.fsf@awakening.csail.mit.edu> <20140820030012.GK7305@survex.com> <87bnreh8fi.fsf@awakening.csail.mit.edu> Message-ID: <20140902042431.GK22512@survex.com> On Wed, Aug 20, 2014 at 10:40:17PM -0400, Austin Clements wrote: > I haven't seen anything saying that LockFileEx interacts with CreateFile > locking, but if this goes in a new BrassLock, then it shouldn't be an > issue because we'll always use LockFileEx for brass databases. If it's feasible, keeping the lockfile called "flintlock" and the locking interoperable is helpful - it avoids issues like ending up with two databases of different formats in the same directory, as I've seen happen with flint and quartz (quartz used a "db_lock" as the lockfile name and the presence of the file indicated the lock was held). However the new locking code can certainly be a different class if that works best. I doubt LockFileEx() directly interacts with CreateFile(), but you need to open the file before you call LockFileEx(). And if current Xapian has the database locked on Windows, then the lock file can't be opened for write (because only FILE_SHARE_READ was specified). So when future Xapian tries to open to write, it won't be able to (it won't even get to the LockFileEx() stage). And I think we can continue to have writers open with only FILE_SHARE_READ to get locking to work the other way around (readers would I think need to open with FILE_SHARE_READ|FILE_SHARE_WRITE). > Okay. If we're on a platform that can't support MVCC, should an attempt > to open the database for read *without* opting out fail? Alternatively, > it could fall back to old-fashioned read locking (blocking writers) so > that readers still get the guarantee that DatabaseModifiedError will > never happen (and they could still opt out of all read locking and deal > with the consequences). Actually, while I said "opt out", there is the question of what the default should be (MVCC or DatabaseModifiedError). MVCC seems more appealing, though I'm not sure how the much issues with long-lived readers bloating database size will catch people out. But if MVCC is the default, I'm not sure what's best to do on a platform which can't support it. Failing seems the safest option, at least for now, but readers blocking writers isn't so bad - better than readers getting DatabaseModifiedError on some platforms but not others. > I think for a first shot I just won't do lock extension. The next step > would be to do it unless we're using F_SETLK. Then the last step would > be to add a lock helper. Steps 2 and 3 are backwards-compatible, so > they could be done at any point in the future. OK. I'm generally a fan of an incremental approach. Cheers, Olly From olly at survex.com Tue Sep 2 05:38:29 2014 From: olly at survex.com (Olly Betts) Date: Tue, 2 Sep 2014 05:38:29 +0100 Subject: [Xapian-devel] Full MVCC in Brass In-Reply-To: <87mwb0gdpr.fsf@awakening.csail.mit.edu> References: <87mwb0gdpr.fsf@awakening.csail.mit.edu> Message-ID: <20140902043829.GL22512@survex.com> On Tue, Aug 19, 2014 at 09:19:12PM -0400, Austin Clements wrote: > Before I wade too deep into this, I wanted to propose my plan. [snip] I've just read through this carefully, and it all sounds good to me. I'm currently working on merging some changes into brass, but they shouldn't touch the same areas as this, so don't be concerned if you merge from master and find changes to brass flooding in. Cheers, Olly From karthikiyer2000 at gmail.com Sat Sep 20 15:37:33 2014 From: karthikiyer2000 at gmail.com (karthik iyer) Date: Sat, 20 Sep 2014 20:07:33 +0530 Subject: [Xapian-devel] Help with xapian Message-ID: Hi, I am interested in doing some developement work for xapian. I have built the xapian core library on my system and also tried my hands on a few features of xapian. For the past few days I have been going through the code base of xapian. I am trying to understand how the features are implemented ( i tried looking into the codes of porter stemming and list encoding) but I've not had any significant progress. I am new to open source but have some experience with IR projects. Could you please suggest some proper place to start with? Regards Karthik From olly at survex.com Mon Sep 22 11:53:41 2014 From: olly at survex.com (Olly Betts) Date: Mon, 22 Sep 2014 11:53:41 +0100 Subject: [Xapian-devel] Help with xapian In-Reply-To: References: Message-ID: <20140922105341.GK31496@survex.com> On Sat, Sep 20, 2014 at 08:07:33PM +0530, karthik iyer wrote: > I am interested in doing some developement work for xapian. I have > built the xapian core library on my system and also tried my hands on > a few features of xapian. For the past few days I have been going > through the code base of xapian. I am trying to understand how the > features are implemented ( i tried looking into the codes of porter > stemming and list encoding) but I've not had any significant progress. > I am new to open source but have some experience with IR projects. > Could you please suggest some proper place to start with? My suggestion would be to set yourself a goal, for example some small new feature to add, and then work towards that. At least for me, that approach usually works better than trying to understand how something works just by looking at it. There's a list of project ideas on the wiki which might be suitable: http://trac.xapian.org/wiki/ProjectIdeas Or if you're interested in a particular area, people might be able to suggest something related to that. Cheers, Olly From karthikiyer2000 at gmail.com Mon Sep 29 13:40:44 2014 From: karthikiyer2000 at gmail.com (karthik iyer) Date: Mon, 29 Sep 2014 18:10:44 +0530 Subject: [Xapian-devel] Help with xapian In-Reply-To: <20140922105341.GK31496@survex.com> References: <20140922105341.GK31496@survex.com> Message-ID: Hi, I have started getting a hang of the xapian codebase. I think I would like to try my hands on the letor module of xapian. Could you please suggest some free data set for the training and testing of letor features. I am not able to get the INEX data set from anywhere (the one mentioned by parth gupta in his GSOC 2011 projecct. Regards Karthik On Mon, Sep 22, 2014 at 4:23 PM, Olly Betts wrote: > On Sat, Sep 20, 2014 at 08:07:33PM +0530, karthik iyer wrote: > > I am interested in doing some developement work for xapian. I have > > built the xapian core library on my system and also tried my hands on > > a few features of xapian. For the past few days I have been going > > through the code base of xapian. I am trying to understand how the > > features are implemented ( i tried looking into the codes of porter > > stemming and list encoding) but I've not had any significant progress. > > I am new to open source but have some experience with IR projects. > > Could you please suggest some proper place to start with? > > My suggestion would be to set yourself a goal, for example some small > new feature to add, and then work towards that. At least for me, that > approach usually works better than trying to understand how something > works just by looking at it. > > There's a list of project ideas on the wiki which might be suitable: > > http://trac.xapian.org/wiki/ProjectIdeas > > Or if you're interested in a particular area, people might be able to > suggest something related to that. > > Cheers, > Olly > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikiyer2000 at gmail.com Mon Sep 29 14:47:28 2014 From: karthikiyer2000 at gmail.com (karthik iyer) Date: Mon, 29 Sep 2014 19:17:28 +0530 Subject: [Xapian-devel] Help with xapian In-Reply-To: References: <20140922105341.GK31496@survex.com> Message-ID: Ok sorry, my bad. Just checked the INEX 2014 website. I believe that is where we get the data set from. Cheers Karthik On Mon, Sep 29, 2014 at 6:10 PM, karthik iyer wrote: > Hi, > > I have started getting a hang of the xapian codebase. I think I would like > to try my hands on the letor module of xapian. Could you please suggest > some free data set for the training and testing of letor features. I am not > able to get the INEX data set from anywhere (the one mentioned by parth > gupta in his GSOC 2011 projecct. > > Regards > Karthik > > On Mon, Sep 22, 2014 at 4:23 PM, Olly Betts wrote: > >> On Sat, Sep 20, 2014 at 08:07:33PM +0530, karthik iyer wrote: >> > I am interested in doing some developement work for xapian. I have >> > built the xapian core library on my system and also tried my hands on >> > a few features of xapian. For the past few days I have been going >> > through the code base of xapian. I am trying to understand how the >> > features are implemented ( i tried looking into the codes of porter >> > stemming and list encoding) but I've not had any significant progress. >> > I am new to open source but have some experience with IR projects. >> > Could you please suggest some proper place to start with? >> >> My suggestion would be to set yourself a goal, for example some small >> new feature to add, and then work towards that. At least for me, that >> approach usually works better than trying to understand how something >> works just by looking at it. >> >> There's a list of project ideas on the wiki which might be suitable: >> >> http://trac.xapian.org/wiki/ProjectIdeas >> >> Or if you're interested in a particular area, people might be able to >> suggest something related to that. >> >> Cheers, >> Olly >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: