[Xapian-devel] [xapian] GSoC - Learning to Rank, Introduction and some Ideas

Sat Mar 31 17:03:42 BST 2012

Hi Parth, Hi all,

I am extremely sorry I missed the mailing list in my previous mail. I
replied to your message and so it might have missed the list. I will take
care next time.

Thanks for answering my questions. I will get back in case I have more.

Best Regards,

Mudit Raj Gupta

On Sat, Mar 31, 2012 at 9:22 PM, Parth Gupta <parthg.88 at gmail.com> wrote:

> Hi Mudit,
>
> Please do not mail me privately, use the mailing list.
>
>
> It seems quite similar to random forest algorithms, in fact if we remove
>> the step on select boot strap samples of data, both are almost the same,
>> like they select random subset of variables, split node based on threshold
>> criteria and mode data. We will also get all those variable importance
>> matrix and even the proximity matrix I guess. I think I will make my
>> proposal on the same.
>>
>
> cool.
>
>
>>
>> I was going through the xapian source and also the Letor tool which was
>> added to xapian. I read the documentation. What I could figure is that I
>> can use those functions already implemented in xapian to construct my
>> feature vector from the data set and since the present implementation uses
>> a SVM based ML method. I have to feed the feature vector to the new Random
>> Forest Algorithm and I will get ranking of pages. One more thing the input
>> vector format with be a pair of document and features(many). Am I right?
>>
>
> Right, the current input format is listed at the LTR project page
>
> http://trac.xapian.org/wiki/GSoC2011/LTR
>
> and you can use the same features. You basically need to make the
> framework which can take the data in that format and return the ranked
> list, quite similar to the existing approach. If your approach is
> pairwise/listwise you also need to construct the pairs out of the training
> data.
>
> Parth.
>
>>
>> Hope to hear from you soon.
>>
>> Best Regards,
>>
>> Mudit Raj Gupta
>>
>> On Fri, Mar 30, 2012 at 5:28 PM, Parth Gupta <parthg.88 at gmail.com> wrote:
>>
>>>
>>>
>>>> Thank you for your reply. I am more inclined towards random forest for
>>>> ranking. I was planning to complete my proposal soon. Should I include a
>>>> literature survey of various algorithms in my proposal or should I choose
>>>> one and concentrate on details?
>>>>
>>>
>>> Well the concrete plan of the project will be necessary. So you should
>>> focus more on the algorithm which you plan to implement and how are you
>>> planning to go about it.
>>>
>>> Parth.
>>>
>>>>
>>>> Best,
>>>>
>>>> Mudit
>>>>
>>>>
>>>> On Fri, Mar 30, 2012 at 4:48 PM, Parth Gupta <parthg.88 at gmail.com>wrote:
>>>>
>>>>> Hi Mudit,
>>>>>
>>>>> Good to know about you.
>>>>>
>>>>>>
>>>>>> I successfully completed my *Google Summer of Code - 2011* for the *Center
>>>>>> for the study of Complex systems - University of Michigan*. I
>>>>>> implemented various *algorithms (ant colony, random walk etc.)* related
>>>>>> to computational intelligence in Repast S (*Coded in Groovy, Java*)
>>>>>> and wrote *extensive documentations and tutorial* for the related
>>>>>> models with *literature reviews* on the topics. My *contributions to
>>>>>> Repast S was a part of the latest release of the software*. The
>>>>>> detailed documentation and code can be found here:
>>>>>> http://code.google.com/p/cscs-repast-demos/wiki/Mudit I have also
>>>>>> worked on various projects related to implementation of Machine Learning
>>>>>> and Bio-Inspired Evolutionary Algorithms.You can check the code and some
>>>>>> documentation on the same on code.google.com my user profile is :
>>>>>> http://code.google.com/u/110675325175605367090/
>>>>>>
>>>>>
>>>>> Seems your previous experience with machine learning will help you.
>>>>>
>>>>>
>>>>>>
>>>>>> I am interested in applying for the project - *"Learning to Rank*".
>>>>>> I have read the pointers on the ideas page and some literature about it. I
>>>>>> was thinking, based on my literature review, that something on the lines on
>>>>>> Multi-layer Perceptron network with Ant Colony Optimization or an Improved
>>>>>> random Forest could be a good option. I selected the same because of my
>>>>>> experience on the topic. Although any further details/pointers to the
>>>>>> projects would be greatly appreciated. I would also like to request you to
>>>>>> please let me know about any specific detail related to the project that is
>>>>>> required in the proposal (apart from the ones mentioned on the page)
>>>>>>
>>>>>
>>>>> There have been plenty of algorithms proposed in the recent past.
>>>>> Based on your choice of ML technique, you can choose one. As you are
>>>>> interested in Neural Net based approaches, ListNet [1], RankNet [2],
>>>>> ListMLE [3], LamdaRank [4] can be of your interest and if you want to
>>>>> explore the random forests based approaches then [5] can be checked out.
>>>>>
>>>>> [1] Learning to rank: from pairwise approach to listwise approach
>>>>> [2] Learning to Rank using Gradient Descent
>>>>> [3] Listwise Approach to Learning to Rank - Theory and Algorithm
>>>>> [4] Learning to Rank with Nonsmooth Cost Functions
>>>>> [5] Learning to rank with extremely randomized trees
>>>>>
>>>>>
>>>>> Regards,
>>>>> Parth.
>>>>>
>>>>>
>>>>>
>>>>>> Thank you for your time. Hope to hear from you soon.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Mudit Raj Gupta
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xapian-devel mailing list
>>>>>> Xapian-devel at lists.xapian.org
>>>>>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120331/d79190c4/attachment-0001.htm>