[Xapian-discuss] scriptindex memory usage

Kevin Duraj kevin.softdev at gmail.com
Wed Nov 21 03:50:03 GMT 2007


Dear Jim,

My scriptindex uses 7.2 GB of memory when indexing 56 millions of
documents. Xapian memory indexing usage is based on
XAPIAN_FLUSH_THRESHOLD envrionment variable. The default is 10K, mine
is 1 million. I have switch all memory slots to 2GB memory modules and
have been throwing 500MB memory modules to garbage. If you send me
self adress envelope with postage I will send you back couple of 500MB
memory modules. It will be more than double what you need.


Cheers
  Kevin Duraj
  http://UncensoredWebSearch.com


On Nov 19, 2007 2:39 PM, Jim Spath <jspath at pangeamedia.com> wrote:
> Jim Spath wrote:
> > quiz_id : field=quiz_id unique=Q boolean=Q
> > quiz_title : field=title weight=4 index index=XTITLE
> > quiz_path : field=path
> > tags : weight=3 index index=XTAGS
> > questions : weight=2 index index=XQUESTIONS
> > answers : weight=1 index index=XANSWERS
> > adult : field=adult index boolean=XADULT
> > type : field=type boolean=XTYPE
> > create_date : value=0
> > language_string : field=language_string boolean=L
>
> Looking my indexer_script over, I saw a some optimizations I could make
> and have lowered the amount of memory scriptindex is using by over 100MB:
>
>             VIRT  RES  SHR
> previously: 236m 227m 1504
> currently:  138m 129m 1508
>
> My indexer_script now looks like:
>
> quiz_id : field=quiz_id unique=Q boolean=Q
> quiz_title : field=title weight=4 index=XTITLE
> quiz_path : field=path
> tags : weight=3 index=XTAGS
> questions : weight=2 index=XQUESTIONS
> answers : weight=1 index=XANSWERS
> adult : boolean=XADULT
> type : boolean=XTYPE
> create_date : value=0
> language_string : boolean=L
>
> The resulting database files are much smaller now too:
>
>  position: 59M  vs 148M
>  postlist: 51M  vs 89M
>  record:   4.4M vs 7.7M
>  termlist: 50M  vs 67M
>  value:    1.3M vs 1.5M
>
> I'm still worried about resource use as the amount of data grows, but I
> guess I'm somewhat better off now.
>
> Are there some generally accepted "best practices" for indexing large
> datasets?
>
>
> - Jim
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list