/home/bill/System_maintenance/recoll/recoll notes.txt - Desktop Search Engine, appears stable and reliable www.BillHowell.ca /usr/share/recoll/examples/recoll.conf - settings / configuration /home/bill/.recoll/xapiandb/ - location of db ************************* 04May2015 Indexing WITHOUT numbers? : file:///usr/share/recoll/doc/usermanual.html#RCL.INSTALL.CONFIG.RECOLLCONF 5.4.1.2. Parameters affecting how we generate terms: nonumbers If this set to true, no terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not be indexed ("value123" would still be). Numbers are often quite interesting to search for, and this should probably not be set except for special situations, ie, scientific documents with huge amounts of numbers in them. This can only be set for a whole index, not for a subtree. textfilemaxmbs Maximum size for text files. Very big text files are often uninteresting logs. Set to -1 to disable (default 20MB). .I'll leave this for now Rebuild index! ************************* 03May2015 Location of xpiand database of indexing /media/bill/ATA_WDC_500G/xapiandb apparently, Thuderbird doesn't consistently respect it's own mbox format, which causes a problem: https://bitbucket.org/medoc/recoll/issue/32/not-indexing-thunderbird-316-mbox https://bitbucket.org/medoc/recoll/issues/8?sort=-id http://www.perlwatch.com/man/?query=recoll.conf&sektion=5&manpath=FreeBSD+8.2-stable SHIT!!! My skipped names included #*, which excludes Thudeerbird!! Maybe I did that when I had a problem or switched Thunderbird from the Toshiba laptop to Lenovo desktop : skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ So I removed that!! Run Recoll : Menu -> File -> Update index ... doing 21,897 additional files... 03May2015 Second Shit - I forgot to exclude the numerical output tables (Jacques Laskar) 04May2015 - Here's an initial "data-exclusion" list : /media/bill/ATA_WDC_500G/Climate/Datasets/ /media/bill/ATA_WDC_500G/Climate/delta 18O graph 6 My/ /media/bill/ATA_WDC_500G/Climate/Howell - computational models/ /media/bill/ATA_WDC_500G/Climate/Paillard/ /media/bill/ATA_WDC_500G/Dads paintings/ /media/bill/ATA_WDC_500G/DVD_* /media/bill/ATA_WDC_500G/Stalin supported Hitler/ 04May2015 06:58 Update index (hopefully will be fast?) : Result : search -> Prokhorov 2015 ************************* 27Apr2014 Recoll isn't indexing Thunderbird!! +-----+ https://github.com/MassimoLauria/global-configuration/blob/master/desktop/recoll.conf MassimoLauria on Nov 4, 2013 Enforce Dropbox library indexing even if it is a symlink # The system-wide configuration files for recoll are located in: # /usr/share/recoll/examples # The default configuration files are commented, you should take a look # at them for an explanation of what can be set (you could also take a look # at the manual instead). # Values set in this file will override the system-wide values for the file # with the same name in the central directory. The syntax for setting # values is identical. topdirs = ~ ~/.local/share/doc/ /usr/share/doc/ ~/Dropbox/Library/ # Wildcard expressions for names of files and directories that we should # ignore. If you need index mozilla/thunderbird mail folders, don't put # ".*" in there (as was the case with an older sample config) # These are simple names, not paths (must contain no / ) skippedNames = bin CVS Cache* cache* caughtspam tmp .thumbnails .svn .git .hg *~ recollrc .wine dot* VirtualBox* *.vmdk *.vdi skippedPaths = ~/.OldFiles ~/.cache ~/.gconf ~/.backup ~/.config ~/.local ~/.local/share/Trash # Where to store the database (directory). This may be an absolute path, # else it is taken as relative to the configuration directory (-c argument # or $RECOLL_CONFDIR). # If nothing is specified, the default is then ~/.recoll/xapiandb/ dbdir = /NOBACKUP/lauria/recolldb/ +-----+ Howell - Can't seem to find the problem?!? Just use Thunderbird indexing for now **************** 08Dec2014 Move xapiandb for recoll desktop search engine to 2nd internal HD drive I need to copy /usr/share/recoll/examples/recoll.conf to my ~/.recoll directory so that it is saved!!! >> Done /usr/share/recoll/examples/recoll.conf ~Line 90 : # Where to store the database (directory). This may be an absolute path, # else it is taken as relative to the configuration directory (-c argument # or $RECOLL_CONFDIR). # If nothing is specified, the default is then ~/.recoll/xapiandb/ # dbdir = xapiandb # 08Dec2014 moved directory to ATA_WDC_500G secondary drive to avoid weekly backup problems. # So for recoll to work, the 2nd drive must be mounted! dbdir = /media/bill/ATA_WDC_500G/xapiandb Test - launch from LinuxMint Menu >> Failed, but topdir still at topdirs = "/media/bill/ATA_WDC_500G/Dads paintings" -> couldn't test I set topdirs = ~ "/media/bill/ATA_WDC_500G/Climate/" >> OK! now it works *********** 06Dec2014 skippedNames - to avoid indexing huge numerical files in : - /media/bill/ATA_WDC_500G/Climate/Datasets/ - video files /usr/share/recoll/examples/recoll.conf ~ Line 23 : # Wildcard expressions for names of files and directories that we should # ignore. If you need index mozilla/thunderbird mail folders, don't put # ".*" in there (as was the case with an older sample config) # These are simple names, not paths (must contain no / ) # skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ # *~ .beagle .git .hg .bzr loop.ps .xsession-errors \ # .recoll* xapiandb recollrc recoll.conf # ?06Dec2014 - tried (unsuccessfully for now) to index with more skippedNames skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \ .recoll* xapiandb recollrc recoll.conf \ .Trash-1000 DVD_BigBattles_WWII DVD_GreatBattles_WWII DVD_HitlerLost DVD_SovietStory DVD_SuvorovTalks \ NN_conf_DVDs "lost+found" "Stalin supported Hitler" "Dads paintings" Datasets *********** 02Dec2014 recoll installation & basic command-line usage I simply started the indexing after downloading, via the GUI. But for regular use - I want the updates manually done via bash script during weekly maintenance. i.e. I won't use "cron" - but will re-index each weekly maintenance (hopefully a bash script command?) recoll -t -q -t command-line use. Graphical User Interface will not be started, and results will be printed to standard output . recollindex -x -x using the demon without X11 context recollindex -S - will rebuild the phonetic/orthographic index. This feature uses the aspell package, which must be installed. enddoc