+4
Will be answered

Full Index Searching Help

NPb 3 years ago updated by P.J. d 9 months ago 11

Hello,


I have a working installation of FileRun and I have followed the instructions to set up the full index searching.  

I have an ElasticSearch server installed using Bitnami on a Windows server and I have the Docker version of Apache Tika running in the same Ubuntu server VM as FileRun.

The configuration page of FileRun show green responses when I click the test buttons:

Cluster name: elasticsearch
Nodes: mH9kZfj (6.5.4)
Index count: 178

and

Apache Tika 1.20

I have also excluded the suggested file extension.

Everything seems to be running fine, but I have only ever had 0 queued operations and when I run the process_search_index_queue.php script it says there is nothing to do.

When I log on as a standard user, the content search option is available but it doesn't find any files.


Please tell me where I am going wrong and how I can get the full file searching working.

Thanks, K.

Under review

Files are being queued for indexing when they are uploaded, created, edited, copied, etc.

If you have existing files, from before you have enabled the full-text searching, you should use the "reindex_files.php" command line script (http://docs.filerun.com/command_line_tools) to index them.

Great, thanks for the really quick response.   

I've kicked off that script now and I'll see if that resolves the issue.

Sorry, I have a few followup questions:


I am using FileRun as a front end to various shares on a back end server.  The files on these shares are uploaded via other applications and not through the FileRun interface.  

With this in mind:

  1. I will have to schedule this reindex_files.php script to run regularly to be able to search for any new files which are added?
  2. Will this script reindex all the files in my entire directory structure (several TBs) every time it is run, or will it skip over the ones it already knows?  

Thanks again for your help.


K.

+1
  1. FileRun won't know about the new files, unless something is done to them though FileRun. There is also a command line script for indexing individual files or individual subfolders. But I guess it depends on the particular flow if this can be used as a solution or not.
  2. It will not skip the previously processed files.

I will look into the possibility of adding an option to the command for skipping previously indexed files.

+1

Hi, thanks again.


I will look at frequently indexing for targeted folders, and a less frequent reindex for the entire structure to catch any other changes.  


That would be awesome if you could amend the script so it only indexes new files.  I'm sure that there will be other people who use your software for web based access to files which are updated by other applications, so this would be extremely useful and would make the process significantly faster.

Cheers, K.

Adding my vote for being able to index only new files. This would be extremely helpful, as I add or modify files outside of the Filerun interface fairly often.

I concur: although I am aware there is the option from within the Filerun WebUi to re-index individual folders, it would be nice to have the option to scan for new files first and subsequently re-index only those...

Another vote for skipping files already indexed.

hi Vlad, (me again) weighing in on this subject. 

This would be an absolutely amazing enhancement to have. My use case (which I think represents the OP):

I run Open Media Vault as my NAS at home. 

FileRun is a very robust way for me to make my files available (and editable via ONLYOFFICE) when i'm away from home.

Having said that, the documents I primarily access are those that i scan at home and drop onto my NAS.

These PDFs generally contain scanned images that need OCR'ing to be indexed. 

Without having Elastic and Tika/Tesseract scanning incrementally this will become difficult.

thank you so much :)

Will be answered

The option of indexing while skipping existing files will be added with the next FileRun update.

Nice, thanks for the great level of follow-up!