I'm not the dev, but wondering how much effort he should really be putting into overcoming 32-bit limitations instead of other things.
are you going through a reverse proxy.if so, which one?
is your OnlyOffice available externally via reverse proxy? Did you set a subdomain for it, for example like onlyoffice.yourdomain.com ?
Okay I just played around with these PHP files and here's what I found.
process_search_index_queue.php is working fine. I have it on a schedule.
add_folder_to_search_index.php works. If I point it at a folder, it does take all the plain-text data and injects it into Elasticsearch database and is now searchable.
reindex_files.php - Either this isn't working or it's failing or something. My understanding was that this was supposed to do the same thing that add_folder_to_search_index.php does but across the entire home directory and subdirectories. This does not happen. I'm not sure if it's indexing file contents or file names across my entire user folder but it's not doing either. It runs for a long time and seems to be doing something, but the end result is that it either fails before it finishes(?) or it isn't doing what it is supposed to be doing. Can you clarify if I'm using it right and if I should do any debugs or pull logs to see what it accomplished?
Also, while working with this, it was made more clear how Tika and Elasticsearch do their jobs. I understood both had something to do with indexing and searching but wasn't clear what app did what. Perhaps some workflow or clarification update might be good? But for others reading this (and assuming I'm correct):
Tika will extract the plain-text from various types of files like docx, xls, etc, and keep that data somewhere(?).
Elasticsearch will actually be the indexer for that data that Filerun queries.
But a script or process has to be run on those files if Filerun didn't create/modify them in order for that plain-text data to get put INTO the Elasticsearch database and become searchable.
So add_folder_to_search_index.php and add_file_to_search_index.php makes its contents searchable.
index_filenames.php makes the filenames themselves searchable
reindex_files.php does something but not clear what since I think it's broken for me.
So *for me*, it looks like I need to run add_folder_to_search_index.php as well as index_filenames.php on the root user folder to get everything.
Previous to me figuring this all out, all I thought I had to do was set up the cron to run the process_index_queue regularly and run the reindex_files.php once, which isn't correct for me at least.
It is run on schedule. It's ran every minute via cron. I just ran it manually and the queue is empty. The files I create via FileRun aren't the issue anyway. Those work great.
I have also run 'reindex_files.php' already. Took hours, but it completed. Doesn't this accomplish the same thing as the add_file and add_folder above but more universal across the whole file system?
Just to test, I ran the "add_folder_to_search_index.php" on the folder in question and it said it processed those two text files. However they are still not content-searchable. I've also done this from the GUI under control panel on the folder.
None of these things make the contents searchable. Only actually creating the file new (or duplicating existing) within Filerun will end up making that file's contents searchable.
"you'll see the plain text version of the file" means Tika did its job. It's there.
However, this isn't limited to Tika. It won't even search contents of plain text files. Please see this video for exactly what I'm talking about.
So the files themselves seem to be getting indexed correctly when you look at the control panel for them. Tika does its job on the files it handles, and the plain text files seem to get done via FileRun natively. But when it comes time to search for them using 'contents', then FileRun can't find them. But if I create them fresh in FileRun or make a duplicate of them, then they are now searchable.
Hi. I was not referring to Metadata extraction section. There is a 'Content Indexing' section. If you click that, you'll see the plain text version of the file you clicked on and at the top it says "
Server URL: http://192.168.1.200:9998/tika
Response code: 200
This leads me to believe that TIka is actually indexing the full text of the file, right?
So if Tika is doing the full text indexing and ElasticSearch doesn't do filename searching, then what is ElasticSearch doing? Are Tika and ES doing the same thing?
Regardless, even though Tika seems to have indexed the plain text of a file, it's not searchable from within the FileRun search UI unless I've modified or created/copied the file within Filerun.
Please let me know if you need more details.
Actually even filename indexing with Elasticsearch isn't working. Or at least Filerun doesn't know how to interface with it. I ran the reindex_files.sh which took hours and Filerun still doesn't know a file exists on my system unless I have browsed that directory at least once, as far as I can tell. So not just full-text search but just indexing as well isn't working for some reason.
OnlyOffice has a variable called "JWT_SECRET" that can be set and another variable called JWT_ENABLED which should be 'true'.. Depending on how you install your OnlyOffice, it is set in different ways. I personally install it via Docker container. So you set a container variable for it called "JWT_SECRET" and the value is the secret itself, and JWT_ENABLED to 'true'. Filerun has a field to match this value in the plugin config for OnlyOffice. That's about it.
Adding my vote for being able to index only new files. This would be extremely helpful, as I add or modify files outside of the Filerun interface fairly often.
Customer support service by UserEcho