0

Full text fails on larger pdf files. How to make full text search persistent with docker?

bgribble 4 months ago updated by ovidiu 1 month ago 3

Version 20220519. Installed docker container with full text search with docker-compose example from the website. 12 core xeon, 128GB ram Debian 11.

2 game breaking issues unfortunately. Any PDF 5MB or larger fails the full text search index.

root@cb1fc1cda259:/var/www/html# /var/www/html/cron/process_search_index_queue.sh

-------------------- START ----------------------
Processing 1 queued actions:

[1/1] INDEX - /user-files/2020 1290 SuperDuke GT Owners Manual.pdf -> Extracting......FAILED
Queue processed.

------------------- END --------------------------

2nd issue is needing to re-do the full text search setup every time the docker container is restarted for any reason. Is there documentation available to fix these 2 major issues or am 
I doomed to google drive forever? Thanks for any help, I would love to be able to move to filerun!

I had a similar issue and searching this forum led me to the following solution.

#umask makes files uploaded group writeable

#search limit makes files up to 200MB indexable vs the original 10mb limit

#Create the file "customizables/config.php" and add the following inside:

<?php

umask(012);

$config['search']['limit_file_size'] = 209715200;

thanks, I am going to give your config sample a try although it is weird that that variable is not even mentioned on the advanced config page here: https://docs.filerun.com/advanced_configuration 

content extraction is fine, content indexing gives:

 -> Extracting...Indexing...Exception: Elasticsearch\Common\Exceptions\Forbidden403Exception
Message: {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"},"status":403}