Hi Vlad and Team,
I've noticed that when looking at the results for the Content Extraction in the Control Panel of a PDF I see the following output:
Using Tika in server mode. Server URL: http://tika:9998/tika cURL error: Operation timed out after 5001 milliseconds with 0 bytes receive
Resulting in no indexable content:
-> Extracting...No text contents
When examining the process command line in the Tika container it looks like this:
tesseract /tmp/apache-tika-7883861614107540574.tmp /tmp/apache-tika-4559584953829807082.tmp -l eng --psm 1 -c page_separator= -c preserve_interword_spaces=0 txt
The logging from the this command
php /var/www/html/cron/add_folder_to_search_index.php /user-files/DOCUMENTS
Looks like this (along with many other similar lines of course):
INDEX /user-files/DOCUMENTS/Some Directory/Some File.pdf -> Extracting......OK
My Tika container compose looks like this:
tika: image: logicalspark/docker-tikaserver container_name: filerun-tika mem_limit: 1g restart: unless-stopped
Any idea how I can get this working?
Thanks in advance Team Filerun!
Customer support service by UserEcho