Veröffentlicht am www.peopleperhour.com 19 Mär 2025
Description: I have some text, which is single word on tiff file, designed to train eng_custom.traineddata. Currently I use syntax below which seem sane and does not produce any error before last step.
Important: I don't want to change current approach as my goal to train each of 1000 tiff files with same parameters, since I prepared corresponding tessRead and boxes for each tiff.
#Make lstmf file
tesseract test_sample.tiff test_sample \
--tessdata-dir /home/j/img2/tess_files \
--psm 7 --oem 1 -l eng_custom \
/home/j/tesseract/tessdata/configs/lstm.train
echo "test_sample.lstmf" single_lstmf_file.txt
#Train LSTM model
lstmtraining \
--model_output tess_training.lstm \
--continue_from /home/j/img2/tess_files/eng.lstm \
--traineddata /home/j/img2/tess_files/eng_custom.traineddata \
--train_listfile single_lstmf_file.txt \
--max_iterations 1
Stop training and finalize model
lstmtraining --stop_training \
--continue_from tess_training.lstm_checkpoint \
--traineddata /home/j/img2/tess_files/eng_custom.traineddata \
--model_output /home/j/img2/tess_files/eng_final.lstm
Update traineddata with new LSTM model
mkdir -p /home/j/img2/base_model
combine_tessdata -u /home/j/img2/tess_files/eng_custom.traineddata /home/j/img2/base_model/eng_custom
cp /home/j/img2/tess_files/eng_final.lstm /home/j/img2/base_model/eng.lstm
combine_tessdata /home/j/img2/base_model/eng_custom
cp /home/j/img2/base_model/eng_custom.traineddata /home/j/img2/tess_files/eng_custom.traineddata
But I get problem during final step:
j@j:~/t$ tesseract test_sample.tiff stdout -l eng_custom --tessdata-dir /home/j/img2/tess_files/
index = 0:Error:Assert failed:in file /home/j/tesseract4/src/ccutil/strngs.cpp, line 266
Aborted (core dumped)
Question: How to amend above commands so I can combine eng_final.lstm with eng_custom.traineddata
Environment:
/home/j/img2/tess_files/
eng.traineddata eng_custom.traineddata eng.lstm eng_final.lstm
/home/j/img2/base_model/
eng_custom.bigram-dawg eng_custom.normproto
eng_custom.word-dawg eng_custom.freq-dawg
eng_custom.number-dawg eng.lstm eng_custom.inttemp
eng_custom.pffmtable eng.lstm-number-dawg eng_custom.lstm
eng_custom.punc-dawg eng.lstm-punc-dawg eng_custom.lstm-number-dawg eng_custom.shapetable
eng.lstm-recoder eng_custom.lstm-punc-dawg eng_custom.traineddata
eng.lstm-unicharset eng_custom.lstm-recoder
eng_custom.unicharambigs eng.lstm-word-dawg eng_custom.lstm-unicharset eng_custom.unicharset eng.version eng_custom.lstm-word-dawg eng_custom.version
Any guidance would be greatly appreciated.
Thanks!
Jacob
Alles anzeigen