Tesseract Constructor (String, String, OcrEngineMode, String, Boolean) |
http://www.emgu.com
Create a Tesseract OCR engine.
Namespace:
Emgu.CV.OCR
Assembly:
Emgu.CV.World (in Emgu.CV.World.dll) Version: 4.0.1.3373 (4.0.1.3373)
Syntaxpublic Tesseract(
string dataPath,
string language,
OcrEngineMode mode,
string whiteList = null,
bool enforceLocale = true
)
Public Sub New (
dataPath As String,
language As String,
mode As OcrEngineMode,
Optional whiteList As String = Nothing,
Optional enforceLocale As Boolean = true
)
public:
Tesseract(
String^ dataPath,
String^ language,
OcrEngineMode mode,
String^ whiteList = nullptr,
bool enforceLocale = true
)
new :
dataPath : string *
language : string *
mode : OcrEngineMode *
?whiteList : string *
?enforceLocale : bool
(* Defaults:
let _whiteList = defaultArg whiteList null
let _enforceLocale = defaultArg enforceLocale true
*)
-> Tesseract
Parameters
- dataPath
- Type: SystemString
The datapath must be the name of the directory of tessdata and
must end in / . Any name after the last / will be stripped.
- language
- Type: SystemString
The language is (usually) an ISO 639-3 string or NULL will default to eng.
It is entirely safe (and eventually will be efficient too) to call
Init multiple times on the same instance to change language, or just
to reset the classifier.
The language may be a string of the form [~]%lt;lang>[+[~]<lang>]* indicating
that multiple languages are to be loaded. Eg hin+eng will load Hindi and
English. Languages may specify internally that they want to be loaded
with one or more other languages, so the ~ sign is available to override
that. Eg if hin were set to load eng by default, then hin+~eng would force
loading only hin. The number of loaded languages is limited only by
memory, with the caveat that loading additional languages will impact
both speed and accuracy, as there is more work to do to decide on the
applicable language, and there is more chance of hallucinating incorrect
words.
- mode
- Type: Emgu.CV.OCROcrEngineMode
OCR engine mode - whiteList (Optional)
- Type: SystemString
This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY - enforceLocale (Optional)
- Type: SystemBoolean
If true, we will change the locale to "C" before initializing the tesseract engine and reverting it back once the tesseract initialiation is completer. If false, it will be the user's responsibility to set the locale to "C", otherwise an exception will be thrown. See https://github.com/tesseract-ocr/tesseract/issues/1670
See Also