项目概述
A Python wrapper for the tesseract-ocr API
项目地址
https://github.com/sirfz/tesserocr
项目页面预览

关键指标
- Stars:2145
- 主要语言:Python
- License:MIT License
- 最近更新:2026-01-13T15:39:28Z
- 默认分支:master
本站高速下载(国内可用)
点击下载(本站镜像)
– SHA256:e5c66c25e88169ec9711763b16a4856a2fd5044d011c591d829f962977af5d0b
安装部署要点(README 精选)
(未获取到 README 安装段落,建议查看项目文档。)
常用命令(从 README 提取)
You can use the `simonflueckiger <https://anaconda.org/simonflueckiger/tesserocr>`_ channel to install from Conda:
::
> conda install -c simonflueckiger tesserocr
Or alternatively the `conda-forge <https://anaconda.org/conda-forge/tesserocr>`_ channel:
::
> conda install -c conda-forge tesserocr
pip
If you need Windows tessocr package and your Python version is not supported by above mentioned project,
you can try to follow `step by step instructions for Windows 64bit` in `Windows.build.md`_.
.. _Windows.build.md: Windows.build.md
tessdata
========
You may need to point to the tessdata path if it cannot be detected automatically. This can be done by setting the ``TESSDATA_PREFIX`` environment variable or by passing the path to ``PyTessBaseAPI`` (e.g.: ``PyTessBaseAPI(path='/usr/share/tessdata')``). The path should contain ``.traineddata`` files which can be found at https://github.com/tesseract-ocr/tessdata.
Make sure you have the correct version of traineddata for your ``tesseract --version``.
You can list the current supported languages on your system using the ``get_languages`` function:
.. code:: python
from tesserocr import get_languages
print(get_languages('/usr/share/tessdata')) # or any other path that applies to your system
Usage
=====
Initialize and re-use the tesseract API instance to score multiple
images:
.. code:: python
from tesserocr import PyTessBaseAPI
images = ['sample.jpg', 'sample2.jpg', 'sample3.jpg']
with PyTessBaseAPI() as api:
for img in images:
api.SetImageFile(img)
print(api.GetUTF8Text())
print(api.AllWordConfidences())
# api is automatically finalized when used in a with-statement (context manager).
# otherwise api.End() should be explicitly called when it's no longer needed.
``PyTessBaseAPI`` exposes several tesseract API methods. Make sure you
read their docstrings for more info.
Basic example using available helper functions:
.. code:: python
import tesserocr
from PIL import Image
print(tesserocr.tesseract_version()) # print tesseract-ocr version
print(tesserocr.get_languages()) # prints tessdata path and list of available languages
image = Image.open('sample.jpg')
print(tesserocr.image_to_text(image)) # print ocr text from image
# or
print(tesserocr.file_to_text('sample.jpg'))
``image_to_text`` and ``file_to_text`` can be used with ``threading`` to
concurrently process multiple images which is highly efficient.
Advanced API Examples
---------------------
GetComponentImages example:
.. code:: python
from PIL import Image
from tesserocr import PyTessBaseAPI, PSM
with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
image = Image.open("/usr/src/tesseract/testing/eurotext.tif")
api.SetImage(image)
api.Recognize()
it = api.AnalyseLayout()
orientation, direction, order, deskew_angle = it.Orientation()
print("Orientation: {:d}".format(orientation))
print("WritingDirection: {:d}".format(direction))
print("TextlineOrder: {:d}".format(order))
print("Deskew angle: {:.4f}".format(deskew_angle))
or more simply with ``OSD_ONLY`` page segmentation mode:
.. code:: python
from tesserocr import PyTessBaseAPI, PSM
with PyTessBaseAPI(psm=PSM.OSD_ONLY) as api:
api.SetImageFile("/usr/src/tesseract/testing/eurotext.tif")
os = api.DetectOS()
print("Orientation: {orientation}\nOrientation confidence: {oconfidence}\n"
"Script: {script}\nScript confidence: {sconfidence}".format(**os))
more human-readable info with tesseract 4+ (demonstrates LSTM engine usage):
.. code:: python
from tesserocr import PyTessBaseAPI, PSM, OEM
with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:
api.SetImageFile("/usr/src/tesseract/testing/eurotext.tif")
os = api.DetectOrientationScript()
print("Orientation: {orient_deg}\nOrientation confidence: {orient_conf}\n"
"Script: {script_name}\nScript confidence: {script_conf}".format(**os))
Iterator over the classifier choices for a single symbol:
通用部署说明
- 下载源码并阅读 README
- 安装依赖(pip/npm/yarn 等)
- 配置环境变量(API Key、模型路径、数据库等)
- 启动服务并测试访问
- 上线建议:Nginx 反代 + HTTPS + 进程守护(systemd / pm2)
免责声明与版权说明
本文仅做开源项目整理与教程索引,源码版权归原作者所有,请遵循对应 License 合规使用。
© 版权声明
文章版权归作者所有,未经允许请勿转载。
THE END








暂无评论内容