pytesseract 输出双层pdf

Pytesseract是一个OCR库，它可以识别图像中的文本并将其转换为可编辑文本。它可以处理多种图像格式，例如JPEG，PNG，TIFF等。

然而，Pytesseract不支持直接将文本输出到PDF中。如果您需要将文本输出到PDF中，您需要使用其他库或工具。以下是一些可用的库和工具：

ReportLab - 一个Python库，用于创建PDF文档。它允许您将文本和图像添加到PDF中。
PyPDF2 - 一个Python库，用于处理PDF文件。它允许您合并和拆分PDF文件，添加和删除页面等。
pdftk - 一个命令行工具，用于处理PDF文件。它允许您合并和拆分PDF文件，添加和删除页面等。

您可以使用Pytesseract将图像中的文本提取出来，然后使用上述工具将文本添加到PDF中。以下是一些代码示例：

使用Pytesseract提取图像中的文本：

import pytesseract
from PIL import Image

# Open image using PIL
img = Image.open('image.png')

# Convert image to string
text = pytesseract.image_to_string(img)

# Print extracted text
print(text)

使用ReportLab将文本添加到PDF中：

from reportlab.pdfgen import canvas

# Create a new PDF document
pdf = canvas.Canvas('output.pdf')

# Add text to PDF
pdf.drawString(100, 750, 'Hello world!')

# Save the PDF document
pdf.save()

使用PyPDF2将文本添加到PDF中：

import PyPDF2

# Open existing PDF document
pdf = PyPDF2.PdfFileReader('input.pdf')

# Create a new PDF document
output = PyPDF2.PdfFileWriter()

# Add text to PDF
page = pdf.getPage(0)
page.mergePage(output.addText('Hello world!'))

# Save the PDF document
with open('output.pdf', 'wb') as f:
    output.write(f)

使用pdftk将文本添加到PDF中：

import subprocess

# Extract text from image using Pytesseract
text = subprocess.check_output(['tesseract', 'image.png', 'stdout'])

# Add text to PDF using pdftk
subprocess.call(['pdftk', 'input.pdf', 'background', '-', 'output', 'output.pdf'], input=text)