Python Document Scanner SDK

A Python wrapper for the Dynamsoft Document Normalizer SDK, providing simple and user-friendly APIs across Windows, Linux, and macOS. Compatible with desktop PCs, embedded devices, Raspberry Pi, and Jetson Nano.

Note: This is an unofficial, community-maintained wrapper. For official support and full feature coverage, consider the Dynamsoft Capture Vision Bundle on PyPI.

Quick Links

Comparison: Community vs Official

Feature	Community Wrapper	Official Dynamsoft SDK
Support	Community-driven	✅ Official Dynamsoft support
Documentation	Basic README and limited examples	✅ Comprehensive online documentation
API Coverage	Core features only	✅ Full API coverage
Updates	May lag behind	✅ Always includes the latest features
Testing	Tested in limited environments	✅ Thoroughly tested
API Usage	✅ Simple and intuitive	More complex and verbose

Installation

Requirements

Python 3.x
OpenCV (for UI display)
```
pip install opencv-python
```

Dynamsoft Capture Vision Bundle SDK

pip install dynamsoft-capture-vision-bundle

Build from Source

# Source distribution
python setup.py sdist

# Build wheel
python setup.py bdist_wheel

Command-line Usage

After installation, you can use the built-in command-line interface:

# Scan document from image file
scandocument -f <file-name> -l <license-key>

# Scan documents from camera (camera index 0)
scandocument -c 1 -l <license-key>

Quick Start

Basic Document Detection

import docscanner
import cv2

# Initialize license (required)
docscanner.initLicense("YOUR_LICENSE_KEY")  # Get trial key from Dynamsoft

# Create scanner instance
scanner = docscanner.createInstance()

# Detect from image file
results = scanner.detect("document.jpg")

# OR detect from OpenCV image matrix
image = cv2.imread("document.jpg")
results = scanner.detect(image)

# Process results
for result in results:
    print(f"Document found:")
    print(f"  Top-left: ({result.x1}, {result.y1})")
    print(f"  Top-right: ({result.x2}, {result.y2})")
    print(f"  Bottom-right: ({result.x3}, {result.y3})")
    print(f"  Bottom-left: ({result.x4}, {result.y4})")
    
    # Draw detection rectangle
    import numpy as np
    corners = np.array([(result.x1, result.y1), (result.x2, result.y2), 
                       (result.x3, result.y3), (result.x4, result.y4)])
    cv2.drawContours(image, [corners.astype(int)], -1, (0, 255, 0), 2)

cv2.imshow("Detected Documents", image)
cv2.waitKey(0)

Document Normalization (Perspective Correction)

import docscanner
import cv2
from docscanner import *

# Setup (license + scanner)
docscanner.initLicense("YOUR_LICENSE_KEY")
scanner = docscanner.createInstance()

# Detect documents
results = scanner.detect("skewed_document.jpg")

if results:
    result = results[0]  # Process first detected document
    
    # Normalize the document (correct perspective) - now returns the image
    normalized_img = scanner.normalize(result, EnumImageColourMode.ICM_COLOUR)
    
    # Use the returned normalized image directly
    if normalized_img is not None:
        cv2.imshow("Original", cv2.imread("skewed_document.jpg"))
        cv2.imshow("Normalized", normalized_img)
        cv2.waitKey(0)
        
        # Save normalized image
        cv2.imwrite("normalized_document.jpg", normalized_img)
        print("Normalized document saved!")

Real-time Camera Scanning

import docscanner
import cv2
import numpy as np

def on_document_detected(results):
    """Callback function for async document detection"""
    for result in results:
        print(f"Document detected at ({result.x1},{result.y1}), ({result.x2},{result.y2}), ({result.x3},{result.y3}), ({result.x4},{result.y4})")

# Setup
docscanner.initLicense("YOUR_LICENSE_KEY")
scanner = docscanner.createInstance()

# Start async detection
scanner.addAsyncListener(on_document_detected)

# Camera loop
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Queue frame for async processing
    scanner.detectMatAsync(frame)
    
    # Display frame
    cv2.imshow("Document Scanner", frame)
    
    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break

# Cleanup
scanner.clearAsyncListener()
cap.release()
cv2.destroyAllWindows()

API Reference

Core Functions

`docscanner.initLicense(license_key: str) -> Tuple[int, str]`

Initialize the Dynamsoft license. Required before using any other functions.

Parameters:

license_key: Your Dynamsoft license key

Returns:

(error_code, error_message): License initialization result

Example:

error_code, error_msg = docscanner.initLicense("YOUR_LICENSE_KEY")
if error_code != 0:
    print(f"License error: {error_msg}")

`docscanner.createInstance() -> DocumentScanner`

Create a new DocumentScanner instance.

Returns:

DocumentScanner: Ready-to-use scanner instance

DocumentScanner Class

Detection Methods

`detect(input: Union[str, numpy.ndarray]) -> List[DocumentResult]`

Detect documents from various input sources (unified detection method).

Parameters:

input: Input source for document detection:
- str: File path to image (JPEG, PNG, BMP, TIFF, etc.)
- numpy.ndarray: OpenCV image matrix (BGR or grayscale)

Returns:

List[DocumentResult]: List of detected documents with boundary coordinates

Examples:

# Detect from file path
results = scanner.detect("document.jpg")

# Detect from OpenCV matrix
import cv2
image = cv2.imread("document.jpg") 
results = scanner.detect(image)

# Process results
for result in results:
    print(f"Found document at ({result.x1},{result.y1}), ({result.x2},{result.y2}), ({result.x3},{result.y3}), ({result.x4},{result.y4})")

Asynchronous Processing

`addAsyncListener(callback: Callable[[List[DocumentResult]], None]) -> None`

Start asynchronous document detection with callback.

Parameters:

callback: Function called with detection results

Example:

def on_documents_found(results):
    print(f"Found {len(results)} documents")

scanner.addAsyncListener(on_documents_found)

`detectMatAsync(image: numpy.ndarray) -> None`

Queue an image for asynchronous processing.

Parameters:

image: OpenCV image to process

`clearAsyncListener() -> None`

Stop asynchronous processing and remove callback.

Document Normalization

`normalize(document: DocumentResult, color: EnumImageColourMode) -> numpy.ndarray`

Perform document normalization (perspective correction) on a detected document.

Parameters:

document: DocumentResult containing boundary coordinates and source image
color: Color mode for output (ICM_COLOUR, ICM_GRAYSCALE, or ICM_BINARY)

Returns:

numpy.ndarray or None: The normalized document image as numpy array, or None if normalization fails

Usage Patterns:

# Method 1: Use return value directly
normalized_img = scanner.normalize(result, EnumImageColourMode.ICM_COLOUR)
if normalized_img is not None:
    cv2.imshow("Normalized", normalized_img)

# Method 2: Access from document object (also available)
scanner.normalize(result, EnumImageColourMode.ICM_COLOUR)
if result.normalized_image is not None:
    cv2.imwrite("output.jpg", result.normalized_image)

DocumentResult Class

Container for document detection results.

Attributes:

x1, y1: Top-left corner coordinates
x2, y2: Top-right corner coordinates
x3, y3: Bottom-right corner coordinates
x4, y4: Bottom-left corner coordinates
source: Original image (file path or numpy array)
normalized_image: Perspective-corrected image (numpy array)

Utility Functions

`convertMat2ImageData(mat: numpy.ndarray) -> ImageData`

Convert OpenCV matrix to Dynamsoft ImageData format.

Parameters:

mat: OpenCV image (RGB, BGR, or grayscale)

Returns:

ImageData: SDK-compatible image data

`convertNormalizedImage2Mat(normalized_image: ImageData) -> numpy.ndarray`

Convert Dynamsoft ImageData back to OpenCV-compatible numpy array.

Parameters:

normalized_image: ImageData object from SDK normalization results

Returns:

numpy.ndarray: OpenCV-compatible image matrix

Supported Formats:

Binary images (1-bit): Converted to 8-bit grayscale
Grayscale images: Single channel 8-bit
Color images: 3-channel RGB format

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
docscanner		docscanner
examples		examples
images		images
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
camera_async_api.py		camera_async_api.py
setup.py		setup.py
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Document Scanner SDK

Quick Links

Comparison: Community vs Official

Installation

Requirements

Build from Source

Command-line Usage

Quick Start

Basic Document Detection

Document Normalization (Perspective Correction)

Real-time Camera Scanning

API Reference

Core Functions

`docscanner.initLicense(license_key: str) -> Tuple[int, str]`

`docscanner.createInstance() -> DocumentScanner`

DocumentScanner Class

Detection Methods

`detect(input: Union[str, numpy.ndarray]) -> List[DocumentResult]`

Asynchronous Processing

`addAsyncListener(callback: Callable[[List[DocumentResult]], None]) -> None`

`detectMatAsync(image: numpy.ndarray) -> None`

`clearAsyncListener() -> None`

Document Normalization

`normalize(document: DocumentResult, color: EnumImageColourMode) -> numpy.ndarray`

DocumentResult Class

Utility Functions

`convertMat2ImageData(mat: numpy.ndarray) -> ImageData`

`convertNormalizedImage2Mat(normalized_image: ImageData) -> numpy.ndarray`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

yushulx/python-document-scanner-sdk

Folders and files

Latest commit

History

Repository files navigation

Python Document Scanner SDK

Quick Links

Comparison: Community vs Official

Installation

Requirements

Build from Source

Command-line Usage

Quick Start

Basic Document Detection

Document Normalization (Perspective Correction)

Real-time Camera Scanning

API Reference

Core Functions

docscanner.initLicense(license_key: str) -> Tuple[int, str]

docscanner.createInstance() -> DocumentScanner

DocumentScanner Class

Detection Methods

detect(input: Union[str, numpy.ndarray]) -> List[DocumentResult]

Asynchronous Processing

addAsyncListener(callback: Callable[[List[DocumentResult]], None]) -> None

detectMatAsync(image: numpy.ndarray) -> None

clearAsyncListener() -> None

Document Normalization

normalize(document: DocumentResult, color: EnumImageColourMode) -> numpy.ndarray

DocumentResult Class

Utility Functions

convertMat2ImageData(mat: numpy.ndarray) -> ImageData

convertNormalizedImage2Mat(normalized_image: ImageData) -> numpy.ndarray

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`docscanner.initLicense(license_key: str) -> Tuple[int, str]`

`docscanner.createInstance() -> DocumentScanner`

`detect(input: Union[str, numpy.ndarray]) -> List[DocumentResult]`

`addAsyncListener(callback: Callable[[List[DocumentResult]], None]) -> None`

`detectMatAsync(image: numpy.ndarray) -> None`

`clearAsyncListener() -> None`

`normalize(document: DocumentResult, color: EnumImageColourMode) -> numpy.ndarray`

`convertMat2ImageData(mat: numpy.ndarray) -> ImageData`

`convertNormalizedImage2Mat(normalized_image: ImageData) -> numpy.ndarray`

Packages