Home > Solutions > Nsight VLM OCR

Characters that couldn't be read before, read by edge-running VLM-OCR.

A VLM-OCR trained and optimized in-house that runs self-contained on on-site edge devices. From docking with industrial 2D/3D line cameras through to smartphones, with input configurations to fit the use case — it reads characters from any image. Handwritten slips, reflective labels and faded engraving: it understands them by meaning.

Free diagnosis from one image →
In-house training platformOn-site, closed-network operationIndustrial 2D/3D camera supportSmartphone capture supported

Why does on-site OCR stall?

Conventional OCR that depends on templates and dictionaries only works on "clean, expected text." On the floor, the input is always unexpected.

Broken handwriting

Addresses and names on delivery slips, where everyone's habits and pen pressure differ. The character shapes don't match the dictionary, so rule-based OCR breaks down early.

Glare, fading and dirt

Glare from laminate packaging, faded print, smudging. The moment it strays from the training images, CNN-OCR's confidence collapses.

Layout and format variation

Every time a label's font or field placement changes by SKU, the template has to be re-set. Across many SKUs and many sites, operations can't keep up.

Reading examples

Below are reading examples from Nsight VLM-OCR for the kind of field images conventional OCR has struggled with — broken handwriting, reflective labels, format variation. Actual accuracy and output vary with the target image and imaging conditions.

Why Nsight VLM-OCR reads on the floor

Unlike vendors that sell only an algorithm, Nsight designs the training platform, the edge, the optical hardware and the operation end to end, with development know-how in industrial image processing.

In-house & on-site

Trained in-house, runs self-contained on the floor

We train and optimize the model in-house, and inference runs self-contained on on-site edge devices. With no dependence on the cloud or the network, it reads without sending images outside. It can be deployed as-is in a closed network, even on manufacturing and logistics floors with strict security requirements.

Input configuration

Input you choose by use: from 2D/3D cameras to smartphones

For lines needing high accuracy and stable continuous operation, industrial 2D/3D line cameras; for spot checks and inspections on the move, a smartphone. We choose the input configuration to fit the use case.

Optics × inspection

Design strength that doesn't stall on the floor

Lighting, camera, lens and conveyance designed as one. With a team that includes developers from Keyence's image-processing division, image-quality problems are solved first at the optical level.

No master registration

Reads by meaning

The VLM understands the position and meaning of text, so it can read new formats with no master registration. It is robust to layout changes, multiple languages and handwriting.

From one image to structured data

The flow by which an on-site image becomes structured data, with each field given meaning. The image stays on-site and the process completes within the floor.

Step 01Image inputIndustrial 2D/3D line cameras / industrial cameras / smartphones — ingest from any source.
Step 02Semantic understandingNot just "reading" characters but "understanding them as fields." Handwriting and breaks are corrected by context.
Step 03StructuringOutput as data split by field — address, model number, quantity — with confidence noted.
Step 04Business-system integrationAutomatic integration with WMS or core systems. Only values needing confirmation are routed to human review.

Not the VLM alone: three techniques blended per project

The recognition engine blends VLM, CNN-OCR and rule-based per project.

VLMVision-Language ModelReads hard-to-standardize targets — handwriting, glare, fading, format variation — by context.
CNN-OCRCNN-based OCRThe foundation of character recognition, stably processing high-volume reading on standardized, high-speed lines.
Rule-basedRule-based verificationFinalizes results to business requirements via digit counts, check digits and format validation.

Spec summary

Edge
On-site, closed-network operation
Custom
In-house training platform
Multi
Handwriting / multilingual / reflection
Zero
No master registration

* Actual accuracy and latency vary with the target image, imaging conditions and camera configuration. We validate individually on your sample images and report back.

Frequently asked questions

Does it run without the cloud or network?
Yes. We train and optimize the model in-house, and inference runs self-contained on on-site edge devices. With no dependence on the cloud or network, it reads without sending images outside — so it can be deployed as-is in a closed network, even on floors with strict security requirements.
Do we need a master registration for each label format?
No. The VLM understands the position and meaning of text, so it can recognize new formats with no master registration. It is robust to layout changes, multiple languages and handwriting.
What input devices are supported?
From industrial 2D/3D line cameras to smartphones. For lines needing high accuracy and stable continuous operation we use industrial 2D/3D line cameras; for spot checks we can use a smartphone. We choose the input configuration to fit the use case.
How can I tell whether our slips can be read?
Send us one image. Actual accuracy and output vary with the target image and imaging conditions, so we validate individually on your sample images and report back, free of charge.

Industrial image-processing know-how and VLM, to your floor

"Can our slips and labels be read?" — from one image, a team that includes developers from Keyence's image-processing division will diagnose, free of charge, what AI can read and how far.

Free diagnosis from one image →