Why does on-site OCR stall?
Conventional OCR that depends on templates and dictionaries only works on "clean, expected text." On the floor, the input is always unexpected.
Broken handwriting
Addresses and names on delivery slips, where everyone's habits and pen pressure differ. The character shapes don't match the dictionary, so rule-based OCR breaks down early.
Glare, fading and dirt
Glare from laminate packaging, faded print, smudging. The moment it strays from the training images, CNN-OCR's confidence collapses.
Layout and format variation
Every time a label's font or field placement changes by SKU, the template has to be re-set. Across many SKUs and many sites, operations can't keep up.
Reading examples
Below are reading examples from Nsight VLM-OCR for the kind of field images conventional OCR has struggled with — broken handwriting, reflective labels, format variation. Actual accuracy and output vary with the target image and imaging conditions.
- A broken-script "千々田区" read as "千代田区" from the meaning of the address.
- Even with faded characters nearly the same color as the paper background, it distinguishes the role of each line (address / building name) — with no template registered.
- Even under laminate glare, it correctly pairs field name and value, with a confidence per item.
Why Nsight VLM-OCR reads on the floor
Unlike vendors that sell only an algorithm, Nsight designs the training platform, the edge, the optical hardware and the operation end to end, with development know-how in industrial image processing.
Trained in-house, runs self-contained on the floor
We train and optimize the model in-house, and inference runs self-contained on on-site edge devices. With no dependence on the cloud or the network, it reads without sending images outside. It can be deployed as-is in a closed network, even on manufacturing and logistics floors with strict security requirements.
Input you choose by use: from 2D/3D cameras to smartphones
For lines needing high accuracy and stable continuous operation, industrial 2D/3D line cameras; for spot checks and inspections on the move, a smartphone. We choose the input configuration to fit the use case.
Design strength that doesn't stall on the floor
Lighting, camera, lens and conveyance designed as one. With a team that includes developers from Keyence's image-processing division, image-quality problems are solved first at the optical level.
Reads by meaning
The VLM understands the position and meaning of text, so it can read new formats with no master registration. It is robust to layout changes, multiple languages and handwriting.
From one image to structured data
The flow by which an on-site image becomes structured data, with each field given meaning. The image stays on-site and the process completes within the floor.
Not the VLM alone: three techniques blended per project
The recognition engine blends VLM, CNN-OCR and rule-based per project.
Spec summary
* Actual accuracy and latency vary with the target image, imaging conditions and camera configuration. We validate individually on your sample images and report back.