How to Convert an Image to PDF using OCR
In this tutorial, we will go through the steps needed to extract the rasterized text content contained within an image file, such as JPG or PNG and convert this into plain, editable text that can then be used within popular document formats such as PDF and DOCX.
What is OCR?
OCR, or Optical Character Recognition, is the process of converting text stored within a raster image into text that can be edited within a text-based document, such as a DOCX file. OCR works by analyzing the pixels present within the image file, looking for pixel patterns that resemble written text characters. For a detailed explanation of OCR, please see this great article that explains it well.
What is wrong with text stored in an image file?
Only image editing software can alter a text-containing image file, which can become complex if the image format doesn't support layers. On top of this, the text is not searchable within the image file, making it difficult, if not impossible, to locate files based on a keyword search. OCR is an ideal candidate to convert an image file, such as a scan of a physical document, to a document format if it primarily contains text.
Select your Tool
Now that we are ready to begin converting some images to an editable document format, you will need to choose the correct tool that suits your needs. Our tools can convert to the following three document file types (links to these tools will open in a new browser tab):
Once you have selected the correct tool, you can then select the type of image you are uploading. By default, for JPG files, this will be the selected source file format. You can change this by using the file source file type selector located on the left side of the tool.
Select your Files
With the correct tool and formats selected, you can click the "Upload Files" button to select your image files to convert. You can also drag and drop up to 50 files onto the tool if you prefer. You can re-order the selected files by dragging their thumbnails within the tool display. Each thumbnail also has rotate left and right buttons to allow you to orientate your images correctly prior to upload.
Here is an example of the Merge tool with four JPG files selected and ready to be converted into an editable PDF file:
To the right-side of the tool is the OCR option; by default, this is not enabled. If your files were submitted with this option disabled, the document that would be created would simply contain embedded copies of your image files. For the purposes of this tutorial and to demonstrate the plain text extraction provided by the OCR setting, this should be enabled.
Once your files have been selected and any settings changed, click the "Merge" button, and the OCR tool will convert your image files into clean, plain, editable text.
Here is an example of two files that have been submitted to the Merge tool, the final image contains the plain editable text that was extracted from the first image and saved as a PDF document:
Add a Comment
No Comments
Be the first to comment on this article.