Auto Proofreading Solution for ePublishing Industry

Everyone has had that time while writing a document for a journal or college assignment where they spent more time editing it to a said format and correcting the grammatical errors more than writing it; it’s a very common occurrence. This also happens to the most prolific authors and academicians in a larger scale, where they have to correct and format the documents that amount to thousands of pages, this process is known as proofreading. Everything is generally done by hand in proofreading by annotating the issues, but this rather can be a difficult task for a lot of pages and documents. Modern improvements in technology can resolve this herculean task.



ePublishing Industry



Our client is a publishing, copy writing, and proofreading firm based out of Manila, Philippines ; they handle operations for fictional, technical, and non-technical literature in multiple languages for both foreign and domestic authors.

On an average, our client takes up to 5 days to proofread and reformat a document which is 40,000 words long (the average length of a research paper), which is roughly 89 pages in total. They handle a combination of both research papers and other literature amounting to 12,000 pages per month, with various formatting and layout guidelines. Our client utilizing our model is looking towards reducing these large volume of labor, effort, and time drastically. The client approached us to create a model based on the samples provided by them, utilizing our expertise in AI and Image analysis to create a Minimal Viable Product (MVP) to cater support to their operation focusing on formatting and layout corrections.



The foremost challenges that should be addressed are the ones related to how a proofreader caters to various styles and conventions of layouts and formatting ; such as titling, charts & tables, footnotes, margin creation and etc.

  • Acknowledging various formatting and layout methods - Every firm has its own system for formatting and annotating, and they also have their own unique methodology, which has to be addressed.

  • Ease of Use - The usage of an application like this comes with the challenge of getting people to use it with minimal training.

  • Maintenance of Style - This is a major issue since most document have their own specifications, the layout and formats (including font, margins, and alignments could change drastically). In rare cases, the formatting and layout can be used to stylize the document as well.

  • Catering to different Page Layout 

  • User flexibility - The people who use this application should be given some reflexive methods and integration so that they could use this program in tandem with their word processor or any other related software, it also needs to have very good preference and accessibility settings that help and accelerate the user’s workflow during proofreading.

  • Need for a Lightweight Program  - Since this application would be used on conventional office computers, the AI model and application has to be streamlined to be very efficient in disk and RAM consumption.

  • Maintaining Readability  - The layout and formatting should be highly legible and unconvoluted, and the marginal errors while modifying the layout has to be reduced.



Solution summary:

  • Collecting samples from various sources to identify the annotations used for format specifications.

  • Conversion of these annotation and the format data into image files, for easily retrieval and analysis.

  • Train a model based on these images to create an intelligent solution.

 CCHAR - Continuous Characters

CW - Continuous Words

OL - Odd Length
LL - Loose Line
TL - Tight Line
WL - Incorrect Line Width
BB - Small Word / Incomplete word
CW - Continuous Words
CChar - Continuous Characters


 SWEP - Single Word Ending Paragraph
SWEH - Single Word Ending Heading
TCHyp - Continuous Text Hypen
HypEPP - Hypen Ending Page Paragraph
HWH - Hypen Word Hypen
ColBA - incorrect Column Alignment

Similarly 60+ layout error are auto-identified. 


With the challenges in mind, we arrived at the following steps for a solution :

For the creation of this “autoproofreading” solution, we had to tackle the aforementioned issues regarding the correction of layout, style and formatting. Employing the live data that we gathered from the samples provided to us, we created image files which can be used to identify how their layouts and formatting are marked. The metadata which is related to specifics (such as line spacing,margin size) of these annotations are stored in the XML documents accompanying them. This model is trained over and over till the idealized efficiency is arrived at.



Our proposed model utilizing artificial intelligence for proofreading will increase the efficiency of layout correction by ten-fold via the automation of the process. It would also provide the client with an ease of access, direct annotation which uses digital methods rather than conventional paper & pen methods, and an overall better quality of further revisions. Since the process is real-time, the error correction is instantaneous.

** Standard proofreading time: 4-25 business days for 500 page book.  Now reduced to: 6hrs - 14hr (with editing)
** Standard proofreading costing: For word count 0-500 starts from $95 (USD). Now reduced to: $11 (USD) for word count 0-500

Ready to put AI to work for your business?

Make a plan and understand your ROI before you start implementing AI. 
Don’t fall into the trap most companies fall into. 
Take the first step—Get in touch today.