Affiliation and Reference Structuring

Academic papers and literature hold a considerable command over the growth modern sciences, industry, and technology. These important papers contain valuable information and ideas that should be organized and stored in a meticulous fashion. They are cross referential and contain numerous callbacks to other papers and literature as well. Sometimes they are linked to many authors from varying educational institutions.

All of these various affiliations and references are important to be consistent because they provide important information about the subject matter, the authors, the nature of the discourse, and other important details. To organize them and keeping them updated via a consistent structure is a very tiring but important process.

The Philippines




Content and Data Management Industry



Our client is a major publishing & content management company based out of the Philippines, with a massive output of literature covering nearly every field and global outreach. The bulk of their managements and publications are oriented towards organizing and cataloguing research articles and papers by corporations, institutions, other publishing houses and sometimes even individuals.

We were tasked to resolve the problems of institutional & laboratory affiliations of the authors and also the matching and retrieval of bibliographies and references present in various academic documents. The implementation of streamlined search function and other tools for quick identification and selection within the database is a necessary as well for structuring. Utilizing modern Artificial Intelligence technologies, we have attempted to streamline these expansive problems at the behest of the clients.



After analyzing the problem we have isolated the core challenges faced by the model for implementation.
They are listed below:

  • Patterned Search Implementation - A proper searching method for structuring has to be developed with the all the constraints and taggable parts of the affiliations and references in mind. Otherwise, retrieval operations could tend to be unclear. These operations based on regex methodology has to be stored for easier processing.

  • Numerous Reference Patterns - The types and the contents of references always changes and are currently numbered to be more than 30,000+. A massive set of patterns like these can be excessively hard to work with. Alternative methods have to be developed to resolve this problem.

  • Updation and Maintenance Difficulties - Updating and maintaining these reference and affiliation patterns even while they are organized as a database can be a grueling task for the users since the patterns change and there has been no previous system of entry. Our tools need to alleviate this problem by integrating a robust entry and updation system.



We have arrived at a solution with the project basis and our own research regarding the challenges in mind. They are listed below:

  • The documents are collected and via machine learning analysis they are categorized and partitioned to extract the affiliation and reference data in the text.

  • The extracted references and affiliations are then used to compose a newer, custom machine learning model which can store, match, and retrieve the patterns of the aforementioned components of a document.

  • A reflexive, probability based Natural Language Processing method is used to create this ML model, rather an absolutely deterministic model to provide versatility to pattern identification and updation.

  • This model can search for attributes that are present in any unstructured text on demand by utilizing the custom rule set we create by analysis. This promotes easy categorization of data and entity as needed.

  • Various tests are run to develop our model and to expand its categorization and updation powers. The affiliations and references then can be retrieved by the client in whatever manner necessary.



Our model provides the user with the sufficient level of automation so that it is easy for the client to publish and catalog texts with the single click of a button. They can be also used to identify and screen authors and other pertaining information with relative ease rather than going through mountains of documents. It can also be helpful for authors to identify for papers by request to streamline their academic research.

Ready to put AI to work for your business?

Make a plan and understand your ROI before you start implementing AI. 
Don’t fall into the trap most companies fall into. 
Take the first step—Get in touch today.