Producing Educational Paper Summaries With AI

Maintaining with the most recent analysis is a vital a part of the job for many information scientists. Confronted with this problem myself, I typically struggled to take care of a constant behavior of studying educational papers and questioned if I might design a system that will decrease the barrier to exploring new analysis making it simpler to have interaction with developments in my area with out the necessity for in depth time commitments. Given my lengthy commute to work and an innate lack of motivation to carry out weekend chores, an audio playlist that I might hearken to whereas doing each appeared like the apparent possibility.

This led me to construct Scholcast, a easy Python bundle that creates detailed audio summaries of educational papers. Whereas I had beforehand constructed variations utilizing language fashions, the current developments in expanded context lengths for Transformers and improved vocalization lastly aligned with all my necessities.

To construct Scholcast, I primarily used OpenAI’s GPT GPT-4o-mini. Nonetheless, since I’m utilizing the Langchain API to work together with the fashions, the system is versatile sufficient to accommodate different fashions like Claude (via AWS Bedrock) or regionally hosted LLMs (similar to Ollama).

The important thing elements of this bundle had been the next.

Changing PDF to LaTeX

Step one was to transform educational papers again to their unique LaTeX format. I initially experimented with open-source packages like PyPDF2, however these instruments struggled with complicated educational content material, notably papers containing mathematical notation and particular symbols. To beat these limitations, I opted for the Mathpix API, which provides superior PDF to LaTeX conversion capabilities. Directions for getting the Mathpix API key may be discovered of their documentation here.

As of the date of writing this text (Nov 29, 2024), I used to be unable to make use of OpenAI’s API to transform PDF to Latex with excessive constancy. I’ll create a Push Request if that adjustments.

Abstract Era and Understanding

That is the core part of the device, accountable for producing complete paper summaries. The important thing problem was figuring out the suitable depth of understanding. Whereas ideally, we’d need an understanding equal to an in depth studying of the paper, producing such in depth protection in audio format, particularly for mathematical ideas, proved difficult.

Preliminary experiments used customary prompts like:

"Present a transparent and concise clarification of the analysis paper academic_paper. 
Embrace the primary analysis query, the methodology used, key findings, and 
the implications of the research" 

These prompts generated superficial summaries. As an example, when utilized to the seminal paper “Consideration Is All You Want,” it produced the next clarification:

As you possibly can observe, whereas the reason mentions ideas like Self-Consideration and Multi-Head Consideration, it fails to cowl these subjects in any quantity of depth. The idea of Positional Encodings can also be not talked about on this model. It’s clear that the LLM is both glossing over or skipping complete ideas on this clarification.

To handle this limitation, I developed a multi-step strategy. First, I prompted the LLM to create a data graph of the paper’s key ideas, with edges representing their relationships. 

Analyze the next educational paper and create a data graph. 
Record the primary ideas as nodes and their relationships as edges. 
Format your response as an inventory of nodes adopted by an inventory of edges:

Nodes:
1. Concept1
2. Concept2
...

Edges:
1. Concept1 -> Concept2: Relationship
2. Concept2 -> Concept3: Relationship
...

It generated the next graph for the paper.

Knowledge graph - Attention is all you need

This graph then served as a roadmap for the Instructor LLM to clarify the paper, leading to a notably improved depth. 

To additional improve the summaries, I launched a Pupil LLM that reads the paper together with the primary set of explanations from the Instructor and asks clarifying inquiries to the Instructor LLM. 

This interplay led to extra detailed explanations of complicated ideas.

As you possibly can see, ideas had been coated in far more element together with pretty attention-grabbing follow-up questions from the Pupil LLM.  

Changing That Paper Into Audio/Video Format

This part transforms the generated abstract into an audio or video format. For audio conversion, I utilized OpenAI’s text-to-speech tts-1-hd mannequin, using “nova” and “echo” voices to differentiate between the Instructor and Pupil roles, respectively. This strategy provides selection and construction to the audio presentation.

For video creation, I opted for a easy but efficient technique of mixing a single static picture with the audio observe utilizing the pydub and moviepy bundle. This method ends in a fundamental however useful video format that enhances the audio content material.

Under is the schematic for the end-to-end workflow:

Schematic for the end-to-end workflow

Whereas not equal to an in-depth research, the ultimate output offers complete protection that successfully serves as an alternative choice to an preliminary read-through.

Conclusion

You could find the supply code for Scholcast here and seek advice from this README for directions on find out how to set up and use Scholcast. Additionally, you possibly can try the Scholcast YouTube channel for summaries of a bunch of attention-grabbing papers on subjects starting from LLMs to optimization and ML algorithms.