Bridging Clarity and Creativity: Using Generative AI to Create Graphics for Scientific Articles

Scientific articles often depend on visuals to effectively communicate complex ideas and data. From intricate diagrams to compelling visualizations, the right graphic can make or break the impact of a publication. This project explores the use of Generative AI to create graphics tailored for scientific articles, balancing aesthetic appeal with precision and clarity.

Update 20th of January 2025. I have deployed the project as an web app. See the update here: SciViz - A Web App for Gen AI Science Visuals

The Idea Behind the Project

Recently, I had the opportunity to visit the talented team at Proem.ai, whose mission is to make scientific research widely accessible. Accessibility, in this context, goes beyond easy access; it involves creating engaging interfaces that manage aspects such as speed, relevance, and difficulty, with aesthetics playing a crucial role. Compelling visuals can drive interest and overcome the hesitation some may feel toward scientific texts.

During my visit, I was asked how I would approach creating visuals for scientific research. This sparked an idea: a demo application that takes a link to a scientific paper, uses AI to read and understand the content, and generates fitting images and videos based on the subject matter. I envision these visuals being part of an app—a personalised and aesthetically pleasing “Instagram for scientific papers,” or a digital scientific magazine made just for you.

Guidelines and Approach

To guide this project, I defined the following principles:

Aesthetics: The visuals must compel the user to engage, fit the individual article, and create a cohesive visual identity for the app. Instead of photorealism, I opted for graphics that elevate the text without cheapening it, adhering to a strict design language.
Medium: The visuals should enhance, not distract from, the science, which remains the focal point.
Automation: With thousands of papers published daily, manual creation is impractical. The entire pipeline must be automated using AI.
Speed: To maintain engagement, the visuals must be generated quickly, requiring computational efficiency.

What Can Birdsong
teach Us About
Human Language?

Aesthetics and Medium

Creating full videos for each paper is computationally expensive and risks being overly distracting. Instead, I opted for images with depth and subtle motion — a solution that maintains engagement without overshadowing the science. For the design language, I drew inspiration from:

Vector Art
Art Deco
Flat Design
Minimalism
Pop Art
Bauhaus
Swiss Design

These styles align well with scientific content as they prioritize clarity, simplicity, and striking visuals. Together, they create a cohesive and sophisticated aesthetic akin to high-end magazines. Still images populate the main feed, while individual papers reveal motion-enhanced visuals for a dynamic yet tasteful experience.

What Is Distributed
Computing?

Even a Single Bacterial Cell
Can Sense the
Seasons Changing

Automation and Efficiency

The pipeline integrates LLMs, diffusion models, depth mapping, and automated video post-processing to streamline the workflow. By focusing on computationally inexpensive techniques, the solution balances speed and quality, making it scalable for large volumes of papers.

pipeline Workflow

The workflow is implemented in ComfyUI.

User Input: A link to a scientific paper or article, such as those from Quanta Magazine, is provided.
LLM Processing: An LLM reads the paper, extracts the main subject, and writes a suitable image prompt.
Image Generation: A diffusion model generates an image based on the prompt.
Video Creation: The still image is transformed into a loopable 3D video, adding motion and depth.

At the bottom of this page, you’ll find a video showcasing the full workflow I built in ComfyUI

User feed

Below are images generated from articles, showcasing examples of what a user might see in their feed

What Can Birdsong teach Us
About Human Language?

In the Quantum World, Even
Points of View Are Uncertain

Exotic New Superconductors
Delight and Confound

Teen Mathematicians Tie
Knots Through a
Mind-Blowing Fractal

All Life on Earth Today
Descended From a Single Cell.
Meet LUCA.

How the Human Brain
Contends With the
Strangeness of Zero

Even a Single Bacterial Cell
Can Sense the
Seasons Changing

Fish Have a Brain Microbiome.
Could Humans Have One Too?

Future Directions

This project is a work in progress, with many exciting possibilities for further development:

Motion Enhancements: Exploring additional motion types, such as cinemagraphs or parallax effects using phone gyroscopes.
Aesthetic Exploration: Experimenting with more styles and adding an extra LLM to select the best-generated images.
Pipeline Optimization: Automating video compilation to expand and automate the presentation and enabling LLMs to select the most fitting design language for each paper.

Closing Thoughts

This project demonstrates the potential of Generative AI to transform how scientific research is presented, making it more engaging and accessible. By bridging clarity and creativity, AI-driven visuals can empower researchers and readers alike. Stay tuned for updates as I refine this functionality, and add more examples of generated visuals.

12 December 2024

From Text to Visuals: The Full Workflow

Update 20th of January 2025. I have deployed the project as an web app. See the update here: SciViz - A Web App for Gen AI Science Visuals
This also meant a extensive overhaul of the code to make it exist outside of comfyUI as well as writing a bunch of new python. I also moved the image generation from Leonardo.AI, to running inference in my own cloud instead (Baseten) and using my own diffusion model. I also rewrote and simplified the API call to openAI.

You can however still se the old ComfyUI workflow beneath.

Here are the key technologies powering the pipeline:

ComfyUI: The backbone for organizing the node-based workflow, running locally on my M2 MacBook.

GripTape: Handles the API calls to offload some of the compute:

ChatGPT: Webscrapes the article, summarizes the main subject, and generates a fitting image prompt.
Leonardo.ai: Produces the AI-generated image via diffusion.
DepthFlow (by Tremeschin/BrokenSource), integrated into ComfyUI via akatz-ai’s implementation, transforms static 2D images into 2.5D videos with motion and parallax, adding depth and life to the visuals.

Links to the technologies used:

• GripTape: https://github.com/griptape-ai/ComfyUI-Griptape

• DepthFlow: https://github.com/BrokenSource/DepthFlow

• DepthFlow (ComfyUI implementation by akatz-ai): https://github.com/akatz-ai/ComfyUI-Depthflow-Nodes

The workflow runs smoothly on my M2 MacBook, with the API calls offloading some of the heavy lifting. Check out the video above to see it all in action!