Skip to content

Examples

Summarize an EPUB

Here is an example of how you might make a summary of an EPUB book. It works by:

  • splitting each chapter into overlapping blocks of text
  • summarizing each block using a template
  • squasing the summaries into a single summary
  • using another prompt to clean up the summary

Here's the full program in a Prompterfile:

load test.epub
select "block_tag like 'chapter%'"
transform clean-epub html-to-md token-split --n=5000
complete summarize-block.task
squash
complete cleanup-summary.task
retag summary-{{block_tag}}.md
write

Here is the first task, which summarizes a single block of text:

summarize-block.task

Summarize the following block of text:

{{content}}

Do this in 250 words or fewer using markdown fomatting. Focus on adding bullet lists.

Here is the second task, which cleans up the "squashed" summaries of all the blocks:

task-summarize-summary.md

This is a summary of a chapter that was constucted from overlapping chunks of text from a longer work:

{{content}}

Summarize it into 250 words or fewer.

Create an audiobook from an ebook

To create a quick summary of an ebook:

load example.epub
select "block_tag like '%chapter%'"
transform clean-epub html2md token-split --n=5000 --overlap=0
complete convert-to-narrative.md --persona=reader.md
squash
speak

This will load an ebook, transform it into Markdown, split it into blocks, convert the blocks into a narrative, and then use the speak command to generate an audio file.

Here are the prompts and the persona:

convert-to-narrative.md

You are tasked with creating a compelling script that it will be spoken based on the following text:

{{block}}

Replace any markdown formatting elements with spoken words.

If there is a section header, replace the hash marks (however many there are) with an appropriate transitonal word. For example, "Let's consider...", or "Now, let's talk about...". Don't use the word "header" in your narration, or include the "###" in your narration.

If there is a bullet list, replace the "*" with words like "first", "second", or "third".

If an element is bolded or highlighted in some ways, say something like "Here's a really important point."

If there is a link, just read the name of the link aloud.

Here's the persona. Note how specific it is around reading the text as-is. Otherwise it will tend to hallucinate.

reader.md

You read technical works aloud to convert them to audio books. Your strive to follow the exact text you're reading with as little variation as possible, making changes only when they are absolutely essential. Otherwise, you speak the original text as the auhthor presents it exactly. You might make the occasional excption for things in the text that might not translate to audio, such as a code snippet or a chart or graph.

This example use the mshibanami/GitHubTrendingRSS project to convert an RSS feed (in this case, RSS 2.0) of GitHub trending projects into an audio file. Here's how it works:

  • load the feed from the "All Languages" feed on GitHub Trending RSS
  • transform the feed into json using the feed-to-abridged-json command (there are several ways to summarize the feed data)
  • complete a prompt that converts the json into a markdown file
  • speak to make the audio file

Here's an example of the output of the feed-to-abridged-json command when run on the feed:

[
    {
        "title": "langgenius/dify",
        "link": "https://github.com/langgenius/dify",
        "summary": "<p>Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.</p><hr /><p><img alt=\"cover-v5-optimized\" src=\"https://github.com/langgenius/dify/assets/13230914/f9e19af5-61ba-4119-b926-d10c4c06ebab\" /></p> \n<p align=\"center\"> \ud83d\udccc <a href=\"https://dify.ai/blog/introducing-dify-workflow-file-upload-a-demo-on-ai-podcast\">Introducing Dify Workflow File Upload: Recreate Google NotebookLM Podcast</a> </p> \n<p align=\"center\"> <a href=\"https://cloud.dify.ai\">Dify Cloud</a> \u00b7 <a href=\"https://docs.dify.ai/getting-started/install-self-hosted\">Self-hosting</a> \u00b7 <a href=\"https://docs.dify.ai\">Documentation</a> \u00b7 <a href=\"https://udify.app/chat/22L1zSxg6yW1cWQg\">Enterprise inquiry</a> </p> \n<p align=\"center\"> <a href=\"https://dify.ai\" target=\"_blank\"> <img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Product-F04438"
    },
    ...
]

Here's the Prompterfile:

load https://mshibanami.github.io/GitHubTrendingRSS/weekly/all.xml
transform feed-to-abridged-json
complete summarize-trending-repos.task
speak

Here's the summarize-trending-repos.task task:

summarize-trending-repos.task

The prompt goes here

Break an EPUB into chunks and compute embeddings

This Prompterfile shows how to break an EPUB into chunks of ~500 words and compute their embeddings. The embeddings are saved in a CSV file and the chunks are saved in a JSON file.

#! sh
# Set filename variable that excludes an extension
set FN my-ebook
load {{FN}}.epub
select "block_tag like 'ch%.html'"
transform clean-epub html-to-md
transform token-split --n=500 --overlap=0
export --fn=out-{{ FN }}.json
embed --fn=out-{{ FN }}.csv

Using Jinja in a Prompterfile

You can use Jinja template constructs to create more complex logic in a Prompterfile. For example, here's an example that uses Jinja to loop over a list of durations and generate a series of tasks that summarize a block of text:

load data/source/*.html
select "block_tag like '%-ch%'"
transform strip-attributes extract-headers
complete task-summarize-block.txt
retag gist
squash --tag=squashed
{% for duration in ['30-seconds', '2-minutes'] %}
   checkout squashed
   # Set an environment variable to set context that can be used in the prompt
   set DURATION {{duration}}
   complete task-get-the-gist-duration.txt --context=data/metadata.yaml --model=gpt-4o
   retag gist-{{duration}}
   speak --speed=1.2
{% endfor %}

Using retag to change filenames