Description
Developing the Distroid Curator Algorithm (DCA), to help develop ranked feeds based on the sensemaking markers
In this pub, I cover my early work developing the an LLM-based tool to automate assessing items curated in the Distroid Catalogue, based on the sensemaking markers in the DCA.
I used the items curated in Distroid Digest Issue 44 again for this pub.
You can find the ratings in the embed below.
I created a few-shot prompt template for each marker with LangChain and Cohere’s Command R model (C-LLM). I used LangChain because it provides good functionality for creating prompt templates.
The few-shot prompt template contained:
a prefix including instructions describing how the marker is defined, how to assess the marker, and additional guidance on creating an output;
two examples, including key-value pairs for the title, summary, rating, and query (i.e., question to be answered), for how an item’s title and summary are rated, given the marker; and
a suffix including the rating and reasoning for the item.
I tried a couple different prompts until I could obtain a satisfactory result where the C-LLM’s rating and reasoning for the rating were sensible.
You can find an example prompt for the informative marker in the appendix.
After creating the few shot prompt template, I applied the prompt by calling the C-LLM with the template, and asking the C-LLM to rate all the items in Distroid Digest Issue 44.
Unfortunately, the C-LLM did not follow the expected format for the output (i.e., the rating and reasoning).
This made it hard to parse the output and save it in a structured format later on.
Additionally, the C-LLM also created a new rating value, 3.5, for Informative for How Bluesky works – the network components, disregarding the instructions in the prompt.
Lastly, the reasoning for some of the ratings were suspect.
You can find the unstructured outputs from the C-LLM, in the table below.
I compared my (human) ratings with the C-LLM’s (machine) ratings.
Our ratings were the same thirty (47%) times, and different thirty-five (53%) times.
The human and machine ratings had a Pearson correlation coefficient of 0.72, indicating that their is a strong positive relationship between the human ratings and machine ratings.
I will try using other tools, such as Haystack for LLM orchestration and format enforcers like LM Format Enforcer and Mirascope (also for prompt development), to enforce the C-LLM to provide a structured output.
Additionally, I will try to see if there is a difference in the LLM’s output generation (rating and reasoning) between providing an item’s summary or full-text.
I will also add a reasoning key-value pair to the examples to give some guidance on providing an explanation for a rating.
Context -------------------- Please use the following list below to help formulate your response: 1. Informative, as defined here, is "The content improved my understanding of a topic.." 2. Informative, as a signal, "Signal the informative quality of the work. ." Informative is measured by the following 5-point Likert Scale: 1 = Very uninformative 2 = Uninformative 3 = Neither 4 = Informative 5 = Very informative Additional Guidance -------------------- Please provide an explanation in the Reasoning section for why you gave a particular rating. Use the examples provided below as references for how to rate for Informative. Examples -------------------- Here are examples of how a piece of content is rated for its Informative quality: Title: Hyacinth Weekly Periodical #1 Summary: Hyacinth Audits has launched a serialisation of posts on web3 security topics to raise awareness and maximise security for users. This week, it highlighted some recent exploits and hacks, including those of Socket, Rosa Finance and Radiant Capital. Socket, an interoperability protocol, was hacked for more than $1m from five wallets, taking advantage of users who gave unlimited approval for Socket contracts in their wallet. Rosa Finance and Radiant Capital were both exploited due to a known rounding issue in the Compound/Aave v2 codebase. Over 1.9k ETH ($4.5m) was lost in the Radiant Capital hack. User: Given the title and summary above, how informative is Hyacinth Weekly Periodical #1, from a scale of 1 to 5, with 1 being Very uninformative, and 5 being Very informative? Rating: 4 Title: Unmasking DAOs — How to comprehensively analyze the decentralization of “DAOs” Summary: With the emergence of blockchain-based projects that do not seek to (fully) decentralize their governance, new terminologies have been introduced, in particular, that of decentralized applications (Dapps). Unfortunately, the terminologies of Dapps and DAOs are often used synonymously. This presents a major problem in the blockchain industry as many Dapps misleadingly refer to themselves as DAOs. In doing so, they readily adopt the positive idealistic attributes of a DAO for marketing purposes without offering (fully) decentralized governance structures. To sufficiently understand the degree of decentralization of a Dapp (potentially a DAO), we need to take a closer look at the respective underlying governance mechanisms. User: Given the title and summary above, how informative is Unmasking DAOs — How to comprehensively analyze the decentralization of “DAOs”, from a scale of 1 to 5, with 1 being Very uninformative, and 5 being Very informative? Rating: 4 User: Given the title and summary above, how informative is The State of Private Voting in Ethereum, from a scale of 1 to 5, with 1 being Very uninformative, and 5 being Very informative? Rating: Reasoning: