Reflecting on piloting the V0.9 questionnaire and the results we obtained
As part of our work on the DGSF project, we are developing an assessment framework to benchmark and compare DAOs.
In pursuit of that goal, we have developed the DAO Index, a set of principles that we believe can provide a reference point for working towards the Ideal DAO (ID), operationalized as a questionnaire to assess adherence to the principles by evaluating real-world DAO practices.
In this article, we describe our work piloting the DAO Index principles and questionnaire, as of Version 0.9 (V0.9).
The main goal of the pilot was to determine the general applicability of the principles to DAOs, receive feedback on the questionnaire’s design and our selected principles, and find learnings to guide future development of the DAO Index.
We invite feedback on our work. Please leave your comments here on PubPub or Hypothesis (public channel), or send an email to [email protected].
Term | Definition |
---|---|
ID | Ideal DAO |
V0.9 | Version 0.9 |
DAO | Decentralized Autonomous Organization |
Before assessing DAOs, we needed to determine an appropriate basis for benchmarking and comparing DAOs.
We settled on principles (as of V0.9), “[a] basic idea or rule that guides behavior, or explains or controls how something happens or works” [1], because principles:
influence and constrain the governance and technical choices an organization can make [2][3],
provide a normative logic for social governance [4],
provide a meeting place for academia, industry, and society to agree on how DAOs should be defined and organized [5], and
provide a blueprint for how to organize [6].
We believe that the DAO Index principles are principles that all DAOs should be characterized by.
Our choice of principles is influenced by our view (as of V0.9) of DAOs as organizations that are 1) ideologically-driven, 2) self-governed through social and algorithmic governance mechanisms, and 3) self-infrastructured through a combination of blockchain technologies and other decentralized technologies [7][4][8][9].
We believe the principles should help address issues with part 1 , and provide appropriate constraints for parts 2 and 3 [10][11][12].
The principles also help us conceptualize how an ID, a reference point for how DAOs should be governed and operated (i.e., a standard for DAO practices), could be described, upon which real-world DAOs can use as a guide for their own development.1 We believe the ID can also serve as a base for describing IDs in specific contexts (e.g., Decentralized Finance (DeFi) and Decentralized Science (DeSci)).
Additionally, most of the issues associated with DAOs generally cover parts 1 and 2, such as concerns with plutocracy, rather than issues with self-infrastructuring (it is also easier to find signals here from simply focusing on on-chain activity) [13].
We reviewed seventy-five (75) articles from DAO-related academic and grey literature to develop items (or criteria) relevant to assessing DAOs (primarily, identifying signals of good and bad practices), and to identify principles that could orient DAO practices towards the ID [3][2][14][15][4].
We searched for literature from academic search engines such as Semantic Scholar for insights from academic researchers, and popular online platforms for Web3 discourse such as Twitter, Mirror and Substack, to find knowledge and insights from real-world practitioners and industry researchers in the grey literature.
We believe this will allow for the principles to bridge understanding between academia, industry, and society.
You can find the our literature collection in the table below.
The principles, as of V0.9, and the description and rationale for each principle, are described in the table below.
Initially, the questionnaire was developed by generating a set of items (used interchangeably with questions).
Generally, questions were generated from drawing insights from our literature collection, that better informed us of good and bad practices for DAOs. Similarly for the principles.
After developing our set of questions, we grouped them under the principles.
The items are currently constructed as pass/fail items, with desired outcomes leading to pass (yes, partial), and undesired outcomes leading to fail (no, does not answer).
The questionnaire can be considered an audit tool for ensuring adherence to the DAO Index principles [6]. Through our audits (referred to as assessments here), we hope to guide 1) values-informed decision-making by DAO operators, and 2) tool development by DAO infrastructure or tooling developers to make working towards making an ID technically feasible [6].
Additionally, by operationalizing the principles through a questionnaire, we are pushed towards “more concretely defin[ing] [our] values and principles in terms of measurable actions, so [our principles] can be readily assessed and audited” [6].
The questionnaire is comprised of forty-five (45) items in eight (8) dimensions (the dimensions here being the principles), with the following item count per principle.
Dimension | Items |
---|---|
BSP | 13 |
PDC | 11 |
CPB | 2 |
D2D | 2 |
IDT | 7 |
CBC | 3 |
OT | 4 |
HCAG | 3 |
The model for how the principles are operationalized as a questionnaire, is described in the graphic below.
You can find the questionnaire, and the rationale for each item, in the table below.
The data dictionary for the questionnaire data fields can be found in the table below.
Fields | Description | Example |
---|---|---|
Principle | An organizing principle for DAOs, specific to the particular version of the DAO Index in use | |
Indicator | An area that indicates where a DAO is or can turn the principle into practice | |
Question-ID | The identification number of the question, per principle. | |
Question | A yes/no question written in text to assess whether a DAO’s practices adhere or advance a principle | |
Plain-English | A response to the Question in English (or natural language). | Yes/No/Partial/Does not answer the question, N/A |
Points | The numeric score received for the question, corresponding to the Plain-English response. | |
Explanation | A brief explanation of why the DAO received a certain Plain-English and Point response | |
Sources | The sources referred to in drafting a response | |
Author | The rater(s) or respondent(s) to the Question | |
Notes | Respondent(s) notes regarding a question | |
Search Difficulty | The difficulty in searching for documents to reference to respond to a Question | |
Documents | The documents cited in a Response | |
Snippets | The snippets cited in a Response |
A graphical representation of how the fields in the questionnaire table relate to each other is described below.
We sought to see if the principles could be applied to real-world DAOs, and serve as a basis for benchmarking and comparing DAO practices, through exploratory case studies.
We applied the questionnaire to eleven (11) DAOs, described in the table below.
For more details on the DAO’s characteristics, please refer to the DAO’s DeepDAO profile page.
As of Version 0.9, responses are scored using the following method.
The scoring breakdown (Plain English response = corresponding numerical score) is described below.
Plain-english Response | Numerical Score |
---|---|
Yes | 100 |
Partial | 50 |
No | 0 |
N/A | Points redistributed to other items in the principle |
Does not answer | 0 |
If a DAO positively answers a question, then a Yes is appropriate [16].
If a DAO negatively answers a question, then a No is appropriate [16].
A Partial is appropriate when a DAO positively answers the question, but the practices do not fully answer the question [16].
N/A is appropriate when the question does not apply to that particular DAO [16].
Does not answer is appropriate when the question is applicable to the particular DAO, but the DAO does not provide enough information to positively or negatively answer the question [16].
The questionnaire penalizes DAOs if there is not enough information to answer a question (refer to Does not answer response).
The score is calculated by simply adding the points accrued for each question, and dividing by the total number of points for each applicable question (i.e., 100 multiplied by the number of applicable questions).
Every question has a maximum of 100 points.
The overall score is produced from totaling the points received for every question.
Currently, there are no weights applied to scores per principle and scores per item, nor are scores required to have the same total.
We used the overall score to generate a rating for DAOs.
You can find our data sources for materials (used interchangeably with evidence) cited in our assessments, evaluations of DAOs with the questionnaire, in the table below.
You can find our draft assessments in the table below.
We tested the internal consistency of the Questionnaire V0.9 [17].
At this early stage, we only focused on:
Cronbach’s Alpha to determine how well the questions per principle worked together,
Cronbach’s Alpha If Deleted to see whether the Cronbach’s Alpha coefficient would be improved by removing an item, and
average inter-item correlation to determine if any items were redundant or were not measuring the construct [17].
We determined the overall response distribution, how DAOs performed per principle, and overall scores and ratings.
We could not conduct a confirmatory factor analysis (CFA) to test our conceptual framework (the DAO Index principles) at this time because we could not determine if the dataset was appropriate for factor analysis.
To improve the ease of using the questionnaire, we developed the DAO Index Scorecard Toolkit, an Airtable base for using questionnaire, with tables to assist users such as a glossary, and managing data (e.g., evidence) associated with responses.
You can find the Airtable base here.
We made our results publicly accessible through a web user interface, available at https://joan816.softr.app/.
You can find an embed of the dashboard below.
As part of our work, we developed a Jupyter notebook on Google Colab to analyze the DAO Index assessments and other data, available here:
As part of our work, we developed a Jupyter notebook on Google Colab to archive materials we cited as evidence in our assessments.
An acceptable Cronbach’s Alpha score at this preliminary stage is a value between 0.60 and 0.80 [17][18].
Our Cronbach’s Alpha coefficients ranged from -0.279 to 0.816.
The PDC, CPB, IDT, and OT principles had unacceptable values for Cronbach’s Alpha, suggesting a need to evaluate the questions as grouped to measure the principles.
An acceptable value is if the Cronbach’s Alpha coefficient improves if the item is deleted [17].
We excluded D2D and CBP from this analysis because there were not enough items (needed at least two (2) items after item removal) to perform the analysis.
Most items are likely to be kept because the Cronbach’s Alpha coefficient did not increase significantly or reach an acceptable value.
Items that we may delete in a future version because of this analysis:
HCAG-03,
CBC-03,
BSP-07,
BSP-11,
BSP-02,
PDC-10,
PDC-04,
PDC-09,
PDC-02,
CBC-01,
OT-01, and
OT-03.
Acceptable values for average inter-item correlations are between 0.15 and 0.50 [17].
The average inter-item correlation ranges are described in the table below.
Principle | Range |
---|---|
PDC | -0.417 - 0.299 |
CBC | 0.403 - 0.538 |
BSP | -0.058 - 0.293 |
IDA | 0.058 - 0.219 |
D2D | 0.690 - 0.690 |
HCAG | 0.340 - 0.503 |
OT | 0.168 - 0.168 |
Unfortunately, we could not determine the average inter-item correlation for the following items:
BSP-02,
CPB-01,
CPB-02,
IDA-02,
OT-04, and
OT-02.
Only HCAG and OT had acceptable average inter-item correlation ranges [17].
Thus, suggesting that there may be items that do not measure the principle or are redundant.
The most common response was a Yes. The least common response was Not applicable.
Surprisingly, Does not answer question was the second most common response.
From realizing the high frequency of Does not answer responses, this further cemented our own pre-conceptions on the lack of documentation publicized by DAOs (i.e., transparency poverty) to provide a holistic understanding of on- and off-chain activities [19][20].
Additionally, the number of items for D2D and CPB are likely too inadequate to perform a proper analysis.
We will need to add additional items in the next version for each dimension to at least ten (10) questions.
You can find the ratings table for DAOs described above.
MakerDAO had the highest rating with a D+, while PrimeDAO had the lowest rating with an F.
The chart above shows the overall score for each DAO assessed with the questionnaire V0.9.
Maker DAO received the highest score, with an overall score of 3050
PrimeDAO received the lowest score, with an overall score of 1950.
The bar charts below show how each DAO performed per principle.
CBC
OT
The overlay radar chart above shows how DAOs comparatively performed, given the principles.
This chart summarizes the results shown in the previous sections.
The chart above shows the cumulative distribution of points per question.
DAOs performed best on CPB-01 and -02, D2D-01 and -02, CBC-01 and -02, OT-02 and -04.
The chart above shows the cumulative distribution of points per principle.
The eleven DAOs evaluated performed best on CPB, OT, D2D, and BSP.
In order of performance:
CPB,
OT,
D2D,
BSP,
PDC,
IDT,
HCAG, and
CBC.
The items were generally applicable to every DAO assessed. At most, we only determined that one question was inapplicable to a DAO, here being BSP-01 for dOrg.
Generally, we could find some information to answer a question during our assessments.
PrimeDAO had the most Does not answer responses with twenty-four (24), and MakerDAO had the lowest at seven (7).
We felt that too many DAOs were not transparent enough with their information, with documentation generally lacking to provide a good understanding of off-chain activities. However, this could also be because of a lack of information reporting standards in the DAO ecosystem. Hopefully, the DAO Index can help shed light on the need for the publication of more off-chain information.
An additional issue was that an item’s content could not be evaluated because we could not find any information to do so. Without adequate information to rely on, made it harder to determine if the content of our questions needs to be revised.
Lastly, it makes it difficult to determine if the set of questions work well with each other.
From piloting or testing out the questionnaire for DAO Index V0.9, we gained valuable feedback and learned many lessons.
We found it difficult to respond to the HCAG items, primarily because of how we constructed the questions. In other words, the questions lacked enough clarity to meaningfully respond, given the evidence we found. Thus, we realized we need to improve the clarity of the questions.
Additionally, that we may need more questions under HCAG to truly understand how this principle can guide the development and design of DAOs.
We received an interesting note from a MakerDAO member on the use of the Banzhaf Power Index for BSP-02. The member commented that BSP-02 should take into account quorum settings for different voting scenarios. We did not consider this situation originally when we created the question, as we assumed that DAOs would use a single voting setting for their decision-making.
We found that the scoring method for responses was too harsh on DAOs. The current scoring method is too harsh on DAOs that answered No to a question (i.e., received zero points), because if we can find information to respond to the question, that also promotes our goal of improving transparency in DAO activities. Thus, we realized that we need to update our scoring method for V1.0.
We found that certain questions, such as BSP-09, were likely compound questions (i.e., the question sought multiple answers). Thus, we realized that we need to divide such questions into more specific questions in future versions.
Our current work on the DAO Index V0.9, faced the following assumptions and limitations:
Our DAO definition inherently excludes organizations that may be considered DAOs by others [15];
The assessment takes an outside-in approach to assessing DAOs. Thus, we do not have complete information about the internal workings of the DAO, but only the publicly available information we can find provided by the DAO directly, or indirectly through third parties;
The principles we selected may not be representative of ideals for how a DAO should be operated and governed (or for a vision of an ID). In other words, our principles may not reflect the views of members of DAO, society, or academia;
Our assessments were limited by the lack of standardized documentation, such as the Securities and Exchange Commission (SEC)'s standard for Form 10-K [21]. The lack of standardized documentation limited our efforts to identify potential sources for responses;
Our dataset of eleven assessments is a small dataset;
Our methodology suffered from a lack of systematic research approach to developing the principles and questionnaire, which may have led to compromised results or the inability to interpret our results to determine reasonable outcomes;
Our scoring method may be inadequate for benchmarking and comparing DAOs;
Some of the questions may be organized under the wrong principle;
As assessors, our own set of working knowledge may have hindered us from truly understanding how to interpret a question or certain materials in formulating a response to a question;
Some of our older assessments from 2022 suffered from link rot, making it hard to re-check or review our responses because our sources were no longer accessible;
Some of the questions (e.g., BSP-02), will make more sense if measured periodically (or monitored constantly), rather than solely when we conduct an assessment; and
The inability to secure experts or DAO operators to assess content validity likely led to us including items that had issues with item construction, such as being ambiguous or overly complex.
Possible future directions we are considering include:
Testing the reliability of the questionnaire;
Developing a more robust research methodology for developing our conceptual framework, operationalizing our conceptual framework, and clarifying the DAO attributes we seek to measure;
Developing a more robust scoring methodology for assessments;
Compare the DAO Index with other DAO assessment frameworks to determine concurrent validity and mappings between frameworks;
Conduct more assessments to conduct a exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) on the instrument;
Address issues with item construction by sharing items with more experts and DAO operators to assess content validity;
Developing new tools to speed up the assessment process, such as a Banzhaf Power Index calculator for DAOs;
Improve our assessment process to speed up assessments while improving the accuracy and clarity of responses;
Improve our existing tools;
Increasing the number of items per dimension to at least ten items; and
Adding additional criteria for testing internal consistency.
We also have charts for the response distribution per principle and per DAO. If you are interested in these charts (and any other chart), please leave a comment or send an email to [email protected].