Skip to main content

Distroid: Working towards the Catalogue Graph Data Model V0.2

Updates and notes as we transition from V0.1 to V0.2

Published onAug 22, 2024
Distroid: Working towards the Catalogue Graph Data Model V0.2
·

In this pub, I cover my work updating the Catalogue’s Graph Data Model from V0.1 to V0.2.

Tables

Entities1

Relationships

Visualization

Static

Interactive

Graph_Data_Model-Catalogue-V02-2024_06_16

Discussion

I am trying to figure out how to limit the scope of the Event entity because I want to limit Events to social commentary, rather than including citations and references, which I prefer to have as relations between Works. If you have some ideas on this matter, please send a message.

I am also thinking about whether to keep or remove the HAS_AUTHORED relationship. I think I will probably change HAS_AUTHORED to CONTRIBUTED_TO, and add a role property (and possibly a taxonomy property) to describe the role and any taxonomy the role falls under (e.g., CRediT). I believe doing so will better show the ways people and organizations contribute to a work (e.g., editing, feedback, or proofreading), and to improve the data quality and the usefulness of the contributions sub-graph.

I may also add a type property to Organization, so I can distinguish between the types of organizations (e.g., businesses, teams, joint ventures), and create relationships between organizations and their sub-organizations (e.g., IS_DEPARTMENT_OF, IS_SUBSIDIARY_OF).

I may replace the CO_AUTHORED_WITH relationship with COLLABORATED_WITH, and have a type property for the type of collaboration (e.g., co-authorship).

2024-08-31: I added WORKS_PUBLISHED_IN, UNCLASSIFIED_CONTRIBUTION_IN, and GUEST_APPERANCE_IN as relationships between a Person/Organization and Media Source, so that we can see how a Person/Organization has interacted with a Media Source beyond managing a Media Source (i.e., the MANAGES relationship).

Additionally, I am still thinking about the properties for relations and entities. I am thinking about which properties to include based on things in schema.org that would make sense in the Graph Data Model.

So, I will probably add properties last, after confirming the entities and relationships2 I want to have for the graph data model.

Though, regarding basic properties for Works, Media Sources, and Persons, I settled on the following properties:

Works:

  1. title

  2. subtitle

  3. description

  4. workURL

  5. associatedMedia

  6. datePublished

  7. thumbnailURL

  8. distroidID

Media Sources:

  1. title

  2. description

  3. mediaSourceURL

  4. thumbnailURL

  5. distroidID

I removed manager as a property because being a manager is a relation between a Person or Organization and Media Source.

Person:

  1. name

  2. personURL

  3. description

  4. givenName

  5. familyName

  6. thumbnailURL

  7. honorificSuffix

  8. distroidID

Organization:

  1. name

  2. description

  3. organizationURL

  4. thumbnailURL

  5. distroidID

For Organization, I will probably need to refer to an external schema

I may make [word]URL an object (most likely a list) to hold multiple URLs related to the entity.

Example Graph

I will probably not include an example (i.e., small) graph for this transition as I did for V0.1.

Instead, I will work on updating the Catalogue’s Knowledge Graph (KG) data pipeline to populate the KG with Works collected from Media Sources, and formatting the data collected to comply with the Graph Data Model V0.2.

Request For Comments

I am seeking feedback on this pub for any improvements to make, errors to correct, or other areas to explore.

Please leave your feedback here, on the Ledgerback discussion forum, or on Twitter.

Contact Information

  1. Twitter

Comments
0
comment
No comments here
Why not start the discussion?