Guides: Penn Libraries Linked Data Framework: Appendix: Linked Data and Other Formats

Appendix: Linked Data and Other Formats

Linked data descriptions of materials typically use representations of data that are different from the representations we currently use, which are largely based on MARC. Creating and using linked data also often involves different tools and systems than those that work with MARC. For example, the Library of Congress’s BIBFRAME linked data standard embodies and extends the information that appears in its MARC bibliographic records but uses a very different representation. The Library of Congress is developing a BIBFRAME editor to allow catalogers to create bibliographic descriptions in BIBFRAME. The Library of Congress has also developed automated tools to convert those BIBFRAME-based descriptions into MARC records, and MARC records into BIBFRAME.

MARC is focused on a particular set of properties for describing bibliographic editions (also known as “manifestations” in LRM and FRBR, and “instances” in BIBFRAME), and secondarily on authorities describing headings for names, titles, and subjects. MARC data is organized into records focused on these entities. These records largely describe them in self-contained ways, so they can be cleanly communicated and shared between libraries and systems. Linked data, in contrast, is based on persistent, web-resolvable identifiers and an extensible set of properties with distributed, web-accessible definitions and values that can include both literal representations like text strings and numbers, as well as identifiers of other entities. Linked data is often represented in RDF, but can be described in a number of web standards. More flexible and open-ended than MARC, linked data can describe a broader variety of entities, with an open-ended variety of descriptions, not limited to those defined or maintained by libraries. One type of entity of particular interest in library linked data is the “work”, a creative or informational resource that may be manifested in multiple editions. The relationships between entities, whether works or other types of entities, also has central importance in most library linked data.

Linked data can often be organized into records. Examples of such organizations include Wikidata items, which group together linked data assertions about entities, and “linky MARC”, MARC records that include linked data identifiers. Linky MARC is an example of hybrid linked data, combining linked data entities with more traditional formats. However, linked data, unlike MARC, is not inherently record-based, and record-based linked data descriptions may be more dependent on external data to be fully understandable than traditional MARC records are. Hence, while linked data is more flexible than MARC records, linked data for entities may be more difficult to communicate in self-contained manners, and it may be more dependent on access to more widely distributed knowledge bases on the Internet. However, linked data’s connections with widely distributed knowledge bases can enable resources to be described with greater richness, and by a wider group of people, then they can be in a system based on self-contained MARC records. Some examples of these richer, open-ended descriptions can be found in our discussion of specific Penn Libraries projects in later sections of this white paper.

The effectiveness of a distributed, shared knowledge base of linked data depends heavily on how widely and easily it can be shared and improved. For this reason, many linked data initiatives stress the importance of linked open data, which has licenses or understandings that freely grant rights to access, use, modify, and share data. Examples of linked open data include Wikidata, DBPedia, Getty Vocabularies as Linked Open Data, and the Library of Congress Linked Data Service.

Proprietary restrictions on metadata (which some parties assert, both on MARC records and on other data) make it especially difficult to use and improve on data that may be created and spread out over multiple entities, as linked data often is. If we are to move to a linked data information ecology built up by many people, it is important that the linked data that ecology relies on be open, and be practically accessible and widely editable, without proprietary restrictions or operational bottlenecks that choke off participation.