...

Collecting, Archiving and Exhibiting Digital Design Data Section 5: Appendices

by user

on
Category: Documents
2

views

Report

Comments

Transcript

Collecting, Archiving and Exhibiting Digital Design Data Section 5: Appendices
The Art Institute of Chicago
Department of Architecture
Collecting, Archiving and
Exhibiting Digital Design Data
Section 5:
Appendices
Appendix A: CITI Snapshots
Figure 5.1: Screen Captures of CITI (Collection Image Text and Index) Interface
Copyright: The Art Institute of Chicago
Appendix A: CITI Snapshots
Copyright: The Art Institute of Chicago
Appendix A: CITI Snapshots
Copyright: The Art Institute of Chicago
Appendix A: CITI Snapshots
Copyright: The Art Institute of Chicago
Appendix A: CITI Snapshots
Copyright: The Art Institute of Chicago
Appendix A: CITI Snapshots
Copyright: The Art Institute of Chicago
Appendix A: CITI Snapshots
Copyright: The Art Institute of Chicago
Appendix B: CITI Implementation
of CDWA
This appendix gives an explanation of the categories and subcategories of the Categories for the Description of
Works of Art metadata schema and its implementation in CITI (Collection Image Text and Index), The Art
Institute’s collection management system. A discussion of the CDWA metadata schema and CITI database is
included in the Cataloging Digital Design Data chapter.
Table 5.1: CDWA Explanation and CITI Implementation
(Note: Required categories are highlighted with a light gray background; required subcategories are marked with ♦)
Category
Subcategories
Object, Architecture or Group
Object/Work
Catalog Level
Quantity
Type♦
Components
Quantity
Type♦
Titles or Names
Text♦
Type
Date
Creation
Creator♦
Extent
Qualifier
Identity♦
Role♦
Statement
Date♦
Earliest Date
Latest Date
Place/Original
Location
Commission
Commissioner
Type
Description
Implementation in CITI
The Catalog Level can indicate a
group of objects, such as a suite
of drawings or a set of digital
design data.
Master/Part relates job
level to component or
document level.
The Type would indicate that the
documents are Architectural
Documents.
Subject Description
records quantity. Main
Reference Number
implies quantity.
The Component Type would
classify the group of documents:
Presentation Drawings
Design/Detail Drawings
(for current collection) or
Schematic/Conceptual Design
Design Development
(for digital collection).
Object Type and
Classification fields
indicate type of
document.
(e.g., Architectural
Drawing or
Architectural Working
Drawing)
A level below the group of
documents would be individual
document records that would
further define type to be images,
animations, etc.
Title – Text is the name of an
architectural job or project, or the
name of an individual
component document.
Creator – Identity would be the
architect’s name, with Role =
Architect. Because architects will
often have many jobs within a
collection, an associated
“Authority” called Creator
Identification should be created
to link into the Creation record of
many projects. (Authorities are
explained later in this table.)
Date represents the beginning
and end date of an architectural
Titles
(e.g., Palmer, Potter,
Apartment: Working
Drawings or 11th Floor
Plan)
Artist/ Culture/ Place
Display is a text field with
artist name, nationality,
life dates, role, places of
work and commissioning
agent.
(e.g., Adler, David,
American, 18821949.)
Artists/ Index indexes
names and roles of artists
mentioned in Artist/
Culture/ Place.
Appendix B: CITI Implementation of CDWA
Date
Place
Cost
Numbers
job.
(e.g., John Wyatt
Gregg; Draftsman)
Places field indexes
places from Artist/
Culture/ Place Display.
Date Display is a text
field that records dates
associated with the
creation of a work.
(e.g., Designed 1929)
Dates field is an index for
Date Display.
Materials and
Techniques
Measurements
Classification
Description♦
Extent
Processes or Techniques
Name
Implement
Materials
Role
Name
Color
Source
Marks
Date
Actions
Dimensions♦
Extent
Type
Value
Unit
Qualifier
Date
Shape
Size
Scale
Format
Term♦
Materials and Techniques –
Description describes the
medium of the work. Examples
would be:
Ink, graphite and watercolor on
toned paper.
Or, for digital data:
Photomontage of site photo and
rendered building.
Drawing Dimensions are noted
here, as well as scale.
With digital design data, Format
describes the data format and
can be automatically populated
based on file extension (TIF, AVI
or DWG). Additional format
information will be stored in the
Format Registry and can link into
this category.
Size will be the document’s file
size. Resolution should be an
additional subcategory for digital
images.
Classification identifies the broad
category of the object, e.g.,
architecture, or can be a
department within a museum,
e.g., Department of Architecture.
Separate authority
records are created for
artists and places.
Medium Display records
a short description of
material, support,
processes and
techniques.
(e.g., Black and
colored ink on linen.)
Physical Description
records longer
descriptions of materials
and techniques used.
Media, Process and
Technique, and
Classification fields can
index content of Medium
Display and Physical
Description.
Dimensions field records
measurements of a work
and is a text field.
(e.g., approx. 64.2 x
84.6 cm.)
AMICO Object Type
records one of 13 object
classification terms
determined by AMICO
project.
Appendix B: CITI Implementation of CDWA
Subject Matter
Current
Location
Description
Indexing Terms♦
Identification
Indexing Terms♦
Interpretation
Indexing Terms♦
Interpretive History
Repository Name♦
Geographic Location♦
Repository Numbers♦
Subject Matter Identification
might describe the type of
building, e.g.,
Commercial
Residence or
Civic center.
To aid in tracking, the Repository
Name and shelf Number should
be included, e.g., T 036.12.
In the case of digital data, the file
and server location would
become the Repository Number.
Cataloging
History
Cataloger Name
Cataloger Institution
Date
Cataloging History could be
combined with Condition/
Examination History because
these tasks are performed by the
same person in the current Dept.
of Architecture workflow.
Condition/
Examination
History
Description
Type
Date
Agent
Place
Condition assessment is
performed twice in the current
archiving workflow. An initial
condition report is made during
preliminary cataloging to send to
the Registrar. A second
condition report is completed
when a full cataloging is made.
These Examinations can be
recorded in this field.
Conservation/
Treatment
History
Description
Type
Date
Agent
Place
For the digital collection, the
archivist would check for file
corruption and would perform a
checksum operation. An
additional subcategory for
Checksum should be added.
Conservation/ Treatment refers
to conservation or restoration
efforts made to a work of art and
are not highly applicable to the
current paper collection.
For digital data, preservation is
important and the preservation
Department records the
curatorial group
responsible for the work.
(e.g., Architecture)
Subject Description
records subject matter
and is a text field.
Subject Index field
indexes terms relating to
a work.
Home Location records a
text description of the
location of documents.
(e.g., Jackson/ Peoria
Architecture Vault;
Parents: Jackson/
Peoria Sixth Floor)
Location History indexes
the locations mentioned
in Home Location and
includes Move Date and
Moved By.
Cataloging History
records the cataloger
name, date and
cataloging notes in a text
description.
(e.g., Luigi H. Mumford
9/13/01 5:56PM. Core
data elements reviewed.)
Condition Description
records a text description
of document condition.
(e.g., Soiled and water
stained at right edges
of all sheets.)
Conditions field indexes
Date, Condition Status
and Staff Member.
CITI will add the
capability to link to
conservation documents
to track data preservation
techniques or additional
fields to track dates and
types of format migration
or translation.
Appendix B: CITI Implementation of CDWA
Context
Copyright/
Restrictions
Critical
Responses
Descriptive
Note
Edition
Exhibition/ Loan
History
Facture
Historical/Cultural
Event Type
Event Name
Date
Place
Agent
Identity
Role
Cost or Value
Architectural
Building/ Site
Name
Part
Type
Place
Placement
Date
Holder Name
Place
Date
Statement
Comment
Document Type
Author
Date
Circumstance
Text
Number or Name
Impression Number
Size
Title or Name
Curator
Organizer
Sponsor
Venue
Name
Place
Type
Dates
Object Number
Description
strategies of the Preservation
Policy Committee should be
recorded here. Format migration
and other activities should be
recorded. Format Registry
preservation info may link to this
category.
The Context – Historical/
Cultural might be a cultural event
such as Columbian Exposition or
a competition or a building
complex of which an individual
project is a part.
Copyright information will be
important for images to be made
available on the Web.
Critical Responses of art
historians and critics or the
general public to exhibitions can
be cataloged here.
Additional information relevant to
a specific drawing can be added
as a Descriptive Note.
Current paper drawings tend to
be collected in the final edition.
Edition could be used to track
digital data as it is migrated to
new versions or formats.
Exhibition and Loan History
would follow the current
Department of Architecture
format.
Facture – Description describes
the method in which a work was
created. It could be used to
describe architectural process in
terms of sequence of software
Historical Context records
any didactic commentary
relating to a work in a text
field.
Copyright and rights
granted information is
recorded in the image
metadata record.
Historical Context records
critical responses. Critical
commentary could also
be linked documents.
Physical Description or
Subject Description could
record additional
descriptive notes.
Medium Display or
Catalog Raisonné can
record edition
information, depending
on medium.
Exhibition History
N/A
Appendix B: CITI Implementation of CDWA
Inscriptions/
Marks
Orientation/
Arrangement
Ownership/
Collecting
History
Transcription or
Description
Type
Author
Location
Typeface/ Letterform
Date
Description
Remarks
Citations
Description
Transfer Mode
Cost or Value
Legal Status
Owner
Role
Place
Dates
Owner’s Numbers
Credit Line
operations or digital tool use.
Notes made by the architect
should be noted here.
Inscriptions
Electronic markups are a digital
example.
For architecture, Arrangement
might apply to a bound series of
sketches or the flow of a
PowerPoint presentation.
This category tracks the
provenance (or history of owners
of the collection) and museum
accessioning. This would include
the temporary RX number given
by the Registrar and the
accession number once the
objects are accepted into the
collection after committee
approval.
N/A
Reference Numbers
record the accession
number and permanent
receipt (RX) number.
(e.g., Accession No:
1989.682.1-7;
Permanent Receipt
(RX) No.:
RX17685/168.1-3)
Provenance Text and
Provenance Index record
previous owner
information.
Acquisition information is
record includes
Credit Line
(e.g., Gift of Bowen
Blair, executor of
estate of William
McCormick Blair.)
Committees field records
meeting types and dates
that approved documents
to enter the collection.
(e.g., Architecture,
04/27/1989;
Board of Trustees,
06/12/1989)
Physical
Description
Physical Appearance
Indexing Terms
Physical Appearance can be
incorporated into the Materials
and Techniques category as
done in CITI.
Acquisition Agents field
records indexed names
and roles.
(e.g., Bowen Blair,
Executor; William
McCormick Blair
Estate, Donor.)
Physical Description field
records a long version of
materials and techniques
that is stated briefly in
Medium Display.
May not apply to digital data.
Indexed Terms record
terms about physical
Appendix B: CITI Implementation of CDWA
Related Works
Related Textual
References
Related Visual
Documentation
State
Styles/ Periods/
Groups/
Movements
Relationship Type
Relationship Number
Identification
Creator
Qualifier
Identity
Role
Titles or Names
Creation Date
Earliest Date
Latest Date
Repository Name
Geographic Location
Repository Numbers
Object/ Work Type
Identification
Type
Work Cited
Work Illustrated
Object/ Work Number
Relationship Type
Image Type
Image Measurements
Value
Unit
Image Format
Image Date
Image Color
Image View
Indexing Terms
Image Ownership
Owner’s Name
Owner’s Numbers
Image Source
Name
Number
Copyright/ Restrictions
Identification
Description
Indexing Terms
The Related Works category
describes the link between the
group record and the item
record, e.g., is larger context for,
or can describe an intellectual
link to any other work.
Related Textural References
might be housed in the museum
Library archives. This would
provide a direct link to the
reference or could provide a link
to the relevant Finding Aid.
For the digital collection, the
digital data, including visual
documentation, will be cataloged
in groups and can be viewed by
following a link to the digital
object. Thus, Related Visual
Documentation will already be
addressed in the cataloging
strategy.
description with fields
such as Media, Process
and Technique, and
Classification to
categorize term type.
(e.g., Term:
Architectural working
drawing
Term Type:
Classifications;
Term: Graphite
Term Type: Media)
Master/ Part field links job
records to component
records.
Publications and Catalog
Raisonne fields record
related text references.
Images field records and
presents digital image or
version of work.
(e.g., (.1)1 11th Floor
Plan E38400)
These image-related fields might
be integrated into the item-level
catalog record for a digital
object.
State may distinguish an early
set of drawings from a later set.
The architectural style or
movement should be included as
Indexing Terms for crossreferencing architectural jobs,
e.g., Mid-American Classicism or
Medium Display or
Catalog Raisonné field
records document's state.
Subject Description field
records style as a text
description.
Style index field records
Appendix B: CITI Implementation of CDWA
Authorities
Creator
Identification
Generic
Concept
Identification
Place/Location
Identification
Subject
Identification
Name♦
Variant Names
Dates/Locations♦
Birth Date♦
Death Date♦
Earliest Active Date
Latest Active Date
Place of Birth
Place of Death
Places of Activity
Nationality/ Culture/Race♦
Nationality/
Citizenship
Culture
Race/Ethnicity
Gender
Life Roles♦
Related People
Relationship
Name
Term ♦
Variant Terms
Dates
Earliest Date
Latest Date
Related Generic Concepts
Relationship Type
Name/ Term
Place Name ♦
Variant Place Names
Dates
Earliest Date
Latest Date
Coordinates
Place Types ♦
Related Types ♦
Related Places
Relationship Type
Name
Subject Name
Variant Subject Names
Dates
Earliest Date
Latest Date
Indexing Terms
Related Subjects
Relationship Type
Name
Bauhaus.
style terms.
The Creator Identification
Authority would be created for a
particular architect or draftsman
and could be linked to his
numerous jobs or projects to
avoid re-entering the data.
CITI has an Agent entity
with multiple-name
variants and biographic
information about artists
and architects. The Agent
record is linked to the
Object record through
Role (such as Architect or
Draftsman). Therefore, an
Agent could be linked to
many projects with a
different role for each
project.
The Generic Concept
Identification Authority defines
concepts related to the type of
object, its material, activities
associated with it, its style, other
attributes, or the role of the artist
or place.
N/A
The Place/ Location
Identification Authority could
relate jobs that fall in the
Chicago area.
CITI has a Place entity
with multiple-name
variants capabilities and
hierarchical structure.
The Subject Identification
Authority could be used in the
same manner as Generic
Concept Identification.
N/A
Appendix C: Additional Metadata
Initiatives
This appendix records additional metadata initiatives that are not discussed in the Cataloging Digital Design
Data chapter in Section 2: Archiving Digital Design Data: Practices and Technology.
The Open Archives Initiative Protocol for Metadata
Harvesting (OAI-PMH) –
http://www.openarchives.org/OAI/openarchivesprotocol.html
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which was proposed by the Open
Archives Initiative (OAI), provides a way to make the resources and related metadata included in museum
archives “open” and available over the Web. Here, “open” does not mean “free” or “unlimited access,” but
rather possessing a common machine interface that facilitates the availability of content from a variety of digital
content providers.
The OAI grew out of a meeting in 1999 in Santa Fe, New Mexico. Web-based archives (such as the physics
archive run by Paul Ginsparg at Los Alamos National Laboratory) were beginning to make a significant impact
on the dissemination of scholarly publications, and there was a desire to make these documents available to as
wide an audience as possible. The 1999 meeting resulted in early 2000 in the “Santa Fe Convention,” which
outlined the structure and basic metadata of an open digital archive. The Santa Fe Convention has since been
superceded by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
The OAI develops and promotes standards to facilitate the efficient dissemination of digital content and
increase the availability of scholarly communication. At present, the OAI has defined the OAI-PMH, a technical
specification for a common means of exposing an archive’s metadata to external searches. OAI-PMH provides
an application-independent interoperability framework based on “metadata harvesting,” or collecting of
metadata information from a repository. OAI-PMH-compliant search engine will request information about a
resource in a repository and will receive an XML-encoded byte stream in response.
Z39.50 – http://lcweb.loc.gov/z3950
Another proposal for opening digital archives to external searches is Z39.50, an international standard (ISO
23950) that defines protocol for computer-to-computer information retrieval. With Z39.50, a user in one system
can search and retrieve information from other computer systems that have implemented Z39.50 without
knowing the search syntax used by the other systems. Z39.50 was originally approved by the National
Information Standards Organization (NISO) in 1988. It is possible to search the Library of Congress
bibliographic file using a Z39.50 client.
Metadata Encoding and Transmission Standard
(METS) – www.loc.gov/standards/mets
METS (Metadata Encoding and Transmission Standard) is an XML-based encoding schema for digital library
metadata used by the Library of Congress. It does not have its own set of metadata semantics, but instead
references existing metadata schemas such as Dublin Core. It encodes descriptive and administrative
metadata and is most unique for its ability to encode structural metadata. With its structural metadata section,
METS can be used to represent the relationships and hierarchy between multiple images such as highresolution archival images, thumbnails and delivery images at lower resolutions, images of particular details at
higher magnification.
METS is significant because it is becoming a standard of exchange between research institutions. Fedora, one
of the open-source repositories discussed in Appendix D: Open-Source Repositories, uses a variation of METS
Appendix C: Additional Metadata Initiatives
as its Archival Information Package (AIP) format. In its next release, DSpace plans to use METS as its AIP
format as well.
VRA Core 3.0 – www.vraweb.org/vracore3.htm
The VRA Core Categories, Version 3.0 consist of a single element set that can be applied to create records to
describe works of visual culture and the images that document them. The VRA Core 3.0 follows the 1:1
principle developed by the Dublin Core community, meaning that only one resource may be described within a
single metadata set. It does not provide for hierarchy and object linking, but instead relies on the local database
to link metadata element sets.
Art Museum Image Consortium (AMICO) – www.amico.org
The Art Museum Image Consortium (AMICO) Library is an online collection of images, text and multimedia
from the collections of museums such as the Library of Congress, the Getty and The Art Institute of Chicago.
Each work in the AMICO Library is documented by a catalog record, an image file and an image metadata
record. At the left is a sample AMICO record for the Sarcophagus held by The Art Institute of Chicago.
The catalog record is based on standards developed by the Categories for the Description of Works of Art,
discussed above. The metadata record is based on the Dublin Core and adds fields specific to digital images
such as file size and compression.
Encoded Archival Description (EAD) – www.loc.gov/ead
Encoded Archival Description (EAD) is a digital finding aid that captures all the information and metadata of an
analog finding aid and also provides metadata about the finding aid itself, including its author, language and
publication details. This is useful for projects where the original analog finding aid has some historical
significance and there is a desire to duplicate it in digital form.
US MARC (MAchine-Readable Cataloging) – www.loc.gov/marc
The Library of Congress is the official depository of United States publications and is a primary source of
cataloging records for US and international publications. To take advantage of the shift to computers for
cataloging in the 1960s, the Library of Congress created the LC MARC format that uses brief numbers, letters,
and symbols within the catalog record to identify different types of information. The original LC MARC format
evolved into MARC 21 and has become the standard used by most library computer programs. However, it is
not as well suited for museum collection information, though metadata initiatives mentioned above will often
have mapping “crosswalks” to and from MARC.
TEI Header – http://www.tei-c.org
The TEI Header is a component of the Text Encoding Initiative Guidelines and is used to document a digital
file, whether that file is an encoded text, an image, a digital recording, or a group of any of these. It provides
standard bibliographic information about the digital file and its source as well as more specialized metadata to
record the details of classification schemes, encoding and sampling systems used, linguistic details, editorial
methods, and administrative metadata such as the revision history of the file.
SPECTRUM
SPECTRUM was developed by the Museum Documentation Standard (MDS) in Great Britain. It contains
procedures for documenting objects and the processes they undergo, as well as ways to document information
to support the procedures.
Object ID – www.object-id.com
Object ID is an international standard for describing cultural objects developed as a collaboration of the
museum community, police and customs agencies, the art trade, insurance industry, and appraisers of art and
antiques. Beginning as an initiative by the J. Paul Getty Trust in 1993, the Object ID project was adopted by the
Council for the Prevention of Art Theft (CoPAT), a registered charity in the UK whose mission is crime
Appendix C: Additional Metadata Initiatives
prevention in the arts fields. Object ID includes metadata that could be useful for object identification in case of
theft such as distinguishing features or markings, size and weight.
Metadata Vocabulary and Syntax
Art and Architecture Thesaurus (AAT) –
www.getty.edu/research/conducting_research/vocabularies/aat
The Getty Museum has created a structured vocabulary that can be used to describe art, architecture,
decorative arts, material culture and archival material called the Art and Architecture Thesaurus (AAT). This
vocabulary creates a set of values that can be entered into fields of metadata schemas such as the Categories
for the Description of Works of Art. Terms, related concepts, parent hierarchical information and notes are
linked to each concept. For facility in searching, terms can include the plural or singular form of the term,
natural or inverted order, spelling variations, various forms of speech and synonyms. The concepts, such as
“marble” or “Impressionism” are further organized into facets such as “Materials” and “Styles and Periods.”
Cataloging Cultural Objects – www.vraweb.org/CCOweb
For syntax decisions for the metadata values, the Visual Resources Association has created a guide called
Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images. This guide advises an
archivist on how to format words in metadata fields. There will often be a “preferred title” for a work of art as
well as translations from or into other languages or a several word description of the piece.
Metadata Harvesting and Accessibility
There is a pressing need for basic standards that will enable digital archives to be accessed at a common level
by a widely dispersed audience. According to the Digital Library Federation:
Despite enormous institutional investment in the creation and description of materials of serious interest to
research and education, these resources exist in isolated pockets. They are difficult to find and impossible to
search across. Meanwhile, students and faculty are tempted into over-reliance on commercial Internet search
engines, despite their limitations and the uneven quality of the materials they include.1
Therefore, museums and other institutions cannot rely on the traditional metadata records or on the auto
indexing HTML text used by Web search engines to make their research resources accessible to the public. In
response to this problem, the Digital Library Federation (DLF) has partnered with the Mellon Foundation, and
has included CIMI2, to come up with a solution. One solution involves the Open Archives Initiative, discussed
below, that allows institutions to have their metadata harvested by supporting a simple technical protocol. The
harvested metadata can be built into Web-accessible information resources using portals and gateways.
Digital Library Federation, A New Approach to Finding Research Materials on the Web, 2000, available from
http://www.diglib.org/architectures/vision.html; Internet; accessed 24 September 2003.
1
2
Consortium for the Computer Interchange of Museum Information (CIMI) – www.cimi.org
CIMI is an international consortium of cultural heritage institutions and organizations currently researching how to
make the traditional “collections” approach to archiving compatible with the Web. CIMI is involved in a metadata
harvesting project to this end, described below.
Appendix D: Open-Source
Repository Software
There are presently a number of open-source software packages that comply with Open Archives Initiative
Protocol for Metadata Harvesting (OAI-PMH). As of the beginning of this year, available software includes (in
alphabetical order) ARNO, CDSware, DSpace, Eprints, Fedora, i-TOR, and MyCoRe. The following discussion
makes direct use of information presented in the Open Society Institute’s A Guide to Institutional Repository
1
Software , as well as information from each individual software system’s Web site.
ARNO –– http://www.uba.uva.nl/arno
The ARNO system—Academic Research in the Netherlands Online—was released for public use in December
2003. ARNO has different design goals from the other repository systems in that it was designed to provide a
flexible tool for creating, managing, and exposing OAI-compliant archives and repositories. The system
supports the centralized creation and administration of repository content, as well as end-user submission. The
OAI-PMH module is not limited to presenting metadata in the standard Dublin Core format, but offers a
transformation engine that, based on the internal ARNO XML structures and XSLT style sheets, is able to
produce any format. ARNO does not provide a self-contained, “off-the-shelf” institutional repository system, nor
was it intended to provide a full-blown end-user interface with extensive and advanced search capabilities. To
be able to offer these services ARNO implementers need to deploy other, third party software.
CDSware –– http://cdsware.cern.ch
The CERN
[Conseil Européen pour la Recherche Nucléaire/European Organisation for Nuclear Research]
Document Server Software (CDSware) was developed to support the CERN Document Server. This software
supports electronic preprint servers, online library catalogs, and other web-based document depository
systems. CERN uses CDSware to manage over 450 collections of data, comprising over 620,000 bibliographic
records and 250,000 full-text documents, including preprints, journal articles, books, and photographs.
CDSware was built to handle very large repositories holding disparate types of materials, including multimedia
content catalogs, museum object descriptions, confidential and public sets of documents. CDSware complies
with the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying
bibliographic standard.
DSpace ––
http://www.dspace.org
DSpace was expressly created as a digital repository to capture the intellectual output of multidisciplinary
research organizations. The system is running as a production service at MIT, and a federation comprising
large research institutions is in development for adopters worldwide. DSpace integrates a user community
orientation into the system’s structure. As the requirements of these communities might vary, DSpace allows
the workflow and other policy-related aspects of the system to be customized to serve the content,
authorization, and intellectual property issues of each.
DSpace also addresses the problem of long-term preservation of deposited research material to maintain its
utility for archival time frames.
DSpace, in its basic configuration, assumes that the Submission Information Package (SIP) will be delivered
via a Web upload procedure. A unique persistent ID is assigned to each uploaded file. When the digital object
enters archival storage, technical metadata about file format are added and quality assurance checks are
performed automatically. A preservation status is added manually and the whole package is stored as the AIP.
1
Accessible from http://www.soros.org/openaccess/software/
OSI_Guide_to_Institutional_Repository_Software_v2.htm; accessed 27 February 2004.
Appendix D: Open-Source Repository Software
In the release of DSpace scheduled for early 2004, a structural metadata package that describes parent-child
relationships between multiple files called METS will become the Archival Information Package (AIP) format.
METS (Metadata Encoding and Transmission Standard) is based on XML (eXtensible Markup Language) and
is becoming a standard for digital information exchange between research institutions. It is also being used at
the Library of Congress. DSpace currently supports an export of digital content and metadata in a simple XMLencoded file format.
DSpace has Web-based search and retrieval capabilities, but no explicit dissemination strategy beyond
delivery of the file itself. No viewing strategies are included in dissemination. DSpace is designed to be a
federated model—the DSpace Federation—of many research institutions or museums and could be used to
search across many archives at once. This cross-archive search capability is furthered by DSpace’s support of
the Open Archives Initiative protocol for metadata harvesting.
Eprints –– http://software.eprints.org
According to the Open Society Institute’s “A Guide to Institutional Repository Software,” Eprints has the largest
installed base (at least 120 installations) of the seven software systems discussed here. Developed at the
University of Southampton (and now supported in part by the U.S. National Science Foundation), the first
version of the system was publicly released in late 2000. The number of Eprints installations that have
augmented the system’s baseline capabilities—for example, by integrating advanced search, extended
metadata and other features—indicates that the system can be readily modified to meet local requirements. It
is possible to add new tools and scripts using the modules provided.
Eprints can store documents in any format that the archive administrator wishes to be accepted. Documents
can be placed in a configurable, extendible subject hierarchy, which can be used to view and search the
archive. Each individual document can be stored in more than one document format. The archive can also use
any metadata schema (in addition to a core set required by the software); the administrator decides what
metadata fields to hold about each document and how these metadata fields should be projected into the Open
Archives world. Authors can also have associated metadata.
Data integrity checks are performed automatically without the need for administrator intervention. Some "stub"
routines allow individual sites to add their own integrity checks if they desire.
Fedora –– http://www.fedora.info
The Fedora digital object repository management system is based on the Flexible Extensible Digital Object and
Repository Architecture (Fedora). Developed by the University of Virginia Library and Cornell University, the
Fedora is designed to be a foundation upon which full-featured institutional repositories and other web-based
digital libraries can be built.
The units of content in Fedora are called “data objects” and include the digital file, metadata about the file and
links to software tools and services for data delivery. In Fedora, the SIPs are considered the digital files
submitted by the design office before they enter the system. Descriptive and administrative metadata can be
entered in a Dublin Core metadata record. To enter archival storage, digital files along with their descriptive and
administrative metadata must be fed into an XML document using an XML editor or programmed routine.
The system’s interface comprises three web-based services: A management API that defines an interface for
administering the repository, including operations necessary for clients to create and maintain digital objects;
an access API that facilitates the discovery and dissemination of objects in the repository; and a streamlined
version of the access system implemented as an HTTP-enabled web service. A Fedora Java application
provides an administrator graphical user interface (GUI) to create XML documents from the SIP. Fedora uses a
variation of METS with additional requirements and enables the creation of structural parent-child relationships
between a project and the documents that comprise it. For example, a rendering with a high-resolution TIFF
image, a low-resolution JPG image and a thumbnail image could be represented as one XML file with a
structured relationship. A persistent ID would be assigned only at the parent object level.
The Fedora data objects can be searched through a Web interface. The DIP is well thought-out: Fedora
devotes attention to the way a repository can deliver a wide range of media types in their native format. Each
Appendix D: Open-Source Repository Software
data object is assigned a disseminator that links out to tools and services for accessing the object. This sort of
dissemination strategy could prove useful if 3D viewers were chosen to access native formats.
i-TOR ––
http://www.i-tor.org/en/toon
i-Tor (Tools and technologies for Open Repositories) was developed by the Innovative Technology-Applied (ITA) section of Netherlands Institute for Scientific Information Services (Dutch acronym: NIWI). NIWI calls i-TOR
“a web technology by which various types of information can be presented through a web interface,”
irrespective of where the data is stored or the format in which it is stored. It enables creation of websites which
can access information from a database, an Open Archive, or some other file. i-Tor aims to implement a “data
independent” repository, where the content and the user-interface function as two independent parts of the
system. Data from various existing sources can be merged with data entered by users, all of which information
can be searched and browsed in full. Content can be searched automatically with Internet search engines (e.g.,
Google). All content, including information from databases, can be retrieved. i-Tor’s design might make it an
appropriate choice for an institution that wishes to impose a repository on top of an existing set of disparate
digital repositories.
MyCoRe –– http://www.mycore.de/engl/index.html
MyCoRe grew out of the MILESS Project of the University of Essen. In contrast to MILESS, which provided a
hard-coded Qualified Dublin Core data model, the MyCoRe data model is completely configurable. The
MyCoRe system provides a core bundle of software tools to support digital libraries and archiving solutions (or
Content Repositories, thus “CoRe”). The bundle is designed to be configurable and adaptable to local
requirements (hence, the “My”), without the need for local programming efforts. The core contains the
functionality that would be required in a repository implementation, including distributed search over
geographically dispersed repositories, OAI functionality, audio/video streaming support, file management,
online metadata editors, etc. MyCoRe is not hard-coded to a special underlying database. Rather, a
persistence layer interface is provided, together with implementations for different databases.
Appendix E: Global Digital Format
Registry
There is an initiative at Harvard University and MIT, with funding from the Digital Library Federation and
participation from the Library of Congress and the National Archives and Records Administration, to create a
Global Digital Format Registry (GDFR). The Global Digital Format Registry is a project to create a single,
universal format registry to serve multiple repository systems. The GDFR has developed an extensive and
comprehensive listing of information to be maintained about each data format, described in the tables below:
Table 5.2: Format Properties for a Format Registry1
Type
Enumeration
M
Start
End
Note
LastModified
Date
Date
String
Date
O
O
MA
M
Name
Type
String
Enumeration
M
M
Address
Telephone
Fax
Email
Web
Note
LastModified
String
Telephone
Telephone
Email
URI
String
Date
O
O
O
O
O
MA
M
Name
Version
Release
Vendor
Process
String
String
Date
Agent
Process
M
M
M
O
O
1
R
R
R
Access
Access type:
Escrow
Inaccessible copy on file
License
Access by license only
On-site
On-site access only
Public
Unrestricted access
Restricted
No access
Other
Requires informative note
Starting date
Ending date
Informative note
Modification date/timestamp
Agent
Personal or corporate name of agent
Agent type:
Commercial
Commercial (for-profit) entity
Government
Governmental agency
Education
Educational institution
Non-profit
Non-profit entity
Professional
Professional organization
Standard
Accredited standards body
Trade
Trade association
Other
Requires informative note
Postal address
Telephone number
Facsimile number
Email address
Web site
Informative note
Modification date/timestamp
Application
Application name
Version identifier
Release date
Vendor
Process
Format Registry Data Model, Harvard University Library, 22 December 2003, available from
http://hul.harvard.edu/gdfr/DataModel_v3.doc; Internet; accessed 1 June 2004.
Appendix E: Global Digital Format Registry
HWDependency
SWDependency
Note
LastModified
Platform
Application
String
Date
O
O
O
M
Agent
Start
End
Note
LastModified
Agent
Date
Date
String
Date
M
MA
MA
O
M
Identifier
Description
Note
LastModified
Cognomen
String
String
Date
M
M
O
M
Value
Type
String
Enumeration
M
M
R
R
R
R
R
Hardware dependency
Software dependency
Informative note
Modification date/timestamp
Authority
Authority agent
Starting date of effective authority
Ending date of effective authority
Informative note
Modification date/timestamp
Class
Class identifier
Description
Informative note
Modification date/timestamp
Cognomen
Cognomen value
Cognomen type:
AFNOR
ANSI
ARK
BSI
CCITT
DDC
DOI
ECMA
GDFRClass
GDFRFormat
GDFRRegistry
Handle
Informal
ISO
ISBN
ISSN
ITU
JEITA
LCC
LCCN
MIME
NISO
PII
PURL
RFC
SICI
TOM
UUID/GUID
URI
URL
URN
Other
AFNOR standard
ANSI standard
CDL Archival Resource Key
BSI standard
CCITT standard
Dewey Decimal Classification
Digital Object Identifier
ECMA standard
GDFR classification identifier
GDFR format identifier
GDFR registry identifier
CNRI handle
No defined syntax or embedded semantics
ISO standard
International Standard Book Number
International Standard Serial Number
ITU recommendation
JEITA standard
Library of Congress Classification
Library of Congress Control Number
MIME media type [MIME]
NISO standard
Publisher's Item Identification [PII]
Persistent URL
IETF Request for Comment
Serial Item and Contribution Identifier [SICI]
Typed Object Model identifier
Universally/globally-unique Identifier [UUID]
Uniform Resource Identifier [URI]
Uniform Resource Locator
Uniform Resource Number [URN]
Requires informative note
Appendix E: Global Digital Format Registry
Note
LastModified
String
Date
MA
M
Title
Type
String
Enumeration
M
M
Author
Edition
Publisher
Date
Accessibility
Identifier
Note
LastModified
Agent
String
Agent
Date
Access
Cognomen
String
Date
O
O
O
O
M
O
MA
M
Agent
Type
Agent
Enumeration
M
M
Scope
Enumeration
M
Review
Enumeration
M
Date
Note
LastModified
Date
String
Date
M
O
M
R
R
R
R
R
R
R
Informative note
Modification date/timestamp
Document
Document title
Document type:
Article
Correspondence
Manual
Monograph
Report
Standard
Thesis
Other
Requires informative note
Author
Edition
Publisher
Publication date
Access regime
Identifier
Informative note
Modification date/timestamp
Event
Agent effecting the event
Event type:
Delete
Deletion of a format
Initial
Initial registration of a format
Obsolescence
Declaration of format obsolescence
Update
Update format representation information
Other
Requires informative note
Scope of the vent:
Editorial
Non-substantive editorial change
Technical
Substantive technical change
Review type:
Full
Full technical review
Partial
Requires informative note
None
No review
Date/timestamp
Informative note
Modification date/timestamp
Appendix E: Global Digital Format Registry
Protocol
Enumeration
M
Connection
Note
LastModified
String
String
Date
MA
O
M
Class
Note
LastModified
Class
String
Date
M
O
M
Name
Version
Release
Vendor
Note
LastModified
String
String
Date
Agent
String
Date
M
M
M
O
O
M
Type
Enumeration
M
Auxiliary
Note
LastModified
Identifier
Service
LastHarvestedBy
LastHarvest
Note
LastModified
Cognomen
String
Date
Cognomen
Service
Date
Date
String
Date
MA
O
M
M
M
O
O
O
M
R
R
R
Interface
Interface protocol:
HTTP
.NET
RMI
Remote method invocation
SOAP
Web Service
Other
Requires informative note
Protocol-specific connection parameters
Informative note
Modification date/timestamp
Ontology
Ontological class
Informative note
Modification date/timestamp
Platform
Platform name
Version identifier
Release date
Vendor
Informative note
Modification date/timestamp
Process
Process type:
Create
Render
R
R
R
R
Create new instantiation of formatted object
Media type-specific rendering of formatted
object
TransformFrom Requires source auxiliary format
TransformTo
Requires target auxiliary format
Validate
Validation of formatted object
Other
Requires informative note
Source or target format of transformation
Informative note
Modification date/timestamp
Registry
Registry identifier
Supported GDFR service
Date/timestamp of last harvest by this registry
Date/timestamp of last harvest of this registry
Informative note
Modification date/timestamp
Appendix E: Global Digital Format Registry
Identifier
Registry
Note
LastModified
Cognomen
Cognomen
String
Date
M
O
O
M
Type
Enumeration
M
Interface
Note
LastModified
Interface
String
Date
M
O
M
Value
Obligation
ByteStream
Enumeration
M
M
Note
LastModified
String
Date
MA
M
R
R
R
R
Relation
Target format identifier
Target registry identifier
Informative note
Modification date/timestamp
Service
Service type:
Approval
Technical review
Description
Query for specific format
Export
Bulk export of registry data
Introspection
Information about registry instance
Maintenance
Maintain format representation information
Notification
Synchronization Distributed synchronization
Service interface
Informative note
Modification date/timestamp
Signature
Signature value
Signature obligation:
Mandatory
MandatoryIfApplicable Requires informative note
Optional
Informative note
Modification date/timestamp
Table 5.3: Derived Properties (Derived properties inherit all of the attributes of their parent.)
Type
Enumeration
M
Type
Enumeration
M
Position
Enumeration
M
ExternalSignature IS-A Signature
External signature type:
Extension
File extension
Type
Mac OS data type
Other
Requires informative note
FormatRelation IS-A Relation
Format relation type:
EquivalentTo
IsPreviousVersionOf
IsSubsequentVersionO
f
IsSubtypeOf
IsSupertypeOf
MayContain
UsedBy
Other
Equivalent to target
Previous version of target
Subsequent version of target
Subtype of target
Supertype (parent) of target
May encapsulate target
May be encapsulated by target
Requires informative note
InternalSignature IS-A Signature
Signature position:
Fixed
Fixed position; requires offset
Appendix E: Global Digital Format Registry
Offset
NonNegative
MA
Arbitrary
Byte offset
Arbitrary position
Title
Affiliation
String
Agent
O
O
Person IS-A Agent
Personal title
Organizational affiliation
Table 5.4: Registry Properties
Version
Date
Aegis
ExternalRegistry
Ontology
Format
String
Date
Authority
Registry
Ontology
Format
M
M
M
O
M
O
GDFR IS-A Registry
Version identifier for registry code base and data model
Build date for registry code base and data model
R Responsible authority
R Known external registry
Ontological classification scheme
R Format representation information
Format Properties
Identifier
Description
Alias
Version
Author
Owner
Maintainer
Classification
Relationship
Specification
Signature
Application
Provenance
Note
LastModified
Cognomen
String
Cognomen
String
Agent
Authority
Authority
Cognomen
FormatRelation
Document
Signature
Application
Event
String
Date
M
M
O
O
O
M
O
O
O
M
O
O
M
O
M
R
R
R
R
R
R
R
R
R
R
R
Format
Format canonical identifier
Short description of format
Variant identifier
Format version identifier
Author
Legal owner
Maintainer
Ontological classification
Typed relationship with other format
Specification document
External or internal signature
Application system using format
Provenance event
Informative note
Modification date/timestamp
Appendix F: Adobe PDF Format
and Settings
Throughout our discussions, we have been referring to the capabilities of the Portable Document Format (PDF)
version 1.5, Adobe Acrobat 6.0 and Adobe Reader 6.0. Since the PDF specifications are backwardscompatible, we are assuming that future versions will be able to read documents created under the current
specifications.
Operating System and Version Dependencies
Rich graphic and multimedia content, such as animations, can be embedded in PDF documents, meaning that
providing the media as a separate file is unnecessary. The ability to embed content was added in Acrobat 6.0.
There is, however, a dependency on the ability of the operating system or the availability player software to
access some formats of embedded content. For this reason, we have recommended that the AVI format be
used for embedded animations. If other formats are used, separate players might be needed and they might
not be available for all operating systems. Likewise, Adobe Reader 6.0 is not currently available for versions of
Windows earlier than 98SE, Macintosh operating systems earlier than OS X 10.2.2, or for other operating
systems such as Linux. Computers with these operating systems will not be able to play the embedded
content.
Acrobat Settings
In order to maintain color fidelity and image resolution when creating PDF documents, it is important to select
the correct settings. With Adobe Acrobat, the user can create PDF documents by using the print to PDF
function from an application like PowerPoint or by assembling source documents in Acrobat itself. The different
methods require different ways of setting preferences. (Please note: These instructions are based on Adobe
Acrobat 6.0 Professional and Microsoft Office 2002 applications.)
To print to PDF from an application
such as PowerPoint, use File Æ Print
and select Adobe PDF as the printer
Name. Click on Properties and under
the Adobe PDF Settings tab there will
be a pull-down menu for Default
Settings. Select the High Quality
option from the pull-down menu. (See
Figure 5.2.) This choice of settings will
reduce images to 300 dpi and will
maintain the embedded color profile in
each image. These settings will meet
the requirements of the Department of
Architecture. (These Default Settings
can be edited by clicking the Edit
button and changing values under the
Images and Color tabs.)
The second way to create a PDF is in
Adobe Acrobat by selecting Create
PDF Æ From File or From Multiple
Files and then browsing for the
desired file or files. Before importing
files, it is important to select the
following settings. Under the Edit Æ
Preferences menu, select Convert to
Figure 5.2: Adobe PDF Settings Snapshot
Appendix F: Adobe PDF Format and Settings
PDF from the left column. Select TIFF from the next column and click the Edit Settings button. Select lossless
compression schemes from all Compression pull-down menus and Preserve embedded profiles from all Color
Management pull-down menus. Repeat the process for JPG and other image files to be converted to PDF.
(See Figure 5.3.)
Figure 5.3: Adobe Acrobat Preferences Snapshot
Also in the Edit Æ Preferences dialog box is a Digital Signatures option in the left column. Users can create
their own signature, either from a scanned image of a signature or a company logo, and enter company
information. To sign a document, go to Document Æ Digital Signatures Æ Sign this Document and select a
“new invisible signature.” The user will need to add a Digital ID, either a self-signed ID or one provided by a
third party. Once an ID is created, the user can select “I am the author of this document” and save. When the
PDF document is received by the museum, the archivist will be able to open the document and verify the digital
signature using the Signatures tab and can verify that no modifications have been made to the document since
it was signed. Design firms can go an additional step and certify documents with Adobe—an option that pops
up automatically when signing a document.
When printing a PDF with the above settings, it is important to select Printer/Postscript Color Management from
the Printer Profile pull-down menu of the Output selection of the Advanced option of the Print dialog box. If the
printer has been properly calibrated and an ICC profile has been saved for it, Acrobat will be able to properly
map color values from the embedded ICC profiles in the images within the PDF document to the ICC profile of
the printer.
To embed media content in a PDF file, select Tools Æ Advanced Editing Æ Movie Tool. After you drag a box
for the media clip with the crosshairs cursor, the Add Movie dialog will appear. (See Figure 5.4.)
Figure 5.4: Adobe Acrobat Add Movie Snapshot
Appendix F: Adobe PDF Format and Settings
Under Content Settings, select “Acrobat 6 Compatible Media” and check “Embed content in document.” The
Poster Settings selection determines what appears in the media frame before it is selected for playback. If you
choose “Retrieve poster from movie,” the opening frame of the animation will be displayed.
Fly UP