AMIA-MIG Archive Directory Working Group

July 2002 Meeting Minutes and Attendance Roster

Moving Image Gateway Archive Directory Working Group Meeting
Library of Congress, Washington, D.C.
July 25-26, 2002

Thursday, July 25, 2002

Welcome and Introductions

Gregory Lukow, Assistant Chief in the Library of Congress Motion Picture, Broadcasting, and Recorded Sound Division, welcomed participants on behalf of the Library of Congress and the National Film Preservation Board (NFPB). He described this meeting as unprecedented in terms of the representation and subject matter and applauded the Association of Moving Image Archivists and its members' commitment to the project. Lukow described the history and context of the current project, then thanked Jane Johnson and Steve Leggett for coordinating the meeting.

Winston Tabb, Associate Librarian of Congress for Library Services, dropped by to welcome the group and to thank attendees and AMIA. He wished the participants a successful meeting.

Participants introduced themselves to the group. Jane Johnson thanked Steve Leggett again for assisting with logistics of the meeting and introduced Grace Agnew, author of the Final Report and Principle Investigator on the National Science Foundation (NSF) grant.

Overview of the Gateway and the Archive Directory (Grace Agnew)

Agnew opened with an update on the NSF grant. She described the MIG components, design principles, developers, governance structure, alpha implementer sites, other participants and participation opportunities, architecture schematic, goal of the directory, the union catalog, and next steps.

In answer to attendees' questions, Grace and others stated that the directory will be international in scope and eventually should include multilinguality; the NSF grant period is two years; exploring mapping data to the SMPTE data dictionary is a good idea, but won't be done immediately; mapping to MPEG7 is also important; alpha sites may receive a copy of the NSF grant once funding comes through; quality of participants' original cataloging cannot be fully addressed by the Gateway, but it is hoped that the MIG would raise the standard for cataloging through education and training, some of which may be provided through grant funding, and some through collaborations with groups such as IMAP and the volunteer efforts of AMIA (this is part of the AMIA mission); alpha sites have shown a strong interest in collaborative digitization projects and developing digital content.

Comments included: the group should consider GEM (Gateway for Educational Materials) when discussing audience breakdown (Baca); MERIT would be valuable in regard to subject headings (Hubbard); implications of the international scope must be considered (Hunter) [Agnew: AMIA-MIG will be capable of specifying language; we won't start with this, but the database will be capable of multilinguality; it is important in terms of searching and content]; grants are important (Scheines); there might be two types of training: technical, and "now you have it, what are you going to do with it" (Agnew); collaborations are important, e.g. linking to various controlled lists, collaborating on grant applications (Agnew). Agnew agreed to make her PowerPoint presentation available on the AMIA website.

Experiences with Footage: the Worldwide Moving image Sourcebook (Liz Scheines)

Liz Scheines, Editorial Director of Footage, described her experiences with that publication, which is a hard copy near-equivalent of the MIG's Archive Directory component. This was a two-year project which expanded on Rick Prelinger's original Footage89 guide. The original guide listed 1700 sources; the updated 1997 guide lists 3100 total sources, including 400 new U.S. listings and 1000 new entries for organizations outside the U.S. and Canada. The new edition largely followed the original model, including its idiosyncratic subject index, although information was added on storage and production sources, digital formats, rights, and access. The revised edition also added a geographic index, expanded subject index, a list of researchers for hire, and a series of essays (now somewhat out of date). At peak of production, for 4-5 months, there were 6-8 people working on the project, doing follow-up, data entry, indexing, etc. Work began with many meetings and brainstorming sessions. Staff was divided between national and international archive collections. Potential entries were identified through consultation with FIAF and FIAT, guides to television and broadcast stations, reference books, and the Internet; Scheines advised that MIG employ similar methods. Data was gathered via a questionnaire and input into Microsoft Access. Staff developed a style manual documenting policies and procedures concerning workflow, data gathering, and data entry. Indices were created in-house; the subject heading list was created by soliciting subject terms from the contributing organizations and identifying the 100 most-used headings through word counts. Subject specialists were hired to review and vet the subject headings and collection descriptions for accuracy of representation. Scheines emphasized that rights and access information are key components that must be dealt with thoroughly and carefully; she strongly urged making access restrictions explicit, strongly worded, and visually prominent, for the benefit of end users. Many collections are not open and listing inaccessible collections without adequately clarifying restrictions tantalizes and eventually frustrates end users. Scheines discussed procedures and documentation, and distributed copies of the original questionnaire used to gather data from organizations.

Comments and questions:

· The subject index appears to be the key to the directory.
· There is a general problem of item- vs. collection-level description. [Agnew: item-level description cannot be addressed in the MIG Archive Directory; perhaps search engines could assist with cleaning up data, but not in Phase 1.]
· Precision hits at the item level in MIG will be available through the bibliographic database rather than the Archive Directory, and will be subject to the general quality of cataloging provided.
· Second Line Search purchased the original data content from Rick Prelinger, and it is now owned by Corbis. Corbis could be approached for collaboration. (Scheines, offering to assist) [Agnew: MIG has no funds to pay Corbis for data]
· Editors did not seek permission for updating pre-existing entries from the earlier publication. The practice of including restricted or inaccessible collections was okayed by lawyers; however, many calls were received from organizations angered over their inclusion. (Scheines)

Discussion of Archive Directory Data Elements

Johnson ran through the list of proposed data elements (see attached) and asked for comments and questions, which included the following:

· Agnew - Need to determine which fields should be repeatable and which should be mandatory.
· Williams - Need guidance on how to express "collection size" (title count, item (e.g., can) count, feet of footage, linear feet of shelf space, number of collections, etc.).
· Allow collection size to be expressed as rough percentages or proportions rather than raw numbers. The number becomes inaccurate anyway as soon as more materials are acquired. This is truly a "ballpark" figure.
· Becker - Need to define "viewing room."
· Scheines - In describing collection size, need to give as many options as possible, since collection size is sometimes unknown, or expressed differently; consider defining simply as small, medium or large (with definitions for each).
· Agnew - Collection size question should not direct the archive as to how it should measure its collection; the purpose is simply to give the targeted user (archival community, general public and educators) an idea of the size for purposes of access.
· White - Need to consider sound. (This is an eventual goal but will not be included in Phase One--Agnew)
· Williams - What about updating the directory? (Participating archives should probably be asked to update entry annually--Agnew)
· Williams - Can we use crawlers to do this? (Doubtful the information required could be usefully scraped from existing sources--Agnew).
· DeRoest - Is regular harvesting an option for updates? (Yes--Agnew).
· Tariot - There is a service that can explore "the invisible web" to crawl and locate updated information. We might consider that as an option. (Could try it, but am doubtful it would usefully harvest the necessary information--Agnew)
· Becker - Could ask participating archives to voluntarily update their entry when necessary. (Possible. The key is there is a minimum amount of data that needs to be updated annually and we need to consider all of these ideas to best determine how to obtain these updates--Agnew)
· Becker - Perhaps we can automate the process so that the database automatically contacts each archives via email one year from the last update and request updates.
· DeRoest/Thaxter/Agnew - We can make a schema and if the Archive will keep that at home, we can update automatically by setting them up with MySQL for this. This gives the institution the opportunity to clean up its data prior to OAI harvesting, while allowing at-will updating.
· Scheines - Regardless of the method used, the group needs to track updates (what has been updated, what has not); simply indicating that an entry has been updated, without saying what has been updated, is not that useful.
· Thaxter - If we do updates, we need to keep track also of who is doing the updates.
· And how frequently the database changes.
· Agnew - We need about a half dozen audit fields indicating things like who submitted the information, etc. (metametadata). We should also determine if it is perhaps more economical to request a completely new record every two years as opposed to having to deal with a process for ongoing updates.
· Streible - Regarding the contact, the person's title is perhaps more important than the person's name (since staff turns over).
· Agnew - We could use vCard to standardize that field.
· McCallum - Data verification is important to include in data history, i.e. "Data about the creation ... updating and verification of the Archive Directory entry itself ..."
· Gunderson - In regards to "Date of Last Database Update", we might ask how frequently an archives changes (updates, adds to) their catalogue (holdings). For example, how many collections or items do they anticipate adding each year. (This would be difficult to codify--Agnew).
· Humphrys (?) - One basic point: we may need to add a front page to explain how the site works, for example, how it is updated, that it is not a live link to catalogs, etc., and include that in instructions to people contributing. We could also explain that users need to check out the web site for each archive to get the most up-to-date information. (Great idea; and that illustrates one purpose of concatenating the home URL with the bib records--Agnew)
· Agnew - Also under Database Administration Data, we need a repeatable data element indicating portal ids.
· Scheines - In the "Rights" field, include licensing and rights contact. That is, who to contact for information on rights. (Rights is a thorny issue. Perhaps the best place for this is in our "conditions of use" field--Agnew)
· White - Need to include information on availability of digital surrogates or proxies. (This gets complicated--Agnew)
· Theerman - How are we defining "archives"?
· Johnson - AMIM2 includes a potentially useful definition: Archival materials are those materials intended to be kept so that they may be available for future generations, regardless of their age at the time of acquisition.
· Theerman - Need to show relationship between organization represented by the Archive Directory entry, its parent body, and other departments holding moving images in the same institution.
· Simpson - Perhaps need to add a "Sub-unit" field, converse of the Parent Institution field.
· Tillett - NUC codes [LC's National Union Catalog codes] are a possibility for ArchiveID; they work very well for indicating parent/subunit relationships.
· Apparent group consensus - Sub-unit URL is a useful addition under "Directory Information."
· Scheines - The semantics of the term "collections" is also problematic.
· Group consensus - All data elements require scope notes.
· Becker - Do I understand correctly that the archiveID is a unique identifier? (Yes, and we need to make clear that the ID is unique, Ids are linked to portals, and they have search implications. Perhaps we also need to use codes.--Agnew)
· Hunter - How should we handle the distinction between actual individual repositories and distributed databases (virtual archives) spread over the web? (This cannot be part of Phase One because it is not homogeneous. Z39.50 will help once it is built in. A union catalog is the first step--Agnew)
· Tadic - How will broadcast stations fit into this model? They have only commercial interests and likely may not want to participate. Can we incorporate fee structures or add a field which indicates organizations may have material for sale? (The Steering Committee will have to discuss this with LC and AMIA. The project has always had a non-profit focus--Agnew)
· Kalas - Perhaps there are opportunities for sponsorship. Raise funds to support the project through advertising on the site.
· Humphrys - We need to integrate the profit and the non-profit if we are going to be successful.
· Kalas - We should not discount opportunities to think more commercially.
· Agnew - We don't want to include information on specific pricing because that changes too frequently; in any case it cannot be included in Phase One.
· Arms - Perhaps we can use each organization's logo as a link to take users to its URL and more specific information about things like pricing.
· Hunter - Have we considered issues of content delivery, business models, and rights management for the MIG? Incorporate metadata now rather than later, and as part of Phase Two. Consider digital rights management issues and design implementation now; don't wait.
· Agnew - Yes, but the LC may be somewhat reluctant about widening the focus to include commercial interests and we need to keep that in mind.
· Becker - Perhaps the opportunity to sell materials would encourage people to participate, that is, to complete and return their questionnaire; even the non-profits.
· Agnew - Yes, but we need to focus this phase on access.
· Streible - Need to maintain comprehensiveness of the Archive Directory and represent all materials, even those not available for access. Researchers need to know that a title exists, whether or not it is available for viewing. Deal with differentiation between non-profits and for-profits later. The big picture is the next step; later, commercial interests can be considered.
· Agnew - The granularity of price is too specific right now; it cannot be addressed now.
· Price - Educators are core audience members. The group discussed this.
· Johnson - Many of these issues have been explored and addressed in the Final Report, which can be distributed freely once grant funding has been obtained. At this meeting we are focusing be on finalizing the data elements and controlled vocabularies.
· Complex organizations, those with multiple departments, will have to decide how they wish to be represented in the MIG.
· Johnson - Description must be a one-to-one match to the organization name and content information at the top of the Archive Directory entry.
· Carter - Is the URL field a repeatable field? What about the "viewing facilities" field? Should we add a "dub" field?
· Johnson - Perhaps dubbing goes under "conditions for use."
· Group consensus - The URL field should not be repeatable. The Directory entry needs to be kept simple. The organization has some responsibility to further direct people from its own home website.
· Hunter - We need to indicate if clips are available as opposed to just full programs. (Level of description is an issue that needs to be dealt with--Agnew. This is an issue for the working groups--Johnson)
· Carolyn - We may want to add a free text field on functionality of (home) search page.
· And functionality can include granularity of available searches at the home search site. Possibly include this under "Cataloging Activities," to indicate level of granularity for both metadata (clip vs. feature film) and retrieval.
· Scheines - May need to add more free text notes fields, for example, "Collection Notes." (Remember, the goal of the Directory is not to replace an archives catalog. We can't be all things to all people. The directory needs to be reasonably lightweight and maintainable. Keep in mind FRBR [IFLA's Functional Requirements for Bibliographic Records]--Agnew)
· Johnson - Notes fields can be useful, at least internally at point of input for clarification of data entered, but let's be very judicious.
· Baca - Warned group that notes fields can be abused; if the information is important enough to include, it probably warrants a data element; otherwise, organizations will include information in notes that should really be included elsewhere; it becomes impossible to map and impedes retrieval.
· There seemed to be acceptance of a note field at the input end, for organizations to give clarify points, etc.
· Humphrys (?) - How would notes field be searchable? (As proposed in the Data Element document, it utilizes both free text and controlled vocabulary, but this is something that needs to be determined by the group. Let's use drop-down menus whenever we can--Agnew)
· Hunter - Let's confirm that there are two ways to search: 1) Directory entries, and 2) bibliographic records. (Correct, and we want to allow filtering bib records using Directory information data--Agnew)
· The Internet2 video middleware group is looking at how to discover directories that are out there.
· Johnson - Should Archive Location information be hierarchical?
· Baca - Yes, especially since we are including countries from all over the world.
· Agnew - How about using: 1) country and 2) state/province.
· Hunter - Being familiar with Australia, "region" can also be helpful.
· There is an ISO standard for regions
· Baca - This group should look at the Getty Institute's Thesaurus of Geographical Names.
· Tillett - The NUC code has country, state, city. Let's take advantage of this resource since it is already in place, and is hierarchical. LC controls NUC codes, and assigns them within two days of the request. The MIG can request an NUC code on behalf of an archive.
· Thaxter - I would recommend that any complexity be hidden. (Agreed--Agnew)
· Johnson - Should geographic locations be subfielded? Repeatable fields?
· Williams - Should we only design structure to reflect data (what comes first, data or structure)? For example, do we list all countries in a drop down or only those countries which are reflected by archives participating in the Directory?
· Agnew - Let's use drop down to only reflect what is currently included.
· Price - It may be easier if we should develop the structure first. If a user plugs in a country for which there is not entry, zero hits are indicated.
· Hubbard - Problem is we are including the whole world. Would such a drop-down be unmanageable?
· Group consensus - Define geographic fields (country, state/province, city) separately.
· Theerman - Political designations are not always the most useful. What about (non-jurisdictional) geographical areas? (We could always do this kind of break-down at the back-end--Agnew)
· "Formats collected" could include indication of whether materials are available onsite, online, and/or online to all.
· McCallum - It would be useful to indicate rate of acquisition [possibly in the "Programming, Collections, and Research Support Activities" field] to indicate if, and how much, the collection will be growing; many collections are static.

Introduction to Controlled Vocabularies

Murtha Baca (Getty Research Institute) presented an introduction to vocabularies. She distributed two handouts: her PowerPoint presentation and an annotated bibliography of tools.

Comments and questions:

· Hunter - There is a vocabulary problem with AMIA-MIG in that all the participants will have different vocabularies. How will AMIA-MIG merge them?
· Agnew - The purpose of building portals is to accommodate the variety.
· Johnson - The portals are intended to help us deal with the discrepancy between genre and form and subject. Our group's choice of controlled vocabularies will also help us make that distinction.
· Baca - We need to decide how many categories to have and what each means (subject, genre, form).
· We can allow broad subjects, e.g., "medical," "theological."
· Subject can be handled at the portal or bib record level, or in the "Programming, Collections, and Research Support Activities" free text field.
· We need good guidelines for free text fields. (Discussion ensued re: providing vocabulary guidelines for free text fields, but the point was made that this defeats the purpose.)
· Johnson - Let's set aside "subject" for now. We envisioned 10-20 terms for both genre and form. What does the group think?
· Goldman - "Movements" should be considered for the form/genre field.
· The group first decided that form and genre should be separate fields with pull down menus instead of one category titled "Collection and genre strengths." The group decided to name the field "Collection strengths" instead and to include only form. The group continued to discuss the definition of genre and form as well as the disadvantage/advantage of free-text searching.
· Agnew - We must be careful determining the Directory fields because this is how users will be filtering their search.
· Johnson - Perhaps genre is getting too specific, and is not applicable to much of the larger universe of moving image materials.
· Agnew - Keep in mind the 3 or 4 constituencies we are trying to reach: general public, educators, archivists, and stock shot users. Remember to think of these users when we break into our working groups tomorrow.
· Becker - Most archives can succinctly describe their collections.
· Agnew - Broad brush stroke descriptions were the idea.
· Johnson - Recommend the group focus on form for now.
· Streible - Perhaps the best thing to do is keep form categories very simple: fiction, non-fiction, experimental.
· Johnson - Need to discuss distinctions between "Archive Role," "Archive Type," "Audience(s) Served," and "Programming, Collections, and Research Support Activities."
· The group discussed the problems with various terms such as: collection, archive, institution, repository, and organization.
· Barry - For the MARC Code List for Organizations, there were lengthy battles over terminology and the end result was this term (Organizations); don't reinvent the wheel.
· DeRoest (?) - In the context of Internet2 enterprise white pages, "Organization Directory" however would be misleading; do not rename this project "MIG Organization Directory." "Organization" works fine for the actual data element (field) names themselves though. ("Archive" is too limiting.)
· Goldman - All terms need to be defined for the archives inputting the data and completing the questionnaire.
· The group discussed the "Institution Type" and "Institution Role" fields, their purpose and value.
· Group consensus - For names of individual data elements, there are problems with all of the following terms, depending on context: Archive, Association, Collection, Repository. Organization is a good term to use in names of the individual data elements (e.g., "Organization ID"), but don't call the MIG Archive Directory component an Organization Directory; the term "Archive" works in that context.

Group Exercise

Using real life examples (CNN and NLM) the group filled out sample records for the more problematic fields, and discussed them:

For Organization Type field, the group reviewed the terminologies listed in "Archive Directory Data Elements-Controlled Vocabularies" to determine if this selection would be appropriate and sufficient. It was suggested this field should be repeatable. Johnson suggested the possibility of using 'tool tips' to define what each category means. Agnew suggested alternatively a legend or a parenthetical description following each term. There was some discussion about replacing the term "Television Company" with "Broadcaster/Cablecaster" based on Footage practice. It was later noted that this practice was reversed; "Television Company is used in the book. It was noted that "library," in general parlance, can be synonymous with public library, and that term should be used with caution.

For the "Organization Role" field (subsequently changed to Organization Services), Agnew suggested it be designated as free-text so that an organization lacking its own web site might be able to add inclusive information here. Services might include: ILL [interlibrary loan], reference, dubbing, on-site use, exhibitions etc. There was some debate over whether this section should include a mission statement or whether a field for the mission statement should be added as a separate field. It was decided that this field should indicate services offered, not services supported.

For the Collection Strengths field, participants questioned whether to include form and subject, or just form. There was general agreement that the field should use controlled vocabulary to facilitate searching. Examples of form might be documentary, training, and/or educational. Johnson suggested the group look at the document entitled "Video Development Initiative (ViDe) list." Some think this list is too specific. Hubbard suggested the field be broken into two fields: 1) form and 2) subject, and that we use a drop-down list. A straw poll indicated an overwhelming majority favored a new data element called "Collection Strengths: Subject." Murtha Baca suggested limiting organizations to three terms that can collectively describe the entire scope of the collection.

It was suggested that Collection Size be split into two fields, one controlled vocabulary, the other free-text. It was noted that there will be limits on size of entries for technical purposes, because entries are generating dynamic web pages.

Participants discussed the Audience(s) Served field and how specific the information in this field should be.

For the Conditions for use/Access restrictions field, Agnew suggested we develop a list of terms (not exhaustive). Perhaps toggle choices. Scheines strongly urged the group to use the "tried and true" terms originally developed by Rick Prelinger for Footage.

Johnson described how the Friday breakout sessions would work. Each of five groups would discuss three fields (each field is looked at by two groups), plus all groups will discuss the "Collection Size" field. Each group will then report back. Johnson thanked the participants for their work and asked them to return tomorrow at 9:00 a.m.

5:00 p.m. - Meeting adjourned.

Friday, July 26, 2002

Breakout session instructions

Johnson reviewed a document that she and Murtha Baca compiled after the Thursday meeting. It took the seven Archive Directory data elements identified Thursday as "problematic," indicated whether they were repeatable or not, required or not, and subject to controlled vocabulary or not. Each element was given a scope note. The elements were: Organization Type(s), Organization Services, Collection Strengths: Form(s), Collection Strengths: Subject(s), Audience(s) Served, Conditions for Use/Access Restrictions, and Programming, Collections, and Research Support Activities. The small groups were instructed to use these definitions as a guide in their breakout sessions. Time would not permit discussion of "Collection Strengths: Subject(s)" in the small groups. The Footage questionnaire was also distributed for reference.

Small group agendas were revised: All groups will discuss collection size and consider inclusion of sound terms in the "Collection Strengths: Form" and "Formats collected" fields (sound recordings in moving image collections, e.g. tracks; do not include radio broadcasts, etc.). Participants broke into their small groups, and would reconvene after lunch.

Group leaders reported their findings to the reconvened group. (See Breakout Session reports.)

Johnson described the post-meeting action plan: The Technical Committee (chaired by Johnson) will assign task forces to write sections of the report, which the Committee will consolidate into a Final Report. The Final Report will be the working document used by the developers to design the Archive Directory database and is therefore due October 1, 2002. (If NSF funding comes through, work on the project will begin on that date.) The Report will include finalized data elements with scope notes, vocabularies with scope notes, and indicate for each field whether it is repeatable or non-repeatable, mandatory or optional, and indexed or not. Agnew pointed out the report need only make recommendations for indexing; this is subject to change as the development work proceeds and need not be a "deal-breaker." She added that the document could go out for review and revision after October 1, but must be in good enough shape by that date to start work. Some individual elements or vocabularies could be flagged for community review.

Johnson thanked participants and the Library of Congress. The entire group thanked Steve Leggett.

3:30 p.m. - Meeting adjourned.
Breakout Sessions

Breakout groups were asked to address the following questions for each category:

ORGANIZATION TYPE (formerly INSTITUTION TYPE) (ca. 10, e.g., corporate, academic, public, K-12 library, production house, stock footage house, etc.)

Are there other possible sources besides Footage?
What terms can be added to or changed on the starter term list above? (Cf. Footage list in your packet)
Should we limit to ten or so most common/important, with another category for "other" or try to provide an exhaustive list?


ORGANIZATION SERVICES (e.g., research, education, footage licensing, corporate archive, etc.)

Is there an existing source for these terms?
What terms can be added to or changed on the starter term list above? (If you have any mission statements available, use those to help generate terms)
Would it be useful to identify whether an institution is generally internally or externally oriented? What is the best way to do that?


ORGANIZATION LOCATION

Are there other possible sources of terms?
Identify pros and cons, or issues associated with each source if applicable
Choose preferred source
Identify appropriate level (s) of specificity. For example, providing city and state might not adequately serve a user wishing to retrieve titles in a particular metropolitan area.
Would it be useful to make this field repeatable or multipart in order to capture two or more geographic levels (country in one, state/territory in a second, city/metropolitan area in a third)?


COLLECTION STRENGTHS: FORM (e.g., feature film, broadcast news, etc.; as detailed (e.g. Asian avant garde) or broad (silent films) as desired)

Are there other possible sources of terms?
Identify pros and cons, or issues associated with each source if applicable
Choose preferred source
Determine preferred number of categories (ballpark) to enable working groups to choose appropriate level(s) of specificity after the meeting
Consider sound terms (sound recordings in moving image collections, e.g., soundtracks; do not include radio broadcasts, etc.)

FORMATS COLLECTED (e.g., film, video, digital)

Are there other possible sources of terms?
Identify pros and cons, or issues associated with each source if applicable
Choose preferred source(s)
Determine preferred number of categories (ballpark) to enable working groups to choose appropriate level of specificity after the meeting
Should we create a mixture of broader/narrower terms for the list or does that reduce the usefulness of this element for descriptive or indexing/retrieval purposes? For example, is 'video' useful as a separate term? Or 'film'? Or 'digital'?
Consider sound terms (sound recordings in moving image collections, e.g., soundtracks; do not include radio broadcasts, etc.)


AUDIENCE(S) SERVED (controlled list, e.g. 'organization or corporation members only,' 'researchers'; supplemented by free-text description, e.g., 'non-profit organizations providing medical assistance to developing countries')

Where this utilizes controlled vocabulary, what retrieval abilities do participants want?
List useful terms. Don't worry about form of terms, just content


CONDITIONS FOR USE/ACCESS RESTRICTIONS (e.g., 'onsite use by University of X faculty and students only')

List useful terms. Don't worry about form of terms, just content
Consider toggle choices, e.g.,
lending / onsite use only
some reproduction permitted / viewing only


CATALOGING ACTIVITIES (combination controlled list and free-text description of the metadata standards and technologies used; intended to facilitate information sharing among archives)

List categories to be covered (e.g., Metadata standards, Data exchange standards)
List options within each category.
What other data would you like to see here, e.g., percentage of collection cataloged, percentage cataloged online, etc.


COLLECTION SIZE

How best to express (linear feet, number of titles, all of the above, etc.)
Allow organizations to express as Small, Medium, and Large? If so, what are good ranges for those categories?

Who would like to work on what in follow-up task forces?
REPORT: AMIA MIG Breakout Group #1
Leader: Jane Johnson
Present: Ruta Abolins, Alice Jacobs, Sally McCallum, Ed Price, Janice Simpson,
Linda Tadic (reporter)

ORGANIZATION TYPE

This field describes the primary function of the organization. Choose all that apply.

Additional thesaurus to consider: RAD

Note: Moving image organizations such as AMIA will not be included in the Directory, but in the Education/Outreach section of MIG.

Use terms from "Major Categories Most Used In Footage," but add "Non-profit organizations" from the "Other Categories Used" list:
· Archives
· Association
· Stock footage house (changed from "Commercial library")
· Corporation
· Distributor
· Educational institution
· Foundation
· Government agency
· Historical society
· Library
· Media arts center
· Museum
· Non-profit organization (added from "Other" list)
· Private collection
· Production company
· Television company

Do not allow "Other" category.

Care should be taken that terms do not include concepts that could be defined as service or subject. Thematic archives (e.g., Religious orgs, Political orgs) should be avoided.

Terms that need special attention in their definitions are:
o Archives (what makes an archive? Preservation activities?)
o Library (define generally; otherwise, all orgs would check "library")
o Stock footage house (vs. Stock footage "source", which is a service. This term should be used when licensing stock footage is the organization's primary activity.)
o Non-profit organization (compare to "Association")
ORGANIZATION SERVICES

These should be "positive" terms used to describe services an organization can offer to the public. The org should select as many as apply. If a service is offered to internal staff only, it should not be selected (the information serves no useful purpose to Directory users). There should be a picklist with checkboxes for each indicating internal (in-house) or external (public).
o Research
o Reference
o Viewing facilities
o Screening facilities [e.g., projection facilities; for researchers; doesn't apply to public exhibition]
o Online viewing
o Duplication/Reproduction [includes digitization]
o Restoration/Preservation [on demand, as a service to other orgs]
o Licensing
o Sales
o Copyright clearance
o ILL

Terms that need special attention in their definitions are:
o Research (staff does research for clients; cf. reference)
o Reference (clients come on-site to do research, sometimes with staff assistance; cf. research)
o Restoration/Preservation (make separate and define each)

The group felt that it should be noted somewhere that the organization might charge fees for service. [In larger discussion, Grace felt that the Gateway should not mention money/fees; this should be included on the organization's own site.]

FORMATS COLLECTED

Other sources to consider: SMPTE; RAD; survey in AMIA Compendium of Moving Image Cataloging Practice appendix (SAA, 2001) (this lists just Film; Video; Audio)
o Film (including audio recordings on film, e.g. mag tracks, optical tracks)
o Videotape (analog/digital)
o Audio recordings (all audio not on film or digital file)
o Video discs (CD/DVD/Laserdiscs)
o Digital files (sound or picture; include DLT)

For the Archival Portal, include after "Film" checkboxes for nitrate and safety. The group did not want to include mention of gauge or video format as this would make the field far too complex than necessary. Specifics can be listed in the narrative free text field "Programming, Collections, and Research Support Activities." [During the larger group discussion, Dan Streible argued that scholars/researchers would like to be able to limit their searches to organizations holding small and odd-gauge film such as 8mm, 9.5mm, and 28mm. This exception would satisfy the film archival community and organizations such as NFPF and AMIA.]

[Other notes from group discussion: For "Film," include common gauges in scope note; include examples of tape types in the "Videotape" scope note. Include Vitaphone discs as an example in the "Audio recordings" scope note. See the detailed lists for scope note examples. Include high definition?]

COLLECTION SIZE

The general size field where the org selects Small - Medium - Large would be mandatory. Any other fields should be checked as appropriate. These fields are optional.
o Small: less than 5000
o Medium: greater than 5,000 and less than 100,000
o Large: more than 100,000

The general size field can be used to limit searches. The other fields are not searchable; they are for informational purposes only.

The Final Report should define what quantifiers should be used in determining the general size. For now, the group recommends using the number in the "Total Titles" and/or "Total Shots" fields, or if not relevant, then any fields the org finds most useful.
o Total physical items
o Total titles
o Linear shelf feet
o Feet of film
o Total hours
o Total shots

"Number of collections" is not a useful category.

Include a checkbox to indicate if the collection is static or growing. (This can be expanded upon, e.g., by indicating rate of acquisition or collection growth, in the narrative free text field "Programming, Collections, and Research Support Activities.")

[Note: In group discussion it was noted that S-M-L could be defined after the fact.]

Note: The task force working on the "Collection Strengths: Forms" field might want to also consider the terms in the AMIA Compendium of Moving Image Cataloging Practice (SAA, 2001).

REPORT: AMIA MIG Breakout Group #2
Leader: Murtha Baca
Present: Caroline Arms, Snowden Becker (reporter), Karen Cariani, Jim DeRoest, Nancy Dosch, Dan Streible, Kathleen Williams

INSTITUTION LOCATION

Envisioned as designator for searching, access part of FRBR.

Sources for terms:
· NUC codes (cons: potentially discouraging to unregistered users; pros: could be suitable for archiveID)
· TGN (cons: very extensive, granular resource with which users may not be familiar; verification at front/back end difficult; pros: incorporates alternative forms of place names, especially at the national level, has hierarchical regional designations which could be useful for searching)
· NAF/MARC Code list for Geographic Areas (cons: like TGN, very extensive/detaied, not familiar to users (especially codes); pros: in English, straightforward, can be integrated with TGN and others. Preferred/recommended.

Recommendations for behavior:
· Level of specificity: requires 3 parts (town/city, state/province, nation)
· separately field entries in address area
· if required to set apart/parse out from Archive Directory information, can auto-populate a "location" set based on address field data, but remain editable as required (if physical repository is in a different location from administrative entity)
· City and State and free text
· Nation/country is pick list based on MARC list.. This has the benefit of established terms present elsewhere with associated (but not overwhelming) regional hierarchy, as well as consistency with Internet form conventions.

Additional suggestions:
· For search function/retrieval, allow regional searching that encompasses areas bigger than cities, smaller than nations (TGN)
· Provide a way to attach code for closest airport (access/user-friendliness issue) for search results
· Avoid redundant data entry wherever possible
· Enforce English-language version of nation name, not vernacular
· Enforce use of full state/province name, not abbreviation

AUDIENCE(S) SERVED

Describes all groups targeted/interacted with by the organization, using broadly defined terms.

Recommendations for behavior: "Check all that apply."
· General public
· Educators
· Subscribers/members
· Professionals in the targeted field
· Students
· Researchers
· Affiliated staff/faculty/students
· Media/footage users
· Exhibitors

CONDITIONS FOR USE/ACCESS RESTRICTIONS

This is a defined set of characteristics which can be used to effectively (generally) describe archive functions and assist in narrowing search results. Define in positive, rather than negative, terms (i.e. allowed functions). Specifics can be detailed in the narrative free text field ("Programming, Collections, and Research Support Activities"). Divide into two fields [note this was questioned in group discussion]: Conditions for Access (i.e. physical access) and Material Available For (i.e. uses)

Recommendations for behavior: "Check all that apply."

CONDITIONS FOR ACCESS:
· collection not available
· on-site access only [i.e. non-circulating]
· by appointment only
· qualified researchers only [or general public; some restrictions apply]
· staff/members only

MATERIALS AVAILABLE FOR:
· licensing
· reproduction
· exhibition
· loan, rental, purchase
· research
· educational use
*restrictions and/or charges may apply

[Note: in group discussion people thought this second category belonged in the "Organization Services" field. Also, it was suggested that defining in positive terms only is not working; try indicating restrictions only. Rely on other fields in combination with this to get the information across; for example "Organization Services" (but Grace notes this won't display in bib records), "Audience(s) Served." Consider incorporating some services terms in concatenated bib records. Possibly let the archive choose its own term that will display with bib records.]


Additional suggestions:
· Although specifics about services supported and requirements for various audiences or materials use (including charges) are expected to appear under "Programming, Collections, and Research Activities," some terms here should touch on th[at?]

COLLECTION SIZE

Do not use S-M-L designations; too subjective; instead use two categories: the quantity (approximate numeric value), then the unit.

Unit options should include:
· discrete titles
· total items (e.g., reels, cans, tapes)
· total hours
· length of film (in feet)
· length of film (in meters)
· linear feet of shelving
· linear meters of shelving
· cubic feet of shelving
· cubic meters of shelving
· number of collections
· total shots

Repeatable; fill in all that you know/all that apply.
Allow "Unknown" and "Other."
REPORT: AMIA MIG Breakout Group #3
Leader: Dan Kniesner (reporter)
Present: Grace Agnew, Nigel Elmore, Nancy Goldman, Dina Gunderson, Mike Mashon, Kim Schroeder, Barbara Tillett, Ellie Wackerman


I. Collection Strengths (Form) (controlled vocabulary):

1. Amateur.
2. Animation.
3. Documentaries/Nonfiction.
4. Educational/Instructional.
5. Experimental.
6. Feature films/Shorts.
7. Interviews.
8. News.
9. Performance.
10. Promotional/Advertising.
11. Television.
12. Unedited footage.

(Comments: Derived from L.C.'s Moving Image Genre-Form Guide (migfg) and the ViDe list, then adapted through discussion. A 'snapshot' of collection strengths in terms of forms is useful, but using these forms for pre-filtering in MIG searches could be dangerous. Sports and medicine are subjects and were excluded from this list.)

[In group discussion, it was noted that this group gave precedence to common parlance over consistency for consistency's sake. They recommended being cautious about using the word "film" in the terms. They likened their terms to "elevator conversation" terms; what terms would you use to describe your collection if you were explaining to a stranger in an elevator? Your description would need to be concise and readily understood. This group, after discussion, would include "Sports" as a form term; it was hotly debated in the group and it seems clear the pubic would want that. They also liked Group 5's term "Research Documentation." After discussion, both groups agreed that the best term for "Experimental" (no. 5 above) is "Video art/experimental films.") They suggested that the terms "Conferences" and "Webcasts" be given further consideration in follow-up work. Also, oral histories and other "talking heads" footage may need to be accommodated, possibly under "Interviews"; and cf. webcasts in this context.]

II. Audiences served (controlled vocabulary):

1. Educators.
2. General public.
3. Members/affiliates.
4. Organization's own staff/faculty.
5. Researchers/scholars.
6. Stock footage buyers.

[In group discussion, it was noted that this information must be meaningful on the bib record. This group preferred to lump students with "Researchers" or other category; cf. Group 2 approach. It was noted that "Stock footage buyers, licensees, renters" will be critical for the portal. This group agreed that Group 2's term "Exhibitors" was a good addition. It thought "Professionals in targeted field" could be included as researchers; the problem is it's not meaningful on bib records. Clarify the overlap between this category, "Conditions for Use/Access Restrictions," and "Organization Services."]

III. Cataloging activities (with some controlled vocabulary):

1. Content standards.
(AACR2, AMIM I, AMIM II, FIAF, RAD, RICA, RAK, In-house, Other)
2. Metadata schemas.
(Dublin Core, MARC, IMS, VRA Core, MPEG-7, EAD, ISBD, SMPTE, SMEF, In-house, Other)
3. Subject heading lists, classification schemes.
(LCSH, AAT, etc.)
4. Catalog form.
(no catalog, list/brochure/book catalog, microform, card, finding aids, standalone computer, online networked computer, web-based)
5. Policy is: Catalog for staff use only, or available to public.
6. Percentage of collection cataloged.
none, less than half, more than half, all
7. Percentage cataloged online.
none, less than half, more than half, all
8. Percentage cataloged at collection-level vs. item-level.
all collection-level, some or most collection-level, some or most item-level, all item-level
9. Percentage cataloged at shot-level.

[Note in group discussion it was confirmed that this information is there to serve cataloging community collaborations. Also, several types of information were deliberately omitted from the Archive Directory database because they are somewhat ephemeral and/or will be captured by other means: communication format standards, database systems used and their version numbers (Oracle, Informix, Access, FilemakerPro, integrated library system, media asset management system, etc.), and language/character sets.)]

IV. Collection size (rough idea in terms of number of items)

1. Small - less than or equal to 1,000
2. Medium - greater than 1,000 and less than or equal to 100,000
3. Large - greater than 100,000

V. Preservation activities

1. Cold storage.
2. Restoration.
3. Reformation.
4. Conservation.
REPORT: AMIA MIG Breakout Group #4
Leader: Jane Hunter
Present: Gary Carter, Jim Hubbard, Barbara Humphrys, Pat Loughney, Mairéad Martin (reporter), David Wells, Karen Wyatt

1. ORGANIZATION TYPE

1. Are there other possible sources of terms beside Footage?

Footage only one to do this, but we should compare to broader lists in general reference sources - Encyclopedia of Associations, for example (US and Canada). (Note: does MARC have a list?). (None of directories we looked at had much information about services.)

2. What terms can be added to or changed on the starter term list above?

1. Archives (decided to take this out)
2. Association
3. Broadcaster/Cablecaster
4. Commercial Stock Ftg. Library
5. Corporation
6. Distributor
7. Educational Institution
8. Foundation
9. Govt. Agency
10. Historical Society
11. Library (decided to take this out)
12. Media Arts Center
13. Multimedia Company
14. Museum
15. Private Collection
16. Producer

Additional (some moved from 2nd level to 1st level on the list):

17. Non-Profit Organizations
18. Non-Govt Organizations
19. Religious Organizations
20. Performing Arts Organization
21. Community Groups
22. Research Organization


3. Should we limit to ten or so most common/important, with another category for "other" or try to provide an exhaustive list?

Keep 20 categories and include "other"
Yes, we'd want to include "other" - user has to specify and it's a short field. User can give top level term not refine entry in top level list.

Question: are users going to be able to enter a term under "other" and search on that or only for the archives for descriptive information? How does filtering work? Assumption that anything in free text field would not be used as a filter - is that assumption correct?

2. INSTITUTION SERVICES

1. Is there an existing source for these terms?

No, not that we are aware of.

2. What terms can be added to or changed on the starter term list above (If you have any mission statements available, use these to generate terms.)

1. Duplication
2. Reformatting
3. Storage
4. Curatorial service
5. Distribution
6. Production
7. ILL
8. Footage Licensing
9. Restricted lending
10. Open lending services
11. Exhibition/Screening
12. On-site research
13. Reference services
14. Subscriber/member services
15. No services available to the public (check if no service is available) [In group discussion it was suggested this be the first option. Want to avoid someone simply neglecting to check a box.]


3. Would it be useful to identify whether an organization is generally internally or externally oriented? What is the best way to do that?

See number 15 under Question 2.


3. FORMATS COLLECTED

1. Are there other possible sources of terms?

- Archive of Moving Materials Cataloging Material
- MPEG-7 - too detailed, long list of controlled vocabulary
- DC. Format - MIME types - possible and extensible (video.MPEG1), only covers digital not video and film
- SMPTE technical glossary?
- European Broadcasting Union (EBU)
- VIDIPIX, Inc. publication on video formats.
- FIAT Guide to Audio-visual archives
- FIAF Guide?
- International Assoc. of Sound Archives?


2. Identify pros and cons, or issues associated with each source if applicable?

3. Choose preferred source

Analyze sources. See 4.

4. Determine preferred number of categories (ballpark) to enable working groups to choose appropriate level of specificity after the meeting.

Settled on four categories:

Film
Videotape
Digitized Files
Optical Disc

(Note: we discussed "audio recordings" but hadn't included that in our final notes. At the follow-up discussion, we did decide to include that term too.)

The group recommends an analysis of existing sources to come up with level 2 (format types - 10 to 12). Choose dominant video formats to limit size of level 2 and say "other" for additional, or group together into one inch or two inch, etc.

A lot of interest in doing detailed filtering but a problem because of number of terms; question of whether this is Phase 2 work. Lots of interest in being able to filter on lower level format, ET. SVHS, 16mm, but problematic because these vocabularies can be large, can collapse list to primary/dominant values - most frequently/searched/required.

Issue - high def materials significant to commercial archives. (Issue very significant according to Gary Carter of National Geographic TV.)


FILM VIDEOTAPE DIGITIZED FILES OPTICAL DISC
16mm
35mm
8mm
70mm
Super8
Super16*

Open Reel ½ in
Betacam
BetacamSP
Digital Betacam
VHS
SuperVHS
Hi8
1in
2in
DV6Pro
D5
Other* MPEG-1
MPEG-2
MPEG-4
RealMedia
AVI
Mimetypes*

CD-ROM
DVD*

(Carrier type vs. format: fuzzy issue …..)

* 2nd level examples come from Footage.

5. Should we create a mixture of broader/narrower terms for the list or does that reduce the usefulness of this element for descriptive or indexing/retrieval purposes? For example, is "video" useful as a separate term? Or "film"? Or "digital"?

Answer in question 4.

4. COLLECTION SIZE

How can this be best expressed?

Value: Numeric values (approx values)
Unit: (CV would include items, titles, linear feet, film footage feet, hours, shots, etc.)

* Repeatable
* Can select all that apply

Content, not items, is significant.

User doesn't care about small, medium or large collection? Not useful.


Finally, identify which element controlled vocabularies each person would like to work on further in the months following the meeting:

David Wells: Institutional Type
Gary Carter: Formats Collected
Mairéad Martin: Formats Collected
Karen Wyatt: Services to the Public

Notes


Issues: Question about granularity

Basic point: ability to include two levels of detail

Target end user: end-user, educational, general public, archivists

Specific about formats

-----------------------------------------------------------------------------------------------
From flip chart (note this document captured the content of the flip charts)

- Film, Video, Digital

Film; 70mm, 35mm, 16mm, 8mm …..
Video: VHS, SuperVHS, Betacom, HI8 … (Question: what about NTSC/PAL/SECAM)
Digital: MPEG1, 2, 4, Real, Avi (Digitized files)

Question: Stills, posters?


MEDIUM (Availability): CD-ROM, DVD, Digital file, [Raised but not necessarily included]
-- Service available??
-----------------------------------------------------------------------------------------------
Top level
Second refined level
Text level
-----------------------------------------------------------------------
REPORT: AMIA MIG Breakout Group #5
Leader: Ann Butler (reporter)
Present: Randall Barry, Maxine Fleckner Ducey, Joanne Rudof, John Tariot, Paul Theerman, Alison White


CONDITIONS FOR USE/ACCESS RESTRICTIONS

The group conceptualized this as a category for indicating access and use, but would leave the two together in one field (cf. Group 2 approach). Indicate:
· Who
o public
o in-house
o researchers/scholars
o members of the trade/professionals in the field
· How
o onsite viewing
o offsite viewing through loan
o offsite viewing through purchase
o online viewing
o [Note Group 2 liked these terms and would like to add those they omitted from their list]
· Use
o organization owns rights to some
o organization owns rights to all
o all in public domain
o [Note: add "rights unknown"?]


ORGANIZATION LOCATION

· NUC codes should work well for this field.
· TGN is good, but problematic for end user.
· Reduce duplication by pulling information from address fields.
· Metropolitan area is more useful than regions; regions may be U.S.-centric.


COLLECTION STRENGTHS: FORM

The group thought this data element should provide a "snapshot" of archives, and worked with all three sources to create a picklist of the following terms:
· Home movies/video [cf. Group 3's "Amateur"]
· Unedited footage
· Amateur film/video
· Documentaries
· Feature films
· Television entertainment productions [to include sitcoms, dramas, soaps, game shows, variety shows, kids' shows]
· Shorts
· Advertisements
· Video art/experimental films [this term was preferred by both groups, after discussion]
· Performances
· Instructional/educational
· Research Documentation [Group 3 agreed this was a useful term]
· News/newscasts/newsreels
· Sports coverage
· Live event coverage (non-sports)
· Surveillance footage [In group discussion, consensus was this is not a useful term]

[In group discussion it was noted that the terms on Group 3's list which are missing from this list are:
· Promotional/advertising
· Animation
· Interviews]

COLLECTION SIZE

S-M-L is too subjective; go with a count of total discrete titles (or estimate). Require a number.
ATTENDEES


Ruta Abolins
Media Archives and Peabody Collection, University of Georgia
Historic television collections

Grace Agnew
Associate University Librarian for Digital Library Systems, Rutgers, the State University of New Jersey
AMIA project consultant; AMIA Technical Committee; developer

Caroline Arms
LC: Office of Strategic Initiatives

Murtha Baca
Getty Research Institute
Subject vocabulary specialist

Randall Barry
LC: Network Development and MARC Standards Office

Snowden Becker
Editor, Interactive Programs, The J. Paul Getty Museum
Other non-fiction and subject-oriented collections

Ann Butler
New York University
AMIA-MIG Subcommittee; University collections; Museum collections

Karen Cariani
WGBH
Local television collections; Public Television; Database Working Group

Gary Carter
National Geographic Television
Alpha site

Jim DeRoest
University of Washington
Developer

Nancy Dosch
NLM-History of Medicine Division
Alpha site

Maxine Fleckner Ducey
Wisconsin Center for Film and Theater Research
Independent and performing arts collections; University collections

Nigel Elmore
LC Information Technology Services

Nancy Goldman
Pacific Film Archive
FIAF Cataloguing and Documentation Commission

Dina Gunderson
CNN
Alpha site; MIG Technical Committee

Jim Hubbard
Co-chair, AMIA-MIG Subcommittee co-chair; Independent film & video

Barbara Humphrys
LC - Motion Picture/Broadcasting/Recorded Sound Division
AMIA-MIG Subcommittee

Jane Hunter
University of Queensland

Alice Jacobs
NLM-Technical Services Division
Alpha site

Jane Johnson
UCLA Film and Television Archive
Chair, MIG Technical Committee; University collections

Andrea Kalas
Discovery Communications, Inc.
Network/cable archives

Dan Kniesner
Oregon Health & Science University
Alpha site; MIG Technical Committee

Steve Leggett
LC - National Film Preservation Board

Pat Loughney
LC - Motion Picture/Broadcasting/Recorded Sound Division

Greg Lukow [appearances]
LC - Motion Picture/Broadcasting/Recorded Sound Division

Mairead Martin
University of Tennessee Advanced Internet Technology Unit

Mike Mashon
LC - Motion Picture/Broadcasting/Recorded Sound Division
FIAT/IFTA Television Studies Workgroup

Sally McCallum
LC: Chief, Network Development and MARC Standards Office

Ed Price
Georgia Institute of Technology
Developer

Joanne Rudof
Fortunoff Video Archive for Holocaust Testimonies, Yale University
Other non-fiction and subject-oriented collections

Liz Scheines
Editorial Director, Footage: The Worldwide Moving Image Sourcebook

Kim Schroeder
Archive Impact; General Motors
Corporate archives

Janice Simpson
Managing Director, AMIA

Dan Streible
University of South Carolina
Research and Access: AMIA Academic-Archival Interest Group

Winston Tabb [appearance Thursday]
LC - Associate Librarian of Congress for Library Services

Linda Tadic
HBO
Network/cable archives; Independent film & video; Chair, AMIA Digital Issues Task Force

John Tariot
Moving Image Group
founder, FOOTAGE.net

Dick Thaxter
LC - Motion Picture/Broadcasting/Recorded Sound Division

Paul Theerman
NLM-History of Medicine Division
Alpha site

Barbara Tillett
LC: Chief, Cataloging Policy and Support Office

Ellie Wackerman
LC - Motion Picture/Broadcasting/Recorded Sound Division
AMIA-MIG Subcommittee

Dean Watts
National Geographic Television
Alpha site; Other non-fiction and subject-oriented collections

David Wells
National Film Preservation Foundation

Alison White
Head, Corporation for Public Broadcasting
Public television

Kathleen Williams
Smithsonian Institute
Alpha site

Karen Wyatt
Karen Wyatt Film & Picture Research
Research and access