7. Harvesting the federation
7.1 Overview
Educational material is typically stored in various places - in institutional repositories, in learning management systems, in the community repository etc. One purpose of SWITCHcollection is to make educational material searchable, independently of where the material is located. This is acheived by collecting metadata of educational objects from all content repositories and storing it in a common search index. This harvesting process is regularly repeated in order to keep the search index up to date.
The protocol used to harvest metadata is OAI-PMH.
7.2 About OAI-PMH
More about OAI-PMH
- http://www.openarchives.org/OAI/2.0/guidelines.htm
- http://www.openarchives.org/OAI/2.0/guidelines-repository.htm
- http://www.oaforum.org/tutorial/ (nice, though dated tutorial)
7.3 Metadata Model
This is a brief summary of the metadata model used for objects that are harvested for the search index of SWITCHcollection (a subset of Dublin Core). For more details and explanations refer to the specification.
Mandatory Elements
| Element | Description | Examples |
| title | The name or title of the resource | DNA from the Beginning |
| creator + | Owner of the resource. Usually this is the Author. |
John Doe Erika Mustermann Sprachenzentrum Uni Basel |
| license | Link to legal specification (swiss jurisdiction) what can be done with the resource and under what conditions. Either • "empty": copyright defined inside • A creative commons license URL (the default for video streaming: by-nc-nd) • a ODRL specification (not used yet) Complete list of possible values: http://collection.switch.ch/spec/2008/licenses/ |
http://creativecommons.org/licenses/by-nc-nd/2.5/ch/ |
| source |
The URI to the original location of the resource. Required to redirect a user to the site where the resource can be downloaded. |
http://unige.ch/repo/id12345678 |
Optional but Strongly Recommended Elements
| Element | Description | Examples |
| discipline + | Teaching discipline of the resource: Controlled SWITCHcollection vocabulary. Code as defined in http://collection.switch.ch/spec/2008/disciplines |
9472 |
Optional Elements
| Element | Description | Example |
| contributor + | Who contributes to the resource. Being listed here does not imply technical access rights (they are specified in rights). |
Jane Doe John Doe |
| publisher + | Who makes the resource available. Originating university or SWITCH | Université de Lausanne |
| rightsHolder | Person or organization owning or managing the rights over the resource | 3562781242@zhaw.ch |
| tableOfContents | Table of contents (unformatted free text) | 1. Introduction, 2. genetic alleles, 3. Inheritance, 4. Dominance, 5. Punnett squares, 6. Mendel’s laws |
| abstract | Abstract of the resource |
DNA from the Beginning is an animated tutorial on DNA, genes and heredity. The science behind each concept is explained using animations, an image gallery, video interviews, problems, biographies, and links. |
| subject + | Topic of the resource: keywords, key phrases, classification codes (free text) | Genetics, DNA, classical experiments |
| description + | An account of the content of the resource (currently only used by SWITCHcast to describe the location) |
Recorded at the ETHZ Joint SVC Project. |
| language + | Language(s) of the content. Vocabulary: ISO-639-2 alpha-3 code |
deu fra eng |
| educationLevel + | Description of education or training context (free text) |
basic studies bachelor basic master |
| instructionalMethod + | The way how instructional material is presented |
Experimental learning Observation |
| issued | Publication date of the resource. (Creation and modification dates are part of FOXML) |
2004-03-12 2004-03 2004 |
| alternative + | Alternative title, e.g. title abbreviations or title translations | DNS von Grund auf. |
| type | Nature or gerne of the resource (DCMI Vocabulary: “Dataset”, “InteractiveResource”, “MovingImage”, “Software”, “Sound”, “StillImage”, “Text”) DCMI Subset defined in http://collection.switch.ch/spec/2008/types/ |
InteractiveResource Sound |
| extent + | The duration of the resource in seconds. | 30 sec |
| identifier | Internal identifier (pid) of the resource. Can be used by the OAI Provider to unambiguously tag resource records. | chor_dcterms:12345678 |
+) element can be repeated
7.4 Harvesting Engine
The new harvester is called Harrow and has been written by an internee at SWITCH. The code is available on Sourceforge:
http://sourceforge.net/projects/harrow/
Documentation: http://sourceforge.net/apps/trac/harrow/wiki
7.5 Harvesting Examples
...

