Documentation and metadata

In this section you will learn about documentation and metadata, including requirements of each, responsibilities, standards and the value of controlled vocabularies. Good documentation ensures that your data will be easier to understand and its quality easier to judge.

Readily accessible, detailed and clear data is trusted and usable. Reproduction and validation of the results is faster with good documentation. If that documentation is not available, the data runs the risk of being collected but not used.

Kinds of documentation Top

The documentation should provide contextual information for the data so that it can be understood in the future. Requirements will vary depending on the discipline and type of research being conducted. Producing good documentation is easier if it is planned from the start of a project and considered throughout the lifecycle of the data.

Documentation should include:

  • project aims and objectives (to provide context)
  • catalogue of data collected
  • description of lifecycle of key data elements (procedures for collection/creation, validation, transformation, processing, analysis, publication, archiving/destruction)
  • description of instruments, calibrations etc.
  • description of how data is structured (data model, coding schemes, controlled vocabularies etc.)
  • details of any quality control processes
  • confidentiality agreements and consent forms
  • laboratory notebooks and experimental protocols
  • questionnaires, codebooks, data dictionaries
  • software syntax and output files
  • methodology reports
  • provenance information about sources of derived data.

Metadata requirements and responsibilities Top

Metadata (data about data) is standardised information about a resource, presented in a structured format that is both machine-readable and human-readable. Metadata can describe individual items or groups of items (individual files, images or datasets etc). The items described by the metadata may be physical or digital.

Example: A library catalogue includes metadata about books and eBooks held by the Library plus the electronic journals to which the Library subscribes. A library catalogue record is simply a collection of metadata elements (for example, title or author of a book), linking to items in the library collection through the call number.

Metadata helps the Library to manage its resources and assists users in the discovery and use of those resources.

Metadata helps researchers to manage and reuse data after its creation.

Ideally, as much metadata as possible should be gathered at the beginning of a research project, with ways devised to collect metadata (automatically if possible) throughout the life of the project.

Types of metadata

There are different types of metadata:

  • Descriptive metadata provides 'descriptive terms' that will facilitate search and retrieval of files.
  • Rights metadata is information about ownership of the data.
  • Administrative metadata includes preservation requirements, confidentiality requirements, access restrictions and timelines (e.g. release dates).
  • Provenance metadata provides information about the data source, version tracking and transformations (often including the steps that were applied to produce the data product).
  • Technical metadata gives information about file types, software, file size and contents of components (e.g. variable names, contributing performers of each track in audio recordings etc).
  • Structural metadata indicates how components of a set relate to one another (e.g. a detailed list of all the tables in a database or the details of how one object relates to another (e.g. is an earlier version of…).

Identifying documentation and metadata requirements

To choose the best metadata for your research first consider your own needs (or the needs of the research project team). If the plan involves depositing the data in a data repository or archive, you should consider the data documentation and metadata requirements of the relevant repository. There are three key questions to answer:

  1. Responsibilities: Who will be responsible for what?
  2. How will the metadata be stored? Some metadata can be stored within the digital object (e.g. the metadata in a digital image file) but often this is not the case. For external metadata, consider using a data repository. Repository software stores 'digital objects' which are made up of metadata plus one or more related files.
  3. How will the metadata be created? In some cases, metadata can be generated or extracted from digital files automatically. For example, a digital camera records the date, time, exposure setting, file format etc. In other cases, human effort will be required to create documentation and metadata. Software programs sometimes allow structured metadata (e.g. include title, author, organisation, subjects and keywords) to be added via 'Properties'.

Metadata schemas Top

A 'metadata schema' defines a standard set of terms (e.g. a controlled vocabulary) that will be used to describe a resource and a set of rules that define the syntax and language (e.g. XML). Wherever possible, metadata should be created using schema which are in common use, as this will facilitate the process of contributing a metadata record to a data repository at the completion of the project.

Metadata standards and controlled vocabularies Top

Metadata standards and controlled vocabularies provide a means for standardising descriptions within your metadata schema so that you are always able to retrieve a set of items that have been allocated the same descriptor.

Additionally, standardisation enables computers to retrieve and merge metadata from multiple sources. Some standards are suitable for many different kinds of material and across disciplines; others are more discipline-specific. Examples include:

  • Dublin Core is used to index websites. You can check if a particular website uses Dublin Core by whether their source code includes 'dc' fields.

  • Registry Interchange Format-Collections and Services (RIF-CS) was developed as a data interchange format for supporting the electronic exchange of data collection, parties, activities and service descriptions. It organises information about data collections and services into the format required by the Australian National Data Service (ANDS) Collections Registry, Research Data Australia (RDA). RDA is designed to be a metadata registry for research data collections specifically collected in Australia and/or relevant to Australian research interests.
  • Text Encoding Initiative (TEI) is used to maintain a standard for representations of text chiefly in the humanities, social sciences and linguistics.
  • The Visual Resources Association Core (VRA) is used to describe original works of visual art and also images of them.
  • The Content Standard for Digital Geospatial Metadata (CSDGM), ISO 19115:2003. This is used for geographic digital data such as Geographic Information System files (GIS).
  • The ANZLIC Metadata Profile is a metadata standard for students working with Australian and New Zealand geographic data (AS/NZS ISO 19115:2005) and meets the ISO 19115:2003 standard.
  • Data Documentation Initiative (DDI) is commonly used for social and behavioural science data.
  •   Core Scientific Metadata Model (CSMD) is used chiefly in the 'structural sciences', i.e. chemistry, material science, earth science and biochemistry where researchers are concerned with analysing the structure of substances and perform systematic experimental analyses on samples of those materials. The CSMD is being used as the core metadata within data management infrastructure being developed for large scale scientific facilities (e.g. the ISIS Neutron Source and the Diamond Light Source).

There are many discipline-specific metadata standards available. Investigation of the commonly used standards in your discipline should be part of the data management planning process.

Activity - metadata and your projectTop

Complete section 6 of the Data Management Plan on metadata.