Collect or capture data
In this stage of the research data lifecycle, researchers must ensure that all research data is stored securely and backed up or copied regularly.
Consider the following when planning to store your research data:
What data formats will be used? The choice of format will determine how the data may be used, analysed, backed up, stored and reused in the future. When deciding which format to use consider:
- Could the hardware, software and media fail or become obsolete within your project timeframe?
- Would the impact of such a failure be disastrous?
- How long does the data need to be stored?
- Is support for the hardware, software or media available at QUT?
- Is security an issue?
- Who needs access to the data? Are team members local at QUT or collaborators from other institutions?
Activity – Data formats
In your Data Management Plan list the data formats you will use, include the software and any access requirements.
Watch the video: Data Management (YouTube video, 2m44s)
When storing data, please follow the 3-2-1 rule:
Keep 3 copies of your files in 2 different locations, with 1 copy off-site ideally in a different geographic zone
- Master copy: Keep at secure location
- Working copy: Keep on a reliable/safe device or locations
- Back up copy: Keep off-site
You should store master copies of your data in QUT's Research Data Storage Service which provides all QUT researchers (staff and Higher Degree Research students) with secure storage for data at the different stages of the research lifecycle, including services for raw, working and archived data. To access to RDSS submit an IT request online. The Research Data Storage Service provides:
- large storage capacity for research data
- easy on-campus and off-campus access to your data
- controlled access to your data.
Decisions about storage for highly confidential or highly sensitive research data should be made on a case-by-case basis.
You can view an extended list of storage options under Collect and store research data.
Network drives may be accessible to a large number of people or can be configured for use by a single user or group of users (contact HiQ for more information). The H and U drives are not recommended for the storage of master copies of research data.
The Office of eResearch provides QUT staff and Higher Degree Research students with specialised advanced computing facilities, storage and support. Contact the team for access and advice, or apply for an account.
QRISdata is a data storage service hosted by the Queensland Cyber Infrastructure Foundation (QCIF), designed to complement well-managed data storage provided by your institution. Depending on the storage option you select, QRISdata may be replicated on tape storage, and/or stored in multiple locations. For more information about QRISdata, read the QRIScloud FAQs.
- High volume storage for research data, both active and long-term use.
- Support for working and archived data is provided.
- Researchers are provided with a range of tools to help manage data, enable collaboration and enable fine-grained control access.
- Large datasets can be shared with researchers/collaborators worldwide.
- Access to a wide range of existing eResearch services, tools and applications is possible.
- Integrated access to Queensland-based high-performance computing facilities and specialised cloud services is available.
- Data storage allocation is based on merit, taking into account the data's significance and value to the wider research community. Applications processing could take up to one month.
- Long-term storage plans are required.
CloudStor is a file transfer and storage service for easily and securely storing, sending and receiving large files, available to all Australian researchers via the Australian Access Federation.
- Up to 100GB of storage is available to individual researchers. 'Low-cost' additional data storage is available on request.
- Up to 100 recipients can share/receive a file.
- Australian Access Federation credentials used to login, providing access to researchers at other universities and research institutions.
- Maximum file size for transferring data is 100GB.
- Not suitable for master copies of research data.
- Not suitable for sensitive data, as encryption only occurs during transmission.
Local hard drives
Local hard drives are helpful when collecting and storing data in the field. However, master copies of data need to be securely stored in QUT's Research Data Storage Service.
Removable media (not recommended for storage of research data)
While USB drives and memory cards are useful for transporting and capturing data in the field, data collected should be transferred to stable, secure storage as soon as possible. These devices along with CDs and DVDs are at greater risk of being lost or damaged, are not very robust and can be damaged by magnetic fields, water and high temperatures. If portable media are used for transporting copies of data, use only high-quality products and ensure that any confidential data is encrypted or password protected.
Non-digital research data
Data in non-digital formats (e.g. biological samples, analogue recordings) should be stored in secure facilities located in the school, faculty, institute or an off-campus research facility. Refer to the QUT Governance Services web page or D/2.8.7 Management of research data and primary materials of the MoPP for more information about dealing with non-digital research records.
Metadata - describe your data
Documenting data and capturing or describing metadata at all stages of the research lifecycle, enables you and others to find, access, interpret, validate and reuse the data. Documentation should provide contextual information for the data so that it can be understood in the future.
Metadata (data about data) can describe individual items or groups of items. Requirements may vary depending on the discipline and type of research being conducted. At the collection stage, documentation and metadata to be captured include:
Study level documentation:
- context of data collection: project history, aims, objectives and hypotheses
- collection/generation/capture methods, sampling methods
- instruments used and calibrations required
- hardware or software used
- data scale and resolution, temporal, time or geographic coverage
- questionnaire copies, interview questions and instructions, test samples
- quality control processes
Data level documentation including:
- description of file structures and relationships between files
- file types and file naming conventions chosen
- names, labels and descriptions of variables, records and values
- explanation of codes and classification schemes used
- provenance information about sources of derived or reference data
Adopted from UK Data Archive, (2017). Create and manage data, https://www.ukdataservice.ac.uk/manage-data/document/data-level)
It will be useful to make a copy of the raw data at the collection stage and use this in the working stage of your project. Retain a read-only copy of the raw data in the RDSS Acquisition space as both back-up against loss or corruption and evidence of the work you have undertaken.
Types of metadata
- Descriptive: metadata required for discovery and assessment of the collection, including title, contributors, subject or keywords, study description, list of publications the dataset contributes to and location and dates of the study.
- Provenance: metadata about the data source, instruments used to collect or generate the data, version tracking and transformations (often including the steps that were applied to produce the data product).
- Technical and Structural: metadata about file types, software, file size and contents of components e.g. variable names. How the data or its database is configured and how it relates to other data or how components within a set related to each other.
- Rights and Access: metadata to enable access and licensing or usage rules e.g. negotiated access by contacting the owner or open access via a creative commons licence.
- Citation: metadata required for someone to cite the data, including a persistent identifier such as a DOI or stable URL, e.g. Creator(s), Publication Year, Title, Publisher, DOI.