Publish and share
Why publish data?
Publishing your data and citing its location in published works allows others to replicate, validate and ensure accuracy of results. Some research data would be impossible to collect again e.g. recordings of a specific seismic event. Sharing data improves the scientific record and increases scientific integrity. The Australian Code for the Responsible Conduct of Research, 2018 advises that researchers should share their data wherever possible. To support best practice in publishing data, QUT has adopted the F.A.I.R data principles to make research data Findable, Accessible, Interoperable and Re-usable. The benefits of making data publicly available include:
- greater attention to the research and the researchers, who created the data
- increased citation counts by up to 69% (see Piwowar, Day and Fridsma, 2007)
- increased collaboration and networking opportunities between researchers
- greater opportunities for grant funding as grant funding bodies encourage data sharing.
Durable data formats
Use standard interchangeable formats that most software is capable of interpreting, or at the end of the project convert completed data to these formats. Check data after conversion for errors or changes. Acceptable formats for long term access to data include the following options.
Data type | Acceptable formats for sharing, reuse and preservation |
---|---|
Containers | TAR, GZIP, ZIP |
Computer aided design | DWF, DXF, DWG, DWS, DWT, X3D, STEP, STP |
Databases | XML, CSV |
Geospatial | SHP, DBF, GeoTIFF, NetCDF |
Moving images | MOV, MPEG, AVI, MXF |
Sounds | WAV, AIFF, MP3, MXF, FLAC |
Statistics | ASCII, DTA, POR, SAS, SAV, STATA, SPSS |
Still images | TIFF, JPEG 2000, PDF, PNG, GIF, BMP, RAW |
Tabular data | CSV, XLS XLSX, ODS, |
Text | XML, PDF/A, HTML, ASCII, UTF-8, RTF, HTML, NUD*IST, NVivo, ATLAS.ti |
Web archive | WARC |
Based on information from UK Data Service (2017).
Copyright and research data
Compilations of data are protected by copyright law as 'literary works', provided the compilation involved intellectual effort, was not copied from another source, and supplies 'intelligible information' (i.e. is human/machine readable). Research students own the copyright of their works unless there is a written agreement stating otherwise.
See the Australian Research Data Commons' Research Data Rights Management Guide for more information about whether copyright subsists in your data.
Image attribution: Mike Seyfang (2005)
Licensing data
If you decide to share your data when you have completed your research, you may use a Creative Commons (CC) licence to specify the conditions that apply to reuse. When you apply a CC licence to your data, you retain ownership of the data but license others to use the work on liberal terms. The four different licence terms are:
- Attribution: (BY) You must always provide credit to the original author.
- Share-Alike: (SA) If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
- Non-Commercial: (NC) You may not use the material for commercial purposes.
- No-Derivatives: (ND) You may not distribute modified versions of the work.
Image attribution: Creative Commons
Case study
QUT Post-doctoral researcher Christopher Noune has assigned a CC-BY-ND licence to his dataset which means that others may use, copy and redistribute his work, even for commercial purposes, so long as they give appropriate credit. If they remix, transform or build upon the work, they may not distribute the modified material without seeking permission from Christopher.
See: Noune,Christopher; Hauxwell,Caroline. (2017): HaSNPV-AC53 Genotyping and Abundance Datasets. [Queensland University of Technology].
Check which terms apply to the works you want to use via Creative Commons Australia's licence chooser.
Combining and analysing existing data from multiple sources is common practice. The conditions attached to CC licences only apply when an entire, or substantial part of, a CC licensed dataset is reused. You should not apply a CC licence to a dataset if you are not the copyright owner or if the data contains secret, private or confidential information.
Use the Data Management Planner to document the decisions relating to sharing and licensing your data at the conclusion of your research. For more information on licensing data, refer to the Australian National Data Service (ANDS) guide to Copyright, data and licensing and QUT's information about licencing research data.
Data repositories
Consider sharing your completed datasets or subset with other researchers, publishers and the public via QUT's data repository Research Data Finder, or an open access repository such as FigShare, Dryad or Github. Consider those, discipline specific repositories and others using the QUT Publishing research data guidance and directories.
In Research Data Finder, you can list the following metadata about your dataset to make it Findable, Accessible, Interoperable and Reusable (F.A.I.R):
- Descriptive: metadata required for discovery and assessment of the collection, including title, contributors, subject or keywords, study description, list of publications the dataset contributes to and location and dates of the study.
- Provenance: metadata about the data source, instruments used to collect or generate the data, version tracking and transformations (often including the steps that were applied to produce the data product).
- Technical & Structural: metadata about file types, software, file size and contents of components e.g. variable names. How the data or its database is configured and how it relates to other data or how components within a set related to each other.
- Rights & Access: metadata to enable access and licensing or usage rules e.g. negotiated access by contacting the owner or open access via a creative commons licence.
- Citation: metadata required for someone to cite the data, including a persistent identifier such as a DOI or stable URL e.g. Creator(s), Publication Year, Title, Publisher, DOI.
Case study
See how Phd candidate MD. Lifat Rahi has applied the F.A.I.R data principles to his published research data set.