3. Original data and metadata

The Original Data folder contains:

  1. Your original data files
  2. Importable data files (if necessary)
  3. Metadata subfolder.

1. Original data files

Original data files are the data files you initially get for your project and from which you extract the data you use. These data files can come from a single original data file or from multiple original data files.

Each time you obtain an original data file, save a copy of the original data file in your Original Data folder. It is important to note that:

  1. You give the file a new name when you save it in the Original Data folder but, other than that, the copy you save should be identical to the original version of the data file
  2. The contents and format should not be modified in any way.

2. Importable Data Files

Create a version of your data file in a format that your software or tool can read.

When you need to create a new version of a data file, to make it possible for your software to read it, the new version is called an importable data file. Changes you make to the data, when you create an importable version of an original data file, cannot be executed by commands written in a command file. Therefore, they cannot be automatically reproduced. Make only the minimal changes necessary to make it possible for your software to read the data.

  1. You should not modify the data in the file in any way
  2. Give the importable version of the file a name that reminds you it is the importable version of the original data file from which it was created. For example, if the original data file is called gdp_growth.sav, give the importable version a name like i_gdp_growth.dta. The “i_” prefix is a reminder that the file is “importable” and the change in the extension reflects the change in the format in which the data file is saved.

When storing the importable data file, do not delete the original data files. Store both versions in the folder using different names.

Note: If all of your original data files are in formats that your software can read, you do not need to create any importable data files.

Readme file

For each importable data file you create, write an explanation in your Readme file describing the steps you took to create the importable version from the original data file. This may be the first time you enter any information (other than the title) in your Readme file.

Your written explanation should give the names of both the original and importable versions of the data file that was modified. It should be precise enough to enable others to make the same changes to the original data file and end up with an importable data file identical to the one you created.

Refer to Section 6 for more information about what is included in the Readme file.

3. The Metadata subfolder

The Metadata subfolder contains:

  • A document called your Metadata Guide
  • Supplementary documents with additional metadata (if necessary).

The Metadata guide

For each of your original data files, the Metadata Guide provides the kind of information such as variable definitions and coding, sampling methods, and anything else a user would need to know to work with and interpret the data appropriately. Each time you obtain a new original:

  1. Add a section about that file to the Metadata Guide
  2. Begin the section with a title that identifies the original data file it pertains to. For example, ‘Metadata for penn_tables_1986_2010.txt’
  3. Enter the relevant information about the original data file in your Metadata Guide. The information included in the Metadata Guide should include:
    • A bibliographic citation — this citation should be in a format consistent with the referencing style (e.g. APA or Chicago) used in the research project
    • A Digital Object Identifier (DOI), if one has been assigned — if a DOI is included in the bibliographic citation, it need not be repeated
    • The date the author first downloaded or obtained the original data file — if the download date is included in the bibliographic citation, it need not be repeated
    • A verbal explanation of how others can obtain a copy of the original data file  in many cases, this explanation will give the URL of a webpage from which the data can be accessed, along with instructions for downloading a file identical to the original data file used in the study. In all cases, this explanation should be complete and precise enough to allow another researcher to locate and obtain an exact copy of the original data file, without any additional information or assistance
    • Whatever additional information others would need to understand and use the data in the original data file — this may vary depending on the nature of the original data file. In many cases, the additional information that should be provided are variable names and definitions, coding schemes and units of measurement, details of the sampling method and weight variables, and descriptions of how any imputed variables were constructed. In some cases, it is also necessary to include information about the file structure (e.g. the delimiters used to separate variables or, in rectangular files without delimiters, the columns in which the variables are stored). Any other unique or idiosyncratic aspects of the data, that an independent user of the data would need to understand, should be explained.

Supplementary documentation

In many cases, some or all of the information that should be included in the Metadata Guide may be available in an existing, publicly accessible document, such as a codebook or user’s guide that is provided with the original data file.  If the information is accessible, it is not necessary to include that information in the Metadata Guide. Instead, you can put a note in the Metadata Guide indicating that the information is available in an existing document.

When you put a note in the Metadata Guide, indicating that certain parts of the information are available in an existing document, you should preserve a copy of the existing document in the Metadata sub-folder (along with the Metadata Guide that you compose yourself).

If you enter the required information right away, each time you obtain a new original data file, your Metadata Guide will be complete before you finish your project.

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Document your research data Copyright © 2023 by The University of Queensland is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book