3. Sources of data

What is the question you want to answer?

The question you want answered, or the problem you want to solve, will determine the type of data you need and analysis you need to do on that data. You can then use existing data, or gather the data yourself.

Who, what, when, where?

think Think about:

  • Who or what  — is the subject of the data? e.g. a group of people, food, animals etc
  • Where  — is the location important? e.g. local or international, a particular suburb
  • When  — what is the appropriate time period to look at?

There may be an existing dataset that you can analyse to answer your question. Otherwise, you can use methods to gather your own data.

Open data

Open data is publicly available for reuse. Open data can be accessed from a range of sources. High quality data can be found on government websites and institutional repositories:

  • Research data  — this guide lists a range of high quality public research data sources
  • Text data — this guide lists a variety of sources for open text data
  • Spatial data  — this guide lists Queensland, Australian and Global spatial data sources
  • UQ eSpace  — the repository for UQ research publications and research datasets
  • Google Dataset search  — allows searching across multiple repositories. Limits you can apply include download format and usage rights.

GovHack is an annual event, where people are invited to apply their creative skills to open government data.

Using Census data

Do you want to use Census data from the Australian Bureau of Statistics (ABS) in your assignment or project?

TableBuilder is a free, online data tool, from the ABS, for creating tables, graphs and maps of Census data. Basic tables (YouTube, 4m45s) shows how to select data sets and create basic tables in TableBuilder:

Dataset quality

Evaluate the quality of the dataset, just as you would evaluate any information you find, before you use it in your assignments or projects. Click the plus symbols to find out what you need to consider when assessing datasets:


Text version of Evaluate dataset quality:

Authority – Check:

  • Who collected the data?
  • Is it an educational institution, government or a reputable organisation?
  • If an individual has produced the dataset, are they associated with a reputable organisation or are they well-known in their field?

Coverage – Check: –

  • Were enough samples taken to be representative of the total population or group being researched?
  • Is the time frame relevant or up-to-date?

Purpose – Check:

  • Why was the data collected?
  • Was there any bias in the collection methods?
  • Who was the intended audience?

Accuracy – Check:

  • Is the dataset complete?
  • Are there responses missing or other errors?

Terms of use – Check:
Are the conditions for acceptable use of the data clearly stated and suitable for your needs?


The metadata or description of the data should include information to help you evaluate the dataset.

Check data quality

Check how the Additional information allows you to easily evaluate the quality of this dataset — Australia’s threatened species, life history characteristics, and threatening processes.

Collecting data

You may need to collect your own data to answer your research question, if existing data is not available or suitable.

Storing the data

If you collect the data yourself, you will need to think about where to store it.

If you are collecting data for a small project or assignment, using local or online storage, such as Google Drive or OneDrive, would be suitable, as long as you do not have any identifiable, personal data. The Data storage section gives an overview of local and online storage options and how to back up your files.

If you are conducting research as part of your Winter/Summer Research project, Honours, Masters, or Higher Degree by Research degree:

  1. Manage research data has information on how to manage, store and secure your project’s data.
  2. Discuss with your supervisor/coordinator the suitability of using the UQ Research Data Manager to manage your research data.

Sample size

Decide how many responses or observations you need to have a good sample size. Larger sample sizes are more likely to allow you to draw accurate conclusions than smaller samples. Get information on sample design from the Australian Bureau of Statistics.

Methods for collecting data

Observation

In observation, processes, activities or behaviours are observed. The subjects being observed may or may not be aware that they are under observation. A description of what occurs, or a checklist looking for a particular event, is used to record the observation.

Find out more about the observational method.

Surveys or polls

A survey is a method of collecting data on behaviour, attitudes and opinions. Plan your survey questions carefully so the information you get from participants is useful to answer your research question.

A poll is a type of survey but usually quite short. Polls often have only one multiple choice question.

Information on planning a survey:

You can conduct your surveys or polls face-to-face or you can use online tools.

Survey tools

Tools for conducting online surveys or polls

Tool Free account available Guides
Qualtrics Available to UQ students and staff Survey basic overview
Google Forms Yes How to use Google Forms
SurveyMonkey Yes SurveyMonkey Help Center
Crowd Signal Yes Crowd Signal Support Center

Interviews or focus groups

An interview typically involves asking either structured or unstructured questions of a single participant. Usually, the questions will be open-ended to allow for more in-depth insight on a topic than a survey can give.

A focus group involves a group of selected people (usually 6 to 12 individuals) participating in a group interview, guided by a moderator. It is a good way to get a social context on a topic.

In both techniques, you may need an audio or video record of the discussion, or have an observer record the details.

There are some factors to consider when deciding whether to use focus groups or interviews.

Scraping

Scraping is a method of getting text and images from websites and social media. The practice can be problematic, depending upon the amount of reproduction and the intended use. Researchers may use automated methods to access any publicly available information on the web where they are specifically engaged in legitimate research and study that makes use of the data and where they do not further publish the material. Get more infomation or advice on copyright.

The Find and use media module provides information on how to comply with copyright.

Web APIs

Web APIs can be used to request data from a site using a URL. API stands for Application Programming Interface. Usually, some programming knowledge is needed to use APIs.  Web APIs for non-programmers explains how web APIs work, gives tips on using them and lists some popular free APIs.

Tools for scraping

Tools to scrape text and images from websites and social media.

Tool Freely available Guides
Python Yes Downloading webpages with Python
Scrapy Yes Scrapy documentation
Reaper (for social media) Yes Help information is available on the Reaper site.
NCapture (a Chrome extension used with NVivo) Yes What is NCapture?
Find out more about NVivo in the next section.

Interested in Instagram data? A comparison of Instagram scraping tools lists a range of tools.

Our Text mining and text analysis guide lists more methods for gathering social media data. Text mining is a way of extracting text data from documents, for analysis. Text mining 101 also provides a quick overview.

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Work with Data and Files Copyright © 2023 by The University of Queensland is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book