Processing & Transforming Data
Navigate the knowledge tree: 🌿 Skills ➡ Life Processes
Italics
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
When scientists collect data from an experiment, what they start with is raw data—the direct measurements or observations made during testing. While this information is essential, it’s often messy and difficult to interpret on its own. It often contains many individual values, measurements, or observations that make it hard to see patterns or trends.Â
Processing data is about turning that raw information into something meaningful. This processing is done by organising the raw data and applying calculations. The resulting processed data is now in a form that scientists can use to identify trends, patterns, and relationships between variables (independent and dependent variables). For example, if you were testing how light intensity affects plant growth, processing your data might help you see whether plants really grew taller under brighter light. Without processing, you might just have a confusing list of numbers.
Processing also helps detect errors, reduce random errors by calculating averages, check for consistency and prepare the data for visual presentation in graphs or tables—making it easier to draw accurate conclusions. More information on the following processing methods can be found below:
Calculating totals
Calculating rates
Calculating percentages
Calculating measures of central tendency (mean, median and mode)
Although calculating measures of spread (range and standard deviation) is processing data, this content is covered in Analysing and Interpreting Data.
Not all data are the same, so the way you process them depends on the type of data you’ve collected. Because quantitative data are numerical, they can be processed mathematically—by calculating averages, ranges, or other statistics that reveal patterns and relationships.
Tables are used to support the processing of raw data in various ways.Â
One example is when further columns are added to the data table to carry out calculations on existing columns. For example, the table below shows how two of the columns (Mass and Volume) are used to collect measured values (raw data), while the final column (Density) contains calculated values (processed data).Â
Another example is when raw data from one table are counted to produce more data tables showing the values of the counts.Â
For example, let's say that we collected data on student eye colour and shoe size. Our raw data table will resemble Figure 3.3a, where columns represent the variables, and the rows represent each data point collected...
...if we just wanted to process that raw data of eye colour, we could produce a 'table of counts' (also known as a 'frequency table'). Frequency is the number obtained by counting. For example, if there are 7 students with blue eyes, then this category has a frequency of 7.
The column / variable 'Eye colour' contains categorical data (qualitative data). The raw data in Figure 3.3a also includes 'shoe sizes' which contains discrete (quantitative data). It would be possible to count the number of students with eeach shoe size, but there could be quite a large number of categories!
Here, it may be more convenient (and sensible) to use fewer categories by choosing some groups / classes of shoe size, and to count the nubers in those groups. So Figure 3.3c is also a frequency table, but shows grouped data.Â
Both figures (b) and (c) each show the numbers of students categoried by one variable (eye colour or shoe size)...
...but what happens if we want to show the numbers of students categorised by both of these factors?
We would need to create a two-way frequency table. These tabls are useful to see if two variables are related. For example, if there were a large number of students with green eyes and large shoe sizes, then this type of table will represent that data better.Â
Two-way tables are better for showing relationships between two factors.Â
Basic calculations such as totals, are commonly used to compare treatments. Totals are also used before carrying out other data transformations. To calculate a total, add all of the data values for a variable.
Tally charts group collected measurements into classes, and record the number in each class. As you create tallies, cross out each value on the original list to prevent double entries. Check all values are crossed out at the end and that totals agree. The totals of a tally chart records the number of times a class of values occurs in a data set.Â
Tally charts are a useful first step in analysis, allowing the experimenter to visualise trends or patterns. A neatly constructed tally chart doubles as a simple histogram.Â
Example: Height of 6-day old seedlings.
Percentages are expressed as a fraction of 100. Percentages express the proportion of data falling into any one category, e.g. for pie graphs.Â
Allows meaningful comparison between different samples.Â
Useful to monitor change (e.g. % increase from one year to the next).Â
Example: Percentage of lean body mass in men.
Rates are useful in making multiple data sets comparable (e.g. if recordings were made over different time periods).
(basic calculation) such as The calculation of rate (amount per unit time) is another common calculation and is appropriate for many biological situations (e.g measuring growth or weight loss or gain). For a line graph, with time as the independent variable plotted against the values of the biological response, the slope of the line is a measure of the rate. Biological investigations often compare the rates of events in different situations.Â
Rates are expressed as a measure per unit time. Rates show how a variable changes over a standard time period (e.g. one second, one minute, or one hour). Rates allow meaningful comparison of data that may have been recorded over different time periods.Â
Example: Rate of sweat loss during exercise in cyclists.
One of the most common ways to process numerical data is by calculating the mean (xÌ„). This helps reduce the impact of random errors and shows the most representative value for a set of data.Â
To calculate an average (xÌ„), add all the data values (x) measured for the dependent variable, then divide the total by the number of data points (n).Â
Be aware of significant figures when calculating averages. An average should only have the same number of figures as the number of figures of the least precise raw data value - e.g. if one figure is recorded as a whole number, then the average should also be a whole number (rounded off). Means with a large number of figures after the decimal point do not show the appropriate processing.
These means need to be recorded in a table in the results section of the report. The means are also used to draw an appropriate graph(s) to illustrate a pattern or trend (or its absence).
(Note that outliers (very extreme values) are usually excluded from calculations of the mean.)
Sometimes, it is necessary to compare two sets of results
raw data need more than just processing—they need to be transformed. Transforming data means changing it mathematically to make it easier to analyse or compare. In biology, transformations are especially useful for data that don’t follow a normal distribution.
The simplest and most powerful statistical tests generally require data to exhibit a normal distribution, yet many biological variables are not distributed in this way. This problem can be removed by transforming the data. Data transformation can help to account for differences between sample sizes in different treatments and is a perfectly legitimate way to normalise data so that it meets the criteria for analysis. It is not a way to manipulate data to get the result you want. Your choice of data transformation is based on the type of data you have and how you propose to analyse it. Some experimental results may be so clear, a complex statistical analysis is unnecessary.
Calculating reciprocals
Calculating square roots
Calculating Log10
Further Data Transformations -Â
Reciprocals:Â
1 / x is the reciprocal of x.Â
Reciprocals of time (1/data value) can provide a crude measure of rate in situations where the variable measured is the total time taken to complete a task.Â
Problem it solves: Responses are measured over different time scales. For example, the time taken for colour change in an enzyme reaction.Â
Square root:Â
A square root is a value that when multiplied by itself gives the original number. Â
Applied to data that counts something
The square root of a negative number cannot be taken. Negative numbers are made positive by the addition of a constant value.Â
Helps to normalise skewed data.
Problem it solves: Skewed data. For example, the number of woodlice distributed across a transect.Â
Log10
A log transformation has the effect of normalising data.Â
Log transformations are useful for data where there is an exponential increase in the numbers (e.g. cell growth)
Log transformed data will not plot as a straight line and the numbers are more manageable.
To find the log10 of a number, e.g. 32, using a calculator, key in 'log' '32' = ... The answer should be 1.51
Problem it solves: exponential increases. For example, cell growth in a yeast culture.Â
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
word: Definition
Need help? Consider getting Private Tutoring or Personalised Feedback for your work from Lemonade-Ed's Mrs. Heald.Â
Description
Description
Description