BioPredict

Welcome to BioPredict

A pilot app modelling and predicting biodiversity on Irish Sea coastal structures

Artificial structures are built along coastlines globally to protect property but also provide access for sea-based activities. There is growing interest in being able to predict the biological communities that will colonise new structures, given their potential to provide stepping-stones for problem species and/or to provide surrogate habitats for coastal biodiversity.

This pilot app uses new and existing data describing the physical features (e.g. material, structure type); environmental context (e.g. wave exposure, salinity) and biological communities of 69 intertidal artificial structures around the Irish Sea coasts of Wales and Ireland. It can be used to model the relationships between the physical and environmental parameters (‘predictor variables’) and a range of biodiversity metrics calculated for each structure.

The tool provides four different ways of exploring the data and the models, as indicated in the boxes below. Each can be accessed via the tabs above.

Additional project outputs and resources

Ecostructure has undertaken numerous and comprehensive assessments of the Irish Sea area in regard to community composition, functional valuations and site connectivity. We are keen to offer users links to the other important outputs .

1. Metadata and GIS maps of artificial structures along the Irish (Counties Louth - Cork) and Welsh coastlines.

This data formed part of mapping work including observations of intrinsic (size, length, type of structure, material) and extrinsic features of the artificial structures, as published in Thompson, Crowe & Brooks (2021) Ocean & Coastal Management.

2. The following link provides access to data gathered via remote sensing techniques namely LiDAR and UAV obtained from a selection of artificial and natural study sites along the Irish and Welsh coastlines.

3. Ecosystem functions are vital for understanding the roles and value of species within a given location. Here, you can find a tool (including a refence guide and case studies) and resulting predictive model to explore how different communities can lead to different provisions of these important ecosystem functions.

4. Finally Ecostructure also aimed to review the likely dispersal of particles/larvae to highlight connectivity between locations and thus possible 'sink' locations for potentially invasive species can be found here. Please seek contact with a member of the Ecostructure team for login credentials.

Please note: the data provided in all links above still subject to publication by members of the Ecostructure project team. If you intend to publish any data, in part or full, and/or if you are having difficulty accessing the data, please contact the Ecostructure team.

Code and data are archived below:

Pre-defined models of biota and biological indices

These pre-defined models have been run for selected species/groups/indices and provide the greatest level of accuracy in identifying the most influential predictor variables and characterising their influence and the uncertainty associated with the resultant model outputs.

The Custom Models tool on the next tab enables you to select any species, group or index and run models in real time, but the outputs are less reliable than those presented on this tab.

Step 1: Select a: Species / Group or Biological Index

Choose an index:

Step 2: Explore the model outputs

1. Decision Tree

The branching white boxes in the tree show variables (environmental and/or context) required to best classify our sites in relation to a “pass” or “fail”

The end points (coloured red to green) are known as leaves. In each leaf the text indicates the most likely outcome (PASS or FAIL). The two numbers indicate the probabilities of the species or index being (a) below and (b) above the pass-fail threshold under the environmental conditions specified in the branches of the tree leading to that leaf.

The depth of colour of the leaves is representative of the confidence (and the volume of data falling with that leaf) in the prediction. Leaves that are bright green or bright red indicate confident predictions of pass or fail respectively (with a probability of 0.85 or greater). Pale green or orange colours indicate less confidence (or similar probinilties as above but less data). Yellow leaves represent classifications that should be considered with caution. These leaves have proportions closer to “.60 / .40“ or “.50 / .50“. Here the algorithm will still suggest the most likely outcome but the supporting evidence is not clear-cut.

2. Model Accuracy

Here we present the results of a simple accuracy assesemnt known as a confusion matrix.

Here we present the results of a simple accuracy assessment known as a confusion matrix. In this grid the perfect model would classify data into either the top left and top right boxes known as true positive (TP) and true negative (TN).

Classifications within the top right box indicate sites that PASS the threshold being predicted as FAIL. Classification in the bottom right corner indicate sites that FAIL the threshold being predicted to PASS.

3. How many variables best predict the chosen index?

In this line plot are the results of a further machine learning algorithm called recursive feature elimination (RFE). The computer attempts to use every combination of our variables and calculate the typical accuracy of models containing these many variables.

The dots at each point along the x-axis represent the average performance for models containing that many variables. We as a result want to create models with as high score as possible but also as simple as possible so look for a clear plateau and use this construction for our final predictions.

4. What are the best variables to predict the chosen index?

There are several ways of identifying the “best” variables from a list of candidates. The method used here is a further product of the machine learning algorithm used above. Here the variables are scored in terms of 1) their importance and, 2) their occurrence in the computer runs.

The y- axis, presents the loss of predictive skill when leaving that variable out. A high score meaning the model is worse without this variable.

The x-axis shows the average depth or stage the computer requires this variable within its decision trees. The lower the number the more readily required the variable.

Typically, variables plotting towards the top left are the most valuable. All the variables within the plot are important to some degree, however those with dots outlined in black are the ”best of the best”.

Custom models of biota and biological indices

This 'Custom Models' tool enables you to select any species, group or index and run models in real time to characterise their occurence according to the predictor variables, but the outputs are less reliable than those presented on the 'Pre-defined Models' tab.

1. Choose variables

Abundance Data:

Choose single or multiple:

Presence / Absence Data:

Choose single or multiple:

Select predictor

Choose single or multiple:

Abundance Presence / Absence
This tool allows you to choose (aggregate) several species for prediction. Bear in mind that the custom list is very large and/or contains species with very different life habits and ecological requirements. The model for the aggregation as a whole is unlikely to be very accurate.

2. Explore and select threshold

Select a threshold

This setting enables you to specify the level of abundance or richness above which the model indicates an occurrence of the selected species/group.

Values and Threshold

3. View Model

Model Tree

Here the tool has generated a decision tree based on the inputs provided in step 1 and the target threshold in step 2. The tree shows (in white boxes) the environmental, site and categorical variables deemed most mathematically important to classify the data from our site surveys by the threshold defined in step 2 as “1” or “0”.

At the bottom of each branch is a leaf (or node). In this leaf, the computer provides the most likely outcome “1” or “0”. Where more than 50% (> 0.50) of the data following the prompts of the decision tree reaching a leaf is above the threshold (set in step 2), the computer will return a leaf with a “1” heading. The opposite (under 50%, < 0.5) will result in “0”.

Not all classifications are 100% accurate. To represent the strength or confidence in these classifications in the leaves a colour gradient has been applied where 100% “1” will result in a deep green leaf, 100% “0” a red leaf. Where there is a mixture, the colour will reflect the portions with 50:50 being yellows and other proportions shades of orange. Where leaves are not deep green or red this indicates some error or noise in the models and thus taken with more caution.

Data resources

On this page we provide meta-data and infomation regarding data sources used in the production of this Ecostructure output.

Resource 1: Ecostructure Data

Please click on the resources below (2&3) to download the environmental and biological meta-data.

Please be sure to acknowledge the data source of this tool if used in publications or other applications. Recommended citation available at:

Working title: Lawrence P.J., et al “Predicting biological communities on artificial coastal structures“

Ecostructure would like to recognise all the key partners:

Paul Brooks, Jennifer Coughlan, Veronica Farrugia Drakard, Donal Lennon, Bryan Thompson & Tasman Crowe (University College Dublin)

Peter Lawrence, Tim D'Urban Jackson, Stuart Jenkins, Liz Morris-Webb, Siobhan Vye & Andy Davies* (Bangor University (* & University of Rhode Island))

Ally Evans, Hannah Earp, Liz Humphreys, Tomos Jones, Melanie Prentice, Harry Thatcher & Pippa Moore* (Aberystwyth University (* & Newcastle University))

Tom Fairchild & John Griffin (Swansea University)

Amy Dozier & Kathrin Kopke (University College Cork)

Further thanks to Keaton Wilson (University of Arizona) and Paula Gutiérrez-Muñoz (Instituto de Investigaciones Marinas (CSIC))

Special thanks to the steering committee for guidance and feedback. We further acknowledge the early roles that Dr. Louise Firth and Prof. Steve Hawkins played with Ecostructure colleagues in identifying the need for and developing the concept of such tools

Resource 2: Meta-data (Environmental variables).

Please click on the tab below to download the environmental meta-data.

Resource 3: Meta-data (Biotic data collection).

Please click on the tab below to download the biotic meta-data.

Welcome to BioPredict

A pilot app modelling and predicting biodiversity on Irish Sea coastal structures

Additional project outputs and resources

Biodiversity on the Irish Sea coastal structures - map the data

Instructions (exploratory mapping)

Overview

Contact Information

Disclaimer

Connect

Funding

Step 1: Choose data to map

Step 2 (optional): Filter by material, structure type and urban/rural setting.

Step 3 (optional): Filter by environmental condition

View your data selection on the map below

Pre-defined models of biota and biological indices

Instructions (pre-defined predictive modelling)

Overview

Contact Information

Disclaimer

Connect

Funding

Step 1: Select a: Species / Group or Biological Index

Step 2: Explore the model outputs

1. Decision Tree

2. Model Accuracy

3. How many variables best predict the chosen index?

4. What are the best variables to predict the chosen index?

Custom models of biota and biological indices

Instructions (user defined “Custom Models”)

Overview

Contact Information

Disclaimer

Connect

Funding

1. Choose variables

Abundance Data:

Presence / Absence Data:

2. Explore and select threshold

Select a threshold

Values and Threshold

3. View Model

Model Tree

Data resources

Instructions

Overview

Contact Information

Disclaimer

Connect

Funding

Resource 1: Ecostructure Data

Ecostructure would like to recognise all the key partners:

Special thanks to the steering committee for guidance and feedback. We further acknowledge the early roles that Dr. Louise Firth and Prof. Steve Hawkins played with Ecostructure colleagues in identifying the need for and developing the concept of such tools

Resource 2: Meta-data (Environmental variables).

Resource 3: Meta-data (Biotic data collection).