# Computer-based work (mostly statistics and programming)

**1. Guestimating GHG break-even point for biomass gasification**

**1. Guestimating GHG break-even point for biomass gasification**

Wood gas generation, using wood, manure, compost and like, produces CH4 under anaerobic conditions. Since renewable resources were used, biomass gas are considered sustainable etc. However, CH4 is a potent green-house gas, 84 times higher greenhouse warming potential than CO2 over 20 years (https://en.wikipedia.org/wiki/Greenhouse_gas), and all biomass gasifiers leak. In industrial settings, leakage is under 5% (https://www.umweltbundesamt.de/themen/biogasanlagen-muessen-sicherer-emissionsaermer). In developing countries, particularly for self-made biomass generators and manual methan transport (https://www.deutschlandfunk.de/mini-biogasanlagen-fuer-afrika-wirtschaftsfoerderung-statt.1773.de.html?dram:article_id=459738), such leakages can be expected to easily be in the 20-30%. The aim of this project is to compute the break-even point of biomass gasification, given the global warming potential difference between CO2 and CH4. How much leakage is acceptable before causing more problems than solving them? The key point here is to (a) establish a transparent derivation of the balance; and (b) consider different time horizons of GHG activity of the two gases.

**2. Fitting the elephant Integral Projection Model to observed data from Amboseli, Kenya**

**2. Fitting the elephant Integral Projection Model to observed data from Amboseli, Kenya**

African elephant populations have been studied extensively. Local censuses date back to the 19th century and yet, historical estimates of the continental elephant population are scarce and uncertain. This project aims to estimate the population dynamics and spatial distribution of the African elephant from 1900 until today. Population size estimates can be derived from census reports and other published material. This project will be part of a larger demographic analysis of the continental African elephant population, which provides the opportunity to work alongside field and theoretical ecologists.

For his PhD, Severin Hauenstein has developed a population model, akin to, but more advanced than, a structured matrix population model. So far, this model is parameterised from literature data, yielding nice predictions for a population in Kenya.

The next step, taken here, is to actually parameterise the model with the data of Amboseli, i.e. to fit the model to data. This requires a Bayesian model calibration approach, which is an intellectual hurdle, but also really cool. Data and model code are available, and tutorials for using Bayesian calibration are provided e.g. by R's BayesianTools package.

**Suitable as:*** *MSc project

**Requirements:** Willingness to engage in computer-intensive, statistical work.

**Time:** The project can start anytime.

**Contact:** **Carsten Dormann**, carsten.dormann@biom.uni-freiburg.de

**3. How does overdispersion of count data (non-independent events) affect quantitative network analysis?**

**3. How does overdispersion of count data (non-independent events) affect quantitative network analysis?**

Network analysis is a popular tool for understanding the complexity of ecosystems with respect to species interactions, for example those between plants and their pollinators. Quantitative networks are supposed to be more meaningful for ecosystem functions and more robust to sampling effects. However, many methods for quantitative networks assume that network data (interaction frequency) are based on independent events. Just like in regular poisson regression, this assumption may often be violated: multiple visits by the same individual, social behavior or spatiotemporal heterogeneity may lead to non-independence of interaction events, potentially strongly influencing network patterns and compromising inference. An example where such effects are particularly severe are the counts from pollen counts or fecal analysis, which are thus often not analysed in a fully quantitative way. This project has the potential to challenge conclusions of hundreds of published research papers.

**Methods:** This thesis will explore the influence of this effect on the estimation of specialization and on the significance of patterns inferred from null models. It will combine:

- data simulation using statistical models or (optionally) simple process-based models

- analysis of existing datasets (for which e.g. number of individuals interacting can be compared to the number of visits)

- exploration of solutions to the problem (e.g. log-transformation, using prevalence instead of fully quantity, hierarchical models, or own developed methods that explicitly account for overdispersion)**Suitable as:*** *BSc or MSc thesis project

**Requirements:** strong dedication to work with R, basic programming and statistics skills using R

**Time:** can start anytime.

**Contact: Dr. Jochen Fründ**, jochen.fruend@biom.uni-freiburg.de, 0761/203-3747

**4. Automatising statistical analyses**

**4. Automatising statistical analyses**

Why does every data set require the analyst to start over with all the things she has learned during her studies? Surely much of this can be automatised!

Apart from attempts to make human-readable output from statistical analyses, efforts to automatise even simple analyses have not made it onto the market. But some parts of a statistical analysis can surely be automatised, in a supportive way. For example, after fitting a model, model diagnostics should be relatively straight-forward to carry out and report automatically. Or a comparison of the fitted model with some hyperflexibel algorithm to see whether the model could be improved in principle. Or automatic proposals for the type of distribution to use, to deal with correlated predictors, or to plot main effects?

Here is your chance to have a go! In addition to the fun of inventing and implementing algorithms to automatically do something, you will realise why some things are not yet automatised.

This project has many potential dimensions. It could focus on traditional model diagnostics, or on automatised plotting, or on comparisons of GLMs with machine learning approaches to improve model structure, or ...

**Suitable as:*** *BSc/MSc project

**Requirements:** Willingness to engage in R programming and abstract thinking. Frustration tolerance to error messages.

**Time:** The project can start anytime.

**Contact:** carsten.dormann@biom.uni-freiburg.de

**5. Unified sampling model for abundance of species in communities: fitting an ugly likelihood using MCMC [R programming; community data analysis]**

**5. Unified sampling model for abundance of species in communities: fitting an ugly likelihood using MCMC [R programming; community data analysis]**

How many individuals would we expect a species to have in a local community? That may sound like a strange question, but then we do observe that most species are rare, only some are very common. So there is a pattern! Since several years, Sean Connolly has attempted to find a statistical distribution that describes how many individuals to expect for each species of a community across samples from many sites. The result is a very ugly distribution, but it has the potential to be enormously useful! Alongside the equation, Connolly et al. (2017) also provide a function to fit this ugly distribution, but it only works reliably for large data set (many species, many individuals, many sites), which severely limits its usefulness.

This project attempt to use an MCMC-fitting algorithm (of the many existing ones) to estimate the parameters of the ugly equation in a more robust way. Also, we can expect that the parameters of this distribution are dependent on the environment, which is currently not implemented in Connolly et al.'s functions. This way, one could use the ugly equation to estimate the effect of, say, landscape structure on abundances of birds or spiders in a statistically satisfying way: using the information of all species, rather than only the number of species or a diversity index.

**Methods**: R programming: Develop/adopt MCMC-sampler to fitting the ugly equation (→).

**Analysis**: Apply to one or more community data along a landscape structure gradient.

**Requirements**: Basic mathematics: The ugly function features all sorts of mathematical niceties, which at least require tolerance to formulae.

**Time**: The project can start anytime. Suitable as MSc project.

**References**:

Connolly, S.R., Hughes, T.P. & Bellwood, D.R. (2017) A unified model explains commonness and rarity on coral reefs. *Ecology Letters*, 20, 477–486.

**Contact**: Carsten Dormann: carsten.dormann@biom.uni-freiburg.de