ML Roadmap

1. Objectives

Objective: Models for predicting Biodiversity metrics.
Scope: The investigation will focus on developing models for predicting biodiversity metrics based on input documents such as proposals, variables, PVA, carbon, and datasets. The models will help in understanding the financial value of biodiversity, assessing its impact, and predicting changes over time.
Output Models:
- Species Financial Valuation Model: This model will calculate the financial value of individual species of biodiversity. It will use various features such as species characteristics, ecological importance, and economic factors.
- Biodiversity Database Population Model: This model will automatically populate the Biodiversity Database with relevant data. It will use data from various sources and apply machine learning techniques to ensure the database is comprehensive and up-to-date.
- Biodiversity Impact Assessment Model: This model will assess the impact of different activities on biodiversity. It will help in understanding how various actions affect biodiversity and provide insights for decision-making.
- Biodiversity Risk Assessment Model: This model will evaluate the risk and exposure to biodiversity loss for financial institutions. It will help banks and insurance companies adjust premiums and understand their risk related to biodiversity.
- Biodiversity Data Sourcing Model: Sources and retrieves biodiversity data that is not accessible via APIs.
- Data Interpretation and Inference Model: Interprets and applies structured data (e.g., abundance, relative abundance, density) to infer conclusions for other species and regions.
- Time Series Prediction Model: This model will predict changes in biodiversity over time. It will use historical data to forecast future trends and help in planning conservation efforts.
- Carbon Model: This model will estimate the carbon footprint associated with biodiversity activities. It will help in understanding the environmental impact of biodiversity-related actions.

2. Input Documents

List of Input Documents:
Proposal
Variables
PVA
Carbon
Datasets

3. Probable Output Models

List of Output Models:
- Species Financial Valuation Model (WTP Model)
- Biodiversity Database Population Model
- Biodiversity Impact Assessment Model
- Biodiversity Risk Assessment Model
- Time Series Prediction Model
- Carbon Model

4. Map Input Documents to Output Models

Mapping Strategy:
Criteria for mapping: extracted features from input documents such as PVA and Variables must have high scores. the scores will be calculated based on the importance of the features in the output models and the available data in the data sources.
Tools and techniques to be used
- NLP for extracting features from text documents
- Data preprocessing techniques for cleaning and transforming data
- Machine learning algorithms for model development (Random Forest, XGBoost, etc.)
- Evaluation metrics for model evaluation

To develop the mentioned models using the provided variables, we can follow these steps:

Data Collection and Preprocessing:
Collect data for each variable from relevant sources.
Clean and preprocess the data to handle missing values, outliers, and ensure consistency.
Feature Engineering:
Create new features by combining or transforming existing variables to better represent the underlying patterns.
Normalize or standardize the data if necessary.
Model Development:
Select appropriate machine learning algorithms for each model.
Train the models using the preprocessed data and engineered features.
Evaluate the models using suitable metrics.
Model Evaluation and Tuning:
Evaluate the performance of the models using cross-validation and other techniques.
Tune the hyperparameters to improve model performance.

Here is an updated outline of how each model can be developed using the provided variables, including the PVA data:

Species Financial Valuation Model

Input Features: Global population size, Local population size, Average eBay price, Average salary, Currency exchange rate, PVA variables (e.g., pop_carryingcapacity_k, pop_mortalityrates_femalemort, pop_mortalityrates_malemort). Output: Financial value of individual species. Algorithm: Regression models (e.g., Linear Regression, Random Forest Regressor).

Biodiversity Database Population Model

Input Features: Global population size, Local population size, Number of tourists, PVA variables (e.g., pop_initialpopulationsize_initialn, pop_reproductiverates_broodsize).
Output: Populated biodiversity database.
Algorithm: Clustering algorithms (e.g., K-Means) or database population techniques.

Biodiversity Impact Assessment Model

Input Features: Number of hashtags in Instagram, Number of hashtags in YouTube, Average Instagram conversion rate, Average YouTube conversion rate, PVA variables (e.g., pop_densitydependence_ddrepro, pop_densitydependence_ddp0).
Output: Impact assessment of activities on biodiversity.
Algorithm: Classification models (e.g., Decision Trees, SVM).

Biodiversity Risk Assessment Model

Input Features: Local government bond risk free interest rate, Average household income, Average household size, PVA variables (e.g., pop_carryingcapacity_k, pop_mortalityrates_femalemort, pop_mortalityrates_malemort).
Output: Risk and exposure to biodiversity loss.
Algorithm: Risk assessment models (e.g., Logistic Regression, Bayesian Networks).

Biodiversity Data Sourcing Model

Input Features: Various biodiversity data sources, including PVA variables (e.g., pop_densitydependence_ddrepro, pop_densitydependence_ddp0), and other relevant variables.
Output: Retrieved and structured biodiversity data.
Algorithm: Advanced data retrieval techniques beyond web scraping.

Data Interpretation and Inference Model

Input Features: Structured data (e.g., abundance, relative abundance, density), PVA variables (e.g., pop_carryingcapacity_k, pop_mortalityrates_femalemort, pop_mortalityrates_malemort), and other relevant variables.
Output: Inferred conclusions for other species and regions.
Algorithm: Inference models (e.g., Bayesian Inference).

Time Series Prediction Model

Input Features: Historical data of Global population size, Local population size, Number of tourists, PVA variables (e.g., pop_carryingcapacity_k, pop_mortalityrates_femalemort, pop_mortalityrates_malemort).
Output: Predicted changes in biodiversity over time.
Algorithm: Time series models (e.g., ARIMA, LSTM).

Carbon Model

Input Features: Distance travelled, Cost of travel, Household income, PVA variables (e.g., pop_carryingcapacity_k, pop_mortalityrates_femalemort, pop_mortalityrates_malemort).
Output: Carbon footprint estimation.
Algorithm: Regression models (e.g., Linear Regression, Gradient Boosting).

5. Investigation Process

Step-by-Step Process:
Initial Review:
- Extracting features from input documents PVA, Variables and Carbon.(By Liubov)
- Determining probable output models based on proposal. (By Reza)
- Matching the important features with suggested models.
- Matching the important features with data sources.
Model Selection: Select the most suitable models based on the extracted features and data sources.
Data Extraction: Extract relevant data from data sources.
Model Evaluation: Evaluate the extracted data against the output models.