Skip to content

Introduction

Machine Learning Overview

Purpose and Goals

Key Features

Technologies Used

Endangered Wildlife OÜ is developing the Biodiversity Valuator, a software tool designed to help clients assess the financial value of biodiversity impacts, addressing a growing demand for ESG solutions. While carbon reporting has dominated the sustainability space, biodiversity’s critical role remains underrepresented. This Valuator aims to change that by translating biodiversity impact into financial terms, helping clients make informed decisions. Over five years, the company has tested its valuation methodology on various species and is now transitioning from consulting to a scalable software model.

The Biodiversity Database, a companion to the Valuator, will serve as a core data source, enhanced by machine learning to improve valuation accuracy. Together, these tools will provide industries, investors, and financial institutions with a way to assess biodiversity-related risks and returns, integrating biodiversity into strategic planning. With a Beta launch planned, Endangered Wildlife OÜ aims to establish itself as a unique fintech provider in biodiversity conservation, ultimately offering these tools as SaaS products to meet the evolving sustainability needs of the market.

Mind Map

Feature Engineering
Model Deployment
Task Management
Data Collection
UI/Monitoring
Data Preparation
Data Visualization
Speeding up
data transformations
Finding optimal
set of inputs
Creating new features
Transforming features
to new ones
Create simple pipeline
Test in Production
Retraining Models
Store statistics
Health check
Sorces
Software
RDS vs NoSQL
Plot
(based on important features)
Chart
Biodiversity
Machine Learning
ML Modeling
Time Series
LLM approach
Hyperparameters Tuning
IM
SPP
PVA
Documentation
Reza-G
Github - WIKI
Create 
biodiversity.github.io
Add current 
documentation
Add Grant Details
Update each section
 as going forward
APIs
Public Datasets
Articles and PDFs
GBIF
marinespecies
https://opentraits.org/datasets.html
Jupyters
Apache Airflow
PDF AI Approches
Reza-G
iNaturalist
illuminate 
illuminate.google.com
Current Species
Azure
.....
Create and populate 
a vector database
Ollama
RAG
Vector Database
 (Qdrant)
Retraining
XGBOOST
Data Filtering
Data Validation & Cleansing
Data Formatting
EDA
Scaling and Imputation
Box
HeatMap
Histogram
Scatterplot
Pairplot
Data Aggregation & Reconciliation
Robustness
Compatibility
Scalability
Data Aggregation
Client Website
Admin Panel
Monitoring Page for Models
Setting Page for Models
Modify All pages to comply with the new Architecture
Change Bio API to use ML Models
Comply with new Bio API
Pipeline for
retraining Models
Tasks to run pipelines
 on a defined schedules
Replacing the
new Models
Training the Models
Evaluating the Models
Making Predictions
Articles and PDFs

Project Steps

Step Sub-Step Description
1 Data Collection
1.1 Check all data sources Review existing data sources for relevance
1.2 Add other sources to database or API list Integrate new data sources
1.3 Data Aggregation Combine data from various sources
2 Data Preparation
2.1 EDA (Exploratory Data Analysis)
2.1.1 Data Filtering Filter data based on relevance and quality
2.1.2 Data Validation & Cleansing Ensure data accuracy and consistency
2.1.3 Data Formatting Standardize data format
2.1.4 Data Aggregation & Reconciliation Combine and resolve data conflicts
2.2 Scaling and Imputation Adjust and fill in missing data
3 Data Visualization
3.1 Show data on charts Use Box, HeatMap, Histogram for visualization
3.2 Scatterplot/Pairplot Based on important features
4 ML Modeling
4.1 Choosing Models Select appropriate models for analysis
4.1.1 M1: Forecasting and Regression for time series data Temperature models, etc.
4.1.2 M2: Classification Models for PVA, food web, etc.
4.1.3 M3: Test Neural Network approaches Experiment with neural networks
4.1.4 M4: Test RAG + LLM approach Experiment with Retrieval-Augmented Generation
4.2 Training the Models Train models with prepared data
4.3 Evaluating the Models Measure model performance
4.4 Hyperparameters Tuning Optimize model parameters
4.5 Making Predictions Use models to make data-driven predictions
5 Feature Engineering
5.1 Finding optimal set of inputs Identify key inputs for models
5.2 Creating new features based on aggregated data Develop derived features
5.3 Transforming features to new ones Apply transformations to features
5.4 Speeding up data transformations Improve data processing speed
6 Model Deployment
6.1 Test in Production
6.1.1 Test Robustness, Compatibility, and Scalability Ensure production readiness
6.2 Create simple pipeline Set up streamlined deployment pipeline
7 UI/Monitoring
7.1 Admin Panel
7.1.1 Monitoring Page for Models Track model performance
7.1.2 Setting Page for Models Manage model settings
7.1.3 Modify All pages to comply with the new Architecture Update UI
7.2 Client Website
7.2.1 Change Bio API to use ML Models Integrate models into Bio API
7.2.2 Comply with new Bio API Ensure compatibility with updated API
8 Task Management
8.1 Retraining Models
8.1.1 Pipeline for retraining Models Automate model retraining
8.1.2 Tasks to run pipelines on a defined schedule Schedule retraining
8.1.3 Replacing the new Models Swap in retrained models
8.2 Store statistics Log model statistics
8.3 Health check Monitor system health
9 Documentation
9.1 Proposal Draft and maintain project proposal
9.2 In Code Documentation Document codebase
9.3 Documentation UI Provide accessible UI for documentation