Skip to content

Introduction

Machine Learning Overview

Purpose and Goals

Key Features

Technologies Used

Endangered Wildlife OÜ is developing the Biodiversity Valuator, a software tool designed to help clients assess the financial value of biodiversity impacts, addressing a growing demand for ESG solutions. While carbon reporting has dominated the sustainability space, biodiversity’s critical role remains underrepresented. This Valuator aims to change that by translating biodiversity impact into financial terms, helping clients make informed decisions. Over five years, the company has tested its valuation methodology on various species and is now transitioning from consulting to a scalable software model.

The Biodiversity Database, a companion to the Valuator, will serve as a core data source, enhanced by machine learning to improve valuation accuracy. Together, these tools will provide industries, investors, and financial institutions with a way to assess biodiversity-related risks and returns, integrating biodiversity into strategic planning. With a Beta launch planned, Endangered Wildlife OÜ aims to establish itself as a unique fintech provider in biodiversity conservation, ultimately offering these tools as SaaS products to meet the evolving sustainability needs of the market.

Mind Map

Project Steps

Step Sub-Step Description
1 Data Collection
1.1 Check all data sources Review existing data sources for relevance
1.2 Add other sources to database or API list Integrate new data sources
1.3 Data Aggregation Combine data from various sources
2 Data Preparation
2.1 EDA (Exploratory Data Analysis)
2.1.1 Data Filtering Filter data based on relevance and quality
2.1.2 Data Validation & Cleansing Ensure data accuracy and consistency
2.1.3 Data Formatting Standardize data format
2.1.4 Data Aggregation & Reconciliation Combine and resolve data conflicts
2.2 Scaling and Imputation Adjust and fill in missing data
3 Data Visualization
3.1 Show data on charts Use Box, HeatMap, Histogram for visualization
3.2 Scatterplot/Pairplot Based on important features
4 ML Modeling
4.1 Choosing Models Select appropriate models for analysis
4.1.1 M1: Forecasting and Regression for time series data Temperature models, etc.
4.1.2 M2: Classification Models for PVA, food web, etc.
4.1.3 M3: Test Neural Network approaches Experiment with neural networks
4.1.4 M4: Test RAG + LLM approach Experiment with Retrieval-Augmented Generation
4.2 Training the Models Train models with prepared data
4.3 Evaluating the Models Measure model performance
4.4 Hyperparameters Tuning Optimize model parameters
4.5 Making Predictions Use models to make data-driven predictions
5 Feature Engineering
5.1 Finding optimal set of inputs Identify key inputs for models
5.2 Creating new features based on aggregated data Develop derived features
5.3 Transforming features to new ones Apply transformations to features
5.4 Speeding up data transformations Improve data processing speed
6 Model Deployment
6.1 Test in Production
6.1.1 Test Robustness, Compatibility, and Scalability Ensure production readiness
6.2 Create simple pipeline Set up streamlined deployment pipeline
7 UI/Monitoring
7.1 Admin Panel
7.1.1 Monitoring Page for Models Track model performance
7.1.2 Setting Page for Models Manage model settings
7.1.3 Modify All pages to comply with the new Architecture Update UI
7.2 Client Website
7.2.1 Change Bio API to use ML Models Integrate models into Bio API
7.2.2 Comply with new Bio API Ensure compatibility with updated API
8 Task Management
8.1 Retraining Models
8.1.1 Pipeline for retraining Models Automate model retraining
8.1.2 Tasks to run pipelines on a defined schedule Schedule retraining
8.1.3 Replacing the new Models Swap in retrained models
8.2 Store statistics Log model statistics
8.3 Health check Monitor system health
9 Documentation
9.1 Proposal Draft and maintain project proposal
9.2 In Code Documentation Document codebase
9.3 Documentation UI Provide accessible UI for documentation