Publications scientifiques

Accédez à la plateforme HAL pour déposer ou consulter les publications issues des recherches scientifiques du PEPR.

Logo Hal
© HAL

La collection du PEPR est accessible sur https://hal.science/AGROECONUM/ ou en faisant une recherche avancée sur https://hal.science/.

/!\ Pour qu'un dépôt soit bien associé à la collection AgroEcoNum, il est indispensable de renseigner votre code projet ANR (ANR-22-PEAE-XXXX ou ANR-24-PEAE-XXXX).

Phrase de remerciement à mettre dans les publications scientifiques des projets ayant bénéficié de fonds du programme de recherche Agroécologie et Numérique :

Version française :
Ce travail a bénéficié d'une aide de l'État gérée par l'Agence Nationale de la Recherche au titre de France 2030 dans le cadre du PEPR Agroécologie et Numérique et portant la référence « ANR-**-PEAE-**** ».

Version anglaise :
This work received government funding managed by the Agence Nationale de la Recherche under the France 2030 program as part of the Agroecology and Digital research program, reference number “ANR-**-PEAE-****”.

Références par projet :

ADAAPT - ANR-24-PEAE-0001
AgriFutur - ANR-24-PEAE-0002
AgroDiv - ANR-22-PEAE-0005
AGROECOPHEN - ANR-22-PEAE-0012
BIODICAPT - ANR-24-PEAE-0003
BReIF - ANR-22-PEAE-0014
CoBreeding - ANR-22-PEAE-0003
CoEDiTAg - ANR-22-PEAE-0002
EcoControl - ANR-24-PEAE-0004
HOLOBIONTS - ANR-22-PEAE-0006
LINDDA - ANR-22-PEAE-0004
MELICERTES - ANR-22-PEAE-0010
MISTIC - ANR-22-PEAE-0011
NINSAR - ANR-22-PEAE-0007
PATASEL - ANR-22-PEAE-0013
Pl@ntAgroEco - ANR-22-PEAE-0009
TwinFarms - ANR-24-PEAE-0005
WAIT4 - ANR-22-PEAE-0008

HAL : Dernières publications

  • [hal-05230510] Seed Inference in Interacting Microbial Communities Using Combinatorial Optimization

    The behaviour of microorganisms and microbial communities can be abstracted by models combining a description of their metabolic capabilities as metabolic networks, and suitable computational or mathematical paradigms that further integrate simulation conditions. A major component of the latter is the composition of the environment or growth medium that can be referred to as seeds. Predicting the seeds from the metabolic network and an expected behaviour is an inverse problem that can be addressed with linear programming or logic paradigms such as Answer Set Programming (ASP). Here, we formalise seed prediction for microbial communities, taking into account that their members may interact positively through metabolite transfers, which may reduce the need for external seed metabolites. We address the problem with ASP and add a hybrid component ensuring the satisfiability of linear constraints. We explore the subset-minimality solving heuristic of the Clingo solver and develop two heuristics supporting priority of seeds over transfers. We present a proof of concept of seed inference in small-scale communities, and assess the scalability of the three heuristics at genome-scale. Overall, our work introduces a hybrid logic-linear model for seed inference in interacting microbial communities, and new heuristics for the exploration of the solution space with subset minimality optimisations.

    ano.nymous@ccsd.cnrs.fr.invalid (Chabname Ghassemi Nedjad) 29 Aug 2025

    https://inria.hal.science/hal-05230510v1
  • [hal-05578828] Use of Surface Water and Ocean Topography (SWOT) observations to support Land Use/Land Cover (LULC) change products: the case of the pacific coast of Ecuador

    <div><p>Radar altimetry has been used to characterize land surfaces. However, the nadir configuration of the radar altimeter sensor and its coarse spatial resolution were limiting factor. The Surface Water and Ocean Topography (SWOT) mission overcomes these limitations through its Ka-band Radar Interferometer (KaRIn), a synthetic aperture radar (SAR) system, providing high spatial resolution and accurate surface height measurements. Initially used for hydrology and oceanography, this study explores an innovative use of SWOT to analyse changes in Land Use/Land Cover (LULC). To do this, three study areas located on the Pacific Coast of Ecuador were considered. The area in the south (A) is characteristic of cultivated areas, while the area in the center (B) presents a landscape mosaic and the area in the north (C) hosts tropical rainforests. For each study area, the SWOT backscatter coefficient (sig0) was analysed for the year 2024 from the raster product at 100m spatial resolution. We calculated the number of occurrences and the sig0 average from the raster product over each pixel. The spatial patterns obtained from these two variables enabled us to assign a LULC class (city, water, road, crop, or no forest, depending on the study area) to each pixel, using a Support Vector Machine (SVM). The assigned LULC classes depend on the partial spatial coverage of the SWOT data, which does not allow representing all the LULC classes. The classification results were compared with the LULC map provided by the Ministry of the Environment using a confusion matrix and obtained an accuracy greater than 0.87 and an F1 score greater than 0.89 for the three study areas. In the forest area (C), the SWOT observations were also compared to two change detection products: RAdar for Detecting Deforestation (RADD) alerts and detections by the Cumulative Sum (CuSum) method. 39% of the SWOT observations were in areas identified as forest by these products but classified as no forest or water in our SWOT classification. By detecting small streams (areas A, B and C), roads (area A), the boundaries of agricultural plots and the state of cultivated land (area A) as well as recent forms of deforestation (zone C), SWOT was found to be a complementary source of information for LULC change products.</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Valentine Sollier) 03 Apr 2026

    https://hal.inrae.fr/hal-05578828v1
  • [hal-05567423] Video of the presentation at LREC conference: EPOP: A benchmark corpus for Assessing NLP Models on Structured Information Extraction in Plant Health

    This video presents the work published in LREC conference proceedings in 2026. In this presentation, we introduce the EPOP (Epidemiomonitoring of Plants) corpus, a new annotated resource for structured information extraction in the domain of plant health epidemiology. The corpus consists of translated news reports that reflect real-world phytosanitary monitoring scenarios. It includes annotations for named entities (e.g. Plant, Pest, Vector, Disease, Dissemination Pathway), identity coreferences, and both binary and complex n-ary relations that represent key events such as Transmits or Causes, along with their modalities. A distinctive feature of EPOP is its normalization layer where mentions of species and geographical locations are linked to canonical identifiers in the NCBI Taxonomy and GeoNames, enabling semantic disambiguation and integration with external knowledge bases. As the first publicly available corpus of its kind, EPOP presents a realistic and challenging benchmark, with high linguistic variability, entity role ambiguity, and long-distance relations. We report baseline results on core tasks (named entity recognition, normalization (entity-linking), and relation extraction) using both fine-tuned BERT-based models and hard-prompted large language models. These experiments demonstrate the utility of EPOP while also identifying areas for improvement, particularly in the extraction of complex relations. The corpus is released under an open license, to support research in environmental NLP, crop protection, and knowledge graph enrichment.

    ano.nymous@ccsd.cnrs.fr.invalid (Claire Nédellec) 25 Mar 2026

    https://hal.inrae.fr/hal-05567423v1
  • [hal-05584190] Diversité génétique mondiale du complexe Medicago sativa : implications pour l’amélioration variétale de la luzerne.

    La luzerne cultivée appartient au complexe Medicago sativa, un ensemble de quatre sous-espèces (sativa, falcata, caerulea, ×varia) dont la délimitation taxonomique demeure incertaine, ce qui limite l'exploitation rationnelle de la diversité. Afin d'évaluer la structure génétique et de clarifier les relations entre sous-espèces, formes sauvages et cultivars, nous avons génotypé environ 1 500 accessions à l'aide de 9 761 SNP. Une analyse discriminante des composantes principales (DAPC) a confirmé la différenciation génétique entre les quatre sous-espèces. Au sein de chaque sous-espèce, une structure géographique marquée a été mise en évidence ; toutefois, pour les cultivars — tous appartenant à la ssp. sativa — les groupes régionaux se recouvraient largement. Au sein de chaque groupe régional, le nombre d'allèles privés le plus élevé a été observé dans les groupes de la ssp. falcata et de la ssp. ×varia. Les groupes de la ssp. caerulea présentent également un nombre d'allèles privés modéré. En revanche, les groupes de la ssp. sativa, qu'ils soient cultivés ou sauvages, en contiennent très peu, voire aucun, à l'exception des accessions cultivées de Chine et de Scandinavie. Afin d'identifier les origines géographiques de la domestication, nous avons projeté les groupes cultivés dans l'espace génétique des pools sauvages. Les accessions chinoises, indo-moyen-orientales et afghano-persanes se projettent presque exclusivement sur le groupe sauvage d'Asie centrale, tandis que les cultivars occidentaux se superposent au groupe sauvage méditerranéen. Ces trajectoires suggèrent l'existence d'au moins deux centres de domestication distincts. Ces résultats ouvrent la voie à l'identification de pools géniques régionaux spécifiques encore sous-exploités, et offrent des opportunités concrètes pour une meilleure valorisation de la diversité génétique dans les schémas d'amélioration de la luzerne.

    ano.nymous@ccsd.cnrs.fr.invalid (Irving Arcia Ruiz) 08 Apr 2026

    https://hal.science/hal-05584190v1
  • [hal-05563066] Genetic and heat-stress related environmental influences on pig whole-blood gene expression levels

    Background: Gene expression levels are affected by genetics and environmental effects. However, quantification of the influence of genetics and environmental effects on gene expression remains limited, especially in farm animals. Here, the relative influence of genetic and heat-related environmental variations on gene expression levels was investigated in pigs, using a backcross herd of diverse heat adaptation levels. Backcross animals were raised in either a tropical or temperate environment. Animals raised in temperate environment were subjected to an experimental heat stress at the end of their growth. Results: We identified 1,967 differentially expressed genes (DEGs) between pigs raised in the tropical (n = 181) and temperate (n = 180) facilities, and 472 DEGs throughout a 3 weeks experimental heat stress. Transcriptome-wide association (TWAS) study identified 139 associations between gene expression levels and thermoregulation/production traits. We detected 6,014 expression quantitative trait loci (eQTLs) associated with the expression level of 3,297 genes. Genetic variance was estimated to explain 36.3% of gene expression variance on average, and was the main source of variance for 27.7% of transcripts. Most eQTLs found are located in proximal regions (cis-eQTLs) and few within distal regions (trans-eQTLs) to their assigned genes. A trans-eQTL hotspot highlighted a hematopoietic mechanism driven by GPATCH8 . An integration of GWAS and TWAS pointed to TMCO1 and ZNF184 as candidate genes for backfat thickness. Conclusions: This study provides a better understanding of the impact of climate, heat stress and genetic influences on the pig whole blood transcriptome.

    ano.nymous@ccsd.cnrs.fr.invalid (Arthur Durante) 23 Mar 2026

    https://hal.science/hal-05563066v1
  • [hal-05558287] Standardizing plant damage datasets via EPPO taxonomy: A label harmonization approach using large language models

    Pests and diseases threaten global crop yields, yet the absence of standardized plant-damage datasets limits progress toward general, robust diagnostic tools. Existing resources differ widely in label conventions and scope, hindering interoperability and model generalization. We introduce a fully automated method for harmonizing plant-damage labels across heterogeneous datasets by mapping them to the European and Mediterranean Plant Protection Organization (EPPO) taxonomy. The approach uses large-language-model (LLM) embeddings to capture semantic similarity among label terms, including synonyms, multilingual variants, and vernacular names. Across multiple mapping strategies, embedding-based similarity using OpenAI’s text-embedding-3-large provided the best performance, reaching an F1 score of 0.836 at optimal thresholds and outperforming string-based Levenshtein matching and other LLM baselines. Applying this method, we unified five expert-curated datasets, including the newly released ePhytia collection, yielding 79,808 images mapped to 1895 EPPO-aligned classes. To assess the value of this harmonization, we finetuned a generalist pretrained Vision Transformer for large-scale plant-damage identification. Models trained on LLM-aligned labels consistently surpassed those trained with edit-distance mappings. On independent EPPO test images, our best model achieved 19.4% top-1 accuracy across 1091 classes and 33.1% on the 100 most common classes, demonstrating feasibility at unprecedented label scale. In-dataset evaluation reached 55.8% top-1 accuracy. By grounding label harmonization in an international standard, this work delivers the first large-scale, taxonomy-compliant dataset for in-field plant-damage recognition and establishes a foundation for interoperable diagnostic tools, farmer-facing mobile systems, and plant-health monitoring. We release both the harmonized dataset and the new ePhytia images to support future research.

    ano.nymous@ccsd.cnrs.fr.invalid (Jules Vandeputte) 19 Mar 2026

    https://inria.hal.science/hal-05558287v1
  • [hal-05509112] How long-lived trees remember: Epigenetic memory and priming of drought and heat stress in meristems and embryos

    Abstract With climate change accelerating the frequency and intensity of heat and drought events, forestry urgently needs strategies that enhance stress tolerance without relying solely on genetic improvement, which in trees requires decades. Priming, pre-exposing plants to mild stress or biological signals to reinforce future responses, offers a promising approach for long-lived species. Unlike annual model plants, trees experience multi-year stress cycles, making priming particularly relevant for forestry, restoration, and climate-adaptive management. Our research focuses on developmental windows and cell dividing tissues with high potential for epigenetic memory, somatic embryos and meristems, examined under water deficit, thermal stress, biochar amendment, and mycorrhizal symbiosis. Across experiments, we observe persistent molecular signatures lasting weeks to seasons, and in some cases trans-annual memory. In contrast to short-lived species where histone modifications dominate, trees often display stronger involvement of DNA methylation in these persistent states, consistent with our recent findings in maritime pine embryogenesis and poplar cambium (Trontin et al., 2025; Duplan et al., 2025; and ongoing work). More recently, we investigated how biochar and beneficial root symbioses interact with drought priming in poplar. These studies form the basis of long-term research frameworks and national programs, including EPIMYC (ANR-24-CE20-5751) and the PEPR Agroecology & Digital initiative (ANR-24-PEAE-0001). Ultimately, our goal is to integrate omics layers to build predictive models of priming responsiveness and epigenetic plasticity, enabling identification of biomarkers and management-ready diagnostic tools to guide climate-adaptive forestry. References 1. Trontin, J.F., Sow, M.D., Delaunay, A., Modesto, I., Teyssier, C., Reymond, I., Canlet, F., Boizot, N., Le Metté, C., Gibert, A., Chaparro, C., Daviaud, C., Tost, J., Miguel, C., Lelu-Walter, M.A., & Maury, S. 2025. Epigenetic memory of temperature sensed during somatic embryo maturation in 2-yr-old maritime pine trees. Plant Physiology, 197(2), kiae600. https://doi.org/10.1093/plphys/kiae600 2. Duplan, A., Feng, Y.Q., Laskar, G., Cai, B.D., Segura, V., Delaunay, A., Le Jan, I., Daviaud, C., Toumi, A., Laurans, F., Sow, M.D., Rogier, O., Poursat, P., Duruflé, H., Jorge, V., Sanchez, L., Cochard, H., Allona, I., Tost, J., Fichot, R., & Maury, S. 2025. Drought induced epigenetic memory in the cambium of poplar trees persists and primes future stress responses. bioRxiv 2025.10.14.681991. https://doi.org/10.1101/2025.10.14.681991

    ano.nymous@ccsd.cnrs.fr.invalid (Stéphane Maury) 13 Feb 2026

    https://hal.science/hal-05509112v1
  • [hal-05521725] Modeling breeding programs considering social behavior in large groups of farmed fish

    <div><p>Breeding programs are essential in aquaculture, improving economically and environmentally important traits. In aquaculture systems, animals are raised in large groups, where social interactions are frequent and can influence individual performance. In these circumstances, indirect genetic effects can play an important role in the response to selection, and consequently, their effects on selection outcomes must be analyzed.</p><p>This study aimed to evaluate the implications of heterogeneous social interaction effects on fish breeding programs using stochastic simulations. We simulated a fish breeding program with 2000 selection candidates from 1000 families formed by a partial mating design of 100 males and 100 females. Social interactions were simulated, affected by the target phenotype and two latent-personality traits. We investigated how genetic gains and phenotypic variances are affected by the magnitude and direction of social interaction effects on the target phenotype, different selection strategies, and the genetic correlations between the target phenotype and personality traits. Our results showed that increased social interaction effects lead to greater phenotypic variability in the target trait. Under mass selection, the genetic means of personality traits change, and these changes depend on the strength and direction of genetic correlations between the focal and personality traits. Conversely, group selection did not increase phenotypic variability but reduced genetic gain for the focal trait compared to mass selection. Moreover, group selection did not alter the genetic means of personality traits. However, this approach increased the rate of inbreeding per generation, which could be mitigated by optimizing the number of families per group.</p><p>.</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Gabriel Rovere) 21 Feb 2026

    https://hal.science/hal-05521725v1
  • [hal-05500190] Epigenetic regulation of mycorrhizal symbioses: from plastic responses to transgenerational legacies

    Mycorrhizal symbioses represent one of the most widespread and ecologically significant plant–microbe interactions, shaping plant nutrition, stress resilience, and ecosystem functioning. Beyond their role in nutrient exchange and systemic defense, growing evidence suggests that these symbioses also influence plant plasticity within and across generations through epigenetic regulation. These mechanisms operate throughout the mutualistic interaction, from fungal recognition and root colonization to symbiosis functioning, by regulating gene networks that control signaling, defense suppression, and nutrient exchange. By integrating environmental cues into potentially heritable gene regulatory states, epigenetic regulation fine‐tunes within‐generation responses and may also contribute to effects across generations, thereby influencing adaptation and resilience. The extent of mycorrhiza‐induced epigenetic inheritance likely depends on the host's reproductive strategy and lifespan. Clonal propagation and shorter‐lived hosts tend to preserve epigenetic marks, whereas sexual reproduction and longer‐lived species show partial resetting. This contrast shapes offspring performance, ecological interactions, and evolutionary trajectories. Here, we synthesize current knowledge on the epigenetic regulation of mycorrhizal symbioses, draw parallels with other plant–microorganism interactions (including plant–pathogens and plant–endophytes), highlight its role in within‐generation plasticity and propose a potential role across generations. We outline future research directions to disentangle the stability, ecological relevance, and evolutionary significance of mycorrhiza‐mediated epigenetic inheritance.

    ano.nymous@ccsd.cnrs.fr.invalid (Gerson Beltrán-Torres) 09 Feb 2026

    https://hal.inrae.fr/hal-05500190v1
  • [hal-05494492] Modelling and predicting soil microbial communities at large spatial scale based on metagenomic dimensionality reduction

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Emna Stambouli) 05 Feb 2026

    https://inria.hal.science/hal-05494492v1
  • [hal-05572258] AgroEcoPhen

    L’agroécologie repose sur la mobilisation de la diversité biologique afin d’améliorer la résilience des agrosystèmes et fournir des services écosystémiques. Sa mise en œuvre nécessite des outils pour évaluer performance et stabilité de ces agrosystèmes, en tenant compte des interactions biotiques et abiotiques. Les technologies émergentes (capteurs, IoT, IA) permettent de collecter des données à haute résolution pour mieux comprendre et prédire ces systèmes.

    ano.nymous@ccsd.cnrs.fr.invalid (Tania Rougier) 30 Mar 2026

    https://hal.inrae.fr/hal-05572258v1
  • [hal-05496194] WAIT4 : Intelligence artificielle et nouvelles technologies pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis de la transition agroécologique - contribution au continuum numérique

    Améliorer le bien-être animal est indispensable pour construire des systèmes alimentaires durables. Les agroéquipements (capteurs, caméras, automates) associés à l’intelligence artificielle (IA), peuvent permettre d’évaluer le bien-être des animaux et des troupeaux en temps réel. Ceci est particulièrement utile face aux défis posés par le changement climatique et les transitions agroécologiques des systèmes d’élevage, afin de disposer d’outils et méthodes pour anticiper les risques et agir efficacement.

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 05 Feb 2026

    https://hal.inrae.fr/hal-05496194v1
  • [hal-05512364] The Agricultural Soil Digital Twin : A Key Tool for Agroecological Transition

    <div><p>Effective protection requires better soil management, and better management begins with clarity : we need to bring within everyone's reach how soils function (Thorsøe et al., 2023).</p><p>• The Challenge : Modeling soil functioning is hindered by high spatial and temporal variability, non-linear interactions, and computational barriers (Ilić et al., 2025). • Problematic : How can Digital Twin frameworks accurately model complex soil dynamics to support robust agroecological decision-making while overcoming computational limitations?</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Aziz Hafsia) 23 Feb 2026

    https://hal.science/hal-05512364v1
  • [hal-05514911] BReIF: une e-infrastructure pour accélérer l'utilisation de ressources biologiques diversifiées

    La caractérisation des ressources génétiques génère des quantités massives de données de nature très diverses qu’il faut analyser, gérer, rendre réutilisable et intégrer pour les transformer en connaissances mobilisables.

    ano.nymous@ccsd.cnrs.fr.invalid (Anne-Françoise Adam-Blondon) 17 Feb 2026

    https://hal.inrae.fr/hal-05514911v1
  • [hal-04603038] Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm: how does it work and how can we improve it?

    Deep learning models for plant species identification rely on large annotated datasets. The PlantNet system enables global data collection by allowing users to upload and annotate plant observations, leading to noisy labels due to diverse user skills. Achieving consensus is crucial for training, but the vast scale of collected data makes traditional label aggregation strategies challenging. Existing methods either retain all observations, resulting in noisy training data or selectively keep those with sufficient votes, discarding valuable information. Additionally, as many species are rarely observed, user expertise can not be evaluated as an inter-user agreement: otherwise, botanical experts would have a lower weight in the AI training step than the average user. Our proposed label aggregation strategy aims to cooperatively train plant identification AI models. This strategy estimates user expertise as a trust score per user based on their ability to identify plant species from crowdsourced data. The trust score is recursively estimated from correctly identified species given the current estimated labels. This interpretable score exploits botanical experts' knowledge and the heterogeneity of users. Subsequently, our strategy removes unreliable observations but retains those with limited trusted annotations, unlike other approaches. We evaluate PlantNet's strategy on a released large subset of the PlantNet database focused on European flora, comprising over 6M observations and 800K users. We demonstrate that estimating users' skills based on the diversity of their expertise enhances labeling performance. Our findings emphasize the synergy of human annotation and data filtering in improving AI performance for a refined dataset. We explore incorporating AI-based votes alongside human input. This can further enhance human-AI interactions to detect unreliable observations.

    ano.nymous@ccsd.cnrs.fr.invalid (Tanguy Lefort) 06 Dec 2024

    https://hal.science/hal-04603038v2
  • [hal-05511164] Des réseaux de neurones sur graphes auto-explicatifs basés sur la logique

    Les graphes sont des structures complexes et non euclidiennes qui nécessitent des modèles spécialisés comme les réseaux de neurones sur graphes (Graph Neural Networks, GNNs) pour capturer efficacement les motifs relationnels associés à la variable de classe. Cette complexité intrinsèque rend particulièrement difficile l’explication des décisions prises par les GNNs. La plupart des méthodes actuelles d’intelligence artificielle explicable (XAI) appliquées aux GNNs se concentrent sur l’identification de nœuds influents ou l’extraction de sous-graphes pertinents, sans toutefois clarifier comment ces éléments contribuent réellement à la prédiction finale. Pour dépasser cette limite, les approches à base logique visent à dériver des règles explicites reflétant le raisonnement du modèle. Cependant, les méthodes logiques existantes demeurent majoritairement post-hoc et se limitent à la classification de graphes, laissant un manque important en matière d’architectures intrinsèquement explicables. Dans cet article, nous intégrons le raisonnement logique directement au sein du modèle d’apprentissage sur graphes. Nous introduisons LogiX-GIN, une nouvelle architecture de GNN auto- explicable qui incorpore des couches logiques afin de produire des règles logiques interprétables au cœur même du processus d’apprentissage. Contrairement aux approches post-hoc, LogiX-GIN fournit des explications transparentes, fidèles et cohérentes avec les calculs internes du modèle. Évalué sur plusieurs tâches basées sur des graphes, LogiX-GIN atteint des performances prédictives compétitives tout en explicitant son processus décisionnel. Ces travaux ont été acceptés à NeurIPS 2025

    ano.nymous@ccsd.cnrs.fr.invalid (Alessio Ragno) 14 Feb 2026

    https://hal.science/hal-05511164v1
  • [hal-05558414] Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

    Information retrieval with compact binary codes, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pre-trained encoders that produce rich embeddings. We revisit classical, training-free hashing techniques-principal component analysis, random orthogonal projection, and threshold binarization-to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.

    ano.nymous@ccsd.cnrs.fr.invalid (Ilyass Moummad) 18 Mar 2026

    https://inria.hal.science/hal-05558414v1
  • [hal-05506052] CIP-Net: Continual Interpretable Prototype-based Network

    Continual learning constrains models to learn new tasks over time without forgetting what they have already learned. A key challenge in this setting is catastrophic forgetting, where learning new information causes the model to lose its performance on previous tasks. Recently, explainable AI has been proposed as a promising way to better understand and reduce forgetting. In particular, self-explainable models are useful because they generate explanations during prediction, which can help preserve knowledge. However, most existing explainable approaches use post-hoc explanations or require additional memory for each new task, resulting in limited scalability. In this work, we introduce CIP-Net, an exemplar-free self-explainable prototype-based model designed for continual learning. CIP-Net avoids storing past examples and maintains a simple architecture, while still providing useful explanations and strong performance. We demonstrate that CIP-Net achieves state-of-the-art performances compared to previous exemplar-free and self-explainable methods in both task-and class-incremental settings, while bearing significantly lower memory-related overhead. This makes it a practical and interpretable solution for continual learning.

    ano.nymous@ccsd.cnrs.fr.invalid (Federico Di Valerio) 11 Feb 2026

    https://hal.science/hal-05506052v1
  • [hal-05446899] Animating the transition: How agriculture 5.0 revitalises agroecological principles

    Agriculture is undergoing a rapid digital transformation that challenges its ecological, social, and ethical foundations. This study explores how the transition from two revolutions, from Agriculture 4.0 (A4.0) to Agriculture 5.0 (A5.0), redefines the relationship between technology and agroecology. The dominant approach of A4.0, driven by automation, big data, and artificial intelligence, has enhanced efficiency but missed many agroecological principles, mainly those contributing to secure social equity and responsibility. Emerging as a corrective paradigm, A5.0 seeks to integrate technological progress with agroecological principles that value the social and human dimension. Adopting a scoping review following PRISMA-ScR guidelines, scientific publications indexed in Scopus and CABI up to October 2025 were screened and coded to assess how current A5.0 research embeds the thirteen agroecological principles defined by the High-Level Panel of Experts in 2019. A total of 136 documents were analysed through bibliometric and thematic synthesis. Results show that A5.0 represents a philosophical and structural evolution beyond the efficiency-oriented logic of A4.0, integrating distributed computing, explainable artificial intelligence, digital twins, and collaborative robotics within ecologically restorative and socially inclusive frameworks. However, while A5.0 strengthens resource efficiency, resilience, and certain social segments through open-source technologies and participatory design, gaps remain in policy coherence, emotional engagement, and human-machine co-learning. To address these, the study proposes two complementary agroecological principles, cognitive symbiosis and emotional ecology, emphasising shared intelligence and affective stewardship between humans, machines, and ecosystems. Overall, Agriculture 5.0 reframes digitalisation as a human-ecological partnership that can operationalise agroecology's ethical goals if governed by inclusion, transparency, and regeneration rather than control and optimisation.

    ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 07 Jan 2026

    https://hal.science/hal-05446899v1
  • [hal-05447092] MetaNetMap: automatic mapping of metabolomic data onto metabolic networks

    Metabolic networks represent genome-derived information about the biochemical reactions that cells are capable of performing. Mapping omic data onto these networks is important to refine model simulations. However, metabolomic data mapping remains very challenging due to difficulties in identifier reconciliation between annotation profiles and metabolic networks. MetaNetMap is a Python package designed to automatise the process of mapping metabolomic data onto metabolic networks. It includes several layers of identifier matching, the use of customisable databases, and molecular ontology integration to suggest the most matches between experimentally-identified metabolites and molecules defined in the network.

    ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 07 Jan 2026

    https://inria.hal.science/hal-05447092v1
  • [hal-05410799] Data Paper: HotPig, a behavioural dataset of pigs under heat stress

    The widespread use of videos in modern indoor livestock facilities coupled with the availability of efficient and low-cost computer vision algorithms provides strong incentives for continuously monitoring farm animal behaviour. Deciphering how pigs behave when experiencing prolonged heat stress is particularly important for animal welfare, as it helps us to better understand how animals use various thermoregulation and heat dissipation mechanisms. Data were collected on 24 pigs that were video-monitored day and night under two contrasted conditions: thermoneutral (TN, 22 °C) and heat stress (HS, 32 °C). All pigs were housed individually and had free access to an automatic feeder delivering pellets four times a day, and to water. After acquisition, videos were processed using YOLOv11, a real-time object detection algorithm that uses a convolutional neural network (CNN), to extract the following behavioural traits: drinking, willingness to eat, lying down, standing up, moving around, curiosity towards the littermate housed in the neighbouring pen, and contact between the two animals (cuddling). A minute frequency sampling rate was applied (each minute corresponds to 150 frames processed) for a continuous period of 16 days, spanning the two different thermal conditions (9 days on TN, 6 days on HS, 1 day back to TN). Consistency with the automatic electronic feeder’s data (also provided) was thoroughly checked. The dataset allows quantitative criterion to be analysed to decipher inter-individual differences in animal behaviour and their dynamic adaptation to heat stress. This dataset can be used to train any machine learning methods for behaviour prediction from videos in conventional growing pigs.

    ano.nymous@ccsd.cnrs.fr.invalid (Louis Bonneau de Beaufort) 11 Dec 2025

    https://hal.inrae.fr/hal-05410799v1
  • [hal-05348017] Measuring shade use of dairy cattle at pasture with an on-cow light sensor: a case study

    Grazing cows preferentially access shade to shield against the sun. However, the conditions that provide cows with optimal shade access and use (e.g. no competition for access to shade) are still unknown. Continuous monitoring of shade use by grazing cattle could help to understand how and when cows use shade resources. The aim of this study was to validate a method based on a light sensor (HOBO Pendant MX2202) attached to the back (on the transverse processes of the lumbar vertebrae) of 7 dairy cows at pasture to continuously record their use of natural shade for research purposes. Live behavioral observations of shade use and cow posture were recorded in summer (June to September, between 9 am and 6 pm). Based on the behavioral observation data, we determined thresholds in lux to discriminate between cows in shade and cows in sun on a randomly-generated training dataset representing 15 % of the initial dataset. This process was repeated 100 times, generating 100 thresholds and threshold performances. Data loss due to sensor loss or battery discharge was 9 %, which is acceptable. The thresholds ranged from 15,688 to 40556 lx: sensitivity ranged from 92.0 % to 99.8 % and specificity ranged from 88.7 % to 99.9 %, showing that the performances were robust to threshold variation within this range. This study demonstrates that an efficient threshold to discriminate cows in shade from cows in the sun can be determined via a relatively short (about 12 h) series of live observations. As performances seem to be slightly lower for lying cows than for standing cows (mean false-positive rate is 7.4 % for lying cows versus 1.8 % for standing cows), future studies should consider the posture (which can also be monitored continuously with other sensors such as accelerometer installed on the legs or on the neck collar of the cows).

    ano.nymous@ccsd.cnrs.fr.invalid (Lydiane Aubé) 05 Nov 2025

    https://hal.inrae.fr/hal-05348017v1
  • [hal-05495907] De nouveaux alliés du bien-être des animaux confrontés aux défis des transitions agroécologiques et climatiques

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 05 Feb 2026

    https://hal.inrae.fr/hal-05495907v1
  • [hal-05559686] A preliminary analysis of the insurability of rapeseed crops grown without insecticides

    This study examines the possibility of agronomic and insurance mechanisms to support the phasing out of insecticides. Insurability is analyzed by comparing a conventional rapeseed crop with a "robust rapeseed", cultivated according to specific guidelines regarding planting and fertilization practices, with or without insecticides. The evaluation combines technical expertise regarding the impact of pests and the ability of the resilient rapeseed to limit their effects, regional yield distributions, and data from the “Bulletin de Santé du Végétal”, in order to calibrate a model of losses due to pests. Yield losses without insecticide treatment are simulated using a copula-based approach, allowing for the analysis of the combined effects of different biological threats. The results indicate that, in the absence of insecticides, a conventional rapeseed crop subject to a 20% deductible would result in more than double the climate insurance premium (+130%), while adhering to the guidelines for "robust rapeseed" would limit this increase to +32%. The levels of premiums and deductibles suggest that a targeted subsidy for resilient rapeseed without insecticides could compensate for losses compared to the conventional system. However, resilient rapeseed with insecticides remains more economically advantageous, implying that an increase in the current subsidy would be necessary to equalize the deductibles.

    ano.nymous@ccsd.cnrs.fr.invalid (Pablo Yepes Llano) 21 Mar 2026

    https://hal.inrae.fr/hal-05559686v2
  • [hal-05531250] WAIT4: Bien-être animal et IA, un combo gagnant

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 28 Feb 2026

    https://hal.inrae.fr/hal-05531250v1
  • [hal-05577186] Phylogeny-driven pangenome analysis uncovers the genomic landscape of domesticated and wild Armeniaca species

    Long-read sequencing and pangenomics are revolutionizing crop research by providing more complete genome information and revealing crucial structural variations (SVs) linked to important agricultural traits. Building on recent advances in intraspecific pangenome construction, this study addresses the challenge of creating broader, cross-taxon pangenomes, using the Armeniaca taxonomic section as a model. Leveraging a diverse panel of genome assemblies as well as completing it with seven more genome assemblies generated for this study, we constructed a pangenome graph and cataloged the associated genetic variation, identifying approximately 25 million single nucleotide polymorphisms and over 537 000 structural variants. We characterized the diversity of these variants and assessed the extent to which different taxa contribute to overall pangenome expansion. Additionally, we evaluated the performance of low-depth sample mapping to the graph-based reference, highlighting key technical limitations that may affect the quality of downstream analyses. We further identified specific subsets of SVs that exhibit associations with particular classes of transposable elements (TEs). We showed that TEs are a major driver of SV, particularly insertions and deletions, with distinct size and distribution patterns (peaking in the 200- to 400-bp indel bin). They are also nonrandomly positioned in the genome, showing a tight concentration near coding genes, which suggests a role in gene regulation. As a case study illustrating the potential functional relevance of graph-derived SVs, we examined the genomic configuration of the Dormancy-Associated MADS box locus within the Armeniaca pangenome. These findings provide a framework to investigate adaptation in perennial fruit trees of the Armeniaca section.

    ano.nymous@ccsd.cnrs.fr.invalid (Ismael Blanchard) 03 Apr 2026

    https://hal.science/hal-05577186v1
  • [hal-05444004] Les technologies numériques en élevage : de la mesure à l’évaluation comportementale du bien-être de chaque animal

    Le bien-être des animaux est une notion difficile à définir car se référant à un phénomène complexe, intrinsèquement liée à la perception qu’a l’individu de son environnement. Ne pouvant être mesuré directement, le bien-être est évalué à partir de la détermination et la quantification d’indicateurs spécifiques. Ces indicateurs, dont les variations sont associées à différents états de bien-être, doivent être combinés en fonction du contexte d’évaluation. Le comportement animal, reconnu comme une des clés pour l’évaluation du bien-être, peut changer face aux variations de l’environnement d’élevage, telles que l’accès au pâturage, influençant à la fois la routine et la dynamique de l’occupation de l’espace des animaux. L'analyse de ces changements comportementaux permet de définir de nouveaux indicateurs, facilitant l’évaluation de l’impact positif ou négatif de ces modifications environnementales sur le bien-être des animaux. L’intégration des technologies de capteurs, de modèles mathématiques et de l’intelligence artificielle ouvre de nouvelles perspectives pour un suivi longitudinal des activités, des dynamiques spatiales et d’autres paramètres d’intérêt tout au long du cycle de vie des animaux. Par exemple, les algorithmes de classification supervisée ont permis d’associer les données brutes fournies par des capteurs aux comportements d’intérêt, tandis que les algorithmes non supervisés devraient révéler de nouveaux indicateurs en lien avec le bien-être des animaux. Cet article met en lumière les opportunités offertes par les technologies numériques émergentes. Nous nous concentrons sur l’évaluation comportementale et son rôle crucial dans l’évaluation du bien-être, en présentant trois études de cas : 1) pour distinguer les problèmes liés à la santé, au stress thermique et à la reproduction chez les vaches laitières, 2) pour prévoir la boiterie chez la vache laitière et 3) pour étudier des émotions chez les porcs. Enfin, nous soulignons l’importance d’une collaboration interdisciplinaire étroite entre éthologistes, physiologistes, mathématicien(ne)s et informaticien(ne)s pour favoriser le développement de ce domaine émergent que nous désignons sous le terme d’« éthologie numérique ».

    ano.nymous@ccsd.cnrs.fr.invalid (Masoomeh Taghipoor) 06 Jan 2026

    https://hal.inrae.fr/hal-05444004v1
  • [hal-05527279] AGROECOPHEN - Agro-équipements et numérique pour une protection durable des cultures

    La caractérisation des plantes et de leur environnement change d'échelle avec le développement des outils et méthodes de phénotypage numérique des plantes. Les outils de phénotypage et d’envirotypage, sont plus accessibles, permettant une observation multi-échelle de la plante – voire de la molécule – à l’observation multi-parcelles via le satellite. Parallèlement, les données nombreuses sont plus accessibles et plus facilement valorisables via leur indexation dans des systèmes d’information ouverts. Leur traitement est facilité par des outils de plus en plus nombreux, notamment avec l’aide de l’intelligence artificielle, facilitant ainsi l’analyse par l’opérateur. Les données environnementales et phénotypiques peuvent être combinées dans des modèles prédictifs ou d’aide à la décision. Fort de ces données, ce sont autant d’atouts pour accélérer le déploiement d’une agriculture résiliente, durable et de l’agroécologie.

    ano.nymous@ccsd.cnrs.fr.invalid (Tania Rougier) 25 Feb 2026

    https://hal.inrae.fr/hal-05527279v1
  • [hal-05419350] MetaNetMap: automatic mapping of metabolomic data onto metabolic networks

    MetaNetMap is a Python tool dedicated to mapping metabolite information between metabolomic data and metabolic networks. The goal is to facilitate the identification of metabolites from metabolomic data that are present in one or more metabolic networks to facilitate further modelling, taking into consideration that data from the former likely has distinct identifiers from the latter.

    ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 17 Dec 2025

    https://inria.hal.science/hal-05419350v1
  • [hal-05476560] Deriving breeding goals and expected selection responses to reduce environmental impacts in rainbow trout farming

    Background With growing societal concerns about the sustainability of food production systems, there is increasing interest in considering not only economic gains but also environmental issues in breeding programs of farmed species. In this study, we compared expected selection responses for breeding programs aiming to minimize environmental impacts of the production of rainbow trout in France, one of the most important fish species in salmonid aquaculture. The consequences of genetic improvement based on environmental merit indices were investigated in a hypothetical rainbow trout production farm with a constant annual production of 300 tonnes of fish. The merit indices included three different traits: thermal growth coefficient (TGC), daily feed intake (DFI), and survival (SR). A cradle-to-farm-gate life cycle assessment was conducted to evaluate the environmental values of each trait, which served as weightings in breeding goals aiming at minimizing expected environmental impacts by genetic selection. We explored nine different environmental impact categories: climate change, terrestrial acidification, freshwater eutrophication, marine eutrophication, terrestrial ecotoxicology, freshwater ecotoxicology, land use, water dependence, and cumulative energy demand. Results Selection accuracy ranged from 0.34 to 0.43, with the lowest accuracy observed for the breeding goal targeting reduced water dependence, and the highest for those targeting reductions in eutrophication and terrestrial ecotoxicity. Annual genetic gains in reductions of environmental impacts, expressed per tonne of trout, were high for reducing eutrophication potential (− 6.80 to − 2.61%) and terrestrial ecotoxicity (− 4.14 to − 1.59%), but negligible for water use reduction (− 0.04 to − 0.01%). Genetic changes in DFI and TGC led to substantial annual gains in feed conversion ratio, from 1.7 to 4.8%. However, SR showed no improvement and often declined, highlighting the difficulty of balancing genetic gains across traits. Conclusions We demonstrated the benefits of using environmental values in breeding goals to minimize environmental impacts at the farm level, while maintaining high genetic gains in feed efficiency traits. Nevertheless, we also showed that selection efficiency was highly dependent of the impact category. Our results suggested that another selection strategy should be considered to avoid unfavourable consequences on SR.

    ano.nymous@ccsd.cnrs.fr.invalid (Simon Pouil) 26 Jan 2026

    https://hal.inrae.fr/hal-05476560v1
  • [hal-05435147] On Logic-based Self-Explainable Graph Neural Networks

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Alessio Ragno) 30 Dec 2025

    https://hal.science/hal-05435147v1
  • [hal-04997560] Data paper: A goat behaviour dataset combining labelled behaviours and accelerometer data for training Machine Learning detection models

    This paper presents a dataset of accelerometer data and corresponding video-annotated behaviours from eight indoor dairy Alpine goats. Animals were equipped with 3D-accelerometers attached to their ears for 24 consecutive hours and recorded at a frequency of 5 Hz. Video recordings for this period were also obtained. Activities associated with positional, feeding and social behaviours were annotated over two daylight periods, for a total of 11 hours per goat, by a trained observer assuring high precision and consistency. This dataset can be used independently or complement an existing dataset for training supervised Machine Learning models for the detection of goat behaviour. It contributes to improving the robustness of such models by incorporating behavioural signals specific to indoor-housed goats.

    ano.nymous@ccsd.cnrs.fr.invalid (Sarah Mauny) 19 Mar 2025

    https://hal.inrae.fr/hal-04997560v1
  • [hal-05571656] Reconnaissance d’espèces végétales dans les cultures en mélange

    Cet outil facilite l’analyse automatique des cultures mélangées à grande échelle, ce qui était auparavant fastidieux à observer à l’œil nu, et ouvre deux perspectives majeures : La sélection de variétés de céréales et de légumineuses adaptées aux cultures associées (ex. luzerne couvrante mais peu concurrente). L’aide à la conduite des cultures, en identifiant des indicateurs performants (ex. l’écart de hauteur entre espèces) pour réguler la compétition dans le couvert.

    ano.nymous@ccsd.cnrs.fr.invalid (Solene Sourdille) 30 Mar 2026

    https://hal.inrae.fr/hal-05571656v1
  • [hal-05514771] Interopérabilité des données en sciences de la vie : contexte, ressources et cas d’utilisation

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Michaël Alaux) 17 Feb 2026

    https://hal.science/hal-05514771v1
  • [hal-05264391] Method: An accurate method for detecting drinking bouts in dairy cows based on reticulorumen temperature

    This study evaluated the performances of three methods for detecting drinking bouts in dairy cows using reticulorumen temperature (RT): the 'FixT' method based on a fixed RT threshold, the 'Cow-dT' method based on a cow-day-specific RT threshold, and the 'FallST' method based on RT fall slope. We observed the drinking behaviours of 28 dairy cows equipped with reticulorumenal sensors over 96 h to create a reference dataset. A total of 730 drinking bouts were observed. We matched detected drinking bouts against observed drinking bouts to obtain the number of true-positives, false-negatives, and falsepositives, and then calculated the detection performances of the three methods in terms of sensitivity (Se), positive predictive value (PPV), and F-score. The performances of the three RT-based methods (Se ≥ 90%, PPV > 96% and F-score ≥ 93%) were better than those from previous work using collarattached accelerometers, but slightly lower than methods using drinking troughs connected to electronic identification systems or methods combining accelerometers with geomagnetic sensors or with ultrawideband location. The FallST method showed slightly better performance (highest F-score) than the FixT and Cow-dT methods. The FallST method accurately detected drinking bouts lasting more than 30 s and at least 30 min apart, with a detection time accuracy of 10 min. The models using RT curve parameters failed to predict characteristics of the drinking bouts. In conclusion, the method developed here can accurately detect drinking bouts in dairy cows using RT, but without further characterisation of the drinking bouts (e.g. duration).

    ano.nymous@ccsd.cnrs.fr.invalid (L. Aubé) 17 Sep 2025

    https://hal.inrae.fr/hal-05264391v1
  • [hal-05385353] WAIT4 – un projet de recherche alliant technologies numériques et IA pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis des transitions agroécologique et climatique

    Le projet WAIT4 exploite les opportunités offertes par les technologies numériques pour mesurer différentes composantes du bien-être animal en temps réel ; il met en œuvre des approches d’IA pour intégrer les données hétérogènes, par nature et en temporalité, qui sont ainsi collectées. L’objectif est de définir de nouveaux indicateurs et la fréquence pertinente avec laquelle les mesurer, afin d’identifier les variations du bien-être de l’animal. Différentes espèces (porcins, petits et gros ruminants), en systèmes conventionnels, biologiques ou agropastoraux, et sous des climats contrastés sont abordées. L’ambition est de détecter des déviations précoces des changements de bien-être et de santé en réponse à des changements de pratiques et face aux aléas climatiques. Le projet met en œuvre des actions concertées associant des d'instituts français de recherche (INRAE, CEA, INRIA, INSA), et un dialogue avec les porteurs d’enjeux grâce à l’appui du LIT Ouesterel pour faciliter l’appropriation et la diffusion des résultats. Le projet WAIT4 (2023-2027), coordonné par INRAE, est financé par France 2030 dans le cadre du PEPR Agroécologie et Numérique.

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 27 Nov 2025

    https://hal.inrae.fr/hal-05385353v1
  • [hal-05451323] Detecting signatures underlying the composition of biological data

    Biological compositional data is inherently multidimensional and therefore difficult to visualize and interpret. To allow for the automatic decomposition of large compositional data and to capture gradients in co-occurring features, called signatures, we developed a new software package 'cvaNMF'. Our benchmarks on synthetic data show the effectiveness of cross-validation and our novel signature-similarity method to identify a suitable decomposition using non-negative matrix factorization (NMF). This software provides a complete set of tools to identify and visualize biologically informative signatures which we demonstrate in a wide range of microbial and cellular datasets: 'Enterosignatures' detected in gut metagenomes differentiated human hosts with diverse diseases; five 'terrasignatures' from rhizosphere metagenomes differentiated root-or soil-associated microbiomes, while being refined enough to infer geographic distances between plants. Large-scale data from 13,000 metagenomes representing 25 biomes were decomposed into environmental and host-associated microbiomes based on five newly discovered signatures. Finally, analysis of the cell composition of non-small cell lung cancer samples allowed separation of cancerous and inflamed tissues based on four cell-type signatures.

    ano.nymous@ccsd.cnrs.fr.invalid (Anthony Duncan) 09 Jan 2026

    https://inria.hal.science/hal-05451323v1
  • [hal-05469230] cMFA for multi-omics data integration in microbial community models

    Understanding microbial community functions is challenging because of complex interactions and assembly mechanisms. However, recent advances in sequencing technologies have enabled the collection of multi-omics time-series data at the community scale, including population abundances as well as metabolomic and metatranscriptomic measurements. The main objective of this work is to develop a modeling framework capable of integrating such multi-omics time-series data to infer metabolic activity at the community level. We introduce a method called community Metabolic Flux Analysis (cMFA), which extends classical metabolic flux analysis to microbial communities. The approach relies on experimentally measured time-series data describing metabolite production and consumption rates, as well as microorganism growth. The goal is to infer, for each member of the microbial community, the distribution of intracellular metabolic fluxes that is consistent with these observations. The inference problem is formulated as a constrained regression problem in which predicted exchange fluxes are fitted to experimental measurements. The model incorporates biological constraints, including mass conservation at the intracellular level and bounds on metabolic fluxes. Additional information from metatranscriptomic data is integrated through a regularization term that guides the inference toward biologically plausible solutions. The main challenge lies in accurately recovering latent intracellular fluxes from a limited number of extracellular measurements. The cMFA method was evaluated using synthetic data generated from dynamic models of microbial communities of increasing complexity. These models were based on metabolic networks of different Escherichia coli mutants simulated using dynamic flux balance analysis. Synthetic metatranscriptomic data were derived from the internal fluxes of the dynamic models. Several regularization strategies were tested, including different sparsity levels, and multiple benchmarks were used to assess robustness. These benchmarks evaluated the sensitivity of the method to measurement noise, incomplete metatranscriptomic data, inaccurate prior knowledge of metabolite uptake rates, and increasing community size. Ongoing work focuses on applying the method to real experimental datasets, including denitrification processes and cheese production systems.

    ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 21 Jan 2026

    https://hal.science/hal-05469230v1
  • [hal-05380224] NINSAR Project: Defining Agroecological Routes Using Robots

    The poster presents the doctoral research of Mohammad Naim, conducted within the French national project NINSAR (New ItiNerarieS for Agroecology using cooperative Robots), and outlines how the thesis contributes to this broader research programme. The NINSAR project, as framed in the poster title and structure, is positioned as a national effort to define agroecological routes using robotics, integrating technological innovation with ecological, social, and economic sustainability goals. Within this context, the thesis investigates how autonomous agricultural systems can be designed, evaluated, and adopted without compromising core agroecological principles. The thesis analyzes the transition from Agriculture 4.0 to Agriculture 5.0 through the thirteen agroecological principles defined by the High Level Panel of Experts, assessing how emerging robotic and data-driven systems can support more sustainable production models. It evaluates three major categories of robotic field operations (data collection, soil and crop management, and navigation/communication) and links them to four principle-level agroecological indicators, finding strong contributions to soil health and synergy and weaker support for recycling. The work also conducts an empirical study of French farmers using the Technology Acceptance Model 2, identifying perceived usefulness as the central predictor of adoption, complemented by ease of use and social influence. A complementary technical study clusters 71 agricultural robots into five functional categories, illustrating the increasing specialization of robotic platforms and cost differences between electric and endothermic systems. The thesis further extends to the economic and industrial dimension of the NINSAR project by engaging manufacturers through semi-structured interviews to construct business model canvases aimed at identifying viable pathways for scaling agroecological robots. Taken together, the poster shows that Naim’s thesis forms a core component of NINSAR by integrating agronomic, technological, social, and economic analyses to support the development of robotics aligned with agroecological transition goals.

    ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 24 Nov 2025

    https://hal.science/hal-05380224v1
  • [hal-05444605] Investigating pre-assembly clustering of HiFi reads for de novo assembly of complex metagenomes

    Despite advancements in sequencing technologies, metagenome assembly in taxonomically rich ecosystems remains challenging. Due to the abundance of low-coverage species, many regions in the assembly graph either lack coverage, are too complex, or present a combination of both factors. Clustering reads prior to assembly reduces complexity, but also decreases coverage within each cluster. While effective in improving short-read assemblies in proof-of-concept studies, it has not been widely adopted. In this work, we investigate whether upstream clustering of PacBio HiFi long reads improves assembly quality. To demonstrate the potential of this approach, we simulated an ideal read clustering by comparing the assembly of individual simulated genomes with that of those same simulated genomes merged within a complex ecosystem containing related species. We found that all genomes were better assembled isolated than within the metagenome.

    ano.nymous@ccsd.cnrs.fr.invalid (Nicolas Maurice) 07 Jan 2026

    https://hal.science/hal-05444605v1
  • [hal-05459304] Coupling microbial communities models with data

    This presentation explores different mathematical models of microbial communities, with a focus on how models are tailored to the specificities of the microbial system and the available data. These models will be showcased in a range of microbial ecosystems including the gut microbiota, a cheese fermentation community, and biofilms. Finally, we will introduce the concept of digital twins for microbial systems, discussing their potential and challenges through concrete examples.

    ano.nymous@ccsd.cnrs.fr.invalid (Simon Labarthe) 15 Jan 2026

    https://hal.inrae.fr/hal-05459304v1
  • [hal-05368332] Modeling the emergent metabolic potential of soil microbiomes in Atacama landscapes

    <div><p>Background Soil microbiomes harbor complex communities from which diverse ecological roles unfold, shaped by syntrophic interactions. Unraveling the mechanisms and consequences of such interactions and the underlying biochemical transformations remains challenging due to niche multidimensionality. The Atacama Desert is an extreme environment that includes unique combinations of stressful abiotic factors affecting microbial life. In particular, the Talabre Lejía transect is a natural laboratory for understanding microbiome composition, functioning, and adaptation.</p></div> <div>Results<p>We propose a computational framework for the simulation of the metabolic potential of microbiomes, as a proxy of how communities are prepared to respond to the environment. Through the coupling of taxonomic and functional profiling, community-wide and genome-resolved metabolic modeling, and regression analyses, we identify key metabolites and species from six contrasting soil samples across the Talabre Lejía transect. We highlight the functional redundancy of whole metagenomes, which act as a gene reservoir, from which site-specific adaptations emerge at the species level. We also link the physicochemistry from the puna and the lagoon samples to metabolic machineries that are likely crucial for sustaining microbial life in these unique environmental conditions. We further provide an abstraction of community composition and structure for each site that allowed us to describe microbiomes as resilient or sensitive to environmental shifts, through putative cooperation events.</p></div> <div>Conclusion<p>Our results show that the study of multi-scale metabolic potential, together with targeted modeling, contributes to elucidating the role of metabolism in the adaptation of microbial communities. Our framework was designed to handle non-model microorganisms, making it suitable for any (meta)genomic dataset that includes high-quality environmental data for enough samples.</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Constanza M Andreani-Gerard) 17 Nov 2025

    https://inria.hal.science/hal-05368332v1
  • [hal-05555031] Genetic and phenotypic diversity of lucerne (Medicago sativa) for optimising its role as a living mulch in agroecological systems

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Zineb El Ghazzal) 16 Mar 2026

    https://hal.inrae.fr/hal-05555031v1
  • [hal-05555008] Designing a lucerne ideotype for use as a living mulch for cash crop production and genetic analysis of key underlying traits

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Zineb El Ghazzal) 16 Mar 2026

    https://hal.inrae.fr/hal-05555008v1
  • [hal-05562717] High throughput phenotyping of cereal x alfalfa mixed crops

    Identifying individual species within mixed or associated cover crops would provide a better understanding of plant behavior in environments that are naturally more varied and stressful than those found in conventional agriculture. Therefore, developing algorithms capable of accurately delimiting each pixel in an image and assigning it to the corresponding species class is essential for accessing traits of varietal interest.

    ano.nymous@ccsd.cnrs.fr.invalid (Mario Serouart) 23 Mar 2026

    https://hal.science/hal-05562717v1
  • [hal-05178193] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

    Serving as a powerful proxy in remote sensing studies, spectral indices can generate meaningful environmental interpretation from either raw or atmospherically corrected spectral data, and characterise and quantify some important properties of various objects on Earth’s surface. However, while numerous spectral indices have been developed over time, since the very launch of civilian satellites until now, some critical issues in their usage, such as comparability, remain scarcely studied, which may lead to incorrect, inconsistent, and unreliable results. In this study, we collected 471 spectral indices of various environment components (vegetation, water, and soil) that might be leveraged for soil studies, and traced their popularity in scientific publications over the past decades. The bibliometric analysis revealed a growing interest and utilisation of spectral indices as Earthobserving satellite technology advanced. Based on both literature and, for sake of complementation and illustration, some targeted regional-scale case studies, we discuss the issues of naming confusion, comparability, applicability, accuracy trade-offs, and reproducibility of using spectral indices. Overall, this overview provides an extensive list of spectral indices, both soil indices and soil-related indices, that can be useful for characterising these environment components by remote sensing. It draws attention to some misuses and confusions that must be avoided to prevent scientific pitfalls. The comparisons between different spectral indices, sensors, and correction methods, highlight the confusing effects that the misuse and non-standardised practices of the spectral indices useful for soil, may have on soil property mapping and monitoring. Insights to the judicious and appropriate usage of spectral indices in the remote sensing of soil are provided.

    ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 24 Jul 2025

    https://hal.inrae.fr/hal-05178193v1
  • [hal-05340010] Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification

    Deep learning models have emerged as a promising alternative to conventional approaches for plant disease identification, a critical challenge in agricultural production. However, the existing plant disease datasets are insufficient to address the complexities of realworld agricultural scenarios, such as multi crop disease, unseen, few-shot, and domain shift adaptation. Additionally, the lack of standardized evaluation protocols and benchmark datasets hinders the fair evaluation of models against these challenges. To bridge this gap, we introduce Deep-Plant-Disease, the largest and most diverse dataset with novel text data designed to enhance model generalization in multi crop disease identification. We revisit and reformulate the task by establishing a standardized evaluation framework that supports consistent benchmarking and guides future research. Through experiments, we further validate the robustness and adaptability of models trained on our dataset, highlighting their effective transferability to real-world agricultural challenges.

    ano.nymous@ccsd.cnrs.fr.invalid (Abel Yu Hao Chai) 31 Oct 2025

    https://inria.hal.science/hal-05340010v1
  • [hal-05478330] Weakly supervised segmentation of leaf symptoms in field conditions

    Background Crop diseases can cause significant yield losses. Deep learning models for computer vision offers powerful tools to enhance human observation of plant disease symptoms, for instance by using segmentation models to mark out foliar symptoms. However, the most common and effective architectures rely on a fully supervised learning that requires numerous, costly and often unavailable, pixel-level annotated images.To overcome this, we focus on weakly supervised segmentation [1]. The principle is to generate segmentation masks from less informative annotations, such as image-level labels, in order to train segmentation models with reduced annotation effort.

    ano.nymous@ccsd.cnrs.fr.invalid (Romane Dubois) 26 Jan 2026

    https://hal.science/hal-05478330v1
  • [hal-05343366] Forest Cover in the Congo Basin: Consistency Evaluation of Seven Datasets

    <div><p>Tropical forests play an essential role in the carbon and water cycles of terrestrial ecosystems, but they are increasingly threatened by human activities and climate change. For places where ground observations are scarce, like in Equatorial Africa, remote sensing is a key source of information for monitoring the temporal and spatial dynamics of forests over large areas. Several Earth Observation-based global maps were developed in recent decades using different definitions of the land-use/land-cover (LULC) classes. While such products are widely used for monitoring land use and planning land management, the consistency of these LULC maps for the Congo Basin has never been analyzed and quantified at the ecosystem level. Here, we selected seven of the most-used global maps and analyzed their consistency over the Congo Basin. After reclassification into forest/non-forest masks and spatial resampling, we assessed the agreement and disagreement percentage across the different tropical ecoregions of Africa, from moist forest to miombo, including savanna. The datasets showed differences in forest area as a function of spatial resolution, with higher forest area levels at coarser resolutions (e.g., from 74.1% to 88.5% forest cover when upscaling the GLCLU from 30 m to 1 km over the Congo Basin). A higher agreement between the datasets was found for forest area over moist forest (between 88.18% and 99.38%) in comparison to savanna (32.82%-99.84%) and miombo (53.83%-99.7%). These discrepancies led to large differences in forest cover, varying from a net loss of 205,704 km 2 to a net gain of 50,726 km 2 over 2001-2019 depending on the dataset used. This study draws attention to the uncertainty associated with these products with regard to forests, particularly in regions of biological importance, such as the miombo and savanna regions, which remain poorly understood. Indeed, the two major uncertainties affecting the quality of LULC products are related to the different spatial resolutions and biological definition of "forest" adopted by each product.</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Solène Renaudineau) 03 Nov 2025

    https://hal.science/hal-05343366v1
  • [hal-05322783] Whole genome sequencing dataset for a Vitis vinifera diversity panel

    Vitis vinifera is a significant agricultural species across continents and a genomic model for perennial crops. A diversity panel of 279 cultivars from the Vassal-Montpellier Grapevine Biological Resources Centre, which represents the diversity of the three main genetics pools of this species, has served as a foundation for genome-wide association studies using genotyping-by-sequencing approaches. Part of this panel (74 cultivars) has recently been sequenced at the whole genome level. Here, we release whole-genome sequencing of the remaining 205 cultivars of the panel, using the short-read NovaSeq6000 S4 PE150 technology to achieve complete genomic coverage. To ensure consistency with prior analyses and confirm genetic identities, we performed variant calling and SNP comparison with previously published data. During this stage, we identified two mislabeled samples, which were excluded from the dataset, resulting in a final set of 72 samples from the public data. Additionally, nine representative cultivars spanning major genetic groups underwent long-read sequencing using PacBio Revio technology. All sequences have been deposited at the ENA under project PRJEB95058 for the short-read data and project PRJEB100755 for the long-reads. Variant data have been deposited in the publicly accessible GIGWA SNP database. This expanded genomic dataset establishes a comprehensive foundation for advanced genomic analyses in V. vinifera, including genome-wide association mapping, structural variant characterization, and genetic diversity assessment. The long-read sequences provide high-quality genomic resources for structural variation analysis and pangenome construction. The integration of short-and long-read sequencing technologies enhances the usefulness of this resource for understanding grapevine genomic architecture and supporting genetic improvement initiatives.

    ano.nymous@ccsd.cnrs.fr.invalid (Gautier Sarah) 20 Oct 2025

    https://hal.science/hal-05322783v1