Publications scientifiques

Accédez à la plateforme HAL pour déposer ou consulter les publications issues des recherches scientifiques du PEPR.

La collection du PEPR est accessible sur https://hal.science/AGROECONUM/ ou en faisant une recherche avancée sur https://hal.science/.

/!\ Pour qu'un dépôt soit bien associé à la collection AgroEcoNum, il est indispensable de renseigner votre code projet ANR (ANR-22-PEAE-XXXX ou ANR-24-PEAE-XXXX).

Phrase de remerciement à mettre dans les publications scientifiques des projets ayant bénéficié de fonds du programme de recherche Agroécologie et Numérique :

Version française :
Ce travail a bénéficié d'une aide de l'État gérée par l'Agence Nationale de la Recherche au titre de France 2030 dans le cadre du PEPR Agroécologie et Numérique et portant la référence « ANR-**-PEAE-**** ».

Version anglaise :
This work received government funding managed by the Agence Nationale de la Recherche under the France 2030 program as part of the Agroecology and Digital research program, reference number “ANR-**-PEAE-****”.

Références par projet :

ADAAPT - ANR-24-PEAE-0001
AgriFutur - ANR-24-PEAE-0002
AgroDiv - ANR-22-PEAE-0005
AGROECOPHEN - ANR-22-PEAE-0012
BIODICAPT - ANR-24-PEAE-0003
BReIF - ANR-22-PEAE-0014
CoBreeding - ANR-22-PEAE-0003
CoEDiTAg - ANR-22-PEAE-0002
EcoControl - ANR-24-PEAE-0004
HOLOBIONTS - ANR-22-PEAE-0006
LINDDA - ANR-22-PEAE-0004
MELICERTES - ANR-22-PEAE-0010
MISTIC - ANR-22-PEAE-0011
NINSAR - ANR-22-PEAE-0007
PATASEL - ANR-22-PEAE-0013
Pl@ntAgroEco - ANR-22-PEAE-0009
TwinFarms - ANR-24-PEAE-0005
WAIT4 - ANR-22-PEAE-0008

HAL : Dernières publications

[hal-05652708] Text-to-MDX: LLM-assisted generation of MDX queries from user questions

MDX (MultiDimensional Expressions) is the standard language for querying multidimensional data in OLAP systems, but its complex syntax poses challenges for non-expert users. While a lot of research has focused on natural language interfaces for SQL, little attention has been given to MDX. This paper explores the potential of Large Language Models (LLMs), specifically GPT-4o, in translating natural language questions into MDX statements. We investigate whether LLMs can act as full MDX query generators or assistants, and study how the writing style of questions affects output correctness. Through four research questions, we evaluate ChatGPT’s basic capabilities and the effectiveness of prompt engineering in improving text-to-MDX performance. Our evaluation confirms that, with ad-hoc prompt engineering, GPT-4o is indeed able to generate complex MDX queries —particularly when the natural language question is given a structured formulation.

ano.nymous@ccsd.cnrs.fr.invalid (Sandro Bimonte) 10 Jun 2026
https://hal.inrae.fr/hal-05652708v1
[hal-05230510] Seed Inference in Interacting Microbial Communities Using Combinatorial Optimization

The behaviour of microorganisms and microbial communities can be abstracted by models combining a description of their metabolic capabilities as metabolic networks, and suitable computational or mathematical paradigms that further integrate simulation conditions. A major component of the latter is the composition of the environment or growth medium that can be referred to as seeds. Predicting the seeds from the metabolic network and an expected behaviour is an inverse problem that can be addressed with linear programming or logic paradigms such as Answer Set Programming (ASP). Here, we formalise seed prediction for microbial communities, taking into account that their members may interact positively through metabolite transfers, which may reduce the need for external seed metabolites. We address the problem with ASP and add a hybrid component ensuring the satisfiability of linear constraints. We explore the subset-minimality solving heuristic of the Clingo solver and develop two heuristics supporting priority of seeds over transfers. We present a proof of concept of seed inference in small-scale communities, and assess the scalability of the three heuristics at genome-scale. Overall, our work introduces a hybrid logic-linear model for seed inference in interacting microbial communities, and new heuristics for the exploration of the solution space with subset minimality optimisations.

ano.nymous@ccsd.cnrs.fr.invalid (Chabname Ghassemi Nedjad) 29 Aug 2025
https://inria.hal.science/hal-05230510v1
[hal-05656546] A soil-type-specific stratified approach to bare soil mosaicking and SOC prediction from Sentinel-2 time series

Accurate, high-resolution mapping of soil organic carbon (SOC) is essential for environmental modelling and sustainable land management, yet its prediction based on satellite imagery is often affected by vegetation and moisture, possibly causing generalised models to fail in landscapes with heterogeneous soils. To address this, we developed a stratified framework that tailors bare soil thresholding and SOC modelling to specific soil types. Using a 7-year Sentinel-2 time series and 414 soil samples over the Centre-Val de Loire region in France, we first identified the Visible and Shortwave Infrared Drought Index (VSDI) as an effective moisture proxy, avoiding the need for availability-limited external moisture data. We then optimised NDVI, NBR2, and VSDI thresholds individually for each major soil type to filter bare soil observations. Finally, soil-type-specific partial least squares regression (PLSR) models were built and compared against a single generalised model.</p><p>Our results showed that our soil-type-specific strategy substantially outperformed the generalised model (e.g., RPIQ increased from 0.68 to 2.38 for Brunisols eutriques). The primary spectral predictors for SOC were highly variable, varying from visible to SWIR bands according to the soil type. The optimised VSDI filter was also critical for this improvement, reducing prediction RMSE by nearly 50% for loamy-texture soils like Luvisols. This study demonstrates that, for accurate SOC prediction at a very large regional scale, context (soil-type)-specific stratification of bare soil thresholding and SOC modelling is critical, serving as a framework for integrating pedological knowledge into SOC prediction and subsequent digital soil mapping workflow.

ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 14 Jun 2026
https://hal.inrae.fr/hal-05656546v1
[hal-05657073] A free time machine? Milking robots and transformations in the temporal regime of dairy farmers in France

This article examines the effects of the large-scale diffusion of milking robots on dairy farmers’ lifestyle, with a special attention to the temporal patterns of their daily life. The study is based on quantitative data from a questionnaire completed by 831 respondents and qualitative data from 43 interviews with dairy farmers. It critically questions prevailing narratives depicting milking robots as technologies that liberate farmers' time and modernize their lifestyles. Our findings show instead that the robot reconfigures labour along principles of fluidity, and further blurs the already fuzzy boundaries between work and the domestic sphere. Farmers' daily lives remain tied to a largely unchanged temporal regime defined by long working hours. The reduction of total work hour is on average of 6% and mostly concerns evening hours. The morphology of work time, as well as the patterns of articulation of social times within farming households, are only marginally altered by the adoption of milking robots.

ano.nymous@ccsd.cnrs.fr.invalid (Nicolas Deffontaines) 15 Jun 2026
https://u-picardie.hal.science/hal-05657073v1
[hal-05683977] Assessing the structure of DNA representation spaces using graph-based comparisons

Many models have been proposed to create embeddings of DNA sequences. While these models are typically evaluated using downstream tasks such as species prediction, such evaluations offer limited insight into the intrinsic differences between their embedding spaces. To address this gap, we focus on direct comparison of the geometric and topological properties of embeddings generated by those models. We consider five models (TNF, dna2vec, DNABERT-S, DNABERT-2 and HyenaDNA) chosen to represent a broad range of embedding techniques: from simple k-mer counts to state-of-the-art transformer architectures. Our comparison centers on two key questions. First, how do these models organize sequences from the same species in their latent spaces ? Second, how do they differ in terms of local and global topological properties ? We first evaluated whether sequences from the same species cluster together in each model’s embedding space. PCA projections revealed that, while all models grouped sequences by species, the degree of separation varied. Strikingly, the order of species along the first principal component was consistent across models, as were the relative distances between clusters. This suggests that, despite wide architectural differences, the models capture a shared global biological signal in their embeddings. Three models (TNF, dna2vec, and HyenaDNA) produced tightly clustered species groups, yielding similar silhouette scores. In contrast, DNABERT-S and DNABERT-2 exhibited more dispersed clusters: DNABERT-S achieved a higher silhouette score due to better separation, while DNABERT-2’s lower score reflected greater intra-cluster variance. This divergence may stem from the transformer architectures’ ability to capture more nuanced sequence features, albeit at the cost of cluster compactness. To further probe the embedding spaces using quantitative metrics, we constructed k-nearest neighbors (K-NN) graphs and applied two complementary analyses: (i) Jaccard distance to quantify local neighborhood similarities and (ii) a permutation test based on Random Dot Product Graphs (RDPG) to compare global topological properties. These methods enabled us to assess both fine-grained and large-scale differences between embedding spaces. Our analysis revealed that both Jaccard and RDPG comparisons are sensitive to hyperparameter choices. For K-NN graphs, the distance metric and number of neighbors (k) significantly impacted results, as did the size and composition of the input dataset. The RDPG framework introduced an additional hyperparameter: the latent dimensionality (d) for spectral embedding. Surprisingly, it also resulted in strong asymmetry in model comparisons (A vs. B ≠ B vs. A). This asymmetry poses a challenge for aggregating results and drawing robust conclusions. In summary, our study establishes Jaccard distance and RDPG on k-NN graphs as a unified framework for comparing DNA sequence embeddings, offering insights into both local and global properties of latent spaces. While methodological challenges, in particular hyperparameter sensitivity and comparison asymmetry, remain, addressing them could provide a way toward more biologically interpretable embeddings and deepen our understanding of what genomic models actually learn.

ano.nymous@ccsd.cnrs.fr.invalid (Juliette Francis) 07 Jul 2026
https://hal.science/hal-05683977v1
[hal-05673352] EweAcT: Ewe behaviour aligned to accelerometer data for activity monitoring in extensive grazing systems.

Monitoring livestock behaviour under extensive conditions would provide valuable insights to assess animal adaption to environmental perturbations in agroecological systems (e.g., heat waves, parasitism, predator attacks). Animal behaviour can be monitored using accelerometer data collected from neck-collars combined with artificial intelligence models. However, large amounts of accelerometer data aligned with annotated behaviours are necessary to develop accurate models of behaviour prediction. In particular, developing reliable models for extensive systems requires data collected across a wide range of representative conditions. The dataset includes 79 hours of tri-axial accelerometer data aligned with behaviours manually annotated from video recordings for 120 Romane ewes born between 2021 and 2024. The ewes were derived from two divergent genetic lines after three and four generations of selection started 10 years ago: low and high social attractiveness, noted S-and S+, and low and high tolerance towards humans, noted H-and H+. They were reared under the extensive system applied to the Experimental Unit of La Fage (UEF, INRAE, Saint-Jean-et-saint Paul, Aveyron) where 250 sheep were reared exclusively outdoors on 280 hectares of rangeland in southern France. First batch of data was collected on March, June and July 2024 at the UEF under a range of extensive conditions, including sloping pastures and heat-wave periods. Ewes were equipped with accelerometer neck-collars specifically designed for young sheep on pasture. They were grouped on experimental paddocks for 4 to 8 hours and provided with fresh grass and ad libitum access to water. The animals were simultaneously video-recorded using an elevated CCTV camera. Behaviour annotation was carried out using Behavioral Observation Research Interactive Software focusing on the main behaviours on pasture: Grazing, Ruminating, Resting, Moving, and "Other", grouping all remaining activities. Annotations and corresponding accelerometer sequences were aligned using Python language, based on a time synchronization procedure. A second batch of data was acquired on November 2025 to supplement the dataset with the moving activity. For that purpose, ewes were equipped with the accelerometer collars and moved on tracks from the housing area to the pastures, corresponding to an approximately 10 minute-walk. The start and end times of the moves for each ewe were used to align the corresponding accelerometer data with the moving activity. These data were then merged with the dataset from the first batch. The resulting dataset is ready to use for applying artificial intelligence models to classify the 5 main behaviours of sheep under extensive grazing systems from accelerometer data.

ano.nymous@ccsd.cnrs.fr.invalid (Lucile Riaboff) 01 Jul 2026
https://hal.inrae.fr/hal-05673352v1
[hal-05686843] Lossless compression of k-mer matrices enabling random row access

<div><p>Genomic search engines such as Logan-Search index petabytes of sequencing data as large binary matrices, called k-mer matrices, where each row encodes the presence of a k-mer across thousands to millions of genomic samples. Logan-Search contains a petabyte of binary matrices, and storing them is expensive, yet compression must not prevent fast random access to any matrix row at query time. We present kmcomp, a lossless compression method for k-mer matrices that satisfies these competing requirements. Block compression partitions the matrix into fixed-size row blocks, each compressed independently; block start positions are stored in an Elias-Fano encoded array, enabling O(1) random access to any block. To improve compressibility without introducing additional decompression steps, we introduce the π-compression: a column reordering that groups similar samples together by solving the Traveling Salesman Problem via a nearest-neighbor heuristic. We accelerate this heuristic with a novel variant of the vantage-point tree, the masked vp-tree, which dynamically prunes nearest-neighbor search space. On three (meta)genomic datasets, kmcomp achieves compression ratios of 1.3 to 5.4; π-compression further improves these to 1.5 to 51.3. Applied to the Logan-Search petabyte-scale index, compression reduces storage by approximately half, and π-compression adds a further 13% gain. Query overhead remains modest: queries of hundreds of nucleotides incur an absolute latency increase of ≈ 100 ms, and highly compressed indexes can match uncompressed query times thanks to reduced disk reads.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Alix Regnier) 09 Jul 2026
https://hal.science/hal-05686843v1
[hal-05665434] Horloges épigénétiques pour la longévité fonctionnelle des vaches laitières

Chez les bovins lait, la longévité fonctionnelle est un caractère d’intérêt majeur sur les plans économique, environnemental et du bien-être animal mais elle n’est mesurable que tardivement dans la vie des individus. L’objectif de ce stage était d’évaluer si l’âge épigénétique, prédit à partir de la méthylation de l’ADN pouvait constituer un biomarqueur précoce du vieillissement et de la longévité fonctionnelle. Le méthylome sanguin de 4 751 vaches Holstein a été analysé à l’aide de la puce RUMIGEN EpiChip, ciblant 44 053 sites CpGs. Plusieurs modèles de prédiction ont été comparés afin de construire une horloge épigénétique. Le meilleur modèle a prédit l’âge épigénétique des animaux avec une précision de 129 jours. L’écart entre l’âge épigénétique et l’âge réel a été calculé pour chaque animal. Dans les modèles analysés, cet écart présentait une héritabilité modérée et était associé à plusieurs régions génomiques où plusieurs gènes candidats ont été identifiés (dont DNMT3B et RCAN2). De plus, cet écart a permis d’observer que l’accélération du vieillissement épigénétique était associée à une réforme précoce des animaux. Ces résultats montrent qu’une horloge épigénétique peut être construite chez la vache laitière et que l’âge épigénétique pourrait constituer un éventuel biomarqueur précoce de la longévité fonctionnelle. Ces approches sont très prometteuses pour être appliquées à d’autres contextes et à différentes échelles d’étude.

ano.nymous@ccsd.cnrs.fr.invalid (Margaux Gaury) 22 Jun 2026
https://hal.inrae.fr/hal-05665434v1
[hal-05643765] DRL-Based Pose Control for Double-Ackermann Robots Under Actuation Uncertainties

Robust deployment of deep reinforcement learning (DRL) policies on real robots remains challenging due to discrepancies between simulation and real-world dynamics. We address this issue in the context of maneuvering with double-Ackermann-steering mobile robots, which introduce additional constraints due to their non-holonomic nature. Building upon the DRL framework ManeuverNet, we extend its objective from position control to full pose control, resulting in a more challenging task. We further investigate the impact of actuationrelated uncertainties on policy transfer. The use of simplified actuation models during training of the extended policy can lead to poor generalization, shown by a success rate drop from 100% in PyBullet to 25% in Gazebo under stricter evaluation conditions. To address this limitation, we adopt a sim-to-simto-real approach, where actuation effects observed in Gazebo are incorporated into the PyBullet training environment. Using multi-environment DRL with SAC and CrossQ, we learn policies that remain robust despite modeling inaccuracies. This approach can significantly reduce the performance gap across simulators, achieving up to 92% success rate in Gazebo and maintaining 69% under stricter thresholds, with successful transfer to a real robot without additional tuning.

ano.nymous@ccsd.cnrs.fr.invalid (Oussama Zaim) 04 Jun 2026
https://hal.science/hal-05643765v1
[hal-05643469] OrthoViewer

OrthoViewer (interactive exploration of orthologous gene families). OrthoViewer is a publicly accessible web platform that enables real-time exploration of orthologous gene families across 84 angiosperm species. It integrates phylogenetic trees, eggNOG-mapper functional annotations (GO, SO, CoG namespaces) and a manually curated layer of reference genes with associated literature, into a single interactive query environment.

ano.nymous@ccsd.cnrs.fr.invalid (Raphaël Flores) 17 Jun 2026
https://hal.inrae.fr/hal-05643469v1
[hal-05578828] Use of Surface Water and Ocean Topography (SWOT) observations to support Land Use/Land Cover (LULC) change products: the case of the pacific coast of Ecuador

<div><p>Radar altimetry has been used to characterize land surfaces. However, the nadir configuration of the radar altimeter sensor and its coarse spatial resolution were limiting factor. The Surface Water and Ocean Topography (SWOT) mission overcomes these limitations through its Ka-band Radar Interferometer (KaRIn), a synthetic aperture radar (SAR) system, providing high spatial resolution and accurate surface height measurements. Initially used for hydrology and oceanography, this study explores an innovative use of SWOT to analyse changes in Land Use/Land Cover (LULC). To do this, three study areas located on the Pacific Coast of Ecuador were considered. The area in the south (A) is characteristic of cultivated areas, while the area in the center (B) presents a landscape mosaic and the area in the north (C) hosts tropical rainforests. For each study area, the SWOT backscatter coefficient (sig0) was analysed for the year 2024 from the raster product at 100m spatial resolution. We calculated the number of occurrences and the sig0 average from the raster product over each pixel. The spatial patterns obtained from these two variables enabled us to assign a LULC class (city, water, road, crop, or no forest, depending on the study area) to each pixel, using a Support Vector Machine (SVM). The assigned LULC classes depend on the partial spatial coverage of the SWOT data, which does not allow representing all the LULC classes. The classification results were compared with the LULC map provided by the Ministry of the Environment using a confusion matrix and obtained an accuracy greater than 0.87 and an F1 score greater than 0.89 for the three study areas. In the forest area (C), the SWOT observations were also compared to two change detection products: RAdar for Detecting Deforestation (RADD) alerts and detections by the Cumulative Sum (CuSum) method. 39% of the SWOT observations were in areas identified as forest by these products but classified as no forest or water in our SWOT classification. By detecting small streams (areas A, B and C), roads (area A), the boundaries of agricultural plots and the state of cultivated land (area A) as well as recent forms of deforestation (zone C), SWOT was found to be a complementary source of information for LULC change products.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Valentine Sollier) 03 Apr 2026
https://hal.inrae.fr/hal-05578828v1
[hal-05665454] Epigenetics clocks for functional longevity of dairy cows

Chez les bovins laitiers la longévité fonctionnelle se définit comme la capacité d’un individu à rester productif dans le troupeau tout en maintenant un bon état de santé et des performances satisfaisantes au cours de sa vie productive (Schuster et al, 2020). L’augmentation de la durée de vie fonctionnelle des individus permettrait de diminuer les besoins de renouvellement du troupeau et de réduire les impacts environnementaux et économiques associés. La longévité fonctionnelle n’est mesurable que tard dans la vie d’un animal et disposer de biomarqueurs précoces permettrait d’accélérer le progrès génétique sur ce caratère. On appelle âge épigénétique l’âge d’un animal prédit via une horloge épigénétique, un algorithme utilisant des données de méthylations de l’ADN comme variables prédictives (Horvath, 2013). L’âge épigénétique pourrait-il constituer un biomarqueur précoce de la longévité fonctionnelle chez les bovins laitiers ? Pour construire l’horloge épigénétique, nous avons analysé le méthylome sanguin de 4 751 vaches Holstein, à l’aide d’une puce de méthylation bovine (RUMIGEN EpiChip), mesurant la méthylation de 44 053 sites CpG. Nous avons comparé plusieurs méthodes d’apprentissage en variant : (1) la nature des variables prédictives (matrices de méthylation de l’ADN ou valeurs propres issues d’une analyse en composante principale, données corrigées de co-variables ou non), et (2) la méthode de prédiction (régressions linéaires pénalisées ou forêts aléatoires). Les modèles les plus performant prédisent un âge épigénétique fortement corrélé à l’âge réel des animaux lors du prélèvement (RMSE = 128.9 jours, R² = 0.95). Nous avons prédit un âge épigénétique pour l’ensemble des échantillons du dispositif, en utilisant les paramètres des meilleurs modèles et en utilisant différentes répartitions des animaux dans les jeux de données d’apprentissage et de test. Cela nous a permis de calculer des écarts d’âges entre l’âge prédit et l’âge réel pour chaque animal de la cohorte. L’héritabilité des écarts d’âges est modérée (h² > 0,2) et plusieurs régions du génome (QTLs) associées à ce phénotype ont été détectés. L’identification des gènes candidats dans chacun de ces QTLs est en cours. L’étape suivante consistera à estimer la corrélation entre cet écart d’âge et différents phénotypes de production, de santé, et de longévité mesurés sur les mêmes animaux. Ces travaux confirment qu’il est possible d’entraîner des horloges épigénétiques chez la vache laitière avec une bonne précision, et nous espérons qu’ils nous apportent des connaissances nouvelles sur l’efficacité de ce biomarqueur pour prédire la longévité fonctionnelle des animaux.

ano.nymous@ccsd.cnrs.fr.invalid (Margaux Gaury) 22 Jun 2026
https://hal.inrae.fr/hal-05665454v1
[hal-05682255] Towards lucerne varieties used as living mulch for cereal crops in agroecological systems

Lucerne, a perennial legume known for its nitrogen fixation, persistence, and soil-covering capacity, shows strong potential as a living mulch for cereal cropping. However, its vigorous growth often results in excessive competition with cash crops. The selection of lucerne varieties adapted to living mulch could be a solution to reduce this competition. We synthetize the state of the art on this subject. Wheat–lucerne interactions occur from the earliest stages of wheat cycle until its harvest and are mainly driven by lucerne morphological and phenological traits. Autumn dormancy, growth habit, height, and cover state of lucerne determine the trade-off between reducing competition with wheat and maintaining the ecosystem services provided by lucerne. An intermediate dormancy, combined with moderate height and upright cover, appears to provide the most favourable balance. Genetic correlations between traits measured in spaced plants and living mulch conditions reveal that some traits, such as height, remain stable across designs, whereas others are highly design-dependent. This supports a two-step breeding strategy combining early indirect selection in nursery of spaced plants with an indirect selection under living mulch conditions. Finally, molecular markers used for genomic prediction could accelerate the identification of genotypes suited for living mulch systems. This knowledge can be used to create dedicated varieties.

ano.nymous@ccsd.cnrs.fr.invalid (Zineb El Ghazzal) 06 Jul 2026
https://hal.inrae.fr/hal-05682255v1
[hal-05639710] Modelling soil microbial functions at large spatial scale based on metagenomic dimensionality reduction

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Emna Stambouli) 01 Jun 2026
https://inria.hal.science/hal-05639710v1
[hal-05105798] Metabolic Flux Inference in a Cheese Microbial Community via comFI: a Biology-informed Approach for Time-resolved Multi-omics Integration

Microbial communities play a central role in many bioprocesses with key applications in food fermentation, waste treatment, human and animal well-being, plant protection or metabolite transformation in industrial bioprocesses. However, the metabolic microbial interactions driving the community dynamics remain difficult to characterize because of their complexity and their temporal variability. Recent advances in sequencing and analytical technologies now provide time-resolved multi-omics data at the community scale providing key insights into the mechanisms shaping the community dynamics. However, integrating these heterogeneous data in an interpretable way to decipher species-specific metabolic activity and microbial interactions remains a major challenge in the study of microbial communities. We introduce the community metabolic flux inference (comFI) method, a mathematical framework for inferring the metabolic fluxes of individual microorganisms from community-level longitudinal data. The method formulates flux estimation as a biology-informed constrained inference problem that combines observed microbial abundances and extracellular metabolite exchange data, with metabolic constraints, encoded in a metabolic model, and transcriptomic-based lasso regularization terms. We evaluated comFI on synthetic datasets generated from dynamic models of microbial communities involving three Escherichia coli mutant strains. The comFI method showed a very good reconstruction accuracy for exchange fluxes, intracellular metabolic fluxes distribution, metabolic pathway activation patterns and strain contribution. We also applied the method to experimental cheese fermentation data involving three bacteria (Lactococcus lactis, Lactobacillus plantarum, and Propionibacterium freudenreichii ), combining abundance measurements, targeted metabolomics and transcriptomics data. The comFI framework enabled to recover previously identified interaction patterns, and to reconstruct latent intracellular flux states for individual microorganisms alongside with their respective metabolic contributions within the community, consistently with the omics data. All together, we demonstrate that comFI provides a practical framework for recovering the metabolic activity of individual microorganisms from community-scale multi-omics time-resolved data.

ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 24 May 2026
https://hal.science/hal-05105798v2
[hal-05630049] Automated mapping of metabolomic compounds onto metabolic networks using MetaNetMap

Understanding biological systems requires integrative and multi-level approaches. Genome Scale Metabolic Networks (GSMNs), that are derived from genome annotation, capture the metabolic capabilities of an organism. In contrast, metabolomics gives an insight into what is really happening in an organism under specific conditions. Mapping molecules identified from metabolomic experiments onto GSMNs offers several advantages: mapped compounds can be used for visualisation or quality assessment of the GSMN; and conversly, unidentified metabolites highlight gaps in the network and create model curation opportunities. This is especially important for specialised metabolism that is currently largely overlooked in GSMNs. Such mapping is thus attractive but it remains cumbersome due to several challenges such as harmonisation and matching of identifiers between metabolomic annotation profiles and GSMNs, and dispersion of information across various knowledge bases and input files. Currently, mapping requires manual or semi-manual mapping, but it is quite fastidious and prone to errors. To overcome these challenges, we developed MetaNetMap, a Python package that automatically matches metabolite information between metabolomic annotations and GSMNs. It improves mapping rates through direct mapping taking into account metadata of input files, indirect matching by relying on conversion data tables built from third-party knowledge bases, and partial matching techniques. It offers an automatic solution for ambiguous mapping, providing relevant information for manual curation. By automating and harmonising metabolite mapping, MetaNetMap aims to overcome a major barrier in multi-omic integration, enabling more efficient and reproducible integration of metabolomic data onto GSMNs.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 22 May 2026
https://inria.hal.science/hal-05630049v1
[hal-05630041] Automated mapping of metabolomic compounds onto metabolic networks using MetaNetMap

Understanding biological systems requires integrative and multi-level approaches. Genome Scale Metabolic Networks (GSMNs), that are derived from genome annotation, capture the metabolic capabilities of an organism. In contrast, metabolomics gives an insight into what is really happening in an organism under specific conditions. Mapping molecules identified from metabolomic experiments onto GSMNs offers several advantages: mapped compounds can be used for visualisation or quality assessment of the GSMN; and conversly, unidentified metabolites highlight gaps in the network and create model curation opportunities. This is especially important for specialised metabolism that is currently largely overlooked in GSMNs. Such mapping is thus attractive but it remains cumbersome due to several challenges such as harmonisation and matching of identifiers between metabolomic annotation profiles and GSMNs, and dispersion of information across various knowledge bases and input files. Currently, mapping requires manual or semi-manual mapping, but it is quite fastidious and prone to errors. To overcome these challenges, we developed MetaNetMap, a Python package that automatically matches metabolite information between metabolomic annotations and GSMNs. It improves mapping rates through direct mapping taking into account metadata of input files, indirect matching by relying on conversion data tables built from third-party knowledge bases, and partial matching techniques. It offers an automatic solution for ambiguous mapping, providing relevant information for manual curation. By automating and harmonising metabolite mapping, MetaNetMap aims to overcome a major barrier in multi-omic integration, enabling more efficient and reproducible integration of metabolomic data onto GSMNs.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 22 May 2026
https://inria.hal.science/hal-05630041v1
[hal-05613501] From Concept to Perspective: Digital Twins of Microbial Systems

Digital twins (DTs) are increasingly recognized across diverse sectors for their capacity to enhance the control, efficiency, and comprehension of the physical or biological systems they represent. For microbial systems, DTs could allow model-guided improvements of the services provided by the microbial communities in the agrifood chain. While DTs definitions are generally built on the same core idea of bi-directional exchanges between digital and physical counterparts, where realtime data feeds digital models and model-driven insights guide the real system, a wide variety of definitions of what is a DT still co-exist across domains. This variability underscores the need for a clear, system-specific definition of DTs for microbial ecosystems. In this perspective paper, we propose a conceptual framework for microbial system digital twins (MSDTs), defined as a collection of models dynamically linked to the microbiological system through in-line, at-line or off-line data and control flows. We illustrate this framework with examples spanning environmental, bioprocess, plant, animal, food, and human microbial systems, in a One Health perspective. For each ecosystem, we explore the potential applications of MSDTs. We also identify the scientific challenges that remain in experiments, bioinformatics, data science, modeling, control and microbial ecosystem engineering to build accurate MSDTs. We advocate for the development of MSDT in laboratory settings, as a catalyst for interdisciplinary sciences, and we stress practical and ethical issues preventing the generalization of MSDT for large-scale applications. However, high-tech MSDTs in laboratory environments may pave the way for low-tech, generalizable microbial solutions for improved ecosystemic microbial services.

ano.nymous@ccsd.cnrs.fr.invalid (Simon Labarthe) 06 May 2026
https://hal.inrae.fr/hal-05613501v1
[hal-05610014] Romane sheep divergently selected on residual feed intake: consequences on production and feeding behaviour traits

Breeding companies are highly interested in including feed efficiency into their breeding programmes because of the economic and environmental benefits this would bring. We developed divergent lines on residual feed intake (RFI) in the Romane breed in order to estimate the consequences of RFI selection on other traits. In this experiment, RFI was phenotyped in growing lambs under an ad libitum low-energy concentrate diet, distributed through automated concentrate feeders, during 6-week test periods. The divergent selection experiment encompasses five generations of selection, and a total of 821 male lambs were phenotyped for production, feed efficiency and feeding behaviour traits. After five generations of selection, the difference between the efficient and less efficient lines reached 2.6 genetic SD, and efficient lambs ate on average 9.36% less concentrate than less efficient lambs. No significant differences in production traits were observed between both lines, except for on-farm breeding values, with daughters from efficient rams having higher estimated breeding values (EBVs) for prolificacy but lower EBVs for maternal traits. Significant differences between both lines were observed for feeding behaviour traits. The efficient lambs visited the feeders less and tend to spend less time in feeding activity. Less efficient lambs had a marked hourly feed intake pattern compared to efficient lambs. Nevertheless, the genetic correlations between RFI and production or behaviour traits were of low magnitude (from-0.33 to 0.24 with conformation score and BW at the beginning of the test, respectively), with most of the estimates being close to zero. This divergent experiment demonstrated that selecting RFI is feasible with low impacts on production traits, but particular caution must be paid to prolificacy and maternal traits to avoid any undesirable changes in these important traits. This also illustrated that feeding behaviour is different in efficient and less efficient animals, which provides clues to further investigate the mechanisms underlying feed efficiency in meat sheep. (c) 2026 The Author(s). Published by Elsevier B.V. on behalf of The animal Consortium.

ano.nymous@ccsd.cnrs.fr.invalid (F. Tortereau) 03 May 2026
https://hal.inrae.fr/hal-05610014v1
[hal-05604339] WP 2.3: Réduction de dimension et analyse de série temporelles multiomiques

In order to elucidate bioprotection against mildiou on tomato leaves, I explore way to integrate time series and dimension reduction on top of a metabarcoding dataset.

ano.nymous@ccsd.cnrs.fr.invalid (Sébastien Raguideau) 28 Apr 2026
https://hal.science/hal-05604339v1
[hal-05610911] Metagenome assembly and evaluation in taxonomically rich ecosystems

Presentation for the M36 meeting of the MICTIC project. Showcases advancements on the Mapler pipeline (https://hal.science/hal-05288241), cluster-based assemblies (https://hal.science/hal-05444605) as well as an assortment of quickly explored or unexplored ideas, including assembly consensus, analysis of assembly graphs, exploitation of reference genome and realignment of reads on the assembly.

ano.nymous@ccsd.cnrs.fr.invalid (Nicolas Maurice) 04 May 2026
https://hal.science/hal-05610911v1
[hal-05604272] Generation of metabolomic-informed models of metabolism in complex microbial communities

Presentation for the M36 meeting of the MICTIC project Resume of the presentation: The generation of genome-wide metabolic networks has become a routine analysis for individual organisms or communities communities. However, these automatically generated metabolic networks are incomplete because they are constructed by based on the combination of gene annotation and reactions available in generic available in generic databases (Metacyc, BIGG, ModelSEED...). These are oriented towards well-known organisms or organisms or model organisms and miss out on important functions secondary metabolism. We propose to combine metabolomic data analysis, metabolic modelling and annotation metabolic modelling and annotation mining to build high-quality models of high quality models of microbial metabolism with the long-term aim of better understanding of microbial communities. In terms of application of the methods to plant microbial communities, we hope that the plant microbial communities, we hope that the newly developed models will provide a better understanding of the process of microbial recruitment by the plant: metabolic functions involved, micro-organisms associated with these functions.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 28 Apr 2026
https://hal.science/hal-05604272v1
[hal-05602030] Do farmers trust digital decision support tools on pesticide use?

Les outils d'aide à la décision (OAD) sont de plus en plus mobilisés en agriculture pour raisonner l'usage des pesticides en fournissant des recommandations de traitement, souvent au jour près. Malgré les performances reconnues de ces outils, on constate que les agriculteurs suivent relativement peu les recommandations qu'ils fournissent. Cet article cherche à expliquer ces décisions à travers le prisme de la confiance en l'outil et des biais de perception du risque par les agriculteurs. Nous étudions le cas d'un OAD français destiné à la gestion du mildiou de la pomme de terre. À partir d'une base de données originale contenant les recommandations émises par l'OAD et les décisions de traitement des agriculteurs abonnés, nous analysons le suivi des recommandations de l'OAD par les agriculteurs ainsi que l'évolution de ce suivi à la suite d'épisodes de contamination par le mildiou. Nos résultats montrent que les agriculteurs suivent moins les recommandations de non traitement que celles qui recommandent de traiter. Ce résultat s'accentue pour les agriculteurs ayant subi une contamination dans le passé. On montre qu'ils choisissent de moins suivre les recommandations que l'OAD ait fait une erreur de prédiction dans le passé, ou non. Ces résultats suggèrent que le suivi ou non des recommandations est moins lié au niveau de confiance des agriculteurs dans l'OAD, que dans l'évolution de la perception du risque de contamination future, plus forte chez l'agriculteur ayant déjà subi une contamination dans le passé.

ano.nymous@ccsd.cnrs.fr.invalid (Alban Cornier) 24 Apr 2026
https://hal.inrae.fr/hal-05602030v1
[hal-05601510] Public and private support of the AgriTech : what rationale for what innovation pathway ? A multiple case study in France and Chile

Both public and private actors are increasingly using a diversity of support instruments to foster AgriTech innovation, including a growing interest in the agrifood sector from venture capital and private equity (Lajoie-O’Malley et al., 2020; Sippel and Dolinga, 2023). This support is not only legitimized by the classic rationale of the effect of innovation on economic efficiency and the need for support due to uncertain returns on investment - but also by the socio-ecological effects that these technological developments would enable (Martin and Schnebelin, 2024). Despite this evolution and the growing influence of private funding (Glenna et al., 2015), the AgriTech ecosystems, and their impact on the agricultural innovation system, remain poorly studied (Klerkx and Villalobos, 2024). Who supports AgTech development, what rationales underpin support instruments and how hybrid public and private arrangements configure who and what is supported ? At the crossroad of literature on the agricultural innovation system and entrepreneurial and innovation ecosystems, our research aims to decipher how the development of private venture capital, alongside new innovation support rationales, shape innovation pathways in the agrifood sector. To address these aims, we present a comparative study of AgriTech innovation ecosystems in France and Chile. While these two countries offer different institutional context, they are both very dynamic in terms of AgriTech development. Through an analysis of the CrunchBase database and interviews with key public and private stakeholders of AgriTech innovation ecosystems, we describe the landscape of AgriTech support services. We characterize these instruments according to their providers, beneficiaries and means (Audretsch et al., 2020), as well as their objectives and rationales (Kerr et al., 2017; Laranja et al., 2008). The results highlight the multi-level, multi-actor and multi-instrument ecosystem of innovation support instruments for AgTech in France and Chile. We observe a strong entanglement of public and private support. Most instruments provide a diversity of resources such as funds, knowledge, network, legal expertise…. While they incorporate “mission-oriented” dimensions, they have few processes to evaluate, support and foster these missions. Despite similarities, the instruments refer to different rationales and objectives. These results will enable us to discuss the evolution of agricultural innovation policies within a neoliberal context and a regime of entrepreneurial innovation (« the Silicon Valley model of innovation »).

ano.nymous@ccsd.cnrs.fr.invalid (Éléonore Schnebelin) 24 Apr 2026
https://hal.inrae.fr/hal-05601510v1
[hal-05661789] Automated monitoring of the activity of lactating sows in conventional and organic systems from the use of AI with video images

In the last decade, we developed experimental research projects to determine the impact of the transition to looser housing on sow health, activity and welfare around farrowing and during lactation. We use high-throughput phenotyping based on the use of AI with video images to address these questions. We developed an automated device for the recording and monitoring of sow behaviour. The system consists of a Raspberry Pi protected in a waterproof box, connected to 3 closed-circuit television cameras to monitor up to 3 sows simultaneously. The system can be controlled using a smartphone with a direct wireless connection which does not require Wi-Fi. We developed an ethogram for the analysis of sow postures. The software differentiates eight postures : sitting, standing, kneeling, lying on the belly, right/left side with teats not visible, and lying on the right/left side with teats visible. The system uses a convolutional neural network (CNN Yolo-v11) to estimate sow postures from the images at a rate of one estimate every second. The CNN was set up with more than ten thousand labelled images from different sows collected in our 2 experimental farms. It allows the identification of sow body with corresponding posture, sow head and piglets. The microcomputer manages the data flow and estimates sow posture in real-time. The CNN was set up on many and diverses images with the objective of guaranteeing the quality of the prediction in our experimental farms. We also developed a web application to supervise and pre-analyse sow activity (e.g. time budget). Studies of temporal changes in activity will be possible, as well as the search for sows with greater capacity of adaptation (Canario et al., EAAP 2023 ; Girardie et al., 2024). In addition, we created an ethogram for the analysis of sow facial expressions (Barry et al., EAAP 2025). Having developed a database of ten thousand labelled images from different sows, we are preparing a proof of concept for the automated monitoring of changes in facial expressions. The device can be equipped with several algorithm(s) trained and adapted to the trait(s) and condition of interest. Developments received financial support from the animal genetic division of INRAE and the PEPR WAIT4 ANR-22-PEAE-0008 and H2020 PPILOW projects.

ano.nymous@ccsd.cnrs.fr.invalid (Téo Cochou) 18 Jun 2026
https://hal.science/hal-05661789v1
[hal-05686261] Mapping the agricultural digital ecosystem: An analysis of the French AgTech market actors and dynamics

This communication proposes a systematic and dynamic understanding of the firms that commercialise new equipment and digital technologies (EDiTs) to farmers. Agriculture's digitalisation is mostly analysed through the lens of equipment (hardware and software), their adoption paths and impacts. However, research on the ecosystem supporting these innovations remains scarce, particularly outside North America (Klerkx &amp; Villalobos, 2024). Several questions emerge: what kind of technologies are produced? What kind of organisations develop and finance them? What is the sector's dynamic, especially regarding the viability of organisations and technologies? This work addresses these questions by characterising the actors of the French « AgTech » market. We aim to describe their diversity, trajectories, and funding dynamics using a quantitative approach. We built an original database of organisations based on former initiatives that propose inventories of digital tools, apps, and robots sold to farmers, including Wiki AgriTech and outputs from various projects. The database was then coupled with other sources providing economic information about the firm, including CrunchBase or the French business database Pappers. We collected information on the firm's characteristics (size, localisation, age, funders, funding sources, turnovers), and on the EDiTs sold to farmers: nature (robots, apps, software), functions (supporting decision-making, field observations, etc.), and targeted sector. In total, more than 600 organisations were identified. The database shows that the technologies are primarily developed by private actors (almost 90% of organisations). Half of them were established after 2014 and are specialised in both the agricultural and digital sectors. Analyses show that company characteristics, particularly size and location, vary depending on the type of technology developed and the targeted agricultural sector. The collected data indicate that the sector is still consolidating, as more than a quarter of the firms have gone bankrupt and/or been acquired in recent years. An econometric analysis of this data enables us to understand the factors contributing to the success or failure of firms commercialising EDiTs, including the level of specialisation, the nature of the technologies sold, the integration in policy frameworks, and the sources of funding. By shedding light on the construction and dynamics of the emerging French AgTech market, this work challenges the idea of a homogeneous economic sector. It raises questions about the roles of both public and private ecosystems in its emergence. Finally, this work opens up avenues for reflection to better understand the contribution of different innovation systems to sustainability challenges.

ano.nymous@ccsd.cnrs.fr.invalid (Romane Guillot-Pelliet) 08 Jul 2026
https://hal.science/hal-05686261v1
[hal-05667139] Reconstruction of Root System Architecture in 2D+t: benchmarking loss functions in deep learning reconstruction pipelines

Context: Automatic Root System Architecture (RSA) reconstruction methods generally use a two-step process: 1) segmentation, and 2) tree-graph extraction. Given the hierarchical structure of this pipeline, the alignment of segmentation objectives and graph extraction targets is investigated. Goal: 1) Investigate the downstream impact of the segmentation model’s architecture and training loss on the traits predicted on the graphs. 2) Determine which metrics computed in the early segmentation step are useful for predicting overall pipeline performance. Method: Graph extraction step was fixed using RootSystemTracker (Fernandez et al., 2022). Six variations of the segmentation setup (2 architectures × 3 loss functions) were evaluated through training epochs. A step-by-step evaluation was conducted for each setup to measure the accuracy of the segmented masks and the trait estimation error (SMAPE) derived from the resulting RSML tree-graph files (e.g., lengths, counts).

ano.nymous@ccsd.cnrs.fr.invalid (Loaï Gandeel) 23 Jun 2026
https://hal.science/hal-05667139v1
[hal-05601671] Counting large flea beetle larvae using computer vision

In rapeseed, the Berlese method is currently the only reliable method to assess the pressure of large flea beetles (Psylliodes chrysocephala) on a field and determine whether or not it is relevant to apply insecticide (also based on other indicators like the number of larvae per plant and the biomass of the rapeseed). This method involves repeated, long and tedious countings. To avoid this and facilitate the deployment of the Berlese method, Terres Inovia has developed an application capable of determining the number of larvae using computer vision from smartphone images. To date, more than 16,000 sub-images (640x640 pixels) have been annotated to create an algorithm. This tool works using computer vision and processes images using the YOLOv8 model. Currently, the model has a performance rate of 88%, with an average error of 12%. The higher the quality of the photos, the better the tool will be able to detect larvae, even when they are clustered together. The resolution, framing and brightness of the photos remain essential conditions for successful measurement. This method, which is not yet official, does not replace Berlèse tests or field observation, but it does facilitate their interpretation and monitoring, which is thus faster and more accurate. It should be noted that this tool is expected to evolve to detect not only the number of larvae but also their stage of development (L1, L2 or L3, the most harmful). This image processing will require higher resolution. However, it will enable to measure the degree of harmfulness of the larvae within the plot to be determined more accurately.

ano.nymous@ccsd.cnrs.fr.invalid (Jean-Eudes Hollebecq) 24 Apr 2026
https://hal.science/hal-05601671v1
[hal-05567423] Video of the presentation at LREC conference: EPOP: A benchmark corpus for Assessing NLP Models on Structured Information Extraction in Plant Health

This video presents the work published in LREC conference proceedings in 2026. In this presentation, we introduce the EPOP (Epidemiomonitoring of Plants) corpus, a new annotated resource for structured information extraction in the domain of plant health epidemiology. The corpus consists of translated news reports that reflect real-world phytosanitary monitoring scenarios. It includes annotations for named entities (e.g. Plant, Pest, Vector, Disease, Dissemination Pathway), identity coreferences, and both binary and complex n-ary relations that represent key events such as Transmits or Causes, along with their modalities. A distinctive feature of EPOP is its normalization layer where mentions of species and geographical locations are linked to canonical identifiers in the NCBI Taxonomy and GeoNames, enabling semantic disambiguation and integration with external knowledge bases. As the first publicly available corpus of its kind, EPOP presents a realistic and challenging benchmark, with high linguistic variability, entity role ambiguity, and long-distance relations. We report baseline results on core tasks (named entity recognition, normalization (entity-linking), and relation extraction) using both fine-tuned BERT-based models and hard-prompted large language models. These experiments demonstrate the utility of EPOP while also identifying areas for improvement, particularly in the extraction of complex relations. The corpus is released under an open license, to support research in environmental NLP, crop protection, and knowledge graph enrichment.

ano.nymous@ccsd.cnrs.fr.invalid (Claire Nédellec) 25 Mar 2026
https://hal.inrae.fr/hal-05567423v1
[hal-05591524] LifeCLEF 2026 Teaser: AI Challenges for Biodiversity Understanding and Ecosystem Management

AI is increasingly central to understanding and managing biodiversity and ecosystems. Since 2011, the LifeCLEF lab has provided large-scale benchmarks that stimulate progress in multimodal species recognition, ecological prediction, and knowledge extraction. The 2026 edition expands this scope with five complementary challenges spanning visual, acoustic, and textual data: (i) AnimalCLEF: discovery and re-identification of individual animals, (ii) BirdCLEF+: multi-taxonomic species recognition in complex soundscapes, (iii) MarineCLEF: detection of marine species in underwater imagery under positive-unlabeled constraints, (iv) PestCLEF: extraction of information on plant pests from heterogeneous textual sources, (v) PlantCLEF: multi-species plant identification in quadrat images. Together, these challenges address critical dimensions of biodiversity science and ecosystem management, while fostering collaboration between AI researchers, ecologists, and practitioners. This paper provides an overview of the LifeCLEF 2026 lab and its tasks, outlining their motivation, data, and evaluation methodology to guide participants and inform the wider research community.

ano.nymous@ccsd.cnrs.fr.invalid (Alexis Joly) 14 Apr 2026
https://hal.inrae.fr/hal-05591524v1
[hal-05593509] Integrating metagenome-scale metabolic modelling and metabolomics to identify biochemical interactions in Microcystis phycospheres

Favoured by global changes, freshwater cyanobacterial harmful blooms generate major ecological, economical and public health challenges. Microcystis , one of the most widespread cyanobacterial genera, grows within a phycosphere where specialised interactions with its microbiome occur, and are suspected to influence bloom appearance and its potential toxicity. Using a combination of metagenomic, metabolomic and metabolic modelling, we characterised the phycospheres of twelve Microcystis strains isolated from a French pond. The distribution of metabolic reactions within Microcystis was consistent with their genospecies, whereas the metabolic landscape at the community level diverged from cyanobacterial phylogeny indicating functional decoupling between cyanobacteria and their associated microbiomes. Phycosphere-associated bacteria substantially expand the metabolic repertoire of the system, while maintaining functional redundancy within and across communities. On the other hand, metabolomic profiles were largely driven by cyanobacterial metabolic outputs. Metabolic modelling, together with the identification of toxic specialised metabolites produced by specific biosynthetic gene clusters, further highlighted differences in metabolic potential among phycospheres. Together, these findings deepen the understanding of Microcystis ’ phycosphere functioning, demonstrate the value of multi-omics systems biology approaches, and underscore the ecological relevance of interspecies and inter-phycosphere metabolic interactions as a structuring process in bloom-associated microbiomes.

ano.nymous@ccsd.cnrs.fr.invalid (Juliette Audemard) 16 Apr 2026
https://inria.hal.science/hal-05593509v1
[hal-05613681] DNA methylation around transcription start sites is not globally associated with transcription in the grain of natural and synthetic hexaploid wheat

Epigenetic mechanisms including DNA methylation are assumed to play crucial roles in the maintenance of genome integrity, regulation of gene expression and development, and their increasing exploitation in breeding applications is anticipated. However, the relationship between DNA methylation and gene expression remains ambiguous and difficult to generalize. Here we explored the hypothesized causality between the level of transcription and cytosine methylation at the 5' end of genes (around transcription start sites and start codons) in relation to whole-genome duplication in natural and synthetic allohexaploid wheat (Triticum/Aegilops complex). Using transcriptomes and a sequence capture protocol coupled with bisulfite sequencing, we observed sometimes significant, but overall very weak associations between gene expression and 5' end methylation on a genome-wide scale. In synthetic wheat allohexaploids, global methylation differences between subgenomes are not triggered by the polyploidization, as the subgenome patterns are rather faithfully inherited from parents. A small number of genes differentially methylated between the parents and synthetics was consistently recovered in reciprocal synthetics and subsequent generations. Differences in transcription between homeologs are not clearly associated with 5' end methylation in either natural or synthetic wheat. Overall, allopolyploidization triggers only minor methylation changes around transcription start sites and start codons of nascent wheat allopolyploids, and these are not statistically associated with differential expression. Although there is a measurable methylation difference between silent and expressed genes in the developing grain, our results do not support the hypothesis that 5' end DNA methylation is engaged in the regulation of gene expression in natural and synthetic wheat. While a 'genome shock' hypothesis predicts extensive transcriptomic and epigenetic reorganization after polyploidization, DNA methylation patterns around transcription start sites are generally undisturbed in nascent wheat allohexaploids. Although this stability might indicate importance for gene regulation, a clear relationship between DNA methylation and transcription was not observed either on a genome-wide scale, or among triads of homeologous genes.

ano.nymous@ccsd.cnrs.fr.invalid (Meriem Banouh) 06 May 2026
https://hal.inrae.fr/hal-05613681v1
[hal-05584190] Diversité génétique mondiale du complexe Medicago sativa : implications pour l’amélioration variétale de la luzerne.

La luzerne cultivée appartient au complexe Medicago sativa, un ensemble de quatre sous-espèces (sativa, falcata, caerulea, ×varia) dont la délimitation taxonomique demeure incertaine, ce qui limite l'exploitation rationnelle de la diversité. Afin d'évaluer la structure génétique et de clarifier les relations entre sous-espèces, formes sauvages et cultivars, nous avons génotypé environ 1 500 accessions à l'aide de 9 761 SNP. Une analyse discriminante des composantes principales (DAPC) a confirmé la différenciation génétique entre les quatre sous-espèces. Au sein de chaque sous-espèce, une structure géographique marquée a été mise en évidence ; toutefois, pour les cultivars — tous appartenant à la ssp. sativa — les groupes régionaux se recouvraient largement. Au sein de chaque groupe régional, le nombre d'allèles privés le plus élevé a été observé dans les groupes de la ssp. falcata et de la ssp. ×varia. Les groupes de la ssp. caerulea présentent également un nombre d'allèles privés modéré. En revanche, les groupes de la ssp. sativa, qu'ils soient cultivés ou sauvages, en contiennent très peu, voire aucun, à l'exception des accessions cultivées de Chine et de Scandinavie. Afin d'identifier les origines géographiques de la domestication, nous avons projeté les groupes cultivés dans l'espace génétique des pools sauvages. Les accessions chinoises, indo-moyen-orientales et afghano-persanes se projettent presque exclusivement sur le groupe sauvage d'Asie centrale, tandis que les cultivars occidentaux se superposent au groupe sauvage méditerranéen. Ces trajectoires suggèrent l'existence d'au moins deux centres de domestication distincts. Ces résultats ouvrent la voie à l'identification de pools géniques régionaux spécifiques encore sous-exploités, et offrent des opportunités concrètes pour une meilleure valorisation de la diversité génétique dans les schémas d'amélioration de la luzerne.

ano.nymous@ccsd.cnrs.fr.invalid (Irving Arcia Ruiz) 08 Apr 2026
https://hal.science/hal-05584190v1
[hal-05635518] Platform Intelligence in Agriculture

This paper considers the current challenges of agriculture in historical retrospect. As in all activities, agriculture is orchestrated by information governance systems. They have evolved along history following a rather regular trend of disembedding and concentration, disembedding of information from its origin in the physical world and concentration under the control of increasingly powerful market actors. The current challenges of agriculture, whether environmental, health, or culture-related, can be shown to relate to blind spots of the information systems. We consider the potential of intermediation platforms, that are increasingly in control of multi-sided markets in increasingly more sectors, to reshape the view systems have, making them more holistic, able to deal with what was previously neglected as externalities. We first consider the issue theoretically, and then illustrate with the claims of real actors towards a systemic intelligence of agriculture.

ano.nymous@ccsd.cnrs.fr.invalid (Sébastien Grappe) 28 May 2026
https://inria.hal.science/hal-05635518v1
[hal-05563066] Genetic and heat-stress related environmental influences on pig whole-blood gene expression levels

Background: Gene expression levels are affected by genetics and environmental effects. However, quantification of the influence of genetics and environmental effects on gene expression remains limited, especially in farm animals. Here, the relative influence of genetic and heat-related environmental variations on gene expression levels was investigated in pigs, using a backcross herd of diverse heat adaptation levels. Backcross animals were raised in either a tropical or temperate environment. Animals raised in temperate environment were subjected to an experimental heat stress at the end of their growth. Results: We identified 1,967 differentially expressed genes (DEGs) between pigs raised in the tropical (n = 181) and temperate (n = 180) facilities, and 472 DEGs throughout a 3 weeks experimental heat stress. Transcriptome-wide association (TWAS) study identified 139 associations between gene expression levels and thermoregulation/production traits. We detected 6,014 expression quantitative trait loci (eQTLs) associated with the expression level of 3,297 genes. Genetic variance was estimated to explain 36.3% of gene expression variance on average, and was the main source of variance for 27.7% of transcripts. Most eQTLs found are located in proximal regions (cis-eQTLs) and few within distal regions (trans-eQTLs) to their assigned genes. A trans-eQTL hotspot highlighted a hematopoietic mechanism driven by GPATCH8 . An integration of GWAS and TWAS pointed to TMCO1 and ZNF184 as candidate genes for backfat thickness. Conclusions: This study provides a better understanding of the impact of climate, heat stress and genetic influences on the pig whole blood transcriptome.

ano.nymous@ccsd.cnrs.fr.invalid (Arthur Durante) 23 Mar 2026
https://hal.science/hal-05563066v1
[hal-05506052] CIP-Net: Continual Interpretable Prototype-based Network

Continual learning constrains models to learn new tasks over time without forgetting what they have already learned. A key challenge in this setting is catastrophic forgetting, where learning new information causes the model to lose its performance on previous tasks. Recently, explainable AI has been proposed as a promising way to better understand and reduce forgetting. In particular, self-explainable models are useful because they generate explanations during prediction, which can help preserve knowledge. However, most existing explainable approaches use post-hoc explanations or require additional memory for each new task, resulting in limited scalability. In this work, we introduce CIP-Net, an exemplar-free self-explainable prototype-based model designed for continual learning. CIP-Net avoids storing past examples and maintains a simple architecture, while still providing useful explanations and strong performance. We demonstrate that CIP-Net achieves state-of-the-art performances compared to previous exemplar-free and self-explainable methods in both task-and class-incremental settings, while bearing significantly lower memory-related overhead. This makes it a practical and interpretable solution for continual learning.

ano.nymous@ccsd.cnrs.fr.invalid (Federico Di Valerio) 11 Feb 2026
https://hal.science/hal-05506052v1
[hal-05592306] Establishing the ELIXIR Domestic Animals Genome and Phenome Community

The well-being of farmed and companion animals is increasingly recognised as integral to sustainable agroecosystems, companionship, and the One Health approach, which emphasises the interconnected health of people, animals, and the environment. The ELIXIR Domestic Animals Genome and Phenome (DAGP) Community supports genome-to-phenome analyses for farmed and companion animal species. Its aim is to coordinate, discuss, and explore the potential of data technology solutions to address key issues in animal welfare, behaviour, health, infectious diseases, metabolism, nutritional efficiency, and the preservation of genetic diversity and the environment. Through consolidating efforts to develop data standards, coordination, workflows, and visualisation, it will enhance the science underpinning rapidly growing fields in domestic animal genomics, including genome-enabled breeding, population genomics, pangenome analysis, functional genomics, genome editing, paleogenomics, phenotyping, and bio-banking. These standards will adhere to the FAIR data principles and leverage established ontologies to promote best practices in data coordination and archiving. This white paper, prepared by the ELIXIR DAGP Focus Group, summarises the current data infrastructure, resources, and tools available for domestic animal genomics and phenomics, and presents community-led plans and priorities to be implemented to meet the requirements of ELIXIR services and the animal science community. We describe how ELIXIR services can be applied in the domestic animal genomics and phenomics fields, and how we can connect projects and infrastructures that are active in the animal sciences domain. We also discuss three key priority areas: i) expanding the FAANG Data Portal for phenotype data with ELIXIR Data Platforms; ii) supporting submissions of new data types across ELIXIR Core Data Resources , including proprietary data from industry partners; and iii) strengthening connections to existing ELIXIR Communities and international consortia. This article provides a set of priorities for a Domestic Animals Genome and Phenome Community in ELIXIR and outlines the next steps to engage across stakeholders and to consolidate data for domestic animal science in Europe.

ano.nymous@ccsd.cnrs.fr.invalid (Emily Clark) 04 May 2026
https://hal.inrae.fr/hal-05592306v1
[hal-05558287] Standardizing plant damage datasets via EPPO taxonomy: A label harmonization approach using large language models

Pests and diseases threaten global crop yields, yet the absence of standardized plant-damage datasets limits progress toward general, robust diagnostic tools. Existing resources differ widely in label conventions and scope, hindering interoperability and model generalization. We introduce a fully automated method for harmonizing plant-damage labels across heterogeneous datasets by mapping them to the European and Mediterranean Plant Protection Organization (EPPO) taxonomy. The approach uses large-language-model (LLM) embeddings to capture semantic similarity among label terms, including synonyms, multilingual variants, and vernacular names. Across multiple mapping strategies, embedding-based similarity using OpenAI’s text-embedding-3-large provided the best performance, reaching an F1 score of 0.836 at optimal thresholds and outperforming string-based Levenshtein matching and other LLM baselines. Applying this method, we unified five expert-curated datasets, including the newly released ePhytia collection, yielding 79,808 images mapped to 1895 EPPO-aligned classes. To assess the value of this harmonization, we finetuned a generalist pretrained Vision Transformer for large-scale plant-damage identification. Models trained on LLM-aligned labels consistently surpassed those trained with edit-distance mappings. On independent EPPO test images, our best model achieved 19.4% top-1 accuracy across 1091 classes and 33.1% on the 100 most common classes, demonstrating feasibility at unprecedented label scale. In-dataset evaluation reached 55.8% top-1 accuracy. By grounding label harmonization in an international standard, this work delivers the first large-scale, taxonomy-compliant dataset for in-field plant-damage recognition and establishes a foundation for interoperable diagnostic tools, farmer-facing mobile systems, and plant-health monitoring. We release both the harmonized dataset and the new ePhytia images to support future research.

ano.nymous@ccsd.cnrs.fr.invalid (Jules Vandeputte) 19 Mar 2026
https://inria.hal.science/hal-05558287v1
[hal-05632018] STAX : un logiciel pour concevoir et explorer des alternatives socio-techniques cohérentes biophysiquement

Face à l'intensification des crises environnementales globales, il est crucial de concevoir des alternatives socio-techniques (AST) respectueuses des limites planétaires. Celles-ci reposent à la fois sur les technologies disponibles et sur les choix de société. Elles doivent également être cohérentes sur le plan biophysique, c'est-à-dire respecter les équilibres de matière. Des logiciels tels que STAN implantent des analyses de flux de matière et d'énergie (AFME) et sont utilisés à des fins de diagnostic en se focalisant sur la correction de données mesurées et l'estimation de flux manquants. Les outils de scénarisation sont eux généralement sectoriels (énergie, agriculture, etc.) et n'ont pas recours à de l'optimisation. Notre approche novatrice applique l'AFME à un usage prospectif multisectoriel (dit approche nexus) avec le développement d'un logiciel pour aider à la conception et l'exploration d'AST, nommé STAX (pour Socio-Technical Alternatives eXplorer). D'abord, l'utilisateur exprime des souhaits et des contraintes sur les productions et les consommations au sein d'un système productif. Puis, cela est converti en un problème d'optimisation sous contraintes et le solveur SCIP trouve des AST qui tranchent des compromis à faire dus à l'incompatibilité entre les souhaits. Outre les problèmes mathématiques d'optimisation, découlent de cela des questions d'explicabilité des résultats et d'interface d'utilisation pour accompagner la conception des AST. Cet outil peut être utilisé pour imaginer des changements socio-techniques (modes de production, pratiques de consommation, etc.) mais aussi évaluer la résilience d'un système face à des crises ou vulnérabilités diverses, comme le manque de disponibilité des ressources. La modélisation proposée est applicable à tout système avec des flux quantifiables (matière, énergie, main d'œuvre, etc.) et constitué de multiples modes de production et de consommation. Pour le moment, nous avons conçu plusieurs modèles jouets (jusqu'à une cinquantaine de variables) et reproduit les résultats d'un article sur des scénarios du secteur agroalimentaire.

ano.nymous@ccsd.cnrs.fr.invalid (Thibaut Coudroy) 25 May 2026
https://inria.hal.science/hal-05632018v1
[hal-05509112] How long-lived trees remember: Epigenetic memory and priming of drought and heat stress in meristems and embryos

Abstract With climate change accelerating the frequency and intensity of heat and drought events, forestry urgently needs strategies that enhance stress tolerance without relying solely on genetic improvement, which in trees requires decades. Priming, pre-exposing plants to mild stress or biological signals to reinforce future responses, offers a promising approach for long-lived species. Unlike annual model plants, trees experience multi-year stress cycles, making priming particularly relevant for forestry, restoration, and climate-adaptive management. Our research focuses on developmental windows and cell dividing tissues with high potential for epigenetic memory, somatic embryos and meristems, examined under water deficit, thermal stress, biochar amendment, and mycorrhizal symbiosis. Across experiments, we observe persistent molecular signatures lasting weeks to seasons, and in some cases trans-annual memory. In contrast to short-lived species where histone modifications dominate, trees often display stronger involvement of DNA methylation in these persistent states, consistent with our recent findings in maritime pine embryogenesis and poplar cambium (Trontin et al., 2025; Duplan et al., 2025; and ongoing work). More recently, we investigated how biochar and beneficial root symbioses interact with drought priming in poplar. These studies form the basis of long-term research frameworks and national programs, including EPIMYC (ANR-24-CE20-5751) and the PEPR Agroecology & Digital initiative (ANR-24-PEAE-0001). Ultimately, our goal is to integrate omics layers to build predictive models of priming responsiveness and epigenetic plasticity, enabling identification of biomarkers and management-ready diagnostic tools to guide climate-adaptive forestry. References 1. Trontin, J.F., Sow, M.D., Delaunay, A., Modesto, I., Teyssier, C., Reymond, I., Canlet, F., Boizot, N., Le Metté, C., Gibert, A., Chaparro, C., Daviaud, C., Tost, J., Miguel, C., Lelu-Walter, M.A., & Maury, S. 2025. Epigenetic memory of temperature sensed during somatic embryo maturation in 2-yr-old maritime pine trees. Plant Physiology, 197(2), kiae600. https://doi.org/10.1093/plphys/kiae600 2. Duplan, A., Feng, Y.Q., Laskar, G., Cai, B.D., Segura, V., Delaunay, A., Le Jan, I., Daviaud, C., Toumi, A., Laurans, F., Sow, M.D., Rogier, O., Poursat, P., Duruflé, H., Jorge, V., Sanchez, L., Cochard, H., Allona, I., Tost, J., Fichot, R., & Maury, S. 2025. Drought induced epigenetic memory in the cambium of poplar trees persists and primes future stress responses. bioRxiv 2025.10.14.681991. https://doi.org/10.1101/2025.10.14.681991

ano.nymous@ccsd.cnrs.fr.invalid (Stéphane Maury) 13 Feb 2026
https://hal.science/hal-05509112v1
[hal-05521725] Modeling breeding programs considering social behavior in large groups of farmed fish

<div><p>Breeding programs are essential in aquaculture, improving economically and environmentally important traits. In aquaculture systems, animals are raised in large groups, where social interactions are frequent and can influence individual performance. In these circumstances, indirect genetic effects can play an important role in the response to selection, and consequently, their effects on selection outcomes must be analyzed.</p><p>This study aimed to evaluate the implications of heterogeneous social interaction effects on fish breeding programs using stochastic simulations. We simulated a fish breeding program with 2000 selection candidates from 1000 families formed by a partial mating design of 100 males and 100 females. Social interactions were simulated, affected by the target phenotype and two latent-personality traits. We investigated how genetic gains and phenotypic variances are affected by the magnitude and direction of social interaction effects on the target phenotype, different selection strategies, and the genetic correlations between the target phenotype and personality traits. Our results showed that increased social interaction effects lead to greater phenotypic variability in the target trait. Under mass selection, the genetic means of personality traits change, and these changes depend on the strength and direction of genetic correlations between the focal and personality traits. Conversely, group selection did not increase phenotypic variability but reduced genetic gain for the focal trait compared to mass selection. Moreover, group selection did not alter the genetic means of personality traits. However, this approach increased the rate of inbreeding per generation, which could be mitigated by optimizing the number of families per group.</p><p>.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Gabriel Rovere) 21 Feb 2026
https://hal.science/hal-05521725v1
[hal-05500190] Epigenetic regulation of mycorrhizal symbioses: from plastic responses to transgenerational legacies

Mycorrhizal symbioses represent one of the most widespread and ecologically significant plant–microbe interactions, shaping plant nutrition, stress resilience, and ecosystem functioning. Beyond their role in nutrient exchange and systemic defense, growing evidence suggests that these symbioses also influence plant plasticity within and across generations through epigenetic regulation. These mechanisms operate throughout the mutualistic interaction, from fungal recognition and root colonization to symbiosis functioning, by regulating gene networks that control signaling, defense suppression, and nutrient exchange. By integrating environmental cues into potentially heritable gene regulatory states, epigenetic regulation fine‐tunes within‐generation responses and may also contribute to effects across generations, thereby influencing adaptation and resilience. The extent of mycorrhiza‐induced epigenetic inheritance likely depends on the host's reproductive strategy and lifespan. Clonal propagation and shorter‐lived hosts tend to preserve epigenetic marks, whereas sexual reproduction and longer‐lived species show partial resetting. This contrast shapes offspring performance, ecological interactions, and evolutionary trajectories. Here, we synthesize current knowledge on the epigenetic regulation of mycorrhizal symbioses, draw parallels with other plant–microorganism interactions (including plant–pathogens and plant–endophytes), highlight its role in within‐generation plasticity and propose a potential role across generations. We outline future research directions to disentangle the stability, ecological relevance, and evolutionary significance of mycorrhiza‐mediated epigenetic inheritance.

ano.nymous@ccsd.cnrs.fr.invalid (Gerson Beltrán-Torres) 24 Apr 2026
https://hal.inrae.fr/hal-05500190v1
[hal-05494492] Modelling and predicting soil microbial communities at large spatial scale based on metagenomic dimensionality reduction

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Emna Stambouli) 05 Feb 2026
https://inria.hal.science/hal-05494492v1
[hal-05572258] AgroEcoPhen

L’agroécologie repose sur la mobilisation de la diversité biologique afin d’améliorer la résilience des agrosystèmes et fournir des services écosystémiques. Sa mise en œuvre nécessite des outils pour évaluer performance et stabilité de ces agrosystèmes, en tenant compte des interactions biotiques et abiotiques. Les technologies émergentes (capteurs, IoT, IA) permettent de collecter des données à haute résolution pour mieux comprendre et prédire ces systèmes.

ano.nymous@ccsd.cnrs.fr.invalid (Tania Rougier) 30 Mar 2026
https://hal.inrae.fr/hal-05572258v1
[hal-05512364] The Agricultural Soil Digital Twin : A Key Tool for Agroecological Transition

<div><p>Effective protection requires better soil management, and better management begins with clarity : we need to bring within everyone's reach how soils function (Thorsøe et al., 2023).</p><p>• The Challenge : Modeling soil functioning is hindered by high spatial and temporal variability, non-linear interactions, and computational barriers (Ilić et al., 2025). • Problematic : How can Digital Twin frameworks accurately model complex soil dynamics to support robust agroecological decision-making while overcoming computational limitations?</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Aziz Hafsia) 23 Feb 2026
https://hal.science/hal-05512364v1
[hal-05496194] WAIT4 : Intelligence artificielle et nouvelles technologies pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis de la transition agroécologique - contribution au continuum numérique

Améliorer le bien-être animal est indispensable pour construire des systèmes alimentaires durables. Les agroéquipements (capteurs, caméras, automates) associés à l’intelligence artificielle (IA), peuvent permettre d’évaluer le bien-être des animaux et des troupeaux en temps réel. Ceci est particulièrement utile face aux défis posés par le changement climatique et les transitions agroécologiques des systèmes d’élevage, afin de disposer d’outils et méthodes pour anticiper les risques et agir efficacement.

ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 05 Feb 2026
https://hal.inrae.fr/hal-05496194v1
[hal-05514911] BReIF: une e-infrastructure pour accélérer l'utilisation de ressources biologiques diversifiées

La caractérisation des ressources génétiques génère des quantités massives de données de nature très diverses qu’il faut analyser, gérer, rendre réutilisable et intégrer pour les transformer en connaissances mobilisables.

ano.nymous@ccsd.cnrs.fr.invalid (Anne-Françoise Adam-Blondon) 17 Feb 2026
https://hal.inrae.fr/hal-05514911v1
[hal-04603038] Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm: how does it work and how can we improve it?

Deep learning models for plant species identification rely on large annotated datasets. The PlantNet system enables global data collection by allowing users to upload and annotate plant observations, leading to noisy labels due to diverse user skills. Achieving consensus is crucial for training, but the vast scale of collected data makes traditional label aggregation strategies challenging. Existing methods either retain all observations, resulting in noisy training data or selectively keep those with sufficient votes, discarding valuable information. Additionally, as many species are rarely observed, user expertise can not be evaluated as an inter-user agreement: otherwise, botanical experts would have a lower weight in the AI training step than the average user. Our proposed label aggregation strategy aims to cooperatively train plant identification AI models. This strategy estimates user expertise as a trust score per user based on their ability to identify plant species from crowdsourced data. The trust score is recursively estimated from correctly identified species given the current estimated labels. This interpretable score exploits botanical experts' knowledge and the heterogeneity of users. Subsequently, our strategy removes unreliable observations but retains those with limited trusted annotations, unlike other approaches. We evaluate PlantNet's strategy on a released large subset of the PlantNet database focused on European flora, comprising over 6M observations and 800K users. We demonstrate that estimating users' skills based on the diversity of their expertise enhances labeling performance. Our findings emphasize the synergy of human annotation and data filtering in improving AI performance for a refined dataset. We explore incorporating AI-based votes alongside human input. This can further enhance human-AI interactions to detect unreliable observations.

ano.nymous@ccsd.cnrs.fr.invalid (Tanguy Lefort) 06 Dec 2024
https://hal.science/hal-04603038v2
[hal-05511164] Des réseaux de neurones sur graphes auto-explicatifs basés sur la logique

Les graphes sont des structures complexes et non euclidiennes qui nécessitent des modèles spécialisés comme les réseaux de neurones sur graphes (Graph Neural Networks, GNNs) pour capturer efficacement les motifs relationnels associés à la variable de classe. Cette complexité intrinsèque rend particulièrement difficile l’explication des décisions prises par les GNNs. La plupart des méthodes actuelles d’intelligence artificielle explicable (XAI) appliquées aux GNNs se concentrent sur l’identification de nœuds influents ou l’extraction de sous-graphes pertinents, sans toutefois clarifier comment ces éléments contribuent réellement à la prédiction finale. Pour dépasser cette limite, les approches à base logique visent à dériver des règles explicites reflétant le raisonnement du modèle. Cependant, les méthodes logiques existantes demeurent majoritairement post-hoc et se limitent à la classification de graphes, laissant un manque important en matière d’architectures intrinsèquement explicables. Dans cet article, nous intégrons le raisonnement logique directement au sein du modèle d’apprentissage sur graphes. Nous introduisons LogiX-GIN, une nouvelle architecture de GNN auto- explicable qui incorpore des couches logiques afin de produire des règles logiques interprétables au cœur même du processus d’apprentissage. Contrairement aux approches post-hoc, LogiX-GIN fournit des explications transparentes, fidèles et cohérentes avec les calculs internes du modèle. Évalué sur plusieurs tâches basées sur des graphes, LogiX-GIN atteint des performances prédictives compétitives tout en explicitant son processus décisionnel. Ces travaux ont été acceptés à NeurIPS 2025

ano.nymous@ccsd.cnrs.fr.invalid (Alessio Ragno) 14 Feb 2026
https://hal.science/hal-05511164v1
[hal-05558414] Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

Information retrieval with compact binary codes, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pre-trained encoders that produce rich embeddings. We revisit classical, training-free hashing techniques-principal component analysis, random orthogonal projection, and threshold binarization-to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.

ano.nymous@ccsd.cnrs.fr.invalid (Ilyass Moummad) 18 Mar 2026
https://inria.hal.science/hal-05558414v1

Date de modification : 25 novembre 2025 | Date de création : 17 juillet 2025 | Rédaction : AgroEcoNum

Catégorie de cookie	Moyens de désactivation
Cookies analytiques et de performance	Realytics Google Analytics Spoteffects Optimizely
Cookies de ciblage ou publicitaires	DoubleClick Mediarithmics

Cookies obligatoires	Cookies fonctionnels	Cookies sociaux et publicitaires
Ces cookies sont nécessaires au bon fonctionnement du site, ils ne peuvent pas être désactivés. Ils nous sont utiles pour vous fournir une connexion sécuritaire et assurer la disponibilité a minima de notre site internet.	Ces cookies nous permettent d’analyser l’utilisation du site afin de pouvoir en mesurer et en améliorer la performance. Ils nous permettent par exemple de conserver vos informations de connexion et d’afficher de façon plus cohérente les différents modules de notre site.	Ces cookies sont utilisés par des agences de publicité (par exemple Google) et par des réseaux sociaux (par exemple LinkedIn et Facebook) et autorisent notamment le partage des pages sur les réseaux sociaux, la publication de commentaires, la diffusion (sur notre site ou non) de publicités adaptées à vos centres d’intérêt.
Sur nos CMS EZPublish, il s’agit des cookies sessions CAS et PHP et du cookie New Relic pour le monitoring (IP, délais de réponse). Ces cookies sont supprimés à la fin de la session (déconnexion ou fermeture du navigateur)	Sur nos CMS EZPublish, il s’agit du cookie XiTi pour la mesure d’audience. La société AT Internet est notre sous-traitant et conserve les informations (IP, date et heure de connexion, durée de connexion, pages consultées) 6 mois.	Sur nos CMS EZPublish, il n’y a pas de cookie de ce type.

Publications scientifiques

HAL : Dernières publications

[hal-05652708] Text-to-MDX: LLM-assisted generation of MDX queries from user questions

[hal-05230510] Seed Inference in Interacting Microbial Communities Using Combinatorial Optimization

[hal-05656546] A soil-type-specific stratified approach to bare soil mosaicking and SOC prediction from Sentinel-2 time series

[hal-05657073] A free time machine? Milking robots and transformations in the temporal regime of dairy farmers in France

[hal-05683977] Assessing the structure of DNA representation spaces using graph-based comparisons

[hal-05673352] EweAcT: Ewe behaviour aligned to accelerometer data for activity monitoring in extensive grazing systems.

[hal-05686843] Lossless compression of k-mer matrices enabling random row access

[hal-05665434] Horloges épigénétiques pour la longévité fonctionnelle des vaches laitières

[hal-05643765] DRL-Based Pose Control for Double-Ackermann Robots Under Actuation Uncertainties

[hal-05643469] OrthoViewer

[hal-05578828] Use of Surface Water and Ocean Topography (SWOT) observations to support Land Use/Land Cover (LULC) change products: the case of the pacific coast of Ecuador

[hal-05665454] Epigenetics clocks for functional longevity of dairy cows

[hal-05682255] Towards lucerne varieties used as living mulch for cereal crops in agroecological systems

[hal-05639710] Modelling soil microbial functions at large spatial scale based on metagenomic dimensionality reduction

[hal-05105798] Metabolic Flux Inference in a Cheese Microbial Community via comFI: a Biology-informed Approach for Time-resolved Multi-omics Integration

[hal-05630049] Automated mapping of metabolomic compounds onto metabolic networks using MetaNetMap

[hal-05630041] Automated mapping of metabolomic compounds onto metabolic networks using MetaNetMap

[hal-05613501] From Concept to Perspective: Digital Twins of Microbial Systems

[hal-05610014] Romane sheep divergently selected on residual feed intake: consequences on production and feeding behaviour traits

[hal-05604339] WP 2.3: Réduction de dimension et analyse de série temporelles multiomiques

[hal-05610911] Metagenome assembly and evaluation in taxonomically rich ecosystems

[hal-05604272] Generation of metabolomic-informed models of metabolism in complex microbial communities

[hal-05602030] Do farmers trust digital decision support tools on pesticide use?

[hal-05601510] Public and private support of the AgriTech : what rationale for what innovation pathway ? A multiple case study in France and Chile

[hal-05661789] Automated monitoring of the activity of lactating sows in conventional and organic systems from the use of AI with video images

[hal-05686261] Mapping the agricultural digital ecosystem: An analysis of the French AgTech market actors and dynamics

[hal-05667139] Reconstruction of Root System Architecture in 2D+t: benchmarking loss functions in deep learning reconstruction pipelines

[hal-05601671] Counting large flea beetle larvae using computer vision

[hal-05567423] Video of the presentation at LREC conference: EPOP: A benchmark corpus for Assessing NLP Models on Structured Information Extraction in Plant Health

[hal-05591524] LifeCLEF 2026 Teaser: AI Challenges for Biodiversity Understanding and Ecosystem Management

[hal-05593509] Integrating metagenome-scale metabolic modelling and metabolomics to identify biochemical interactions in Microcystis phycospheres

[hal-05613681] DNA methylation around transcription start sites is not globally associated with transcription in the grain of natural and synthetic hexaploid wheat

[hal-05584190] Diversité génétique mondiale du complexe Medicago sativa : implications pour l’amélioration variétale de la luzerne.

[hal-05635518] Platform Intelligence in Agriculture

[hal-05563066] Genetic and heat-stress related environmental influences on pig whole-blood gene expression levels

[hal-05506052] CIP-Net: Continual Interpretable Prototype-based Network

[hal-05592306] Establishing the ELIXIR Domestic Animals Genome and Phenome Community

[hal-05558287] Standardizing plant damage datasets via EPPO taxonomy: A label harmonization approach using large language models

[hal-05632018] STAX : un logiciel pour concevoir et explorer des alternatives socio-techniques cohérentes biophysiquement

[hal-05509112] How long-lived trees remember: Epigenetic memory and priming of drought and heat stress in meristems and embryos

[hal-05521725] Modeling breeding programs considering social behavior in large groups of farmed fish

[hal-05500190] Epigenetic regulation of mycorrhizal symbioses: from plastic responses to transgenerational legacies

[hal-05494492] Modelling and predicting soil microbial communities at large spatial scale based on metagenomic dimensionality reduction

[hal-05572258] AgroEcoPhen

[hal-05512364] The Agricultural Soil Digital Twin : A Key Tool for Agroecological Transition

[hal-05496194] WAIT4 : Intelligence artificielle et nouvelles technologies pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis de la transition agroécologique - contribution au continuum numérique

[hal-05514911] BReIF: une e-infrastructure pour accélérer l'utilisation de ressources biologiques diversifiées

[hal-04603038] Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm: how does it work and how can we improve it?

[hal-05511164] Des réseaux de neurones sur graphes auto-explicatifs basés sur la logique

[hal-05558414] Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models