AI & ML interests

Create NLP models and datasets applied to French, to very long sequences and the combination of the two ;)

Recent Activity

bourdoiscatie  updated a collection about 1 month ago
NanoBEIR-fr 🍺
bourdoiscatie  updated a dataset about 1 month ago
CATIE-AQ/NanoBEIR-fr
bourdoiscatie  published a dataset about 1 month ago
CATIE-AQ/NanoBEIR-fr
View all activity

CATIE-AQ 's collections 21

CATIE French sparse embedding
A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models
French caption datasets
Datasets with an image, a prompt question (like "describe this image") and an answer Can be used to train VLMs.
CATIE French DPO and conversation datasets
By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.
French visual retriever datasets
Datasets with an image and a question. Can be used to train visual retrievers (ColPali and co.).
CATIE French NER pack
CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo
CATIE French sparse embedding
A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models
CATIE French DPO and conversation datasets
By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.
French caption datasets
Datasets with an image, a prompt question (like "describe this image") and an answer Can be used to train VLMs.
French visual retriever datasets
Datasets with an image and a question. Can be used to train visual retrievers (ColPali and co.).
CATIE French NER pack
CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo