French ressources (datasets & models) I developped to empower use cases in French
Loïck BOURDOIS PRO
AI & ML interests
👀
Organizations
FAT5
Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).
French NER
NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 185,000 downloads.
-
CATIE-AQ/Moderncamembert_3entities
Token Classification • 0.1B • Updated • 1 • 1 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 374 • 1 -
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 55 • 2 -
CATIE-AQ/NERmembert-large-3entities
Token Classification • 0.3B • Updated • 15 • 2
French paraphrase dataset
French prompts datasets
French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 90,000 downloads.
French think and toolcalling datasets
French VQA datasets
VQA datasets I cleaned with an image, a question and an answer.
Can be used to train VLMs.
French OCR datasets
Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer.
Can be used to train VLMs.
French table-to-text datasets
In 2021 before the release of LoRA, I was interested in Prefix-tuning, which I wanted to apply to French. So I had to translate table-to-text data
French Courses Translations
Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).
-
Running7
Free online AI courses in French
📚7French translations of five AI courses
-
lbourdois/en-fr-nyu-dl-course-corpus
Viewer • Updated • 3.13k • 63 • 1 -
Running4
SSM Blog Posts
📝4Blog posts about State Space Models (SSM)
-
Running3
Guide sur l'évaluation des LLM
⚖3Traduction du guide de Clémentine Fourrier
Breton packs
Breton ressources (datasets & models) I developped to empower use cases in Breton
French QA
QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 160,000 downloads.
French summarization datasets
French DPO and conversation datasets
French embedding datasets
French datasets to train embeddings models or evaluate them.
French caption datasets
Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer.
Can be used to train VLMs.
French retriever datasets
Datasets I cleaned with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 12 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 8 -
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 24 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 16
French audio datasets (pretraining)
Around 117K hours of audio in French for research purpose
French packs
French ressources (datasets & models) I developped to empower use cases in French
French Courses Translations
Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).
-
Running7
Free online AI courses in French
📚7French translations of five AI courses
-
lbourdois/en-fr-nyu-dl-course-corpus
Viewer • Updated • 3.13k • 63 • 1 -
Running4
SSM Blog Posts
📝4Blog posts about State Space Models (SSM)
-
Running3
Guide sur l'évaluation des LLM
⚖3Traduction du guide de Clémentine Fourrier
FAT5
Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).
Breton packs
Breton ressources (datasets & models) I developped to empower use cases in Breton
French NER
NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 185,000 downloads.
-
CATIE-AQ/Moderncamembert_3entities
Token Classification • 0.1B • Updated • 1 • 1 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 374 • 1 -
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 55 • 2 -
CATIE-AQ/NERmembert-large-3entities
Token Classification • 0.3B • Updated • 15 • 2
French QA
QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 160,000 downloads.
French paraphrase dataset
French summarization datasets
French prompts datasets
French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 90,000 downloads.
French DPO and conversation datasets
French think and toolcalling datasets
French embedding datasets
French datasets to train embeddings models or evaluate them.
French VQA datasets
VQA datasets I cleaned with an image, a question and an answer.
Can be used to train VLMs.
French caption datasets
Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer.
Can be used to train VLMs.
French OCR datasets
Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer.
Can be used to train VLMs.
French retriever datasets
Datasets I cleaned with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 12 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 8 -
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 24 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 16
French table-to-text datasets
In 2021 before the release of LoRA, I was interested in Prefix-tuning, which I wanted to apply to French. So I had to translate table-to-text data
French audio datasets (pretraining)
Around 117K hours of audio in French for research purpose