Question Answering
Transformers
Safetensors
English
llama
text-generation
rag
text-generation-inference
Instructions to use DISLab/Ext2Gen-8B-R2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DISLab/Ext2Gen-8B-R2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="DISLab/Ext2Gen-8B-R2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DISLab/Ext2Gen-8B-R2") model = AutoModelForCausalLM.from_pretrained("DISLab/Ext2Gen-8B-R2") - Notebooks
- Google Colab
- Kaggle
| base_model: | |
| - meta-llama/Llama-3.1-8B-Instruct | |
| language: | |
| - en | |
| license: apache-2.0 | |
| pipeline_tag: text-generation | |
| tags: | |
| - rag | |
| library_name: transformers | |
| ```markdown | |
| <div align="center"> | |
| <b style="font-size: 40px;">Ext2Gen-8B-R2</b> | |
| </div> | |
| Note: We are still working on this. | |
| Are you looking for a more robust and reliable generation model for RAG system? | |
| Here is a Ext2Gen-8B-R2 model that effectively mitigates hallucinations caused by retrieval noise and information overload. | |
| See the details in our paper [Link](https://arxiv.org/pdf/2503.04789) | |
| ### What is Ext2Gen-8B-R2? | |
| Ext2Gen-8B-R2 is built upon Llama3.2-8B-Instruct, incorporating preference-aligned fine-tuning through pairwise feedback learning. | |
| This training strategy enables the model to: | |
| - Extract highly relevant sentences from retrieved chunks before generating an answer. | |
| - Filter out irrelevant or misleading information, reducing hallucinations. | |
| - Align generation with human preferences by optimizing for faithfulness, completeness, and conciseness. | |
| ### Why does Ext2Gen-8B-R2 outperform standard RAG models? | |
| Standard RAG models often struggle due to: | |
| - Uncertain Placement – Relevant information may appear in unpredictable locations within retrieved chunks, making it difficult for LLMs to utilize it effectively. | |
| - Information Overload – The presence of irrelevant chunks can distract the model, leading to errors or hallucinations. | |
| - Lack of Alignment – Most generation models are not explicitly trained to prioritize relevant content over noise. | |
| ### Need a Faster Inference? | |
| Our Ext2Gen model writes the sentences related to the query first before generating the answer. So, it needs more latency before getting the answer. | |
| If you don't want to see the extracted sentences but want to directly see the answer with low latency, use its variant we call Gen-8B-R2. | |
| Link: https://huggingface.co/DISLab/Gen-8B-R2 | |
| This model skips the sentence extraction phase but remains its high robustness comparable to Ext2Gen-8B-R2. | |
| ### Recommended Prompt | |
| - query: the query to answer | |
| - chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"] | |
| ```python | |
| def prepare_sample_text(prompt): | |
| row_json = [{"role": "user", "content": prompt}] | |
| return tokenizer.apply_chat_template(row_json, tokenize=False) | |
| def format_prompt_template(query, chunk_list): | |
| chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)] | |
| chunk_list = ' | |
| '.join(chunk_list) | |
| prompt = ''' | |
| You are an expert assistant trained to extract essential sentences from document chunks and generate answers based on the extracted sentences. | |
| Your task is twofold: | |
| - Extraction: Identify sentences that contribute to constructing a precise and accurate response to the given query. | |
| - Generation: Formulate a concise and coherent answer based on the extracted sentences. | |
| ### Extraction Instruction: | |
| - A query will be provided for you to answer. | |
| - Extract only the sentences that contribute to forming an answer to the query. | |
| - Ensure that the extracted sentences are sufficient to derive a correct and complete answer. | |
| - If no relevant sentences are found in the provided chunks, return an empty list. | |
| ### Generation Instruction: | |
| - Use the extracted sentences to generate a well-formed answer to the query. | |
| - If no sentences are extracted, return "No Answer". | |
| ### Output Example: | |
| Extracted Sentences: | |
| - Sentence 1 | |
| - Sentence 2 | |
| Answer: Your Answer | |
| ### Query: | |
| %s | |
| ### Chunk List: | |
| %s | |
| ### Output: | |
| ''' % (query, chunk_list) | |
| return prompt.strip() | |
| prompt = format_prompt_template(query, noisy_chunks) | |
| prompt = prepare_sample_text(prompt) | |
| ``` | |
| Note that this prompt outputs both extracted relevant sentences and the answer to the query. | |
| The output follows a consistent format as seen in an example below. | |
| ``` | |
| Extracted Sentences: | |
| - The estimated number of deaths is 150-300,000, mainly Jews. | |
| Answer: The estimated number of deaths at Chelmno is 150-300,000, mainly Jews. | |
| ``` | |
| The number of extracted sentences vary depending on the QA. | |
| ### Recommended Generation Parameters | |
| ```python | |
| max_new_tokens=1024, # or 2048 | |
| do_sample=True, | |
| temperature=0.8, | |
| top_p=0.9, | |
| ``` | |
| ### Performance Benchmark | |
| Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems: | |
| * We conduct a QA task using RAG Systems on NQ, MS-MARCO, HotpotQA datasets. | |
| * The difference is the generation backbone: Llama3.1-8B-Instruct vs. Ext2Gen-8B-R2 | |
| See the results in the Figure below: | |
|  | |
| ``` |