putazon
/

SearchQueryNER-BERT

Token Classification

Generated from Trainer

Model card Files Files and versions

SearchQueryNER-BERT / README.md

putazon's picture

Update README.md

09813cd verified over 1 year ago

|

history blame contribute delete

2.58 kB

	---
	library_name: transformers
	license: mit
	base_model: bert-base-cased
	tags:
	- generated_from_trainer
	metrics:
	- precision
	- recall
	- f1
	- accuracy
	model-index:
	- name: searchqueryner-be
	results: []
	datasets:
	- putazon/searchqueryner-100k
	language:
	- en
	- es
	pipeline_tag: token-classification
	---

	# bert-finetuned-ner

	This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the [SearchQueryNER-100k](https://huggingface.co/datasets/putazon/searchqueryner-100k) dataset. It achieves the following results on the evaluation set:
	- Loss: 0.0005
	- Precision: 0.9999
	- Recall: 0.9999
	- F1: 0.9999
	- Accuracy: 0.9999

	## Model description

	This model has been fine-tuned for Named Entity Recognition (NER) tasks on search queries, making it particularly effective for understanding user intent and extracting structured entities from short texts. The training leveraged the SearchQueryNER-100k dataset, which contains 13 entity types.

	## Intended uses & limitations

	### Intended uses:
	- Extracting named entities such as locations, professions, and attributes from user search queries.
	- Optimizing search engines by improving query understanding.

	### Limitations:
	- The model may not generalize well to domains outside of search queries.

	## Training and evaluation data

	The training and evaluation data were sourced from the [SearchQueryNER-100k](https://huggingface.co/putazon/searchqueryner-100k) dataset. The dataset includes tokenized search queries annotated with 13 entity types, divided into training, validation, and test sets:
	- Training set: 102,931 examples
	- Validation set: 20,420 examples
	- Test set: 20,301 examples

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: ADAMW_TORCH with betas=(0.9,0.999), epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Precision \| Recall \| F1 \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:---------:\|:------:\|:------:\|:--------:\|
	\| 0.0011 \| 1.0 \| 12867 \| 0.0009 \| 0.9999 \| 0.9999 \| 0.9999 \| 0.9999 \|
	\| 0.002 \| 2.0 \| 25734 \| 0.0004 \| 0.9999 \| 0.9999 \| 0.9999 \| 0.9999 \|
	\| 0.0005 \| 3.0 \| 38601 \| 0.0005 \| 0.9999 \| 0.9999 \| 0.9999 \| 0.9999 \|

	### Framework versions

	- Transformers 4.48.1
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0