Instructions to use hyperquest/atom-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hyperquest/atom-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="hyperquest/atom-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("hyperquest/atom-classifier") model = AutoModelForTokenClassification.from_pretrained("hyperquest/atom-classifier") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - multilingual | |
| - en | |
| - de | |
| - fr | |
| - es | |
| - pt | |
| - nl | |
| base_model: distilbert-base-multilingual-cased | |
| tags: | |
| - token-classification | |
| - semantic-parsing | |
| - hypergraph | |
| - nlp | |
| pipeline_tag: token-classification | |
| library_name: transformers | |
| # Atom Classifier | |
| A multilingual token classifier for **semantic hypergraph parsing**. It classifies each token in a sentence into one of 39 semantic atom types/subtypes, serving as the first stage (alpha) of the [Alpha-Beta semantic hypergraph parser](https://github.com/hyperquest-hq/hyperbase-parser-ab). | |
| ## Model Details | |
| - **Architecture:** DistilBertForTokenClassification | |
| - **Base model:** distilbert-base-multilingual-cased | |
| - **Labels:** 39 semantic atom types | |
| - **Max sequence length:** 512 | |
| ## Label Taxonomy | |
| Atoms are typed according to the [Semantic Hyperedge (SH) notation system](https://hyperquest.ai/hyperbase/manual/notation/). The 7 main types and their subtypes: | |
| ### Concepts (C) | |
| | Label | Description | | |
| |-------|-------------| | |
| | `C` | Generic concept | | |
| | `Cc` | Common noun | | |
| | `Cp` | Proper noun | | |
| | `Ca` | Adjective (as concept) | | |
| | `Ci` | Pronoun | | |
| | `Cd` | Determiner (as concept) | | |
| | `Cm` | Nominal modifier | | |
| | `Cw` | Interrogative word | | |
| | `C#` | Number | | |
| ### Predicates (P) | |
| | Label | Description | | |
| |-------|-------------| | |
| | `P` | Generic predicate | | |
| | `Pd` | Declarative predicate | | |
| | `P!` | Imperative predicate | | |
| ### Modifiers (M) | |
| | Label | Description | | |
| |-------|-------------| | |
| | `M` | Generic modifier | | |
| | `Ma` | Adjective modifier | | |
| | `Mc` | Conceptual modifier | | |
| | `Md` | Determiner modifier | | |
| | `Me` | Adverbial modifier | | |
| | `Mi` | Infinitive particle | | |
| | `Mj` | Conjunctional modifier | | |
| | `Ml` | Particle | | |
| | `Mm` | Modal (auxiliary verb) | | |
| | `Mn` | Negation | | |
| | `Mp` | Possessive modifier | | |
| | `Ms` | Superlative modifier | | |
| | `Mt` | Prepositional modifier | | |
| | `Mv` | Verbal modifier | | |
| | `Mw` | Specifier | | |
| | `M#` | Number modifier | | |
| | `M=` | Comparative modifier | | |
| | `M^` | Degree modifier | | |
| ### Builders (B) | |
| | Label | Description | | |
| |-------|-------------| | |
| | `B` | Generic builder | | |
| | `Bp` | Possessive builder | | |
| | `Br` | Relational builder (preposition) | | |
| ### Triggers (T) | |
| | Label | Description | | |
| |-------|-------------| | |
| | `T` | Generic trigger | | |
| | `Tt` | Temporal trigger | | |
| | `Tv` | Verbal trigger | | |
| ### Conjunctions (J) | |
| | Label | Description | | |
| |-------|-------------| | |
| | `J` | Generic conjunction | | |
| | `Jr` | Relational conjunction | | |
| ### Special | |
| | Label | Description | | |
| |-------|-------------| | |
| | `X` | Excluded token (punctuation, etc.) | | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForTokenClassification | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("hyperquest/atom-classifier") | |
| model = AutoModelForTokenClassification.from_pretrained("hyperquest/atom-classifier") | |
| sentence = "Berlin is the capital of Germany." | |
| encoded = tokenizer(sentence, return_tensors="pt", return_offsets_mapping=True) | |
| offset_mapping = encoded.pop("offset_mapping") | |
| with torch.no_grad(): | |
| outputs = model(**encoded) | |
| predictions = outputs.logits.argmax(-1)[0].tolist() | |
| word_ids = encoded.word_ids(0) | |
| for idx, word_id in enumerate(word_ids): | |
| if word_id is not None: | |
| start, end = offset_mapping[0][idx].tolist() | |
| label = model.config.id2label[predictions[idx]] | |
| print(f"{sentence[start:end]:15s} -> {label}") | |
| ``` | |
| ## Intended Use | |
| This model is designed to be used as the first stage of the Alpha-Beta semantic hypergraph parser (`hyperbase-parser-ab`). It assigns atom types to tokens, which are then combined into nested hypergraph structures by rule-based grammar in the beta stage. | |
| ## Part of | |
| - [hyperbase](https://github.com/hyperquest-hq/hyperbase) -- Semantic Hypergraph toolkit | |
| - [hyperbase-parser-ab](https://github.com/hyperquest-hq/hyperbase-parser-ab) -- Alpha-Beta parser | |