| --- |
| license: mit |
| language: |
| - code |
| tags: |
| - transformer |
| - code-translation |
| - xlcost |
| - from-scratch |
| --- |
| |
| # C++ → Python Transformer (16.4M params) |
|
|
| Encoder-decoder transformer trained from scratch for C++ → Python code translation. Trained on XLCoST on a single GTX 1650 4 GB GPU. Best checkpoint at epoch 19, val_loss **2.0474**. |
| |
| ## Architecture |
| |
| - 4 encoder + 4 decoder layers, pre-norm |
| - d_model 256, 8 heads, d_ff 512 |
| - Sinusoidal positional encoding |
| - Greedy decoding at inference |
| - 16.4M parameters |
| |
| ## Files |
| |
| - `best_model.pt` — full PyTorch checkpoint (model state, optimizer state, src/tgt vocabularies). 189 MB. |
|
|
| ## Load |
|
|
| ```python |
| from model import build_transformer |
| import torch |
| ckpt = torch.load('best_model.pt', map_location='cpu') |
| model = build_transformer( |
| src_vocab_size=len(ckpt['src_vocab']), |
| tgt_vocab_size=len(ckpt['tgt_vocab']), |
| src_seq_len=300, tgt_seq_len=300, |
| d_model=256, N=4, h=8, dropout=0.0, d_ff=512, |
| ) |
| model.load_state_dict(ckpt['model_state']) |
| model.eval() |
| ``` |
|
|
| Full training and inference code: [github.com/debtirthasaha/cpp-to-python-transformer](https://github.com/debtirthasaha/cpp-to-python-transformer) |
|
|
| ## Writeup |
|
|
| [A transformer that reads C++ and writes Python](https://debtirthasaha.github.io/blog/2026/cpp-to-python-transformer/) |
|
|