13 1

Sam Horradarn

sirahd

AI & ML interests

None yet

Recent Activity

published a model 3 months ago

sirahd/dwreewrew

published a model 3 months ago

sirahd/testtesttest

upvoted an article 3 months ago

Building for an Open Future - our new partnership with Google Cloud

View all activity

Organizations

published 2 models 3 months ago

sirahd/dwreewrew

Updated Dec 11, 2025

sirahd/testtesttest

Updated Dec 11, 2025

upvoted an article 3 months ago

Article

Building for an Open Future - our new partnership with Google Cloud

Nov 13, 2025

•

upvoted an article 4 months ago

Article

On the Shifting Global Compute Landscape

Oct 29, 2025

•

upvoted 2 articles 6 months ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

Sep 4, 2025

•

273

Article

Building Tensors from Scratch in Rust (Part 1.3): Data Operations

Jul 31, 2025

•

upvoted an article 7 months ago

Article

Parquet Content-Defined Chunking

Jul 25, 2025

•

liked a Space 7 months ago

Ready Xet Go

🚀

Track migration progress and visualize data

published a model 7 months ago

sirahd/test-xet-eu

Updated Jul 18, 2025

upvoted an article 7 months ago

Article

Migrating the Hub from Git LFS to Xet

Jul 15, 2025

•

published an article 7 months ago

Article

Migrating the Hub from Git LFS to Xet

Jul 15, 2025

•

upvoted a changelog 9 months ago

Changelog

Xet is now the default storage option for new users and organizations

May 23, 2025

• 76

upvoted an article 11 months ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face

Apr 5, 2025

•

148

upvoted a collection 11 months ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29, 2025 • 695

updated a model 11 months ago

xet-team/SmolVLM-256M-Instruct-test

Image-Text-to-Text • 0.3B • Updated Mar 26, 2025 • 2

upvoted an article 11 months ago

Article

Xet is on the Hub

Mar 18, 2025

•

published an article 11 months ago

Article

Xet is on the Hub

Mar 18, 2025

•

updated a model about 1 year ago

sirahd/test-xet-migration-2

Updated Feb 20, 2025

published a model about 1 year ago

sirahd/test-xet-migration-2

Updated Feb 20, 2025

commented on From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub about 1 year ago

How can we find the chunk content using chunk hash?

Chunk hash is calculated via content-defined chunking (CDC), which means that if two chunks have the same content they will share the same hash. CDC removes the need to store the mapping between chunk hash -> chunk content because we know if two chunks share the same hash, they will have identical content.

The CAS system only stores "block_hash -> block_content", Where does the map of chunk to block?

This is explained in the "key chunks" section in the blog post above. Essentially we only store a tiny subset of chunk -> block by leveraging spatial locality in the file. Trying to store every mapping of chunk -> block can get impractical very quickly.

what does the shards store? Is it "file_name, shard_id, chunk_hash, block_hash"

You can think of the shards as storing mappings between file (identified via file hash) to list of chunks that make up the file.

I hope this help explains our underlying tech better!

Sam Horradarn

AI & ML interests

Recent Activity

Organizations

sirahd's activity

Building for an Open Future - our new partnership with Google Cloud

On the Shifting Global Compute Landscape

Welcome EmbeddingGemma, Google's new efficient embedding model

Building Tensors from Scratch in Rust (Part 1.3): Data Operations

Parquet Content-Defined Chunking

Ready Xet Go

Migrating the Hub from Git LFS to Xet

Migrating the Hub from Git LFS to Xet

Xet is now the default storage option for new users and organizations

Welcome Llama 4 Maverick & Scout on Hugging Face

Xet is on the Hub

Xet is on the Hub