Posit AI Blog: safetensors 0.1.0

safetensors is a new, simple, fast and secure file format for storing tensors. The design and original implementation of the file format is led by Hugging Face and has been widely adopted in the popular ‘transformers’ framework. The safetensors R package is a pure R implementation that can read and write safetensor files.

The initial version (0.1.0) of safetensor is now on CRAN.

Motivation

The main motivation for safetensor in the Python community is security. As stated in the official documentation:

The main rationale for this crate is to remove the need to use pickle in PyTorch, which is used by default.

Pickle is considered an unsafe format because loading a Pickle file may lead to arbitrary code execution. For R users, this was not an issue with Torch, because the Pickle parser included with LibTorch only supports a subset of Pickle types that do not contain executable code.

However, the file format has additional advantages over other commonly used formats:

Lazy loading support: You can choose to read a subset of tensors stored in a file.
Zero copy: Reading a file requires no more memory than the file itself. (Technically the current R implementation makes a single copy, but this could be optimized if you really need it at some point.)
Simple: The file format implementation is simple and does not require complex dependencies. That is, it is a suitable format for exchanging tensors between ML frameworks and between different programming languages. For example, you can write a safetensors file in R and load it in Python, and vice versa.

It has additional advantages over other commonly used file formats in this space, and you can see a comparison table here.

format

The safetensors format is illustrated in the figure below. It’s basically a header file that contains some metadata and raw tensor buffers.

Basic usage

safetensor can be installed from CRAN using:

install.packages("safetensors")

You can then build a named list of torch tensors.

library(torch)
library(safetensors)

tensors <- list(
  x = torch_randn(10, 10),
  y = torch_ones(10, 10)
)

str(tensors)
#> List of 2
#>  $ x:Float (1:10, 1:10)
#>  $ y:Float (1:10, 1:10)

tmp <- tempfile()
safe_save_file(tensors, tmp)

You can pass additional metadata to the saved file by providing: metadata
A parameter containing a named list.

Reading the safetensors file is handled by: safe_load_fileAnd it returns a list of named tensors with: metadata Property containing the parsed file header.

tensors <- safe_load_file(tmp)
str(tensors)
#> List of 2
#>  $ x:Float (1:10, 1:10)
#>  $ y:Float (1:10, 1:10)
#>  - attr(*, "metadata")=List of 2
#>   ..$ x:List of 3
#>   .. ..$ shape       : int (1:2) 10 10
#>   .. ..$ dtype       : chr "F32"
#>   .. ..$ data_offsets: int (1:2) 0 400
#>   ..$ y:List of 3
#>   .. ..$ shape       : int (1:2) 10 10
#>   .. ..$ dtype       : chr "F32"
#>   .. ..$ data_offsets: int (1:2) 400 800
#>  - attr(*, "max_offset")= int 929

Currently safetensor only supports writing torch tensors, but we plan to add support for writing regular R arrays and tensorflow tensors in the future.

future direction

In the next version of Torch safetensors Used as a serialization format. That is, when calling torch_save() For models, tensor lists, or other supported types of objects torch_saveYou will get a valid safetensors file.

This is an improvement over the previous implementation for the following reasons:

It’s much faster. For midsize models, this is 10 times more. For large files it may be more. This also improves the performance of parallel data loaders by up to 30%.
This improves cross-language and cross-framework compatibility. You can train a model in R and use it in Python (and vice versa), or train a model in tensorflow and run it using Torch.

If you want to try it out, you can install the Torch development version using:

remotes::install_github("mlverse/torch")

Photo by Nick Fewings on Unsplash

recycle

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Illustrations reused from other sources do not fall under this license and can be recognized by the note “Illustration of…” in the caption.

Summons

To give attribution, please cite this work as follows:

Falbel (2023, June 15). Posit AI Blog: safetensors 0.1.0. Retrieved from https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/

BibTeX Quotes

@misc{safetensors,
  author = {Falbel, Daniel},
  title = {Posit AI Blog: safetensors 0.1.0},
  url = {https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/},
  year = {2023}
}

Posit AI Blog: safetensors 0.1.0

DJI Air 3S – Initial Setup (Includes Video)

Drone Rush 2018 State of the Industry Report

Review: BDI Patch G2 Antenna for DJI Goggles 2 – Antenna Tests and Results

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

The Jays scored four runs in the eighth to beat the Rays 6-3.

Jason Ryan ‘fell in love very quickly’ with All Blacks bolter Pacilio Tosi

Real Madrid and Brazil star Vincius Junior warns abusers to be fearful: ‘I will hold them accountable’ | soccer news

FOX Super 6 Contest: Chris ‘The Bear’ Fallica’s NFL Thanksgiving Picks

Euro 2024: France’s Deschamps defends Embape amid criticism

Our Picks

Kyrgyzstan President Visits Azerbaijan

Game Geeks News | The official gaming news channel for gamers

James Carville, Fox’s Hannity, Borders, Mass Shootings Spark Fierce Debate

Most Popular

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

Posit AI Blog: safetensors 0.1.0

Motivation

format

Basic usage

future direction

recycle

Summons

Related Posts