Loading Trained PyTorch Model into Research

Hello,

I am trying to import a trained model in PyTorch. It consists of a tokenizer and another model that is very large. However, I started with the tokenizer using the following: -

tokenizer = qb.Download("DropBox URL")

import torch
tokenizer_model = torch.load(tokenizer)

But I ended with the following error:-

ValueError                                Traceback (most recent call last)
<ipython-input-28-acfbb532fa2a> in <module>
      1 import torch
----> 2 tokenizer_model = torch.load(tokenizer)
/opt/miniconda3/lib/python3.6/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    582         pickle_load_args['encoding'] = 'utf-8'
    583 
--> 584     with _open_file_like(f, 'rb') as opened_file:
    585         if _is_zipfile(opened_file):
    586             with _open_zipfile_reader(f) as opened_zipfile:
/opt/miniconda3/lib/python3.6/site-packages/torch/serialization.py in _open_file_like(name_or_buffer, mode)
    232 def _open_file_like(name_or_buffer, mode):
    233     if _is_path(name_or_buffer):
--> 234         return _open_file(name_or_buffer, mode)
    235     else:
    236         if 'w' in mode:
/opt/miniconda3/lib/python3.6/site-packages/torch/serialization.py in __init__(self, name, mode)
    213 class _open_file(_opener):
    214     def __init__(self, name, mode):
--> 215         super(_open_file, self).__init__(open(name, mode))
    216 
    217     def __exit__(self, *args):
ValueError: embedded null byte
+ Expand
- Collapse

I found an old thread, but unfortunately the author did not share the exact solution he found.

Hi Adham

The original post author is using ByteIO to wrap a byte file:

from io import BytesIO
tokenizer_model = torch.load(BytesIO(tokenizer))

Best
Louis

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

TypeError Traceback (most recent call last)
<ipython-input-8-9c28ab9dcaf7> in <module>
1 import torch
----> 2 tokenizer_model = torch.load(BytesIO(tokenizer))
TypeError: a bytes-like object is required, not 'str'

Louis Szeto INVESTOR

QuantConnect | June 2022

Upvote

Adham Al-Harazi INVESTOR

July 2022

Thank you, Louis, for your answer. Unfortunately, I got another error:

The downloader did not download any file to my workspace. I refrained from using QuantConnect because they do not take machine learning seriously in algo trading. If I can not easily load models, then I think Quantconnect is still lagging far behind.

TypeError                                 Traceback (most recent call last)
<ipython-input-8-9c28ab9dcaf7> in <module>
      1 import torch
----> 2 tokenizer_model = torch.load(BytesIO(tokenizer))
TypeError: a bytes-like object is required, not 'str'

QuantConnect | July 2022

It seems like you didn't save your tokenizer in the format that torch.load takes. Please try to save your tokenizer in the form of bytes.

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

368,300 Quants.

VOTE FOR UPCOMING FEATURES

JOIN OUR Community MAILING LIST

IN THIS RESEARCH

PARTICIPANTS

Actions

Join QuantConnect for Free