Brett Kuprel
|
7bf76deafb
|
fixed wrong file path
|
2022-07-01 10:58:29 -04:00 |
|
Brett Kuprel
|
e4c2be54cb
|
save converted detokenizer params
|
2022-07-01 10:17:29 -04:00 |
|
Brett Kuprel
|
b40fd83a0d
|
mega works with latest flax version 0.5.2 now, removing 0.4.2 pin
|
2022-07-01 02:58:43 -04:00 |
|
Brett Kuprel
|
08b158d580
|
updated readme
|
2022-06-30 16:50:04 -04:00 |
|
Brett Kuprel
|
2311a1af7b
|
delete cache
|
2022-06-30 15:48:20 -04:00 |
|
Brett Kuprel
|
b913b58353
|
pre converting params to torch allows mega to run in standard colab runtime
|
2022-06-30 14:54:08 -04:00 |
|
Brett Kuprel
|
c2a3858c96
|
delete params sooner
|
2022-06-30 11:44:36 -04:00 |
|
Brett Kuprel
|
f951424e38
|
is_reusable
|
2022-06-30 11:25:24 -04:00 |
|
Brett Kuprel
|
b55bcba4c0
|
removed deepcopy, delete expendable parameters after use
|
2022-06-30 11:09:09 -04:00 |
|
Brett Kuprel
|
41a44068d0
|
keep params in expendable mode
|
2022-06-30 09:36:32 -04:00 |
|
Brett Kuprel
|
df9aa6f915
|
sort -> topk, prev_token_and_index -> prev_token, token_index
|
2022-06-30 09:04:11 -04:00 |
|
Brett Kuprel
|
fb97ba5e20
|
update readme, cleanup
|
2022-06-30 07:41:31 -04:00 |
|
Brett Kuprel
|
1e18ba0ffa
|
is_expendable argument reduces memory usage for command line script
|
2022-06-30 06:43:10 -04:00 |
|
Brett Kuprel
|
d99828a239
|
simplified flax attention and matched torch attention
|
2022-06-29 14:56:28 -04:00 |
|
Brett Kuprel
|
61cc99c13c
|
read tokenizer files with utf8 encoding
|
2022-06-29 14:18:23 -04:00 |
|
Brett Kuprel
|
661ec976ac
|
simplified attention for torch model
|
2022-06-29 13:48:12 -04:00 |
|
Brett Kuprel
|
ed91ab4a30
|
refactored to load models once and run multiple times
|
2022-06-29 09:42:12 -04:00 |
|
Adam Novak
|
28c812c832
|
Use all logical cores in Torch mode
|
2022-06-28 22:26:51 -04:00 |
|
Brett Kuprel
|
1fbb209623
|
fixed bug with cuda in detokenizer
|
2022-06-28 22:02:35 -04:00 |
|
Brett Kuprel
|
764b0bc685
|
cuda in detokenizer from previous commit broke colab flax model, fixed
|
2022-06-28 21:36:48 -04:00 |
|
Brett Kuprel
|
17c96fe110
|
works with cuda
|
2022-06-28 21:28:36 -04:00 |
|
Brett Kuprel
|
9d6b6dcc92
|
previous commit broke flax model, fixed now
|
2022-06-28 12:54:58 -04:00 |
|
Brett Kuprel
|
5aa6fe49bf
|
use cuda if available
|
2022-06-28 12:47:11 -04:00 |
|
Brett Kuprel
|
8544f59576
|
use cuda if available
|
2022-06-28 12:38:31 -04:00 |
|
Brett Kuprel
|
aef24ea157
|
torch.no_grad(), cleanup
|
2022-06-28 12:16:44 -04:00 |
|
Brett Kuprel
|
34df2b97df
|
previous commit broke colab example, so adjusting flax requirement to 0.4.2 for now
|
2022-06-28 08:04:08 -04:00 |
|
Brett Kuprel
|
38ebe54a38
|
works with latest flax version 0.5.2 now
|
2022-06-28 07:12:29 -04:00 |
|
Brett Kuprel
|
a014dccc05
|
fixed an issue with argument parser
|
2022-06-27 16:49:42 -04:00 |
|
Brett Kuprel
|
e7001f063c
|
simplified
|
2022-06-27 15:46:04 -04:00 |
|
Brett Kuprel
|
18e6a9852f
|
license and cleanup
|
2022-06-27 14:34:10 -04:00 |
|
Brett Kuprel
|
c936d26102
|
back to linear attention
|
2022-06-27 13:19:03 -04:00 |
|
Brett Kuprel
|
018414a5c3
|
fixed relative imports
|
2022-06-27 12:43:47 -04:00 |
|