Brett Kuprel
|
08b158d580
|
updated readme
|
2022-06-30 16:50:04 -04:00 |
|
Brett Kuprel
|
b913b58353
|
pre converting params to torch allows mega to run in standard colab runtime
|
2022-06-30 14:54:08 -04:00 |
|
Brett Kuprel
|
b55bcba4c0
|
removed deepcopy, delete expendable parameters after use
|
2022-06-30 11:09:09 -04:00 |
|
Brett Kuprel
|
df9aa6f915
|
sort -> topk, prev_token_and_index -> prev_token, token_index
|
2022-06-30 09:04:11 -04:00 |
|
Brett Kuprel
|
17c96fe110
|
works with cuda
|
2022-06-28 21:28:36 -04:00 |
|
Brett Kuprel
|
9d6b6dcc92
|
previous commit broke flax model, fixed now
|
2022-06-28 12:54:58 -04:00 |
|
Brett Kuprel
|
5aa6fe49bf
|
use cuda if available
|
2022-06-28 12:47:11 -04:00 |
|
Brett Kuprel
|
c936d26102
|
back to linear attention
|
2022-06-27 13:19:03 -04:00 |
|
Brett Kuprel
|
018414a5c3
|
fixed relative imports
|
2022-06-27 12:43:47 -04:00 |
|