Yes the model is exactly what you mentioned, but it is the quantized version of that. The context length is 128k indeed for these family of models.

If I misunderstood your question, please elaborate.

Karan Kaul | カラン
Karan Kaul | カラン

Written by Karan Kaul | カラン

Writes about Machine Learning/Data Science. Say Hi on LinkedIn - https://www.linkedin.com/in/krnk97/

No responses yet