Mistral-7B-v0.3


Find this model in the Mistral7B model summary


Mistral-7B-v0.3 Model Set Plots



Mistral-7B-v0.3 Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 32768 4096 8.0 3.439691 0.021360 4.773514 903
2 dense 14336 4096 3.5 4.043273 0.031153 0.900296 745
3 dense 14336 4096 3.5 2.376188 0.020049 1.426923 896
4 dense 14336 4096 3.5 3.215841 0.047801 0.640537 504
5 dense 4096 1024 4.0 1.445017 0.022430 1.670501 437 over-trained
6 dense 4096 4096 1.0 3.101084 0.039317 -0.366890 246
7 dense 4096 4096 1.0 1.422876 0.026146 2.530323 681 over-trained
8 dense 4096 1024 4.0 4.937602 0.031803 -3.395587 39
9 dense 4096 1024 4.0 4.453772 0.114017 -4.626263 173
10 dense 4096 4096 1.0 2.950313 0.049451 2.828152 42
11 dense 4096 4096 1.0 3.613301 0.024845 -0.094817 339
12 dense 14336 4096 3.5 3.410574 0.058448 -0.033928 712
13 dense 14336 4096 3.5 2.684022 0.019643 1.955310 944
14 dense 14336 4096 3.5 5.255667 0.022727 0.016751 123
15 dense 4096 1024 4.0 2.966090 0.042046 1.914610 59
16 dense 14336 4096 3.5 3.934177 0.030687 2.092816 618
17 dense 4096 1024 4.0 8.970260 0.080669 -10.249178 56 under-trained
18 dense 4096 4096 1.0 3.741508 0.027982 2.183285 162
19 dense 4096 4096 1.0 4.048027 0.018692 -0.349561 154
20 dense 14336 4096 3.5 9.458788 0.014893 -1.758467 96 under-trained
21 dense 14336 4096 3.5 4.383651 0.011062 0.081690 517
22 dense 4096 1024 4.0 5.689346 0.034497 1.541709 57
23 dense 4096 1024 4.0 3.826627 0.042951 1.707878 54
24 dense 4096 1024 4.0 6.896596 0.058204 -7.243250 72 under-trained
25 dense 14336 4096 3.5 4.351269 0.056191 -0.077309 595
26 dense 14336 4096 3.5 3.530574 0.021623 1.686158 843
27 dense 14336 4096 3.5 9.390052 0.021939 -1.670966 82 under-trained
28 dense 4096 4096 1.0 2.904661 0.028460 2.179457 230
29 dense 4096 4096 1.0 3.998026 0.027836 -0.685435 284
30 dense 4096 1024 4.0 3.946118 0.063416 0.832697 92
31 dense 14336 4096 3.5 7.629028 0.026391 -1.706420 285 under-trained
32 dense 14336 4096 3.5 3.221784 0.025834 2.097702 241
33 dense 4096 1024 4.0 6.130137 0.088687 -6.400593 114 under-trained
34 dense 14336 4096 3.5 5.752187 0.020042 -0.088977 174
35 dense 4096 4096 1.0 3.506533 0.041539 -1.031341 286
36 dense 4096 4096 1.0 3.305944 0.061627 1.878680 152
37 dense 14336 4096 3.5 4.586704 0.022562 -0.389507 445
38 dense 14336 4096 3.5 3.261441 0.014880 2.466750 943
39 dense 14336 4096 3.5 7.709814 0.012919 -1.288460 166 under-trained
40 dense 4096 1024 4.0 5.355175 0.039595 1.073932 34
41 dense 4096 4096 1.0 4.343221 0.033477 -1.859056 216
42 dense 4096 1024 4.0 8.452762 0.042463 -9.411120 52 under-trained
43 dense 4096 4096 1.0 3.069108 0.060139 1.671124 279
44 dense 4096 1024 4.0 5.721780 0.057173 -5.940264 104
45 dense 4096 4096 1.0 4.046693 0.032449 -1.944258 226
46 dense 4096 1024 4.0 5.140777 0.029681 1.540409 34
47 dense 14336 4096 3.5 6.860135 0.010782 -0.912425 243 under-trained
48 dense 14336 4096 3.5 4.330694 0.012678 0.453453 305
49 dense 14336 4096 3.5 3.773420 0.009847 2.618705 565
50 dense 4096 4096 1.0 3.834751 0.025031 2.276654 117
51 dense 4096 4096 1.0 3.798434 0.041996 -1.424972 251
52 dense 14336 4096 3.5 4.550889 0.025235 -0.026358 319
53 dense 14336 4096 3.5 3.368198 0.017642 2.541179 787
54 dense 14336 4096 3.5 6.742066 0.023191 -0.782236 164 under-trained
55 dense 4096 1024 4.0 2.742009 0.076440 0.663959 188
56 dense 4096 4096 1.0 2.960277 0.060553 1.593393 221
57 dense 4096 1024 4.0 6.356066 0.076445 -6.572292 111 under-trained
58 dense 4096 1024 4.0 3.641931 0.071560 -3.711667 272
59 dense 4096 4096 1.0 3.779916 0.043921 1.990470 87
60 dense 4096 4096 1.0 5.272579 0.035188 -2.190300 103
61 dense 4096 1024 4.0 3.824632 0.083362 0.932649 115
62 dense 14336 4096 3.5 3.776839 0.011373 2.760438 441
63 dense 14336 4096 3.5 4.110545 0.044200 0.209520 519
64 dense 14336 4096 3.5 6.214202 0.016149 -0.598768 208 under-trained
65 dense 4096 1024 4.0 4.156777 0.067939 0.546902 126
66 dense 14336 4096 3.5 5.020289 0.021525 -0.552012 346
67 dense 4096 4096 1.0 3.687727 0.052818 -1.717946 200
68 dense 4096 4096 1.0 3.433150 0.041439 1.439111 173
69 dense 4096 1024 4.0 4.422038 0.086436 -4.472052 211
70 dense 14336 4096 3.5 3.814492 0.047763 0.228252 454
71 dense 14336 4096 3.5 3.598865 0.014209 2.711060 509
72 dense 14336 4096 3.5 4.127773 0.033144 0.553038 328
73 dense 4096 4096 1.0 3.991987 0.034393 2.055630 104
74 dense 14336 4096 3.5 3.651038 0.020903 2.499783 449
75 dense 14336 4096 3.5 4.835164 0.029995 -0.655842 342
76 dense 4096 1024 4.0 3.363999 0.095006 0.758482 166
77 dense 4096 4096 1.0 4.253062 0.040709 -2.035195 170
78 dense 4096 1024 4.0 7.862064 0.050747 -8.185111 58 under-trained
79 dense 4096 1024 4.0 8.545806 0.076095 -8.071939 44 under-trained
80 dense 4096 4096 1.0 2.851191 0.068754 1.187577 295
81 dense 4096 4096 1.0 4.249892 0.041449 -1.758231 128
82 dense 14336 4096 3.5 3.649664 0.023123 2.606920 455
83 dense 14336 4096 3.5 4.689437 0.031519 -0.501337 331
84 dense 14336 4096 3.5 4.425697 0.034052 0.458457 162
85 dense 4096 1024 4.0 2.006401 0.115418 0.268366 422
86 dense 14336 4096 3.5 4.176957 0.031837 0.596667 254
87 dense 14336 4096 3.5 3.816688 0.022592 2.800351 266
88 dense 14336 4096 3.5 4.784038 0.030512 -0.555222 326
89 dense 4096 1024 4.0 2.236832 0.118925 0.294895 371
90 dense 4096 4096 1.0 3.602685 0.069715 -1.837793 267
91 dense 4096 4096 1.0 2.327816 0.080011 0.997922 590
92 dense 4096 1024 4.0 7.273821 0.073592 -7.722913 69 under-trained
93 dense 4096 1024 4.0 3.307850 0.080133 -3.154390 340
94 dense 4096 4096 1.0 2.940141 0.062375 1.512794 281
95 dense 4096 4096 1.0 3.475862 0.034807 -1.447786 247
96 dense 4096 1024 4.0 3.107529 0.092651 0.805541 166
97 dense 14336 4096 3.5 3.854443 0.023160 2.626047 440
98 dense 14336 4096 3.5 3.549286 0.029700 0.559854 575
99 dense 14336 4096 3.5 5.052470 0.031096 -0.486640 275
100 dense 14336 4096 3.5 4.362479 0.008943 0.486641 375
101 dense 14336 4096 3.5 3.768572 0.019393 2.974760 357
102 dense 14336 4096 3.5 4.667597 0.023927 -0.414372 388
103 dense 4096 1024 4.0 1.972677 0.113806 0.265696 483 over-trained
104 dense 4096 4096 1.0 3.320989 0.084970 -1.533702 385
105 dense 4096 4096 1.0 2.306769 0.075722 1.007436 580
106 dense 4096 1024 4.0 4.485369 0.091851 -4.225606 228
107 dense 4096 4096 1.0 4.448143 0.025339 -1.760768 201
108 dense 4096 4096 1.0 2.816306 0.050845 1.083486 250
109 dense 4096 1024 4.0 2.969253 0.089016 0.433138 166
110 dense 4096 1024 4.0 6.011310 0.114966 -6.368472 225 under-trained
111 dense 14336 4096 3.5 4.003503 0.024894 3.243418 287
112 dense 14336 4096 3.5 4.558735 0.009655 -0.117511 483
113 dense 14336 4096 3.5 5.523924 0.034983 -0.693776 198
114 dense 14336 4096 3.5 4.551069 0.013454 0.056622 539
115 dense 14336 4096 3.5 4.441042 0.017705 3.926295 209
116 dense 14336 4096 3.5 5.932999 0.026374 -0.622511 247
117 dense 4096 1024 4.0 2.730679 0.094831 0.553156 183
118 dense 4096 4096 1.0 3.854242 0.031918 -1.125493 152
119 dense 4096 4096 1.0 2.272734 0.071125 1.113580 670
120 dense 4096 1024 4.0 12.433383 0.122678 -13.778330 79 under-trained
121 dense 4096 1024 4.0 12.507503 0.094848 -11.760388 39 under-trained
122 dense 4096 4096 1.0 2.711433 0.073897 1.640352 455
123 dense 4096 4096 1.0 3.434442 0.053212 -1.117057 350
124 dense 4096 1024 4.0 5.476467 0.053400 1.464024 36
125 dense 14336 4096 3.5 6.013885 0.012102 -0.701702 276 under-trained
126 dense 14336 4096 3.5 4.825547 0.020503 4.039407 160
127 dense 14336 4096 3.5 4.661273 0.011071 -0.090645 508
128 dense 14336 4096 3.5 5.605436 0.020068 -0.294377 449
129 dense 14336 4096 3.5 4.655023 0.022175 4.194735 197
130 dense 14336 4096 3.5 6.014434 0.010381 -0.148483 301 under-trained
131 dense 4096 1024 4.0 4.653589 0.115443 0.614328 93
132 dense 4096 4096 1.0 5.556422 0.041396 -1.177458 99
133 dense 4096 4096 1.0 2.336881 0.069910 1.167769 587
134 dense 4096 1024 4.0 6.818255 0.105052 -6.202526 187 under-trained
135 dense 4096 1024 4.0 5.768012 0.046369 -5.222116 147
136 dense 4096 4096 1.0 2.778117 0.054474 1.422923 401
137 dense 4096 4096 1.0 3.611866 0.025013 -0.300446 435
138 dense 4096 1024 4.0 2.954016 0.062800 0.530940 191
139 dense 14336 4096 3.5 6.468907 0.013817 -0.072625 275 under-trained
140 dense 14336 4096 3.5 4.740639 0.016014 4.323735 282
141 dense 14336 4096 3.5 6.656887 0.031968 -0.343247 406 under-trained
142 dense 14336 4096 3.5 8.622144 0.034677 -1.011983 279 under-trained
143 dense 14336 4096 3.5 4.726893 0.013260 4.359782 338
144 dense 14336 4096 3.5 7.050711 0.020420 -0.120297 248 under-trained
145 dense 4096 1024 4.0 2.419563 0.095461 0.400433 364
146 dense 4096 4096 1.0 4.534020 0.046164 -1.073011 203
147 dense 4096 4096 1.0 2.539529 0.062039 1.386561 618
148 dense 4096 1024 4.0 10.525223 0.128138 -10.610727 97 under-trained
149 dense 4096 1024 4.0 4.478420 0.060171 -3.867411 239
150 dense 4096 4096 1.0 3.048791 0.057318 1.700103 431
151 dense 4096 4096 1.0 6.323711 0.035588 -1.862968 113 under-trained
152 dense 4096 1024 4.0 3.309801 0.056438 0.468460 192
153 dense 14336 4096 3.5 8.335261 0.034917 -0.328021 211 under-trained
154 dense 14336 4096 3.5 4.798878 0.012684 4.588008 337
155 dense 14336 4096 3.5 8.676747 0.039762 -1.624083 281 under-trained
156 dense 4096 1024 4.0 4.061670 0.084129 0.081684 124
157 dense 4096 4096 1.0 5.631450 0.027816 -1.253593 144
158 dense 4096 4096 1.0 4.397952 0.031525 2.293980 73
159 dense 4096 1024 4.0 4.035567 0.119707 -3.546850 271
160 dense 32768 4096 8.0 4.525132 0.022283 11.204455 776
161 dense 14336 4096 3.5 7.361288 0.029765 -0.960700 306 under-trained
162 dense 14336 4096 3.5 5.164596 0.017038 4.618811 293
163 dense 14336 4096 3.5 8.700556 0.037171 -0.287966 215 under-trained
164 dense 4096 1024 4.0 3.988380 0.124369 -3.376176 338
165 dense 4096 4096 1.0 4.227151 0.062852 -0.317089 330
166 dense 4096 1024 4.0 4.988803 0.027856 0.646427 66
167 dense 4096 4096 1.0 4.231646 0.018472 2.447569 61
168 dense 14336 4096 3.5 5.294459 0.027830 4.734334 338
169 dense 14336 4096 3.5 5.948973 0.033797 -0.568334 470
170 dense 14336 4096 3.5 9.588905 0.043266 -0.671629 189 under-trained
171 dense 14336 4096 3.5 4.386910 0.034253 0.351727 110
172 dense 14336 4096 3.5 5.477519 0.032066 5.135899 348
173 dense 14336 4096 3.5 9.904057 0.053231 -0.725712 211 under-trained
174 dense 4096 1024 4.0 4.094445 0.097112 0.213455 187
175 dense 4096 4096 1.0 5.547353 0.019908 -2.372551 171
176 dense 4096 4096 1.0 3.435330 0.031594 1.910522 360
177 dense 4096 1024 4.0 8.120464 0.027563 -6.309803 63 under-trained
178 dense 4096 1024 4.0 7.261025 0.037141 -5.736756 75 under-trained
179 dense 4096 4096 1.0 4.382994 0.029114 2.224100 62
180 dense 4096 4096 1.0 6.132524 0.022974 -2.171335 80 under-trained
181 dense 4096 1024 4.0 4.261343 0.081220 0.070481 129
182 dense 14336 4096 3.5 10.249573 0.060374 -0.233021 210 under-trained
183 dense 14336 4096 3.5 5.710936 0.037370 5.381742 350
184 dense 14336 4096 3.5 4.395275 0.031918 0.395109 673
185 dense 14336 4096 3.5 3.886555 0.030105 -0.002352 777
186 dense 14336 4096 3.5 5.635033 0.038968 5.226124 379
187 dense 14336 4096 3.5 9.993261 0.061209 0.176428 232 under-trained
188 dense 4096 1024 4.0 4.859032 0.040203 0.346957 49
189 dense 4096 4096 1.0 5.023180 0.026907 -0.172693 195
190 dense 4096 4096 1.0 4.323725 0.032562 2.375503 66
191 dense 4096 1024 4.0 4.375621 0.102659 -2.837404 237
192 dense 4096 1024 4.0 6.959399 0.030072 -4.019592 95 under-trained
193 dense 4096 4096 1.0 4.279085 0.012380 0.137129 204
194 dense 4096 1024 4.0 5.025122 0.035346 0.593956 113
195 dense 4096 4096 1.0 4.094442 0.028041 2.248830 203
196 dense 14336 4096 3.5 5.597756 0.027834 5.248540 384
197 dense 14336 4096 3.5 3.622915 0.040692 0.139940 992
198 dense 14336 4096 3.5 8.875806 0.068148 1.399287 307 under-trained
199 dense 14336 4096 3.5 4.339638 0.025435 -0.557137 659
200 dense 14336 4096 3.5 4.844458 0.030330 4.542901 596
201 dense 14336 4096 3.5 7.395066 0.054719 2.751915 390 under-trained
202 dense 4096 1024 4.0 5.013608 0.040972 0.319970 36
203 dense 4096 4096 1.0 5.694538 0.020105 -0.547037 172
204 dense 4096 4096 1.0 3.295777 0.031616 1.949401 311
205 dense 4096 1024 4.0 3.210245 0.062440 -1.387110 365
206 dense 4096 1024 4.0 7.167897 0.085298 -4.642945 139 under-trained
207 dense 4096 4096 1.0 6.225427 0.045399 -0.612670 196 under-trained
208 dense 4096 1024 4.0 3.887847 0.031238 0.144987 64
209 dense 4096 4096 1.0 3.177789 0.023151 1.843093 364
210 dense 14336 4096 3.5 3.857567 0.034735 3.381674 149
211 dense 14336 4096 3.5 7.299393 0.063053 -0.351030 263 under-trained
212 dense 14336 4096 3.5 5.999558 0.044814 3.138129 470
213 dense 4096 1024 4.0 3.311706 0.058629 -0.988749 310
214 dense 4096 4096 1.0 4.834113 0.018104 1.274444 225
215 dense 4096 1024 4.0 4.436075 0.041490 0.206186 43
216 dense 4096 4096 1.0 3.896616 0.027533 2.460491 47
217 dense 14336 4096 3.5 3.667118 0.023620 3.077888 134
218 dense 14336 4096 3.5 7.004188 0.025589 0.639111 156 under-trained
219 dense 14336 4096 3.5 5.240625 0.045162 4.425877 551
220 dense 4096 4096 1.0 2.591679 0.041240 1.743805 508
221 dense 14336 4096 3.5 5.743561 0.034432 2.790759 361
222 dense 14336 4096 3.5 3.484226 0.024058 2.942706 99
223 dense 14336 4096 3.5 4.159006 0.028677 5.412313 638
224 dense 4096 1024 4.0 3.788307 0.041183 0.283001 60
225 dense 4096 4096 1.0 4.377279 0.064507 4.120290 186
226 dense 4096 1024 4.0 2.790443 0.046039 -0.747621 440