Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2025-03-13 | 32.6 kB | |
v0.1.3 source code.tar.gz | 2025-03-13 | 570.2 kB | |
v0.1.3 source code.zip | 2025-03-13 | 681.0 kB | |
Totals: 3 Items | 1.3 MB | 0 |
- Support for PaliGemma 2 and Gemma 3.
- Major update to MatMul and MatMul-using operations; significant performance increases in multiple parts of the codebase.
- Codebase simplifications and refactors in many areas.
- Bugfixes
What's Changed
- Add more ops: Sigmoid, (Two)MatVecAdd. Faster TwoMatVec. by @veluca93 in https://github.com/google/gemma.cpp/pull/129
- Improve weight handling. by @veluca93 in https://github.com/google/gemma.cpp/pull/130
- Remove unused includes by @copybara-service in https://github.com/google/gemma.cpp/pull/132
- Add a benchmark and additional tests. by @veluca93 in https://github.com/google/gemma.cpp/pull/131
- Adding Griffin implementation. by @pculliton in https://github.com/google/gemma.cpp/pull/136
- Change
NumGemmaLayers
andNumGriffinLayers
to constants in configs by @ufownl in https://github.com/google/gemma.cpp/pull/139 - Mention Makefile contributed by @jart by @copybara-service in https://github.com/google/gemma.cpp/pull/141
- Refactor data structures to reduce memory usage by @ufownl in https://github.com/google/gemma.cpp/pull/142
- Added functionality of storing layers activations output. by @atorero in https://github.com/google/gemma.cpp/pull/145
- Further improve IO, enable multiple backends without -D. by @copybara-service in https://github.com/google/gemma.cpp/pull/148
- Use lambda to split function and Make stream_token can break prefill by @zeerd in https://github.com/google/gemma.cpp/pull/156
- Simplify prefill early-exit (originally Merge [#156]) by @copybara-service in https://github.com/google/gemma.cpp/pull/158
- Fix underflow in NUQ ClusterCost() by @copybara-service in https://github.com/google/gemma.cpp/pull/162
- Add error-checking for py binding, add missing include+hwasan check by @copybara-service in https://github.com/google/gemma.cpp/pull/163
- Simplify threading: remove the use of inner_pool. by @szabadka in https://github.com/google/gemma.cpp/pull/167
- Use more parallelism in the QKV projections in MQA mode. by @szabadka in https://github.com/google/gemma.cpp/pull/170
- Fix kv offset computation for MHA config. by @szabadka in https://github.com/google/gemma.cpp/pull/172
- Use more parallelism in the final output of the attention block. by @szabadka in https://github.com/google/gemma.cpp/pull/175
- Use more parallelism in the QKV projections of the MHA block. by @szabadka in https://github.com/google/gemma.cpp/pull/176
- Factor out deinterleaving of bf16 vectors for MatVecs. by @samkaufman in https://github.com/google/gemma.cpp/pull/166
- Use more parallelism in attention block in prefill mode. by @szabadka in https://github.com/google/gemma.cpp/pull/177
- work with cmake install by @xinpingwang in https://github.com/google/gemma.cpp/pull/169
- 2x speedup of SFP decode (1.4x overall) on AVX3_DL+. by @copybara-service in https://github.com/google/gemma.cpp/pull/178
- Support additional scaling by @copybara-service in https://github.com/google/gemma.cpp/pull/181
- Store tokens/sec in auxiliary struct TimingInfo. by @copybara-service in https://github.com/google/gemma.cpp/pull/183
- Add TTFT to TimingInfo by @copybara-service in https://github.com/google/gemma.cpp/pull/186
- Make BlobWriter::Add() accept const void* by @copybara-service in https://github.com/google/gemma.cpp/pull/188
- Adds Kaggle testing to CI workflow by @pculliton in https://github.com/google/gemma.cpp/pull/189
- Fix normalization in Softmax function. by @szabadka in https://github.com/google/gemma.cpp/pull/194
- Clarified README by @zond in https://github.com/google/gemma.cpp/pull/137
- Unrolled / tiled 4x4 MatMul by @copybara-service in https://github.com/google/gemma.cpp/pull/199
- Refactor GemmaImpl dispatch to use Highway 1.2's HWY_DYNAMIC_DISPATCH_T by @copybara-service in https://github.com/google/gemma.cpp/pull/202
- Add first version of backpropagation support. by @szabadka in https://github.com/google/gemma.cpp/pull/203
- Fix for GenerateZeroMat call in TestTiledMatMul by @copybara-service in https://github.com/google/gemma.cpp/pull/206
- Remove no longer required stats.h - use Highway version instead by @copybara-service in https://github.com/google/gemma.cpp/pull/208
- Simplifications: remove GemmaInterface and GemmaImpl by @copybara-service in https://github.com/google/gemma.cpp/pull/209
- Implement mixed mode matmul: f32 * bf16 by @copybara-service in https://github.com/google/gemma.cpp/pull/210
- Fix Softmax on SVE by @copybara-service in https://github.com/google/gemma.cpp/pull/213
- Fix fix for weight type define, refs [#198] by @copybara-service in https://github.com/google/gemma.cpp/pull/216
- Add Adam optimizer. by @szabadka in https://github.com/google/gemma.cpp/pull/212
- Add support for custom sampling function to runtime config. by @szabadka in https://github.com/google/gemma.cpp/pull/217
- Shifting large matrix init to heap in ops_test.cc by @copybara-service in https://github.com/google/gemma.cpp/pull/220
- Add CPU output, error if not C++17, simplify tokenizer ctor by @copybara-service in https://github.com/google/gemma.cpp/pull/222
- Use CompressedWeights<TConfig\<float>> in backpropagation. by @szabadka in https://github.com/google/gemma.cpp/pull/224
- Update benchmark with internal init by @copybara-service in https://github.com/google/gemma.cpp/pull/225
- Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc by @copybara-service in https://github.com/google/gemma.cpp/pull/227
- Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix. by @copybara-service in https://github.com/google/gemma.cpp/pull/231
- Add benchmark dependency to cmake build. by @szabadka in https://github.com/google/gemma.cpp/pull/234
- Fix numerical issue in Softcap by subtracting max. by @copybara-service in https://github.com/google/gemma.cpp/pull/236
- Extends Transformer() to prepare for batched processing. by @copybara-service in https://github.com/google/gemma.cpp/pull/238
- Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding. by @copybara-service in https://github.com/google/gemma.cpp/pull/239
- Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32, by @copybara-service in https://github.com/google/gemma.cpp/pull/237
- Increase parallelism in ops_test by @copybara-service in https://github.com/google/gemma.cpp/pull/233
- Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time. by @copybara-service in https://github.com/google/gemma.cpp/pull/241
- Reduce duplication in Config* by inheriting no-SSM by @copybara-service in https://github.com/google/gemma.cpp/pull/242
- Major duplicated code reduction in test/benchmarks by @copybara-service in https://github.com/google/gemma.cpp/pull/240
- Implement a missing (bf16, f32) tiled MatMul kernel. by @copybara-service in https://github.com/google/gemma.cpp/pull/245
- Removed now redundant non-batch matmul by @copybara-service in https://github.com/google/gemma.cpp/pull/246
- Integrate matmul into FFW: 4.3x prefill speedup by @copybara-service in https://github.com/google/gemma.cpp/pull/243
- Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/244
- Added bias vector addition to MatMul by @copybara-service in https://github.com/google/gemma.cpp/pull/247
- Refactor CompressedWeights. by @copybara-service in https://github.com/google/gemma.cpp/pull/248
- Fix DASSERT - TiledBatch requires at least 2 vectors. by @copybara-service in https://github.com/google/gemma.cpp/pull/253
- Move raw_weights into separate header, used mainly by compress_weights. by @copybara-service in https://github.com/google/gemma.cpp/pull/249
- Further simplification to ForEachTensor, thanks I.K. by @copybara-service in https://github.com/google/gemma.cpp/pull/254
- Update developer docs and mention asan/msan by @copybara-service in https://github.com/google/gemma.cpp/pull/255
- 1.15x 7b sfp prefill speedup: Matmul in attention by @copybara-service in https://github.com/google/gemma.cpp/pull/256
- Fix Py binding/run_example: use GemmaEnv by @copybara-service in https://github.com/google/gemma.cpp/pull/257
- Simplify Attention. by @copybara-service in https://github.com/google/gemma.cpp/pull/258
- Fix debug_prompt and other binaries (internal init) by @copybara-service in https://github.com/google/gemma.cpp/pull/259
- Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly by @copybara-service in https://github.com/google/gemma.cpp/pull/260
- Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions. by @copybara-service in https://github.com/google/gemma.cpp/pull/261
- Move test placeholder to a later pos. by @copybara-service in https://github.com/google/gemma.cpp/pull/263
- Code cleanup by @copybara-service in https://github.com/google/gemma.cpp/pull/264
- Refactor kCachePosSize and kCacheLayerSize into separate functors. by @copybara-service in https://github.com/google/gemma.cpp/pull/262
- Fixing two typos. by @copybara-service in https://github.com/google/gemma.cpp/pull/265
- Fix compilation errors in clang by @ufownl in https://github.com/google/gemma.cpp/pull/267
- Fix KV cache size calculation error by @ufownl in https://github.com/google/gemma.cpp/pull/266
- Skip the last RMSNormInplaceBatched in the Prefill phase. by @copybara-service in https://github.com/google/gemma.cpp/pull/268
- Improve logging when running Gemma examples: fix the issue when max_tokens, max_generated_tokens and temperature were logging without any trailing space/newline. by @copybara-service in https://github.com/google/gemma.cpp/pull/270
- Use hwy::ThreadPool::MaxThreads() to determine the number of threads to use. by @copybara-service in https://github.com/google/gemma.cpp/pull/251
- Fix a clang tidy warning by @copybara-service in https://github.com/google/gemma.cpp/pull/271
- Remove unused BUILD dependency by @copybara-service in https://github.com/google/gemma.cpp/pull/272
- Refactor model type / training tables, simplify reverse mapping by @copybara-service in https://github.com/google/gemma.cpp/pull/273
- Introduce new Gemma 9B and 27B configs by @copybara-service in https://github.com/google/gemma.cpp/pull/274
- Add prompt batching to Gemma.cpp. by @copybara-service in https://github.com/google/gemma.cpp/pull/269
- Add config for att/final cap, skip max-subtract. Fixes [#278] by @copybara-service in https://github.com/google/gemma.cpp/pull/279
- Declutter gemma/ directory, move binaries to evals/ and util/. by @copybara-service in https://github.com/google/gemma.cpp/pull/277
- Remove unused kSystemPrompt by @copybara-service in https://github.com/google/gemma.cpp/pull/275
- Use benchmark_helper in py bindings (adds BOS) by @copybara-service in https://github.com/google/gemma.cpp/pull/282
- Cleanup: add ModelInfo struct, remove gcpp:: by @copybara-service in https://github.com/google/gemma.cpp/pull/281
- Prep for sharding gemma.cc: split into kv_cache, tokenizer. by @copybara-service in https://github.com/google/gemma.cpp/pull/284
- Add sliding window attention for Gemma 2. by @copybara-service in https://github.com/google/gemma.cpp/pull/280
- Small cleanups. Fixes gemma_test build. by @copybara-service in https://github.com/google/gemma.cpp/pull/286
- 7x compile time speedup: shard gemma.cc by @copybara-service in https://github.com/google/gemma.cpp/pull/288
- Fix gemma_test - moved to evals/. by @copybara-service in https://github.com/google/gemma.cpp/pull/289
- Add Py bindings for weight compression by @copybara-service in https://github.com/google/gemma.cpp/pull/290
- Cleanup: move util/compress and convert_weights to compression/ by @copybara-service in https://github.com/google/gemma.cpp/pull/291
- Fix handling of %c and %q if eot_string. Fixes [#283], thanks @ljcucc by @copybara-service in https://github.com/google/gemma.cpp/pull/292
- Update gemma_test with the expected entropy values for the IT models of size 2B/7B/9B/27B. by @copybara-service in https://github.com/google/gemma.cpp/pull/294
- Lint fix - string append, remove stale TODO by @copybara-service in https://github.com/google/gemma.cpp/pull/295
- Update gemma_test to also pass for the v1.1. models. by @copybara-service in https://github.com/google/gemma.cpp/pull/296
- Add more comments to attention computation (and some small restructuring). by @copybara-service in https://github.com/google/gemma.cpp/pull/298
- Fix windows build: min conflict, unused VF by @copybara-service in https://github.com/google/gemma.cpp/pull/299
- Refactor configurables. by @copybara-service in https://github.com/google/gemma.cpp/pull/297
- Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing by @copybara-service in https://github.com/google/gemma.cpp/pull/303
- Simplify matmul: only 2 overloads by @copybara-service in https://github.com/google/gemma.cpp/pull/304
- SVE build fix: avoid capturing vectors directly. by @copybara-service in https://github.com/google/gemma.cpp/pull/305
- Improve readability with RepeatedAttentionWindowSizes by @copybara-service in https://github.com/google/gemma.cpp/pull/302
- Increase the prefill batch size to 64. by @copybara-service in https://github.com/google/gemma.cpp/pull/306
- Fix gemma_cpp/examples/hello_world build. by @copybara-service in https://github.com/google/gemma.cpp/pull/307
- Further 1.02x prefill speedup from batch 64->512 by @copybara-service in https://github.com/google/gemma.cpp/pull/308
- Fix examples/hello_world for real. by @copybara-service in https://github.com/google/gemma.cpp/pull/309
- Simplify FFW by using MatMul_4x4_Batch_Add. by @copybara-service in https://github.com/google/gemma.cpp/pull/311
- De-templatize Activations, add RowVectorBatch class by @copybara-service in https://github.com/google/gemma.cpp/pull/310
- Update gemma-27b to the correct query scaling. by @copybara-service in https://github.com/google/gemma.cpp/pull/312
- Add scale parameter to MatMul. by @copybara-service in https://github.com/google/gemma.cpp/pull/313
- Fix msan uninitialized scale by @copybara-service in https://github.com/google/gemma.cpp/pull/314
- Major Prefill/Generate cleanup, 1.3x Prefill speedup by @copybara-service in https://github.com/google/gemma.cpp/pull/315
- Cleanup: add wrapper functions and rename vars to interleaved by @copybara-service in https://github.com/google/gemma.cpp/pull/316
- Split up ops.h into ops/ops-inl and matmul-inl by @copybara-service in https://github.com/google/gemma.cpp/pull/317
- Use all CPU sockets when pinning threads to cores by @copybara-service in https://github.com/google/gemma.cpp/pull/319
- Fix msan uninitialized scale in optimize_test by @copybara-service in https://github.com/google/gemma.cpp/pull/320
- Minor polishing: adding comments, renaming variables. by @copybara-service in https://github.com/google/gemma.cpp/pull/321
- Fix setting scales in Py binding by @copybara-service in https://github.com/google/gemma.cpp/pull/322
- Add offset arg to MatMul, rename, Matmul for logits = ~1.1x decode speedup by @copybara-service in https://github.com/google/gemma.cpp/pull/325
- 1.05x prefill speedup: matvec -> matmul for !MHA by @copybara-service in https://github.com/google/gemma.cpp/pull/327
- Add Python code for converting Griffin Orbax weights. Refs [#301] by @copybara-service in https://github.com/google/gemma.cpp/pull/329
- MatMul cleanup: Mat struct, simplify args. by @copybara-service in https://github.com/google/gemma.cpp/pull/330
- Fix Windows build - macro conflict with param name by @copybara-service in https://github.com/google/gemma.cpp/pull/331
- Extend LayersOutputFunc to take query index and auxillary int by @copybara-service in https://github.com/google/gemma.cpp/pull/328
- Split matmul into matvec; add large matrix benchmark by @copybara-service in https://github.com/google/gemma.cpp/pull/333
- Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/326
- SFP speedup: 1.14x f32, 1.19x bf16 dot = 1.02x prefill by @copybara-service in https://github.com/google/gemma.cpp/pull/335
- 1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism. by @copybara-service in https://github.com/google/gemma.cpp/pull/334
- Improve performance logging by @copybara-service in https://github.com/google/gemma.cpp/pull/336
- 1.03-1.08x decode speedup: precompute Rope theta, fuse by @copybara-service in https://github.com/google/gemma.cpp/pull/339
- Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. by @copybara-service in https://github.com/google/gemma.cpp/pull/342
- Add pin flag to disable pinning. Refs [#338] by @copybara-service in https://github.com/google/gemma.cpp/pull/343
- 1.3x prefill, 0.95x decode: matmul replacing last matvec by @copybara-service in https://github.com/google/gemma.cpp/pull/345
- Fix gemma_test GeographyBatched for 2b-it and add entropy expectations for gemma2-2b-it. by @copybara-service in https://github.com/google/gemma.cpp/pull/346
- 0.98x prefill: refactor in prep for cache blocking. by @copybara-service in https://github.com/google/gemma.cpp/pull/347
- Implement
start_pos
per query for batch interface (reopen) by @ufownl in https://github.com/google/gemma.cpp/pull/348 - Simplify pos handling, auto-increment output arg by @copybara-service in https://github.com/google/gemma.cpp/pull/350
- Support directly observing activations, partially replacing LayersOutputFunc by @copybara-service in https://github.com/google/gemma.cpp/pull/351
- Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul by @copybara-service in https://github.com/google/gemma.cpp/pull/352
- Expose underlying model configuration: number of layers, heads, etc. by @copybara-service in https://github.com/google/gemma.cpp/pull/354
- VectorizedRopeAndMulBy. by @copybara-service in https://github.com/google/gemma.cpp/pull/355
- Fix prefill for batched queries. by @copybara-service in https://github.com/google/gemma.cpp/pull/353
- Vectorize Rope for qkv dim not evenly divisible by number of lanes. by @copybara-service in https://github.com/google/gemma.cpp/pull/356
- Fix test for 2b - update prompt by @copybara-service in https://github.com/google/gemma.cpp/pull/358
- Minor followup: remainder handling is a single iteration by @copybara-service in https://github.com/google/gemma.cpp/pull/359
- Experiment with compensated dot product. by @copybara-service in https://github.com/google/gemma.cpp/pull/357
- Avoid duplication of RMSNorm, support all activation/weight types by @copybara-service in https://github.com/google/gemma.cpp/pull/360
- Demonstrate constrained decoding in gemma_cpp's hello world example by @copybara-service in https://github.com/google/gemma.cpp/pull/363
- Add an additional QueryModel() overload to GemmaEnv. by @copybara-service in https://github.com/google/gemma.cpp/pull/362
- Internal change. Slight restructuring of gemma_test. by @copybara-service in https://github.com/google/gemma.cpp/pull/367
- 1.22x NUQ compress speedup, fix out of bounds access, improve numerics by @copybara-service in https://github.com/google/gemma.cpp/pull/366
- Fix NUQ for SVE - incorrect nibble packing by @copybara-service in https://github.com/google/gemma.cpp/pull/368
- Further nuq_test speedups to prevent timeout by @copybara-service in https://github.com/google/gemma.cpp/pull/371
- Refactor/cleanup, remove even_odd by @copybara-service in https://github.com/google/gemma.cpp/pull/372
- Minor cleanup/fixes: by @copybara-service in https://github.com/google/gemma.cpp/pull/375
- Major compression update, arbitrary-len unpack + new Dot by @copybara-service in https://github.com/google/gemma.cpp/pull/374
- Fix mismatch between blob_store and compress interfaces (bytes) by @copybara-service in https://github.com/google/gemma.cpp/pull/376
- Adds insert_float() to SbsWriter() to store a float array directly. by @copybara-service in https://github.com/google/gemma.cpp/pull/378
- Implement scalar version of LayerNorm by @copybara-service in https://github.com/google/gemma.cpp/pull/379
- Add const batch accessor to RowVectorBatch. by @copybara-service in https://github.com/google/gemma.cpp/pull/381
- Add entropy expectations for Griffin-2b model in gemma_test and make sure it passes. by @copybara-service in https://github.com/google/gemma.cpp/pull/382
- Add tests for SampleTopK that highlight existing problems and fix those: by @copybara-service in https://github.com/google/gemma.cpp/pull/383
- Add pairwise sum dot products for testing by @copybara-service in https://github.com/google/gemma.cpp/pull/386
- Fix the warnings complained by Clang by @ufownl in https://github.com/google/gemma.cpp/pull/380
- Cascaded summation for Softmax by @copybara-service in https://github.com/google/gemma.cpp/pull/388
- Fix compress-inl bf16->f32 overrun by @copybara-service in https://github.com/google/gemma.cpp/pull/390
- Fix topology display for platforms where it fails (Apple) by @copybara-service in https://github.com/google/gemma.cpp/pull/391
- Update expected entropy values for GRIFFIN_2B model. by @copybara-service in https://github.com/google/gemma.cpp/pull/392
- Add forward and backward error by @copybara-service in https://github.com/google/gemma.cpp/pull/389
- Fix prefix-LM mode assertion by @ufownl in https://github.com/google/gemma.cpp/pull/394
- Reduce flakiness of dot_test. by @copybara-service in https://github.com/google/gemma.cpp/pull/396
- 1.6x speedup of MatMulSlow using compensated Dot by @copybara-service in https://github.com/google/gemma.cpp/pull/397
- Add download location of Pali Gemma weights to README.md. by @copybara-service in https://github.com/google/gemma.cpp/pull/398
- Tiny update of the README formatting. by @copybara-service in https://github.com/google/gemma.cpp/pull/399
- Add double-precision dot variant by @copybara-service in https://github.com/google/gemma.cpp/pull/393
- Use f64 Dot and sum in softmax - faster than Cascaded by @copybara-service in https://github.com/google/gemma.cpp/pull/400
- 1.09x decode speedup for topk=1/temp0: fuse softmax and sample by @copybara-service in https://github.com/google/gemma.cpp/pull/402
- Rename one variable in SampleTopK and update TestSampleTopK. by @copybara-service in https://github.com/google/gemma.cpp/pull/404
- Minor fix to profiler zone and add comment by @copybara-service in https://github.com/google/gemma.cpp/pull/407
- Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/408
- Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/377
- Fix MSAN issue for multiturn. Rewind the prior EOS token. by @copybara-service in https://github.com/google/gemma.cpp/pull/412
- Reduce number of operations in Gelu() by one Mul. by @copybara-service in https://github.com/google/gemma.cpp/pull/414
- Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. by @copybara-service in https://github.com/google/gemma.cpp/pull/417
- Update expected ranges in dot_test. by @copybara-service in https://github.com/google/gemma.cpp/pull/420
- Remove unused "two-sizes" version of MulByConstAndAdd. by @copybara-service in https://github.com/google/gemma.cpp/pull/421
- Benchmark gemma.cpp with different length inputs. by @copybara-service in https://github.com/google/gemma.cpp/pull/416
- Fix PaliGemma model loading. by @copybara-service in https://github.com/google/gemma.cpp/pull/425
- Fix compilation error of the weights compression tool by @ufownl in https://github.com/google/gemma.cpp/pull/422
- Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. by @copybara-service in https://github.com/google/gemma.cpp/pull/419
- Eliminated TConfig. by @copybara-service in https://github.com/google/gemma.cpp/pull/428
- Fix PaliGemma's GenerateImageTokensT(). by @copybara-service in https://github.com/google/gemma.cpp/pull/430
- Use NestedPools, add NUMA infra by @copybara-service in https://github.com/google/gemma.cpp/pull/427
- Fix compilation errors of "compress_weights" target by @ufownl in https://github.com/google/gemma.cpp/pull/432
- Add overloads of
Image::ReadPPM
method by @ufownl in https://github.com/google/gemma.cpp/pull/426 - New blob_store_test, ensure ReadOne checks actual size against requested size by @copybara-service in https://github.com/google/gemma.cpp/pull/433
- Add a compilation option to disable topology by @ufownl in https://github.com/google/gemma.cpp/pull/435
- Serialization for class members for use with ModelConfig by @copybara-service in https://github.com/google/gemma.cpp/pull/436
- Warning fixes (casts) and fix Windows build for aligned_alloc by @copybara-service in https://github.com/google/gemma.cpp/pull/437
- Factor out addition of ViTConfig to a ModelConfig. by @copybara-service in https://github.com/google/gemma.cpp/pull/438
- Simpler MatMul interface, vocab types, Tristate for use_spinning by @copybara-service in https://github.com/google/gemma.cpp/pull/442
- Expose BlobReader::Keys() by @copybara-service in https://github.com/google/gemma.cpp/pull/443
- Fix Griffin model: by @copybara-service in https://github.com/google/gemma.cpp/pull/444
- Replace CLIF SbsWriter with pybind-based gcpp extension by @copybara-service in https://github.com/google/gemma.cpp/pull/445
- Added a blob_compare tool that compares two sbs files that may have the blobs in a different order by @copybara-service in https://github.com/google/gemma.cpp/pull/448
- Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/450
- Added pybind for configs. by @copybara-service in https://github.com/google/gemma.cpp/pull/449
- Improved consistency of compressor API, and added a universal method with a target type arg. by @copybara-service in https://github.com/google/gemma.cpp/pull/452
- Add a simple benchmark for batching. by @copybara-service in https://github.com/google/gemma.cpp/pull/453
- Threading/infra improvements. by @copybara-service in https://github.com/google/gemma.cpp/pull/455
- Print cache info and update Highway version for that by @copybara-service in https://github.com/google/gemma.cpp/pull/456
- Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/457
- Add support for 448px resolution to PaliGemma and PaliGemma2. by @copybara-service in https://github.com/google/gemma.cpp/pull/459
- Tiny cleanup. by @copybara-service in https://github.com/google/gemma.cpp/pull/461
- Refactor
gemma/common.cc
to improve readability and safety by @ericcurtin in https://github.com/google/gemma.cpp/pull/460 - Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/462
- Fix unhandled switch warning/error by @copybara-service in https://github.com/google/gemma.cpp/pull/463
- Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future. by @copybara-service in https://github.com/google/gemma.cpp/pull/454
- Make prompt wrapping more consistent and fix duplicated tokens for multi-turn. by @copybara-service in https://github.com/google/gemma.cpp/pull/464
- Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT by @copybara-service in https://github.com/google/gemma.cpp/pull/465
- Rename ModelTraining to PromptWrapping which is a more accurate name. by @copybara-service in https://github.com/google/gemma.cpp/pull/466
- Small updates to the README file. by @copybara-service in https://github.com/google/gemma.cpp/pull/467
- Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/468
- Added ability to load/save a complete model file, including tokenizer. by @copybara-service in https://github.com/google/gemma.cpp/pull/469
- Moved the vit config fields to their own config struct by @copybara-service in https://github.com/google/gemma.cpp/pull/471
- Allow interactive use with new single-file weight format. by @copybara-service in https://github.com/google/gemma.cpp/pull/472
- Add the missing
migrate_weights
target for CMake by @ufownl in https://github.com/google/gemma.cpp/pull/473 - Tiny fix: align template parameter order with parameter order. by @copybara-service in https://github.com/google/gemma.cpp/pull/476
- Add parameter for base_frequency to CreateInvTimeScale(). by @copybara-service in https://github.com/google/gemma.cpp/pull/477
- Infra improvements (2) by @copybara-service in https://github.com/google/gemma.cpp/pull/474
- internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/478
- Allow overriding num threads despite detecting topology by @copybara-service in https://github.com/google/gemma.cpp/pull/480
- Assorted small cleanups. by @copybara-service in https://github.com/google/gemma.cpp/pull/482
- Add python wrappers for configs and inference. by @copybara-service in https://github.com/google/gemma.cpp/pull/481
- Simplified interface class and example for Gemma.cpp usage. by @copybara-service in https://github.com/google/gemma.cpp/pull/483
- Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests. by @copybara-service in https://github.com/google/gemma.cpp/pull/484
- Allow conversion, loading and inference with NUQ. by @copybara-service in https://github.com/google/gemma.cpp/pull/485
- Improved blob diff: parallel, tolerance for float by @copybara-service in https://github.com/google/gemma.cpp/pull/489
- Remove
srcs_version
andpython_version
attributes, as they already default to"PY3"
by @copybara-service in https://github.com/google/gemma.cpp/pull/487 - Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts by @copybara-service in https://github.com/google/gemma.cpp/pull/492
- Add fork/join latency benchmark by @copybara-service in https://github.com/google/gemma.cpp/pull/496
- Fix nuq Enc() to handle groups < kGroupSize. by @copybara-service in https://github.com/google/gemma.cpp/pull/497
- Using TimingInfo methods and cleaning up args to DecodeStepT by @copybara-service in https://github.com/google/gemma.cpp/pull/499
- Fix the link error when building
compress_weights
with Clang on macOS by @ufownl in https://github.com/google/gemma.cpp/pull/493 - Add conversion tool for HF safetensors to gemma.cpp for PaliGemma. by @copybara-service in https://github.com/google/gemma.cpp/pull/498
- Less verbose threading_test output, improve formatting. by @copybara-service in https://github.com/google/gemma.cpp/pull/500
- Only temporarily enable spinning in threading benchmark by @copybara-service in https://github.com/google/gemma.cpp/pull/503
- Implements FusedSoftmaxAndSampleTopK. by @copybara-service in https://github.com/google/gemma.cpp/pull/502
- Use vectorized TopK using highway VQSelect by @copybara-service in https://github.com/google/gemma.cpp/pull/505
- Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning by @copybara-service in https://github.com/google/gemma.cpp/pull/488
- Support bf16 output of Matmul by @copybara-service in https://github.com/google/gemma.cpp/pull/511
- Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/514
- Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/515
- Update github actions/cache version by @copybara-service in https://github.com/google/gemma.cpp/pull/517
- Fix PaliGemma models. by @copybara-service in https://github.com/google/gemma.cpp/pull/519
New Contributors
- @veluca93 made their first contribution in https://github.com/google/gemma.cpp/pull/129
- @atorero made their first contribution in https://github.com/google/gemma.cpp/pull/145
- @samkaufman made their first contribution in https://github.com/google/gemma.cpp/pull/166
- @xinpingwang made their first contribution in https://github.com/google/gemma.cpp/pull/169
- @zond made their first contribution in https://github.com/google/gemma.cpp/pull/137
- @ericcurtin made their first contribution in https://github.com/google/gemma.cpp/pull/460
Full Changelog: https://github.com/google/gemma.cpp/compare/v0.1.2...v0.1.3