From: GitHub <no...@gi...> - 2017-08-21 17:21:05
|
Branch: refs/heads/master Home: https://github.com/MLton/mlton Commit: 5bef8d6ab53622bf9aed0c7e3fc895234f97d72a https://github.com/MLton/mlton/commit/5bef8d6ab53622bf9aed0c7e3fc895234f97d72a Author: Matthew Fluet <mat...@gm...> Date: 2017-07-12 (Wed, 12 Jul 2017) Changed paths: M benchmark/main.sml A benchmark/tests/string-concat.sml Log Message: ----------- Add a string-concat benchmark Commit: d4c68b56d94a35f66c28707c42e5eef5517c795f https://github.com/MLton/mlton/commit/d4c68b56d94a35f66c28707c42e5eef5517c795f Author: Matthew Fluet <mat...@gm...> Date: 2017-07-12 (Wed, 12 Jul 2017) Changed paths: M runtime/gc/array.c M runtime/gc/array.h Log Message: ----------- Add GC_arrayCopy to runtime Commit: 21827e4e36bed3b71b58409bc6a7b0629603d77d https://github.com/MLton/mlton/commit/21827e4e36bed3b71b58409bc6a7b0629603d77d Author: Matthew Fluet <mat...@gm...> Date: 2017-07-12 (Wed, 12 Jul 2017) Changed paths: M mlton/atoms/prim.fun M mlton/atoms/prim.sig M mlton/backend/ssa-to-rssa.fun M mlton/closure-convert/abstract-value.fun M mlton/ssa/constant-propagation.fun M mlton/ssa/ssa-tree2.fun M mlton/ssa/useless.fun Log Message: ----------- Preliminary support for Array_copy{Array,Vector} primitives Commit: ed8f6beb38862c147a7d7dcfef9536b547559dae https://github.com/MLton/mlton/commit/ed8f6beb38862c147a7d7dcfef9536b547559dae Author: Matthew Fluet <mat...@gm...> Date: 2017-07-12 (Wed, 12 Jul 2017) Changed paths: M basis-library/arrays-and-vectors/sequence.fun M basis-library/arrays-and-vectors/sequence0.sml M basis-library/primitive/prim-seq.sml Log Message: ----------- Preliminary use of Array_copy{Array,Vector} in Basis Library Commit: de8f9229cbdb89884220cbdb9fb1ffba39bf2535 https://github.com/MLton/mlton/commit/de8f9229cbdb89884220cbdb9fb1ffba39bf2535 Author: Bryan Camp <bx...@ri...> Date: 2017-08-01 (Tue, 01 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence.fun M basis-library/arrays-and-vectors/sequence0.sig M basis-library/arrays-and-vectors/sequence0.sml M basis-library/arrays-and-vectors/slice0.sig M mlton/backend/ssa-to-rssa.fun Log Message: ----------- Introduced new copy functionality to improve concat and other similar functions. Commit: 7f63b6860bfe978eb36bf6ef8922008557873f67 https://github.com/MLton/mlton/commit/7f63b6860bfe978eb36bf6ef8922008557873f67 Author: Matthew Fluet <mat...@gm...> Date: 2017-08-12 (Sat, 12 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence.fun M basis-library/arrays-and-vectors/sequence0.sig M basis-library/arrays-and-vectors/sequence0.sml M mlton/backend/ssa-to-rssa.fun M mlton/ssa/useless.fun Log Message: ----------- Cleanup whitespace Commit: 4cfe0aa8f58676ddcda7b9933b3eb8a1df215507 https://github.com/MLton/mlton/commit/4cfe0aa8f58676ddcda7b9933b3eb8a1df215507 Author: Matthew Fluet <mat...@gm...> Date: 2017-08-12 (Sat, 12 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence0.sml Log Message: ----------- Reorder function declarations for consistency Commit: caa9c617ab5b6fb782d53b91edd6e7e6e28f8205 https://github.com/MLton/mlton/commit/caa9c617ab5b6fb782d53b91edd6e7e6e28f8205 Author: Matthew Fluet <mat...@gm...> Date: 2017-08-12 (Sat, 12 Aug 2017) Changed paths: M lib/mlton/basic/stream-parser.sig M lib/mlton/basic/stream-parser.sml M mlton/main/main.fun M mlton/xml/parse-sxml.fun Log Message: ----------- Merge branch 'master' of github.com:MLton/mlton into array-copy-prims For performance comparison, want both master and array-copy-prims to use PIC on amd64-linux. Commit: a2c6a3c71f323ebd93baa025163f911231600188 https://github.com/MLton/mlton/commit/a2c6a3c71f323ebd93baa025163f911231600188 Author: Matthew Fluet <mat...@gm...> Date: 2017-08-14 (Mon, 14 Aug 2017) Changed paths: M benchmark/Makefile M benchmark/main.sml M benchmark/tests/.gitignore R benchmark/tests/vector-concat.sml A benchmark/tests/vector32-concat.sml A benchmark/tests/vector64-concat.sml M benchmark/update-counts.sh Log Message: ----------- Split `vector-concat` into `vector32-concat` and `vector64-concat` Commit: daae91ded68a0e22d582b77e824669103a7426aa https://github.com/MLton/mlton/commit/daae91ded68a0e22d582b77e824669103a7426aa Author: Matthew Fluet <mat...@gm...> Date: 2017-08-14 (Mon, 14 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence0.sml Log Message: ----------- Use `copy` to implement `Vector.update` Commit: b5192efa9e928050997d7b4262a3e813e5f448a1 https://github.com/MLton/mlton/commit/b5192efa9e928050997d7b4262a3e813e5f448a1 Author: Matthew Fluet <mat...@gm...> Date: 2017-08-20 (Sun, 20 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence0.sml Log Message: ----------- Perform small (<5) sequence copies with loop (rather than `memmove`) Using a C call to `memmove` has some overhead for copies of short length, as compared to element-by-element copies. Instrumenting the `GC_arrayCopy` function to record number of elements copied at each invocation shows that a self-compile makes 37074056 calls to `GC_arrayCopy` with the following statistics on number of elements copied: min 0 median 3 mean 7.767096 max 146120 histogram: count density cumulative 0 8138 0.0002195066 0.0002195066 1 1556411 0.0419811364 0.0422006430 2 8922853 0.2406764720 0.2828771149 3 9395704 0.2534307010 0.5363078159 4 7053342 0.1902500768 0.7265578927 (4,8] 4883067 0.1317111621 0.8582690548 (8,16] 1113094 0.0300235291 0.8882925839 (16,146120] 4141447 0.1117074161 1.0000000000 With nearly 75% of copies being done with 4 or fewer elements, it is worth special handling copies of very small length. The following function measures the cost of `Array.copy{,Vec}`: fun run {fromInt, toString, ...} (di, si, l, j, k: Int64.int) = let val n = IntInf.toInt (IntInf.pow (2, 21)) val a = Array.tabulate (n, fromInt) val copy = if true then let val v = Vector.tabulate (n, fromInt) val sl = VectorSlice.slice (v, si, SOME l) in fn () => ArraySlice.copyVec {dst = a, di = di, src = sl} end else let val sl = ArraySlice.slice (a, si, SOME l) in fn () => ArraySlice.copy {dst = a, di = di, src = sl} end fun loop k = if k <= 0 then () else (copy (); loop (k - 1)) val _ = loop k val ans = Array.sub (a, j) val () = print (concat ["ans = ", toString ans, "\n"]) in () end Instantiating `fromInt` and `toString` selects the element type of the sequences. Results using `di = 1` and `si = 7` (for non-aligned data with < 32bit element sizes) and `k` dynamically determined (by finding a value for which `g1591902-amd64` is > 10s): |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio w8 0 12.52 59.46 4.749 15.82 1.263 9.05 47.51 5.250 15.82 1.748 9.05 45.26 5.001 13.58 1.501 w8 1 12.45 28.36 2.278 9.19 0.738 5.67 22.97 4.051 8.89 1.568 5.65 21.81 3.860 8.58 1.518 w8 2 16.65 30.64 1.840 13.68 0.822 9.05 24.88 2.749 12.92 1.428 8.43 23.75 2.817 10.19 1.209 w8 3 10.92 15.31 1.402 8.54 0.782 5.68 12.43 2.188 7.91 1.393 6.23 11.87 1.905 6.55 1.051 w8 4 13.53 14.79 1.093 10.52 0.777 6.85 11.88 1.734 9.93 1.450 7.48 11.31 1.512 7.97 1.065 w8 5 16.16 14.75 0.913 15.33 0.949 8.68 11.94 1.376 11.87 1.367 7.54 11.31 1.500 11.31 1.500 w8 6 18.86 14.87 0.788 15.33 0.813 11.31 11.89 1.051 11.87 1.049 11.74 11.30 0.962 11.30 0.962 w8 7 10.62 7.37 0.694 7.66 0.721 6.50 5.94 0.914 5.94 0.914 6.72 5.65 0.841 5.65 0.841 w8 8 11.87 7.68 0.647 7.94 0.669 7.35 6.22 0.846 6.22 0.846 7.62 5.94 0.780 5.94 0.780 w8 16 11.50 3.12 0.271 3.27 0.284 7.07 2.90 0.410 3.15 0.446 7.22 2.69 0.373 2.73 0.378 w8 32 10.58 1.56 0.147 1.63 0.154 6.92 1.43 0.207 1.46 0.211 7.59 1.38 0.182 1.38 0.182 w8 64 19.58 2.42 0.124 2.45 0.125 14.93 2.44 0.163 2.46 0.165 14.98 2.28 0.152 2.31 0.154 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio w16 0 13.57 52.05 3.836 15.81 1.165 9.04 45.91 5.078 15.69 1.736 9.05 45.27 5.002 13.56 1.498 w16 1 11.35 27.14 2.391 9.98 0.879 6.78 26.15 3.857 8.76 1.292 5.66 23.74 4.194 7.91 1.398 w16 2 14.69 26.01 1.770 13.57 0.924 9.07 23.65 2.607 12.05 1.329 8.43 22.62 2.683 11.02 1.307 w16 3 10.18 13.01 1.278 8.49 0.834 5.65 11.75 2.080 7.43 1.315 5.09 11.30 2.220 7.80 1.532 w16 4 12.43 13.56 1.091 10.51 0.845 6.60 12.11 1.835 11.75 1.780 6.22 11.87 1.908 9.74 1.566 w16 5 14.70 13.57 0.923 14.70 1.000 8.10 12.19 1.505 11.98 1.479 7.35 11.87 1.615 11.87 1.615 w16 6 16.96 13.57 0.800 14.70 0.867 11.46 12.09 1.055 12.60 1.099 8.48 11.87 1.400 11.87 1.400 w16 7 19.22 13.57 0.706 14.70 0.765 13.14 11.96 0.910 12.61 0.960 9.61 11.87 1.235 11.87 1.235 w16 8 10.74 5.84 0.544 6.09 0.567 7.52 5.85 0.778 5.86 0.779 5.37 5.42 1.009 5.37 1.000 w16 16 19.78 5.83 0.295 9.93 0.502 14.23 5.82 0.409 5.86 0.412 9.89 5.58 0.564 5.37 0.543 w16 32 10.43 2.40 0.230 2.45 0.235 7.50 2.45 0.327 2.45 0.327 4.73 2.26 0.478 2.29 0.484 w16 64 19.57 2.43 0.124 2.53 0.129 12.96 2.48 0.191 2.46 0.190 10.77 2.30 0.213 2.30 0.213 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio w32 0 13.56 56.61 4.175 15.81 1.166 9.04 45.26 5.007 15.83 1.751 9.04 45.60 5.044 13.56 1.500 w32 1 10.18 28.29 2.779 9.04 0.888 6.78 23.60 3.481 12.16 1.794 5.66 22.88 4.042 9.22 1.629 w32 2 13.66 29.41 2.153 12.44 0.911 9.05 23.75 2.624 12.78 1.412 9.75 23.87 2.448 11.70 1.200 w32 3 10.17 14.71 1.446 7.92 0.779 5.68 11.87 2.090 7.36 1.296 5.09 12.05 2.367 7.48 1.470 w32 4 12.46 11.89 0.954 10.17 0.816 7.88 11.40 1.447 9.48 1.203 7.16 10.81 1.510 9.18 1.282 w32 5 14.71 11.89 0.808 13.00 0.884 8.14 11.41 1.402 12.18 1.496 7.36 10.78 1.465 10.75 1.461 w32 6 16.96 11.89 0.701 13.00 0.767 9.57 11.45 1.196 12.28 1.283 11.30 10.90 0.965 10.80 0.956 w32 7 19.22 11.89 0.619 13.00 0.676 10.74 11.45 1.066 12.28 1.143 13.30 10.90 0.819 10.79 0.811 w32 8 10.74 5.94 0.553 6.50 0.605 5.93 5.74 0.968 6.14 1.035 7.54 5.41 0.717 5.39 0.715 w32 16 19.78 9.54 0.482 9.69 0.490 10.01 9.72 0.971 9.75 0.974 14.26 9.44 0.662 9.17 0.643 w32 32 19.07 4.80 0.252 4.87 0.255 9.47 4.91 0.518 4.87 0.514 15.05 4.60 0.306 4.60 0.306 w32 64 19.55 3.94 0.201 3.97 0.203 10.88 3.92 0.360 3.89 0.357 12.96 3.78 0.292 3.84 0.296 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio w64 0 13.81 58.85 4.261 15.83 1.146 9.04 45.97 5.085 15.83 1.751 9.05 48.27 5.334 13.57 1.499 w64 1 10.30 30.55 2.966 9.05 0.879 6.78 23.93 3.529 8.58 1.265 5.66 24.95 4.408 8.57 1.514 w64 2 14.15 24.89 1.759 12.44 0.879 9.07 22.81 2.515 11.97 1.320 9.76 21.54 2.207 11.43 1.171 w64 3 10.19 12.45 1.222 7.92 0.777 5.65 11.34 2.007 7.38 1.306 5.09 10.74 2.110 6.84 1.344 w64 4 12.50 12.45 0.996 10.18 0.814 7.62 11.35 1.489 9.48 1.244 7.21 11.29 1.566 9.00 1.248 w64 5 14.73 15.84 1.075 16.39 1.113 8.89 13.64 1.534 14.11 1.587 7.36 13.56 1.842 13.04 1.772 w64 6 16.97 15.84 0.933 16.40 0.966 11.31 13.67 1.209 14.24 1.259 11.31 14.18 1.254 13.05 1.154 w64 7 19.23 15.84 0.824 16.40 0.853 13.00 13.72 1.055 14.24 1.095 13.29 13.56 1.020 13.05 0.982 w64 8 10.74 9.37 0.872 9.55 0.889 7.62 9.57 1.256 9.72 1.275 7.54 8.96 1.188 9.23 1.224 w64 16 19.79 9.47 0.479 9.59 0.485 14.34 9.66 0.674 9.70 0.676 14.26 9.02 0.632 9.22 0.647 w64 32 18.96 7.89 0.416 7.94 0.419 14.97 7.75 0.518 7.77 0.519 14.99 7.57 0.505 7.71 0.514 w64 64 19.57 6.09 0.311 5.15 0.263 12.93 5.80 0.449 5.03 0.389 12.96 5.80 0.447 4.99 0.385 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio r32 0 13.57 61.06 4.500 15.83 1.166 9.05 50.45 5.574 15.83 1.749 9.04 49.80 5.509 13.56 1.500 r32 1 11.41 30.54 2.676 9.04 0.792 6.79 26.50 3.903 10.14 1.493 5.65 24.90 4.407 9.22 1.632 r32 2 13.57 31.67 2.334 12.44 0.917 9.05 26.31 2.907 11.97 1.323 9.45 26.01 2.752 11.65 1.233 r32 3 10.25 15.83 1.544 8.37 0.816 5.66 13.29 2.348 7.43 1.313 5.09 13.01 2.556 7.47 1.467 r32 4 12.54 13.01 1.037 10.74 0.856 7.41 12.24 1.652 9.65 1.302 7.09 11.43 1.612 9.04 1.275 r32 5 14.70 13.01 0.885 12.49 0.850 8.04 12.24 1.522 11.75 1.461 7.37 11.44 1.552 10.77 1.461 r32 6 16.96 13.01 0.767 12.66 0.746 11.31 12.24 1.082 11.78 1.041 11.30 11.46 1.014 10.77 0.953 r32 7 19.22 13.32 0.693 12.87 0.670 13.00 12.24 0.941 11.76 0.905 13.30 11.43 0.859 10.76 0.809 r32 8 10.74 6.50 0.605 6.33 0.589 7.57 6.13 0.810 5.88 0.777 7.35 5.72 0.778 5.41 0.736 r32 16 19.79 10.44 0.527 9.81 0.496 14.37 10.27 0.715 9.79 0.681 14.28 9.82 0.688 9.21 0.645 r32 32 10.37 2.62 0.253 2.46 0.237 7.55 2.58 0.342 2.46 0.326 7.49 2.50 0.334 2.30 0.307 r32 64 19.57 4.15 0.212 4.15 0.212 12.99 4.23 0.326 4.08 0.314 12.94 3.96 0.306 3.89 0.301 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio r64 0 13.57 63.36 4.669 15.83 1.166 9.05 49.78 5.501 15.83 1.749 9.05 47.59 5.259 13.56 1.498 r64 1 11.10 32.81 2.956 9.05 0.815 6.80 26.02 3.826 8.80 1.294 5.66 24.89 4.398 9.20 1.625 r64 2 13.57 27.17 2.002 12.44 0.917 10.50 24.62 2.345 12.20 1.162 9.29 23.17 2.494 11.47 1.235 r64 3 10.29 13.58 1.320 7.93 0.771 6.04 12.28 2.033 7.77 1.286 5.31 11.52 2.169 7.73 1.456 r64 4 12.56 13.58 1.081 11.14 0.887 6.57 12.21 1.858 9.84 1.498 6.89 11.49 1.668 9.06 1.315 r64 5 14.71 16.96 1.153 15.93 1.083 9.09 14.84 1.633 14.18 1.560 7.37 14.24 1.932 13.08 1.775 r64 6 16.97 16.98 1.001 15.94 0.939 11.34 14.82 1.307 14.22 1.254 11.30 14.22 1.258 13.00 1.150 r64 7 19.22 16.97 0.883 15.91 0.828 12.45 14.80 1.189 14.23 1.143 13.14 14.43 1.098 13.00 0.989 r64 8 10.75 10.25 0.953 9.62 0.895 7.12 10.12 1.421 9.65 1.355 7.49 9.78 1.306 9.07 1.211 r64 16 19.78 10.25 0.518 9.72 0.491 14.21 10.21 0.718 9.67 0.681 14.23 9.81 0.689 9.08 0.638 r64 32 10.39 4.06 0.391 4.09 0.394 6.95 4.14 0.596 4.01 0.577 7.50 3.93 0.524 3.87 0.516 r64 64 19.58 5.13 0.262 5.17 0.264 13.04 5.77 0.442 6.00 0.460 12.94 5.62 0.434 5.97 0.461 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio g1591902 gdaae91d ratio gxxxxxxx ratio w8*w64 0 13.82 52.08 3.768 15.84 1.146 9.06 45.27 4.997 15.84 1.748 9.05 45.30 5.006 13.58 1.501 w8*w64 1 12.69 23.40 1.844 11.42 0.900 7.93 23.13 2.917 10.19 1.285 7.93 21.58 2.721 11.09 1.398 w8*w64 2 19.37 23.47 1.212 18.29 0.944 12.90 23.02 1.784 15.69 1.216 11.32 21.48 1.898 15.28 1.350 w8*w64 3 13.05 14.52 1.113 12.47 0.955 7.92 13.49 1.703 9.92 1.253 7.36 13.31 1.808 10.32 1.402 w8*w64 4 16.43 18.74 1.140 15.85 0.965 10.23 19.49 1.905 12.07 1.180 9.35 18.08 1.934 12.54 1.341 w8*w64 5 19.82 18.75 0.946 19.08 0.963 11.32 19.25 1.701 19.42 1.716 10.75 18.06 1.680 18.43 1.714 w8*w64 6 11.62 9.34 0.804 9.56 0.823 6.51 9.71 1.492 9.68 1.487 6.23 8.98 1.441 9.19 1.475 w8*w64 7 13.31 9.34 0.702 9.99 0.751 7.91 9.64 1.219 9.69 1.225 7.08 9.11 1.287 9.24 1.305 w8*w64 8 15.01 9.45 0.629 9.60 0.639 8.83 9.71 1.100 9.69 1.097 7.93 9.02 1.137 9.21 1.161 w8*w64 16 14.29 7.89 0.552 7.94 0.556 7.80 7.77 0.996 7.78 0.997 7.48 7.58 1.013 7.71 1.031 w8*w64 32 15.11 5.63 0.373 5.15 0.341 7.66 5.11 0.667 5.31 0.693 7.41 4.94 0.667 5.52 0.745 w8*w64 64 14.35 3.74 0.261 3.76 0.262 8.69 3.71 0.427 3.69 0.425 8.69 3.64 0.419 3.66 0.421 This commit avoids the > 2X slowdown for very small copies with `memmove`. The amd64 codegen actually does better than `master` for lengths 1 to 4. Commit: fab6b37c5f817c039ede85da4ea8a5f28ea3ce3a https://github.com/MLton/mlton/commit/fab6b37c5f817c039ede85da4ea8a5f28ea3ce3a Author: Matthew Fluet <mat...@gm...> Date: 2017-08-20 (Sun, 20 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence0.sml Log Message: ----------- Refactor copy of small sequences Commit: 9c8b93e9cb9cf2341e92ca71b54883b1c1af7637 https://github.com/MLton/mlton/commit/9c8b93e9cb9cf2341e92ca71b54883b1c1af7637 Author: Matthew Fluet <mat...@gm...> Date: 2017-08-20 (Sun, 20 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence0.sml Log Message: ----------- Rename `aliasArray` to `sameArray` in functor PrimSequence argument Commit: 8f3efad936d8bc23e30c6437fea86f5f782a286f https://github.com/MLton/mlton/commit/8f3efad936d8bc23e30c6437fea86f5f782a286f Author: Matthew Fluet <mat...@gm...> Date: 2017-08-20 (Sun, 20 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/array-slice.sig M basis-library/arrays-and-vectors/array.sig M basis-library/arrays-and-vectors/array.sml M basis-library/arrays-and-vectors/sequence.fun M basis-library/arrays-and-vectors/sequence.sig M basis-library/arrays-and-vectors/sequence0.sml M basis-library/arrays-and-vectors/slice.sig M basis-library/arrays-and-vectors/vector-slice.sig M basis-library/arrays-and-vectors/vector.sig Log Message: ----------- Refactor to "lift" `copy` from `SeqIndex.int` to `int` in `Sequence` Commit: 12a46a712c6800a22e5651df1e4b4f7fc7a003da https://github.com/MLton/mlton/commit/12a46a712c6800a22e5651df1e4b4f7fc7a003da Author: Matthew Fluet <mat...@gm...> Date: 2017-08-20 (Sun, 20 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/sequence.fun Log Message: ----------- Use `Array_copy{Array,Vector}` to implement `concatWith` Commit: ea79eb674eceada911bfa2aff4fafb3a754f0e0a https://github.com/MLton/mlton/commit/ea79eb674eceada911bfa2aff4fafb3a754f0e0a Author: Matthew Fluet <Mat...@gm...> Date: 2017-08-21 (Mon, 21 Aug 2017) Changed paths: M basis-library/arrays-and-vectors/array-slice.sig M basis-library/arrays-and-vectors/array.sig M basis-library/arrays-and-vectors/array.sml M basis-library/arrays-and-vectors/sequence.fun M basis-library/arrays-and-vectors/sequence.sig M basis-library/arrays-and-vectors/sequence0.sig M basis-library/arrays-and-vectors/sequence0.sml M basis-library/arrays-and-vectors/slice.sig M basis-library/arrays-and-vectors/slice0.sig M basis-library/arrays-and-vectors/vector-slice.sig M basis-library/arrays-and-vectors/vector.sig M basis-library/primitive/prim-seq.sml M benchmark/Makefile M benchmark/main.sml M benchmark/tests/.gitignore A benchmark/tests/string-concat.sml R benchmark/tests/vector-concat.sml A benchmark/tests/vector32-concat.sml A benchmark/tests/vector64-concat.sml M benchmark/update-counts.sh M mlton/atoms/prim.fun M mlton/atoms/prim.sig M mlton/backend/ssa-to-rssa.fun M mlton/closure-convert/abstract-value.fun M mlton/ssa/constant-propagation.fun M mlton/ssa/ssa-tree2.fun M mlton/ssa/useless.fun M runtime/gc/array.c M runtime/gc/array.h Log Message: ----------- Merge pull request #192 from Bxc8214/array-copy-prims Introduce `Array_copy{Array,Vector}` primitives val Array_copyArray : 'a array * SeqIndex.int * 'a array * SeqIndex.int * SeqIndex.int -> unit val Array_copyVector : 'a array * SeqIndex.int * 'a vector * SeqIndex.int * SeqIndex.int -> unit which are used to implement a number of array and vector construction functions, particularly `append`, `concat`, and `concatWith`. This was motivated by the observation that Poly/ML's string concatenation was significantly faster than MLton's; investigation showed that Poly/ML uses `memmove` to implement string concatenation, which out performs MLton's element-by-element construction for large strings. See the discussion thread at: https://sourceforge.net/p/mlton/mailman/mlton-user/thread/CAG%2BQnTcffmvXcpFv9rivvwdcsRXvUCorYVdEPmhjCE8EHEpOig%40mail.gmail.com/#msg35312358 Within the Basis Library implementation, the `Array_copy{Array,Vector}` primitives are only used for copies of more than 5 elements; this is to avoid the overhead of an FFI call for copies of few elements. Instrumenting the `GC_arrayCopy` function to record number of elements copied at each invocation shows that a self-compile makes 37074056 calls to `GC_arrayCopy` with the following statistics on number of elements copied: min 0 median 3 mean 7.767096 max 146120 histogram: count density cumulative 0 8138 0.0002195066 0.0002195066 1 1556411 0.0419811364 0.0422006430 2 8922853 0.2406764720 0.2828771149 3 9395704 0.2534307010 0.5363078159 4 7053342 0.1902500768 0.7265578927 (4,8] 4883067 0.1317111621 0.8582690548 (8,16] 1113094 0.0300235291 0.8882925839 (16,146120] 4141447 0.1117074161 1.0000000000 There is one potential missed optimization due to the use of `Array_copy{Array,Vector}` instead of an element-by-element copying loop. Analyses and transformations must ensure that the element type of the source and destination sequence are the same (unified) for uses of `Array_copy{Array,Vector}`, whereas the element type of the source need only be coercable to the element type of the destination for uses of `{Array,Vector}_sub` followed by `Array_update`. That is, the element-by-element copy loop has the opportunity to modify the element as it is copied (e.g., drop unused fields of a tuple), but an `Array_copy{Array,Vector}` cannot modify elements as they are copied. Benchmark results: MLton0 -- ~/devel/mlton/builds/20170725.130317-g159190284/bin/mlton MLton1 -- ~/devel/mlton/builds/20170821.023419-g12a46a712/bin/mlton run time ratio benchmark MLton0 MLton1 string-concat 1.00 0.71 vector32-concat 1.00 0.76 vector64-concat 1.00 0.86 size benchmark MLton0 MLton1 string-concat 116,744 117,048 vector32-concat 121,000 121,336 vector64-concat 121,000 121,368 compile time benchmark MLton0 MLton1 string-concat 1.99 2.00 vector32-concat 2.01 2.02 vector64-concat 2.02 2.03 run time benchmark MLton0 MLton1 string-concat 127.99 90.69 vector32-concat 103.82 79.31 vector64-concat 109.04 93.43 The following function measures the cost of `Array_copy{Array,Vector}`: fun run {fromInt, toString, ...} (di, si, l, j, k: Int64.int) = let val n = IntInf.toInt (IntInf.pow (2, 21)) val a = Array.tabulate (n, fromInt) val copy = if true then let val v = Vector.tabulate (n, fromInt) val sl = VectorSlice.slice (v, si, SOME l) in fn () => ArraySlice.copyVec {dst = a, di = di, src = sl} end else let val sl = ArraySlice.slice (a, si, SOME l) in fn () => ArraySlice.copy {dst = a, di = di, src = sl} end fun loop k = if k <= 0 then () else (copy (); loop (k - 1)) val _ = loop k val ans = Array.sub (a, j) val () = print (concat ["ans = ", toString ans, "\n"]) in () end Instantiating `fromInt` and `toString` selects the element type of the sequences. Results using `di = 1` and `si = 7` (for non-aligned data with < 32bit element sizes) and `k` dynamically determined (by finding a value for which `g1591902-amd64` is > 10s): |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio w8 0 12.51 59.33 4.743 15.81 1.264 9.05 47.52 5.251 15.83 1.749 9.04 45.27 5.008 13.56 1.500 w8 1 12.50 28.48 2.278 9.07 0.726 5.66 22.98 4.060 9.08 1.604 5.66 21.81 3.853 8.53 1.507 w8 2 16.65 30.68 1.843 13.57 0.815 9.05 24.87 2.748 12.55 1.387 8.42 23.74 2.819 11.18 1.328 w8 3 10.92 15.36 1.407 8.66 0.793 5.67 12.44 2.194 8.14 1.436 6.26 11.87 1.896 6.51 1.040 w8 4 13.46 14.76 1.096 10.46 0.777 6.84 11.88 1.737 9.97 1.458 7.46 11.31 1.516 7.95 1.066 w8 5 16.18 14.78 0.913 15.26 0.943 8.69 11.88 1.367 11.88 1.367 7.54 11.31 1.500 11.90 1.578 w8 6 18.76 14.84 0.791 15.26 0.813 11.31 11.89 1.051 11.88 1.050 11.77 11.31 0.961 11.88 1.009 w8 7 10.59 7.40 0.699 7.63 0.720 6.50 5.94 0.914 5.94 0.914 6.72 5.66 0.842 5.94 0.884 w8 8 11.92 7.68 0.644 7.92 0.664 7.35 6.22 0.846 6.22 0.846 7.59 5.93 0.781 6.22 0.819 w8 16 11.59 3.17 0.274 3.25 0.280 7.07 2.89 0.409 3.16 0.447 7.22 2.70 0.374 2.69 0.373 w8 32 10.60 1.58 0.149 1.63 0.154 6.92 1.43 0.207 1.45 0.209 7.59 1.37 0.181 1.36 0.179 w8 64 19.58 2.42 0.124 2.46 0.126 14.94 2.44 0.163 2.46 0.165 14.99 2.27 0.151 2.30 0.153 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio w16 0 13.56 52.04 3.838 15.81 1.166 9.04 45.65 5.050 15.83 1.751 9.05 45.26 5.001 13.56 1.498 w16 1 11.41 27.14 2.379 10.07 0.882 6.78 24.22 3.572 9.32 1.375 5.65 23.75 4.203 7.91 1.400 w16 2 14.70 26.01 1.769 13.62 0.926 9.07 23.63 2.605 13.74 1.515 8.42 22.61 2.685 11.02 1.309 w16 3 10.18 13.00 1.277 8.52 0.837 5.65 11.80 2.088 8.27 1.464 5.09 11.31 2.222 7.95 1.562 w16 4 12.43 13.57 1.092 10.51 0.845 6.58 12.15 1.846 9.42 1.432 6.22 11.88 1.910 9.97 1.603 w16 5 14.69 13.57 0.924 14.76 1.005 8.10 11.93 1.473 12.44 1.536 7.35 11.87 1.615 11.87 1.615 w16 6 16.95 13.57 0.801 14.75 0.870 11.45 12.26 1.071 12.44 1.086 8.48 11.88 1.401 11.88 1.401 w16 7 19.22 13.57 0.706 14.77 0.768 13.15 11.98 0.911 12.43 0.945 9.61 11.87 1.235 11.88 1.236 w16 8 10.74 5.84 0.544 6.14 0.572 7.52 5.86 0.779 5.86 0.779 5.37 5.43 1.011 5.37 1.000 w16 16 19.78 5.82 0.294 6.17 0.312 14.23 5.94 0.417 5.86 0.412 9.89 5.42 0.548 5.38 0.544 w16 32 10.42 2.40 0.230 2.45 0.235 7.50 2.45 0.327 2.45 0.327 4.73 2.26 0.478 2.30 0.486 w16 64 19.57 2.43 0.124 2.46 0.126 12.96 2.46 0.190 2.47 0.190 10.77 2.30 0.213 2.30 0.213 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio w32 0 13.56 56.58 4.173 15.82 1.167 9.04 45.25 5.006 15.82 1.750 9.04 46.71 5.167 13.56 1.500 w32 1 10.17 28.29 2.782 9.04 0.889 6.78 23.60 3.481 9.00 1.327 5.67 23.07 4.069 9.20 1.622 w32 2 13.79 29.41 2.133 12.44 0.902 9.05 23.75 2.624 11.35 1.254 9.68 23.94 2.473 11.81 1.220 w32 3 10.17 14.71 1.446 7.94 0.781 5.68 11.87 2.090 7.39 1.301 5.09 11.97 2.352 7.54 1.481 w32 4 12.46 11.89 0.954 10.20 0.819 7.87 11.43 1.452 9.46 1.202 7.16 10.78 1.506 8.88 1.240 w32 5 14.71 11.89 0.808 12.95 0.880 8.13 11.40 1.402 11.73 1.443 7.36 10.78 1.465 10.75 1.461 w32 6 16.96 11.90 0.702 12.76 0.752 9.57 11.45 1.196 11.84 1.237 11.31 10.90 0.964 10.80 0.955 w32 7 19.22 11.89 0.619 12.82 0.667 10.74 11.46 1.067 11.84 1.102 13.30 10.90 0.819 10.80 0.812 w32 8 10.74 5.95 0.554 6.41 0.597 5.93 5.74 0.968 5.92 0.998 7.53 5.41 0.718 5.40 0.717 w32 16 19.78 9.54 0.482 9.77 0.494 10.01 9.70 0.969 9.63 0.962 14.26 9.02 0.632 9.18 0.644 w32 32 19.10 4.80 0.251 4.93 0.258 9.47 5.08 0.536 4.87 0.514 15.04 4.59 0.305 4.60 0.306 w32 64 19.54 3.93 0.201 3.97 0.203 10.89 3.92 0.360 3.89 0.357 12.97 3.78 0.291 3.84 0.296 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio w64 0 13.81 58.84 4.261 15.83 1.146 9.04 45.73 5.059 15.45 1.709 9.05 48.35 5.343 13.57 1.499 w64 1 10.30 30.54 2.965 9.04 0.878 6.79 23.94 3.526 8.69 1.280 5.66 24.85 4.390 8.57 1.514 w64 2 14.17 24.90 1.757 12.45 0.879 9.07 22.80 2.514 11.32 1.248 9.80 21.55 2.199 11.38 1.161 w64 3 10.19 12.45 1.222 7.91 0.776 5.66 11.36 2.007 7.75 1.369 5.10 10.73 2.104 7.09 1.390 w64 4 12.51 12.45 0.995 10.17 0.813 7.63 11.35 1.487 9.51 1.246 7.21 10.74 1.489 8.65 1.200 w64 5 14.73 15.85 1.076 16.40 1.113 8.92 13.56 1.520 14.16 1.587 7.37 13.55 1.838 13.04 1.769 w64 6 16.97 15.84 0.933 16.39 0.966 11.30 13.65 1.208 14.24 1.260 11.30 13.55 1.199 13.06 1.156 w64 7 19.23 15.84 0.824 16.39 0.852 13.00 13.84 1.065 14.24 1.095 13.30 13.57 1.020 13.07 0.983 w64 8 10.75 9.36 0.871 9.55 0.888 7.62 9.57 1.256 9.70 1.273 7.53 8.97 1.191 9.23 1.226 w64 16 19.78 9.47 0.479 9.60 0.485 14.34 9.67 0.674 11.94 0.833 14.26 9.03 0.633 9.22 0.647 w64 32 18.95 7.88 0.416 7.93 0.418 14.96 7.76 0.519 7.77 0.519 14.99 7.67 0.512 7.70 0.514 w64 64 19.57 6.10 0.312 5.14 0.263 12.93 5.79 0.448 5.07 0.392 12.95 5.83 0.450 5.01 0.387 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio r32 0 13.57 61.07 4.500 15.83 1.166 9.05 50.32 5.560 15.83 1.749 9.04 49.80 5.509 13.56 1.500 r32 1 11.42 30.53 2.673 9.05 0.792 6.79 25.55 3.763 8.76 1.290 5.66 24.88 4.396 9.21 1.627 r32 2 13.57 31.67 2.334 12.47 0.919 9.05 26.26 2.902 12.24 1.352 9.39 26.02 2.771 11.42 1.216 r32 3 10.27 15.83 1.541 10.20 0.993 5.66 13.16 2.325 7.41 1.309 5.09 13.00 2.554 7.57 1.487 r32 4 12.53 13.01 1.038 12.85 1.025 7.43 12.23 1.646 9.65 1.299 7.08 11.43 1.614 9.03 1.275 r32 5 14.70 13.01 0.885 13.01 0.885 8.04 12.27 1.526 11.79 1.466 7.36 11.45 1.556 10.77 1.463 r32 6 16.96 13.01 0.767 13.01 0.767 11.31 12.24 1.082 11.79 1.042 11.30 11.45 1.013 10.78 0.954 r32 7 19.22 13.08 0.681 13.01 0.677 13.00 12.25 0.942 12.24 0.941 13.31 11.44 0.860 10.77 0.809 r32 8 10.74 6.50 0.605 6.51 0.606 7.57 6.12 0.808 5.90 0.779 7.35 5.72 0.778 5.38 0.732 r32 16 19.79 10.45 0.528 9.79 0.495 14.37 10.27 0.715 9.80 0.682 14.27 9.82 0.688 9.20 0.645 r32 32 10.39 2.62 0.252 2.46 0.237 7.54 2.58 0.342 2.45 0.325 7.49 2.46 0.328 2.30 0.307 r32 64 19.57 4.15 0.212 4.15 0.212 12.99 4.22 0.325 4.07 0.313 12.94 3.95 0.305 3.89 0.301 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio r64 0 13.57 63.35 4.668 15.83 1.166 9.05 49.78 5.501 15.84 1.750 9.05 47.56 5.255 13.56 1.498 r64 1 11.17 32.80 2.936 9.05 0.810 6.80 26.01 3.825 8.67 1.275 5.65 24.89 4.405 9.20 1.628 r64 2 13.57 27.16 2.001 12.44 0.917 10.47 24.66 2.355 12.25 1.170 9.29 23.17 2.494 11.11 1.196 r64 3 10.30 13.58 1.318 8.10 0.786 6.02 12.28 2.040 7.44 1.236 5.31 11.51 2.168 7.90 1.488 r64 4 12.56 13.58 1.081 10.74 0.855 6.57 12.22 1.860 9.89 1.505 6.94 11.76 1.695 9.11 1.313 r64 5 14.70 16.97 1.154 15.83 1.077 9.08 14.84 1.634 14.21 1.565 7.36 14.23 1.933 13.10 1.780 r64 6 16.96 16.97 1.001 15.83 0.933 11.36 14.81 1.304 14.28 1.257 11.30 14.21 1.258 13.02 1.152 r64 7 19.23 16.97 0.882 15.83 0.823 12.44 14.81 1.190 14.27 1.147 13.15 14.21 1.081 13.01 0.989 r64 8 10.74 10.25 0.954 9.61 0.895 7.13 10.12 1.419 9.66 1.355 7.49 9.78 1.306 9.06 1.210 r64 16 19.79 10.38 0.525 9.74 0.492 14.20 10.21 0.719 9.68 0.682 14.23 9.81 0.689 9.07 0.637 r64 32 10.46 4.06 0.388 4.11 0.393 6.95 4.13 0.594 4.02 0.578 7.51 3.94 0.525 3.87 0.515 r64 64 19.58 5.13 0.262 5.14 0.262 13.05 5.77 0.442 5.56 0.426 12.95 5.62 0.434 5.50 0.425 |--------------- amd64 ----------------| |----------------- c ------------------| |---------------- llvm ----------------| elt l g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio g1591902 gdaae91d ratio g12a46a7 ratio w8*w64 0 13.81 52.06 3.770 15.84 1.147 9.06 45.26 4.995 15.84 1.748 9.05 45.27 5.002 13.77 1.522 w8*w64 1 12.69 23.36 1.841 11.55 0.910 7.93 23.13 2.917 10.19 1.285 7.93 21.59 2.723 11.03 1.391 w8*w64 2 19.39 23.47 1.210 18.32 0.945 12.91 23.02 1.783 16.17 1.253 11.32 21.48 1.898 14.73 1.301 w8*w64 3 13.05 14.52 1.113 12.47 0.955 7.92 13.49 1.703 10.18 1.285 7.36 13.32 1.810 10.20 1.386 w8*w64 4 16.43 18.73 1.140 15.85 0.965 10.22 19.48 1.906 13.34 1.305 9.36 18.05 1.928 12.45 1.330 w8*w64 5 19.82 18.75 0.946 19.06 0.962 11.32 19.25 1.701 19.39 1.713 10.75 18.14 1.687 18.44 1.715 w8*w64 6 11.62 9.34 0.804 9.56 0.823 6.51 9.70 1.490 9.67 1.485 6.23 9.13 1.465 9.19 1.475 w8*w64 7 13.31 9.35 0.702 9.55 0.717 7.91 9.64 1.219 9.72 1.229 7.08 9.12 1.288 9.25 1.306 w8*w64 8 15.01 9.44 0.629 9.66 0.644 8.82 9.71 1.101 9.73 1.103 7.92 9.03 1.140 9.19 1.160 w8*w64 16 14.28 7.89 0.553 7.94 0.556 7.78 7.77 0.999 7.78 1.000 7.48 7.58 1.013 7.71 1.031 w8*w64 32 15.11 5.63 0.373 5.15 0.341 7.67 5.08 0.662 5.04 0.657 7.41 4.94 0.667 5.02 0.677 w8*w64 64 14.35 3.72 0.259 3.76 0.262 8.68 3.71 0.427 4.03 0.464 8.69 3.65 0.420 4.07 0.468 Compare: https://github.com/MLton/mlton/compare/159190284e12...ea79eb674ece |