Menu

#147 print full path dot

open
nobody
Evaluator (39)
5
2008-08-15
2008-08-11
No

During debug and development of vesta bridges and models we use our favorite “print dot” as our primary tool for searching dot.

_print(.); results in a representation of the binding in the following format:

[ target_platform="Linux2.6-x86_64",
components=
[ glibc=
[ kind="rpm",
name="glibc",
version="2.3.3-98.61",
arch="x86_64",
..

When the size of dot grows to what it is on large projects, for example, say 86K lines to print the contents of dot, it makes it quite difficult to grep and retrieve full binding paths to what one is looking for.

Is there a generic function which instead “prints full path dot” which would be something like this

target_platform="Linux2.6-x86_64"
target_platform/components/glibc/kind=”rpm”
target_platform/components/glibc/name=”glibc”
target_platform/components/glibc/version=”2.3.3-98.61”
target_platform/components/glibc/arch=”x86_64”

This is more conducive to greping and copy&paste

Currenlty, if we grep for enum.hpp for example,

The output isn’t very helpful when you need the full paths. And it also make it difficult to determine the full path when the hierarchy is extremely large.

$ grep enum.hpp vesta.9367.out
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b0f/600/06>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/88a/6fc/87>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/88a/6fc/c5>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/88a/6fc/f2>,
int_float_mixture_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/ba>,
sign_mixture_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bd>,
udt_builtin_mixture_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bf>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b1f/861/d0>,
is_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/8d8/46b/f5>,
tracking_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a86/af4/d8>,
level_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a86/af4/ca>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b0f/600/cb>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b1f/861/d0>,
int_float_mixture_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/ba>,
sign_mixture_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bd>,
udt_builtin_mixture_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bf> ],
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/88a/6fc/87>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/88a/6fc/c5>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/88a/6fc/f2>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b0f/600/06>,
enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b0f/600/cb>,
level_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a86/af4/ca>,
tracking_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/a86/af4/d8>,
is_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/8d8/46b/f5>,
boost_mpl_aux__preprocessor_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/aa7/7fb/10>,
boost_type_traits_is_enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/822/601/86>,

Comments?
++G

Discussion

  • Greg Czajkowski

    Greg Czajkowski - 2008-08-12

    Logged In: YES
    user_id=393776
    Originator: YES

    There are several human readable, text base formats available for data serialization. Here are some examples:

    Generalized
    JSON
    Turtle, a subset of N3
    XML
    YAML
    Language Specific
    SDL has _print
    Perl has Data::Dumper
    Python has pprint module

    As well as text based and platform independent but not human readable:
    bencode

    The added refinement is the output format that is suitable for grep searches. Which is a flattening of the hierarchy.

    This would be answered by the format. Let’s say we generalize the problem to JSON data types and compare to SDL.

    JSON <-> SDL : Similarity
    Number <-> int : High
    String <-> text : High
    Boolean <-> bool : High
    Array <-> list : High
    Object <-> binding : High
    Null <-> N/A : JSON only
    N/A <-> file : Vesta only
    N/A <-> closure : Vesta only
    N/A <-> err : Vesta only (deprecated)

    Then get inspired by Python and Document Object Model, but only how accesses happen to full path to the object, ie flattened.

    Let’s see how the flattening could be applied to JSON

    List of ints:
    “a_list_of_ints”[0] : 4,
    “a_list_of_ints”[1] : 0x20,
    “a_list_of_ints”[2] : 07531,
    “a_list_of_ints”[3] : -20,

    A List of lists of ints:
    “a_list_of_lists_of_ints”[0],”a_list_of_ints”[0] : 4,
    “a_list_of_lists_of_ints”[0],”a_list_of_ints”[1] : 0x20,
    “a_list_of_lists_of_ints”[0],”a_list_of_ints”[2] : 07531,
    “a_list_of_lists_of_ints”[0],”a_list_of_ints”[3] : -20,
    “a_list_of_lists_of_ints”[1],”b_list_of_ints”[0] : 0,
    “a_list_of_lists_of_ints”[1],”b_list_of_ints”[0] : 50,

    An object of types
    “a_object”{“int”} : 4,
    “a_object”{“string”} : “hello”,
    “a_object”{“bar”} : true,
    “a_object”{“int”} : -20,
    “a_object”{“escaped\”string”} : “escaped\”string”,
    “a_object”{“list”}[0] : 1,
    “a_object”{“list”}[1] : 0,
    “a_object”{“list”}[2],”b_list”[0] : 4,
    “a_object”{“list”}[2],”b_list”[1] : “string”,
    “a_object”{“list”}[3],”c_escaped_\”id_object”{“foo”} : “beef”

    Simplify for readability, compress for readability
    a_object{int} : 4
    a_object{string} : “hello”
    a_object{bar} : true
    a_object{int} : -20
    a_object{“escaped\”string”} : “escaped\”string”
    a_object{list}[0] : 1
    a_object{list}[1] : 0
    a_object{list}[2].b_list[0] : 4
    a_object{list}[2].b_list[1] : “string”
    a_object{list}[3].”c_escaped_\”id_object”{foo} : “beef”

    Apply to SDL, compress bindings using /
    a_object/int = 4
    a_object/string = “hello”
    a_object/bar = true
    a_object/int = -20
    a_object/“escaped\”string” = “escaped\”string”
    a_object/list[0] = 1
    a_object/list[1] = 0
    a_object/list[2]/b_list[0] = 4
    a_object/list[2]/b_list[1] = “string”
    a_object/list[3]/d_binding/e_binding/f_binding/e_list[0] = “string”
    a_object/list[4]/”c_escaped_\”binding”/foo = “beef”
    a_object/file.o = <file: /nfs/sc/disks/bmpvsta3/sid/b19/0be/09>
    a_object/”dash-file-name.o” = <file: /nfs/sc/disks/bmpvsta3/sid/b19/0be/09>
    a_object/closure = <Closure /vesta/vestasys.org/bridges/generics/6/build.ves, line 90, char 14>
    a_object/root=<Model /vesta/vesta.com/platforms/linux/suse/i686/components/glibc/2.3.3-98.61/1/root.ves>
    ./includes/boost/type_traits/is_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/8d8/46b/f5>
    ./includes/boost/python/enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/b0f/600/cb>
    ./includes/boost/mpl/aux_/preprocessor/enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/b1f/861/d0>
    ./includes/boost/numeric/conversion/int_float_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/ba>
    ./includes/boost/numeric/conversion/sign_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bd>
    ./includes/boost/numeric/conversion/udt_builtin_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bf>
    ./includes/boost/numeric/conversion/int_float_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/ba>
    ./includes/boost/numeric/conversion/sign_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bd>
    ./includes/boost/numeric/conversion/udt_builtin_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bf>
    ./includes/boost/numeric/conversion/int_float_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/ba>
    ./includes/boost/numeric/conversion/sign_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bd>
    ./includes/boost/numeric/conversion/udt_builtin_mixture_enum.hpp = <file: /nfs/sc/disks/bmpvsta3/sid/a7b/651/bf>

     
  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837
    Originator: NO

    For the most part, I like the proposal you're making. There
    is one concern I have though, and it goes back to the issue
    of lists. Consider this output form your example:

    a_object/list[2]/b_list[1] = "string"

    The problem I see is that it's not really consistent with
    the SDL syntax. You can't use square brackets to index into
    a list. In fact, the only way to index into a list is with
    the _elem primitive function.

    I'm concerned that this could be confusing for novice or
    casual users.

    Obviously you can't always simply copy and paste the printed
    output of the evaluator into your SDL code. I'm not saying
    that we have to make some new output meet such a high (and
    probably unreasonable) standard. However, it seems to me
    that it's usually pretty obvious which portions of the
    current printed representation aren't valid in SDL code. I
    feel like the output format you're promposing would make
    that less so.

    Of course maybe the answer to this conundrum could be to add
    some syntax for looking up an element in a list by index. I
    think we could even make square brackets do that if we
    wanted. It adds a bit of context-sensitivity to the
    language grammar, but we already have that with the way the
    less-than sign is sometimes a comparator and sometimes the
    start of a list.

    Another possibility, that should be a little easier to
    implement, would be to use parentheses:

    a_object/list(2)/b_list(1)

    The benefit of this choice is that it would require no
    changes to the language parser. You could think of it as
    treating a list like a function that takes the index as an
    argument.

     
  • Greg Czajkowski

    Greg Czajkowski - 2008-08-15

    Logged In: YES
    user_id=393776
    Originator: YES

    Hi Ken,

    >>> The problem I see is that it's not really consistent with the SDL syntax

    It really is not meant to be. It's main purpose is to aid in debug and development of extremely large projects. The output

    >>> Obviously you can't always simply copy and paste the printed output of the evaluator into your SDL code.

    Precisely, only part of the output of _print(.) can be dropped back into SDL. So why is _print(.) or _print(some_binding) used?

    Is it to visually log the results of binding operations, functions, models? Yes
    Is it used to visually bindings during debug? Yes
    Is it used to copy the results somewhere else for the purpose of debugging? Partially?

    When the results are files then not really.. not much can be done with this, except visualize its existence
    enum.hpp=<file: /nfs/sc/disks/bmpvsta3/sid/b0f/600/06>,

    closures are useful during searching, and "discovery" of models/bridges/functions, and thus sometimes using during development:
    a_object/closure = <Closure
    /vesta/vestasys.org/bridges/generics/6/build.ves, line 90, char 14>

    same with models:
    a_object/root=<Model
    /vesta/vesta.com/platforms/linux/suse/i686/components/glibc/2.3.3-98.61/1/root.ves>

    But in all these cases you will mangle and manipulate the results by hand.

    >> However, it seems to me that it's usually pretty obvious which portions of the
    >> current printed representation aren't valid in SDL code.
    >> I feel like the output format you're promposing would make that less so.

    The same is true for this output format. It's really meant for debug and development. Grepping which works best flattened output.

    >> Of course maybe the answer to this conundrum could be to add
    >> some syntax for looking up an element in a list by index.

    That's not the purpose of this RFE. But yes, I was really suprised there was no language syntax for array lookups.

    >> I think we could even make square brackets do that if we
    >> wanted. It adds a bit of context-sensitivity to the
    >> language grammar, but we already have that with the way the
    >> less-than sign is sometimes a comparator and sometimes the
    >> start of a list.

    The brackets are a natural choice, and doing a survey of C/C++/Perl/PHP/Python/Java/C#/etc square brackets are clearly the right natural choices for new and veteran users

    >> Another possibility, that should be a little easier to
    >> implement, would be to use parentheses:
    >>
    >> a_object/list(2)/b_list(1)

    With all due respect when I saw this, I wanted to cry. This is so ambiguous with function calls, that would cause much head scratching and confusion. Please don't consider this at all, and especially out of simplicity of the implementation.

    Furthermore array accesses would be in a completely seperate RFE and are not directly part of this one.

    Only indirectly have we wondered into a syntax for array access.

    BTW. Here is an example flattening my python prototype is doing and grepping results in this:

    ./root/.WD/boost_mpl_aux__preprocessor_enum.hpp/<file: /nfs/sid/aa7/7fb/10>
    ./root/.WD/boost_type_traits_is_enum.hpp/<file: /nfs/sid/822/601/86>
    ./root/.WD/boost_mpl_aux__preprocessor_enum.hpp/<file: /nfs/sid/aa7/7fb/10>
    ./root/.WD/boost_type_traits_is_enum.hpp/<file: /nfs/sid/822/601/86>
    ./root/usr/include/boost/type_traits/is_enum.hpp/<file: /nfs/sid/8d8/46b/f5>
    ./root/usr/include/boost/python/enum.hpp/<file: /nfs/sid/b0f/600/cb>
    ./root/usr/include/boost/mpl/aux_/preprocessor/enum.hpp/<file: /nfs/sid/b1f/861/d0>
    ./root/usr/include/boost/numeric/conversion/int_float_mixture_enum.hpp/<file: /nfs/sid/a7b/651/ba>
    ./root/usr/include/boost/numeric/conversion/sign_mixture_enum.hpp/<file: /nfs/sid/a7b/651/bd>
    ./root/usr/include/boost/numeric/conversion/udt_builtin_mixture_enum.hpp/<file: /nfs/sid/a7b/651/bf>
    ./root/usr/include/boost/numeric/conversion/int_float_mixture_enum.hpp/<file: /nfs/sid/a7b/651/ba>
    ./root/usr/include/boost/numeric/conversion/sign_mixture_enum.hpp/<file: /nfs/sid/a7b/651/bd>
    ./root/usr/include/boost/numeric/conversion/udt_builtin_mixture_enum.hpp/<file: /nfs/sid/a7b/651/bf>
    ./root/usr/include/boost/numeric/conversion/int_float_mixture_enum.hpp/<file: /nfs/sid/a7b/651/ba>
    ./root/usr/include/boost/numeric/conversion/sign_mixture_enum.hpp/<file: /nfs/sid/a7b/651/bd>
    ./root/usr/include/boost/numeric/conversion/udt_builtin_mixture_enum.hpp/<file: /nfs/sid/a7b/651/bf>
    ./root/usr/include/boost/preprocessor/seq/enum.hpp/<file: /nfs/sid/b0f/600/06>
    ./root/usr/include/boost/preprocessor/list/enum.hpp/<file: /nfs/sid/88a/6fc/c5>
    ./root/usr/include/boost/preprocessor/repetition/enum.hpp/<file: /nfs/sid/88a/6fc/f2>
    ./root/usr/include/boost/preprocessor/enum.hpp/<file: /nfs/sid/88a/6fc/87>
    ./root/usr/include/boost/serialization/level_enum.hpp/<file: /nfs/sid/a86/af4/ca>
    ./root/usr/include/boost/serialization/tracking_enum.hpp/<file: /nfs/sid/a86/af4/d8>
    ./root/usr/include/boost/serialization/level_enum.hpp/<file: /nfs/sid/a86/af4/ca>
    ./root/usr/include/boost/serialization/tracking_enum.hpp/<file: /nfs/sid/a86/af4/d8>

    This is a good task for an intern to incorporate these ideas in a utility function which is capable of handling all types

     
  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837
    Originator: NO

    > That's not the purpose of this RFE. [...] Furthermore
    > array accesses would be in a completely seperate RFE and
    > are not directly part of this one.

    I didn't mean to muddy the waters with that digression. I
    agree that if we decide syntax for array indexing is
    something we should add to the language then it should be
    covered in a separate RFE. However I think causing
    confusion with this new output format is still a valid
    concern and worth discussing.

    I think it's pretty clear that there's benefit to the user
    in being as close as reasonable to valid SDL syntax. If
    similarity with SDL code was a non-issue, we could use any
    format, such as one of the JSON-inspired examples you gave.

    I'm not opposed to using the format you suggested with
    square brackets for list indexes. The same convention
    (which most programmers are familiar with) is actually used
    by one kind of debug output produced by the evaluator (when
    using the -dependency-check command-line switch). That
    doesn't concern me as much because it's usually only used by
    people working on modifying the evaluator (who would
    obviously be very, very familiar with what is and isn't
    valid in SDL). Here we're talking about something that
    could become a new default for the output produced by the
    _print primitive function and the -result switch. I think
    that warrants considering what expectations it might create
    in the minds of users.

    Let's talk about a few more practicalities of implementing
    this.

    Which portions of the evalautor's output should use this new
    output format? I think the _print primitive function and
    the -result switch are the obvious choices, but there are a
    lot of other places that use the same code. Error messages,
    debug messages, and the report produced by the -printtimings
    switch are all examples that include printed representations
    of values. In some cases, I don't think this new format
    would be appropriate. We can probably just leave it at
    _print and -result for now and deal and leave the other code
    using the existing value printing code. (The alternative
    would be to audit all the code that uses the existing value
    printing functions to come up with a complete list and make
    choices about which method should be used for each.)

    Since I think we're going to retain the current value
    printing code for some cases, we need some way to specify
    whether to use the "classic" printed representation for
    values or the new proposed one for _print and -result. I
    don't think we need a command-line switch, so a vesta.cfg
    setting is probably the right choice. (We can make the new
    output format the default if the setting isn't present in
    the config file.)

    Inside the evaluator's code, I think this new output format
    would have to be produced by a parallel set of functions on
    the ValC class and its sub-classes. (It doesn't seem
    advisable to try and combine it into the existing PrintD
    functions.) This could be an overloaded version of the
    existing PrintD function. This new format will require
    passing a prefix path string into each level of recursive
    call.

    I think this new set of functions will need to support the
    existing "verbose" feature of _print and ValC::PrintD.
    That's mainly an issue for text values, where it's used to
    force the printing of a text value even if it's stored in a
    file (i.e. reading the file and printing it to standard
    output). We might want to suppress the verbose printing of
    closures with this new output format.

    I think we'll want to add a new optional argument to _print
    which will be the prefix string used to represent the path
    to the value being printed. For example, when you talk
    about the example of "_print(.)", you seem to want "."
    included in the path that's being printed. In the simple
    case of "_print(some_variable_name)", I think we can
    reasonably default the prefix. However, the argument to
    _print can be an arbitrarily complicated expression. I
    think the prefix should default to the empty string for the
    non-variable case.

     
  • Kenneth C. Schalk

    • labels: 693271 --> Evaluator
     

Log in to post a comment.

MongoDB Logo MongoDB