Menu

#737 Migrate source repository from svn to git

None
open
nobody
None
5
2024-02-02
2021-03-08
No

The topic keeps coming up on the mailing list, so let's create a ticket for discussion.

Some of the advantages of keeping svn:

  • Not having to migrate
  • The developers are familiar with it
  • Has a mechanism that keeps e.g. the revision number in files up to date
  • (perhaps more)

Some of the advantages of migrating to git:

  • Is the most widespread contemporary version control system
  • Decentralized – everyone has a complete copy; forks remain somewhat interoperable
  • Local branches simplify work on multiple features in parallel
  • Has a staging area – allows e.g. to put just some of the changes to a file in a commit
  • Supports submodules that link to different repositories, e.g. for libraries
  • Makes it easier to move the main repository to a different code hosting provider
  • Records separate author and committer IDs – useful when applying patches
  • Can be used in an svn-like manner with linear history
  • (perhaps more)

The actual migration should ideally be performed with a tool like reposurgeon with enough manual intervention to yield a clean repository.

Steps involved:

  • Creating a map file that maps svn user names to git IDs. See [support-requests:#141].
  • Perhaps splitting the repository to separate the code from the web page etc.
  • Making sure that branches and tags get translated, properly – the hardest part
  • Finding a substitute for the revision number macro in SDCC's source code
  • Hosting the repository
  • (perhaps more)

Let's discuss below!

Discussion

1 2 > >> (Page 1 of 2)
  • Maarten Brock

    Maarten Brock - 2021-03-08

    SVN has the 'external' property which seems similar to git submodules.

     
  • Sebastian Riedel

    .gitignorefor *.rel, *.lst, *.sym, *.rul, *.dep etc. being part of the repository. This will hide those files in status for everybody and adding such files to a commit must be confirmed.
    SVN does this with a client side setting (svn:ignore) as far as I understand, but everybody has to set that up themself.

    Makes it easier to move the main repository to a different code hosting provider

    That also goes for automatic source code mirrors, which can even be located in the local network on a pi. (A handy thing when your internet connection or DNS breaks)
    Or in general being able to commit or read logs in such a situation. (For me that happened twice last year)

    Incremental revision numbers are certainly a big pro of svn, which a decentralized system never could accomplish.

     
    • Maarten Brock

      Maarten Brock - 2021-03-08

      svn:ignore is a property and thus part of the repository.

       
    • Benedikt Freisen

      Adding .gitignore files would make sense, either way, considering how many people use the repository with git-svn, already.

      Actually, I have a local backup of the svn repository from earlier experiments with reposurgeon and update it every once in a while, but it is not particularly useful without a server.
      (... And then there's the git-svn checkout that I work with and create patches with and the svn checkout that I apply the patches to to commit them. – Is that a common workflow?)

      I believe that there is a way to get the sequential number of a commit in a particular branch.
      Something like 12035 (master) or 12042 (feature-foo) would be a reasonable substitute for most use cases of a revision number.

       
  • Benedikt Freisen

    Found another point for git:
    git blame can be configured to ignore commits listed in a specified file and can thereby maintain sensible output across bulk reformatting commits.

     
  • Philipp Klaus Krause

    Two missing aspects:

    • svn can be used via git-svn by users that are more familiar with git.
    • easier branch handling in svn - each branch is just another directory in the repository, while in git branches are something that requires special handling.

    For me working with branches (and thus on multiple features in parallel) feels much easier in svn.

     

    Last edit: Philipp Klaus Krause 2021-03-11
    • Sebastian Riedel

      easier branch handling in svn - each branch is just another directory in the repository, while in git branches are something that requires special handling.

      The more I read about it, the more it sounds like SVN does not really have branches and is just capable of copying and merging folders instead. Which leads to losing commit history when merging, as well as not knowing which revisions branches are based on.
      At least I fail to see how https://sourceforge.net/p/sdcc/code/12088/ is connected to https://sourceforge.net/p/sdcc/code/HEAD/tree/branches/next/sdcc/ via commit data.

      Well, you could say git also doesn't know a thing about branches. Git is just a commit network and branches as well as tags just reference a commit.
      And you can't have more than one commit checked out at a time. (You can have local clones with different commits checked out, but I don't think anybody does this)
      Git is certainly not capable of cloning just a subdirectory or pulling updates for just a subdirectory.

       
      • Benedikt Freisen

        And you can't have more than one commit checked out at a time. (You can have local clones with different commits checked out, but I don't think anybody does this)

        You can have multiple working trees in the same repository, though.
        See https://git-scm.com/docs/git-worktree.

         
  • Sergey Belyashov

    Moreover, current sdcc directory tree is not cannonical. Imho, it should be splitted by several repostories (i.e web, etc)

     
    • Benedikt Freisen

      I think that moving web and extra to submodules while keeping the overall directory structure is more realistic, because it looks like there used to be more directories in trunk that were moved around over time.

       
      • Sergey Belyashov

        submodules are used when some dependenvies are required. Web and code are independed projects, imho.

         
        • Benedikt Freisen

          We can omit the submodule links to web and extra in the main repository.
          The point is that we cannot really move around anything else.
          Either way, there has to be a clean conversion of the full repository, first.

           
  • Benedikt Freisen

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -23,7 +23,7 @@
    
     Steps involved:
    
    -* Creating a map file that maps svn user names to git  IDs
    +* Creating a map file that maps svn user names to git  IDs. See [support-requests:#141].
    
     * Perhaps splitting the repository to separate the code from the web page etc.
     * Making sure that branches and tags get translated, properly – the hardest part
     * Finding a substitute for the revision number macro in SDCC's source code
    
    • Group: -->
     

    Related

    Support Requests: #141

  • Benedikt Freisen

    I have posted the contents of my preliminary author ID map file as [support-requests:#141], a private ticket that is only visible to project members.

     

    Related

    Support Requests: #141


    Last edit: Benedikt Freisen 2021-03-13
  • Sergey Belyashov

    Any progress?

     
    • Benedikt Freisen

      Partial branches and tags appear to be the biggest problem in the existing repository.
      The conversion tool cannot recognize them reliably, which means that some branches are disjoint from the rest of the repository history.
      Every single one of those branches or tags would have to be reattached, manually.

       
      • Sergey Belyashov

        Is it acceptable to keep only trunk and tags history? Because branches are used as temporary repository for investigations.

         
        • Benedikt Freisen

          In git semantics, deleting a branch deletes all its non-merged history and this is also what reposurgeon does by default when converting an svn repository.
          It therefore makes sense to keep all the branches that still exist in the most recent revision.
          The resulting git repository is then less than 60 MB in size.

           
          • Benedikt Freisen

            I have done some experimentation:
            The .git directory grows from approximately 54.5 MiB to 57.5 MiB if all the deleted branches are preserved, as well. We can afford that.

             
            • Sergey Belyashov

              Sounds good.

               
  • Philipp Klaus Krause

    IMO, such a change should not be done unless there is a clear consensus among SDCC developers.

     
    • Sergey Belyashov

      As I remember, there are no one says strictly "NO".

      I suggest to do migration within 3 steps:

      1. GIT is mirror of SVN, all commits are done to SVN. Merge every day.
      2. SVN is mirror of GIT, all commits are done to GIT. Merge master->trunk every day.
      3. SVN freezed, not more merges to SVN.

      Most of usage issues will be solved on 1st step.

       
      • Benedikt Freisen

        Well, we are still at step 0: Figure out how and how well it even works.
        The outcome will have a significant impact on the kind of consensus that can be reached.
        Besides, I would skip step 2.

         
        • Sergey Belyashov

          Step2 is needed to developers, who cannot use GIT. They will continue to work with SVN, but commits are done via tickets.

          Also, it is needed by admins to prepare snapshot bulds migration. This step may be very long: months or years...

           
          • Benedikt Freisen

            Actually, I would group most of that into step 0.
            Switching to git before everyone and everything involved in SDCC's development can use it is kind of pointless.

             
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB