Hi, I need to identify folders on an 8 TB drive where the folder name contains one of a list of keywords then copy that folder's entire contents to another drive. There are 8.5 million files and folders on the drive so I need to be able to do this in a batch mode. Can SmartCopyTool do this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Richard, sorry - wish that it could but that's beyond SmartCopyTool's abilities I'm afraid. I suspect it would be relatively easy to do what you want with a Python script though, e.g. this is close to what you describe:
Would you be interested in a quick contract to write such a script? (I don't know Python and don't have time on this project to learn it on the job.). Basically I would need to treewalk a directory structure for folders with names that contain one of a list of keywords (as a substring or a whole word if the folder name contains whitespace). If such a folder is found its entire contents would need to be copied to another drive preserving timestamps and the entire directory structure from the drive letter (this is for a windows machine) down to the lowest file copied. Please let me know. Thanks, Richard
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To run it you'll need to install Python 3.8 (https://www.python.org/downloads/windows/) and edit the source, target and keyword lines to suit your needs then run the script.
You can use Idle, which comes with Python, to edit and run the script.
Let me know if you have any issues ... it works OK in my tests but they weren't exactly exhaustive :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Simon,
Wow. Thank you VERY much. I was starting toteach myself Python so I could write the programmyself but it takes more time than I really have sothis is a huge help.
Not to look a gift horse in the mouth or seem
ungrateful, but could I ask for a couple of what I
hope are simple enhancements?
First, I need to preserve the original timestampsin the copied data. I was going to use a call torobocopy with the options /COPY:DAT and /DCOPY:T.Is there an equivalent in python?
Second, this project is so massive (8 TB of data and8.5 million files and folders to go through) that I cannotdo it in one pass. Also it is possible that the terms I useto identify folders to copy may overlap causing the samedata to be copied twice. Can you add an option to the copythat will prevent copying something again if it's already on
the target drive?
Again, thank you.
Richard
To run it you'll need to install Python 3.8 (https://www.python.org/downloads/windows/) and edit the source, target and keyword lines to suit your needs then run the script.
You can use Idle, which comes with Python, to edit and run the script.
Let me know if you have any issues ... it works OK in my tests but they weren't exactly exhaustive :)
I've updated the Python script to handle directories that already exist at the destination more intelligently - it'll just copy any files that are missing, rather than re-copying everything.
Ironically that's exactly the sort of thing SmartCopyTool handles nicely ... or would, if it had filtering by keyword :)
The script uses shutils.copy2 to copy files and folders (shutil.copytree uses copy2 internally by default) ... copy2 says it tries to preserve file metadata where possible ... apparently on windows that includes modification time but not creation time :-/
Copying files with the original creation time looks like it would be significantly harder, unfortunately... would need to copy files individually, and then transfer the creation timestamp separately:
Simon,
Thank you for the update. I'm still trying
to understand some of the commands inthe original script so this helps a lot.
If you don't mind, how does this statementrun:
if any(word in folder for word in keywords):
I don't see word or folder defined or used beforethis and they aren't reserved words. Am I supposedto substitute them for something?
I've updated the Python script to handle directories that already exist at the destination more intelligently - it'll just copy any files that are missing, rather than re-copying everything.
Ironically that's exactly the sort of thing SmartCopyTool handles nicely ... or would, if it had filtering by keyword :)
The script uses shutils.copy2 to copy files and folders (shutil.copytree uses copy2 internally by default) ... copy2 says it tries to preserve file metadata where possible ... apparently on windows that includes modification time but not creation time :-/
Copying files with the original creation time looks like it would be significantly harder, unfortunately... would need to copy files individually, and then transfer the creation timestamp separately:
Hah, yes sorry that is an obscure bit of code, and very 'Python'. It breaks down to something like:
any(...) ... is the list formed by this expression non-empty [word in folder] ... true if the string 'folder' contains the string 'word' ... [for word in keywords] ... where we will use the name 'word' to refer to every item in the list 'keywords'
It's basically equivalent to something that would look more like this in other languages:
Hah, yes sorry that is an obscure bit of code, and very 'Python'. It breaks down to something like:
any(...) ... is the list formed by this expression non-empty [word in folder] ... true if the string 'folder' contains the string 'word' ... [for word in keywords] ... where we will use the name 'word' to refer to every item in the list 'keywords'
It's basically equivalent to something that would look more like this in other languages:
foreach (string word in keywords)
{
if (folder.Contains(word))
{
// Do the things
break;
}
}
Hi Simon,
I finally learned enough Python to write the programs.It turned out to be more complicated than I originallythought (that's never happened before) and I endedup writing one program to identify folders to copy
based on the presence of keywords in the path thena second program to actually copy each folder and allof its subdirectories. I also added logic to stop thecopy after a certain amount of data was copied withthe ability to pick up where it left off on another run(there are 8 TB of data to filter).
I'd be interested in your opinion of the code and anysuggestions on how I could have done it better if Iactually knew what I was doing if you're interested.
Richard
To run it you'll need to install Python 3.8 (https://www.python.org/downloads/windows/) and edit the source, target and keyword lines to suit your needs then run the script.
You can use Idle, which comes with Python, to edit and run the script.
Let me know if you have any issues ... it works OK in my tests but they weren't exactly exhaustive :)
Hi, I need to identify folders on an 8 TB drive where the folder name contains one of a list of keywords then copy that folder's entire contents to another drive. There are 8.5 million files and folders on the drive so I need to be able to do this in a batch mode. Can SmartCopyTool do this?
Hi Richard, sorry - wish that it could but that's beyond SmartCopyTool's abilities I'm afraid. I suspect it would be relatively easy to do what you want with a Python script though, e.g. this is close to what you describe:
https://stackoverflow.com/questions/40854483/iteratively-or-recursively-filter-directory-and-copy-results-to-new-directory
Simon,
Would you be interested in a quick contract to write such a script? (I don't know Python and don't have time on this project to learn it on the job.). Basically I would need to treewalk a directory structure for folders with names that contain one of a list of keywords (as a substring or a whole word if the folder name contains whitespace). If such a folder is found its entire contents would need to be copied to another drive preserving timestamps and the entire directory structure from the drive letter (this is for a windows machine) down to the lowest file copied. Please let me know. Thanks, Richard
No contract necessary, good opportunity to re-acquaint myself with Python:
https://github.com/machinewrapped/Toybox/blob/master/Python%20Scripts/CopyFoldersBasedOnKeyword.py
To run it you'll need to install Python 3.8 (https://www.python.org/downloads/windows/) and edit the source, target and keyword lines to suit your needs then run the script.
You can use Idle, which comes with Python, to edit and run the script.
Let me know if you have any issues ... it works OK in my tests but they weren't exactly exhaustive :)
Simon,
Wow. Thank you VERY much. I was starting toteach myself Python so I could write the programmyself but it takes more time than I really have sothis is a huge help.
Not to look a gift horse in the mouth or seem
ungrateful, but could I ask for a couple of what I
hope are simple enhancements?
First, I need to preserve the original timestampsin the copied data. I was going to use a call torobocopy with the options /COPY:DAT and /DCOPY:T.Is there an equivalent in python?
Second, this project is so massive (8 TB of data and8.5 million files and folders to go through) that I cannotdo it in one pass. Also it is possible that the terms I useto identify folders to copy may overlap causing the samedata to be copied twice. Can you add an option to the copythat will prevent copying something again if it's already on
the target drive?
Again, thank you.
Richard
No contract necessary, good opportunity to re-acquaint myself with Python:
https://github.com/machinewrapped/Toybox/blob/master/Python%20Scripts/CopyFoldersBasedOnKeyword.py
To run it you'll need to install Python 3.8 (https://www.python.org/downloads/windows/) and edit the source, target and keyword lines to suit your needs then run the script.
You can use Idle, which comes with Python, to edit and run the script.
Let me know if you have any issues ... it works OK in my tests but they weren't exactly exhaustive :)
Search folder names only and copy contents
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/smartcopytool/discussion/general/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
I've updated the Python script to handle directories that already exist at the destination more intelligently - it'll just copy any files that are missing, rather than re-copying everything.
Ironically that's exactly the sort of thing SmartCopyTool handles nicely ... or would, if it had filtering by keyword :)
The script uses
shutils.copy2to copy files and folders (shutil.copytreeuses copy2 internally by default) ... copy2 says it tries to preserve file metadata where possible ... apparently on windows that includes modification time but not creation time :-/Copying files with the original creation time looks like it would be significantly harder, unfortunately... would need to copy files individually, and then transfer the creation timestamp separately:
https://subscription.packtpub.com/book/networking_and_servers/9781783987467/1/ch01lvl1sec13/copying-files-attributes-and-timestamps
Afraid I'll have to leave it as an exercise for the reader ;-)
Simon,
Thank you for the update. I'm still trying
to understand some of the commands inthe original script so this helps a lot.
If you don't mind, how does this statementrun:
if any(word in folder for word in keywords):
I don't see word or folder defined or used beforethis and they aren't reserved words. Am I supposedto substitute them for something?
Richard
I've updated the Python script to handle directories that already exist at the destination more intelligently - it'll just copy any files that are missing, rather than re-copying everything.
Ironically that's exactly the sort of thing SmartCopyTool handles nicely ... or would, if it had filtering by keyword :)
The script uses shutils.copy2 to copy files and folders (shutil.copytree uses copy2 internally by default) ... copy2 says it tries to preserve file metadata where possible ... apparently on windows that includes modification time but not creation time :-/
Copying files with the original creation time looks like it would be significantly harder, unfortunately... would need to copy files individually, and then transfer the creation timestamp separately:
https://subscription.packtpub.com/book/networking_and_servers/9781783987467/1/ch01lvl1sec13/copying-files-attributes-and-timestamps
Afraid I'll have to leave it as an exercise for the reader ;-)
Search folder names only and copy contents
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/smartcopytool/discussion/general/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Hah, yes sorry that is an obscure bit of code, and very 'Python'. It breaks down to something like:
any(...)... is the list formed by this expression non-empty[word in folder]... true if the string 'folder' contains the string 'word' ...[for word in keywords]... where we will use the name 'word' to refer to every item in the list 'keywords'It's basically equivalent to something that would look more like this in other languages:
Simon,
Thanks for the explanation.
Richard
Hah, yes sorry that is an obscure bit of code, and very 'Python'. It breaks down to something like:
any(...) ... is the list formed by this expression non-empty
[word in folder] ... true if the string 'folder' contains the string 'word' ...
[for word in keywords] ... where we will use the name 'word' to refer to every item in the list 'keywords'
It's basically equivalent to something that would look more like this in other languages:
foreach (string word in keywords)
{
if (folder.Contains(word))
{
// Do the things
break;
}
}
Search folder names only and copy contents
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/smartcopytool/discussion/general/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Hi Simon,
I finally learned enough Python to write the programs.It turned out to be more complicated than I originallythought (that's never happened before) and I endedup writing one program to identify folders to copy
based on the presence of keywords in the path thena second program to actually copy each folder and allof its subdirectories. I also added logic to stop thecopy after a certain amount of data was copied withthe ability to pick up where it left off on another run(there are 8 TB of data to filter).
I'd be interested in your opinion of the code and anysuggestions on how I could have done it better if Iactually knew what I was doing if you're interested.
Richard
No contract necessary, good opportunity to re-acquaint myself with Python:
https://github.com/machinewrapped/Toybox/blob/master/Python%20Scripts/CopyFoldersBasedOnKeyword.py
To run it you'll need to install Python 3.8 (https://www.python.org/downloads/windows/) and edit the source, target and keyword lines to suit your needs then run the script.
You can use Idle, which comes with Python, to edit and run the script.
Let me know if you have any issues ... it works OK in my tests but they weren't exactly exhaustive :)
Search folder names only and copy contents
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/smartcopytool/discussion/general/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/