You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(3) |
Feb
|
Mar
(9) |
Apr
|
May
|
Jun
(3) |
Jul
(8) |
Aug
(19) |
Sep
(20) |
Oct
(4) |
Nov
(8) |
Dec
(4) |
| 2011 |
Jan
(6) |
Feb
(5) |
Mar
(4) |
Apr
(3) |
May
(10) |
Jun
(5) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(32) |
Dec
(7) |
| 2012 |
Jan
(3) |
Feb
|
Mar
(10) |
Apr
(11) |
May
(7) |
Jun
(3) |
Jul
(17) |
Aug
(1) |
Sep
(2) |
Oct
(11) |
Nov
(1) |
Dec
|
| 2013 |
Jan
(2) |
Feb
(17) |
Mar
(7) |
Apr
(11) |
May
(2) |
Jun
(5) |
Jul
(35) |
Aug
(8) |
Sep
(23) |
Oct
(7) |
Nov
(18) |
Dec
(35) |
| 2014 |
Jan
(34) |
Feb
(3) |
Mar
(41) |
Apr
(38) |
May
(23) |
Jun
(15) |
Jul
(32) |
Aug
|
Sep
(18) |
Oct
(13) |
Nov
(8) |
Dec
(6) |
| 2015 |
Jan
(23) |
Feb
(17) |
Mar
(13) |
Apr
(49) |
May
(28) |
Jun
(26) |
Jul
(28) |
Aug
(15) |
Sep
(21) |
Oct
(17) |
Nov
(17) |
Dec
(15) |
| 2016 |
Jan
(3) |
Feb
(22) |
Mar
(16) |
Apr
(11) |
May
(24) |
Jun
(1) |
Jul
(14) |
Aug
(30) |
Sep
(43) |
Oct
(21) |
Nov
(17) |
Dec
(12) |
| 2017 |
Jan
(19) |
Feb
(8) |
Mar
(11) |
Apr
(11) |
May
(6) |
Jun
(8) |
Jul
(12) |
Aug
(22) |
Sep
(9) |
Oct
(12) |
Nov
(14) |
Dec
(7) |
| 2018 |
Jan
(5) |
Feb
(8) |
Mar
(14) |
Apr
(6) |
May
(1) |
Jun
(1) |
Jul
(4) |
Aug
(11) |
Sep
(5) |
Oct
|
Nov
(14) |
Dec
(10) |
| 2019 |
Jan
(2) |
Feb
(1) |
Mar
(3) |
Apr
(2) |
May
(2) |
Jun
(12) |
Jul
(4) |
Aug
(1) |
Sep
(3) |
Oct
(12) |
Nov
(2) |
Dec
|
| 2020 |
Jan
(4) |
Feb
(3) |
Mar
(4) |
Apr
|
May
(1) |
Jun
(2) |
Jul
(2) |
Aug
|
Sep
(3) |
Oct
(7) |
Nov
|
Dec
|
| 2021 |
Jan
(4) |
Feb
(4) |
Mar
(5) |
Apr
(1) |
May
|
Jun
(5) |
Jul
(3) |
Aug
(5) |
Sep
(4) |
Oct
(3) |
Nov
(1) |
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
(1) |
Aug
|
Sep
(4) |
Oct
|
Nov
(2) |
Dec
|
| 2023 |
Jan
(9) |
Feb
(3) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
(8) |
Aug
|
Sep
|
Oct
|
Nov
(7) |
Dec
|
| 2024 |
Jan
|
Feb
|
Mar
|
Apr
(8) |
May
(1) |
Jun
(5) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
|
| 2026 |
Jan
(4) |
Feb
(6) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: <Gui...@ce...> - 2026-03-16 09:32:02
|
Hello, This is an issue we have seen recently with the removal of HSM request deduplication in 2.15 as well. The issue is not only related to FS scans. It can also happen when only reading changelogs. After an archive is sent to the MDT, an HSM changelog is generated by Lustre with the HE_STATE flag. Robinhood reads that changelog and updates the state of the file from archiving to new (or modified if the file is dirty). There is no way for robinhood to tell whether an archive is in progress (or waiting in the MDT's queue). So the current behavior is to remove the archiving flag. This could be patched to not update the state of the file when it is "archiving" but if we miss the "archived" changelog the file will stay in the archiving state. What we did to solve this issue was to revert the patch in Lustre that removes the deduplication of HSM requests. That way robinhood can send multiple archives without problems. The patch in Lustre that introduced the issue is associated to this jira ticket LU-13651 if you are interested. If your HSM backend cannot handle multiple archives of the same file, then you probably need to revert this patch. We have made a patch in Lustre to set files dirty when multiple archives are sent and some of them fail: https://jira.whamcloud.com/browse/LU-19829 Regards, Guillaume ________________________________ De : Kaizaad Bilimorya <ka...@sh...> Envoyé : vendredi 13 mars 2026 23:09 À : rob...@li... Objet : [robinhood-support] FS scan_interval and New or Modified Files RobinHood Version: 3.2.0-2 Lustre Version: 2.15.7 We have an issue with RobinHood (or possibly our config) when files sent to the Lustre HSM Coordinator take longer than the FS "scan_interval" to complete being written to tape. Our file system scan takes < 5min so we have it running frequently (~ every 10m) but we also have Lustre Changelogs running. We don't really need to run the FS_Scan so frequently but since it is so quick we thought it wouldn't matter and it would be a "just in case" something happens with the Changelog reader. "New" or "modified" files are correctly sent to the Lustre HSM Coordinator when the RobinHood "lhsm_archive" policy is triggered and in RobinHood they are updated and show up as "lhsm.status : archiving" (no problems so far). Now if a FS_Scan gets triggered before the file has completed being written to tape, the RobinHood FS_Scan will update the "lhsm.status" to either "new" or "modified" again. Then on the next "lhsm_archive" policy run, the file will once again be sent to the Lustre HSM Coordinator resulting in multiple instances of the same file being queued up or actively being written to tape. The fix (besides increasing the scan_interval) doesn't seem like an easy conditional check of "when FS_Scan is run, don't update lhsm.status if it equals archiving" since files can get stuck in "lhsm.status : archiving" if we run a "mdt.*.hsm_control=purge" (which we sometimes do). I don't think this is a common issue since most people have a large FS "scan_interval" but I thought I would bring it up in case others have seen this. Thanks -k -- Kaizaad Bilimorya Systems Administrator - SHARCNET | http://www.sharcnet.ca Digital Research Alliance of Canada |
|
From: Kaizaad B. <ka...@sh...> - 2026-03-13 22:40:28
|
RobinHood Version: 3.2.0-2 Lustre Version: 2.15.7 We have an issue with RobinHood (or possibly our config) when files sent to the Lustre HSM Coordinator take longer than the FS "scan_interval" to complete being written to tape. Our file system scan takes < 5min so we have it running frequently (~ every 10m) but we also have Lustre Changelogs running. We don't really need to run the FS_Scan so frequently but since it is so quick we thought it wouldn't matter and it would be a "just in case" something happens with the Changelog reader. "New" or "modified" files are correctly sent to the Lustre HSM Coordinator when the RobinHood "lhsm_archive" policy is triggered and in RobinHood they are updated and show up as "lhsm.status : archiving" (no problems so far). Now if a FS_Scan gets triggered before the file has completed being written to tape, the RobinHood FS_Scan will update the "lhsm.status" to either "new" or "modified" again. Then on the next "lhsm_archive" policy run, the file will once again be sent to the Lustre HSM Coordinator resulting in multiple instances of the same file being queued up or actively being written to tape. The fix (besides increasing the scan_interval) doesn't seem like an easy conditional check of "when FS_Scan is run, don't update lhsm.status if it equals archiving" since files can get stuck in "lhsm.status : archiving" if we run a "mdt.*.hsm_control=purge" (which we sometimes do). I don't think this is a common issue since most people have a large FS "scan_interval" but I thought I would bring it up in case others have seen this. Thanks -k -- Kaizaad Bilimorya Systems Administrator - SHARCNET | http://www.sharcnet.ca Digital Research Alliance of Canada |
|
From: Daniel H. <dh...@ge...> - 2026-02-25 15:01:31
|
Hi support, I’m working on implementing robin hood on one of our lustre systems. In order to optimize the scans, I would like to only scan directories rather than each individual file. I believed this to be possible during my initial research but have been struggling to achieve this. Can you advise on whether or not this is possible and if so, how to implement it? Thank you! Best, Dan Hoge Systems Engineer Geneva Trading This e-mail message (together with any attached documents) is strictly confidential and intended solely for the addressee (including the addressee's employing organization, assigns and affiliates). It may contain information that is proprietary to Geneva Trading USA LLC, its affiliates and assigns. It is controlled by law, or is covered by legal, professional or other privilege. If you are not the intended addressee nor associated with the intended addressee’s organization, you must not use, disclose or copy this transmission, and are asked to notify the sender of its receipt. Please be further advised that the unauthorized interception or retrieval of e-mail may be a criminal violation of the Electronic Communications Privacy Act. |
|
From: Angel de V. <ang...@ia...> - 2026-02-20 10:22:30
|
Hello,
as far as I understand, Robinhood is not working well when applying a
policy rule but matching only the files of a given user. Is this a bug
or am I missing something?
My robinhodd version is:
,----
| $ robinhood --version
|
| Product: robinhood
| Version: 3.2.0-1
| Build: 2026-01-26 12:37:47
`----
and I have defined the following policy:
,----
| define_policy cleanup1y {
| scope { type != directory }
| status_manager = none;
| default_action = common.unlink;
| default_lru_sort_attr = last_access;
| }
|
| cleanup1y_rules {
| ignore { last_access < 1d }
| ignore_fileclass = empty_files;
|
| rule default {
| condition { last_access > 365d }
| }
| }
|
| cleanup1y_trigger {
| trigger_on = global_usage;
| high_threshold_pct = 44.98%;
| low_threshold_pct = 44.96%;
| # max_action_count = 8;
| check_interval = 15min;
| }
`----
Now, when I run the policy without a specific target, there are no
policy executions, as the current ussage is below the
"low_threshold_pct", so all good:
,----
| $ robinhood --run=cleanup1y --once -f /etc/robinhood.d/basto.conf
| 2026/02/20 09:36:16 [2490067/1] CheckFS | '/storage' matches mount point '/storage', type=lustre, fs=10.1.1.7@tcp:10.1.1.8@tcp:/storage
| 2026/02/20 09:36:16 [2490067/2] cleanup1y | Current usage max is 44.20%
| 2026/02/20 09:36:16 [2490067/1] Main | cleanup1y: policy run terminated (rc = 0).
| 2026/02/20 09:36:16 [2490067/1] Main | All tasks done! Exiting.
`----
But if I try to apply the same policy but limited to the files of a
given user, then robinhood happily starts executing the policy for a
number of files, despite the fact that the global usage is below the
"low_threshold_pct":
,----
| angelv-adm@diva:/scratch/angelv-adm/robinhood$ sudo /scratch/angelv-adm/robinhood/robin/sbin/robinhood --run=cleanup1y --target=user:anegri-ext --once -f /etc/robinhood.d/basto.conf
| 2026/02/20 09:38:35 [2492813/1] CheckFS | '/storage' matches mount point '/storage', type=lustre, fs=10.1.1.7@tcp:10.1.1.8@tcp:/storage
| 2026/02/20 09:38:35 [2492813/2] cleanup1y | Checking policy rules for user anegri-ext
| 2026/02/20 09:38:35 [2492813/2] cleanup1y | Building policy list - last full FS Scan: 2026/02/19 17:37:58
| 2026/02/20 09:38:35 [2492813/2] cleanup1y | Starting policy run on 'anegri-ext' user files
| 2026/02/20 09:38:47 [2492813/3] cleanup1y | Executing policy action on: 0x20000a04a:0x1:0x0 (/storage/scratch/anegri/C-EAGLE/onlyStars/CE-0/groups_000_z014p003)
| 2026/02/20 09:38:47 [2492813/4] cleanup1y | Executing policy action on: 0x20000a04a:0x2:0x0 (/storage/scratch/anegri/C-EAGLE/onlyStars/CE-0/groups_000_z014p003FileOffsets.hdf5)
| [...]
`----
Any ideas if this can be done in some other way or if a fix would be
simple to implement?
Thanks
--
Ángel de Vicente
Research Software Engineer (Supercomputing and BigData)
Tel.: +34 922-605-747
---------------------------------------------------------------------------------------------
AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
|
|
From: Thomas R. <t....@gs...> - 2026-02-07 08:12:58
|
Seems the reported number of idle threads is not so worrisome after all:
My Robinhood has now caught up with the changelogs on the MDS.
For the record I am reporting one of the last stats while it was still busy,
2026/02/07 06:09:59 [19186/1] STATS | ==== EntryProcessor Pipeline Stats ===
2026/02/07 06:09:59 [19186/1] STATS | Idle threads: 19
2026/02/07 06:09:59 [19186/1] STATS | Id constraints count: 1831 (hash min=0/max=9/avg=0.1)
2026/02/07 06:09:59 [19186/1] STATS | Name constraints count: 1824 (hash min=0/max=3/avg=0.1)
2026/02/07 06:09:59 [19186/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/02/07 06:09:59 [19186/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/02/07 06:09:59 [19186/1] STATS | 1: GET_INFO_DB | 1214 | 0 | 616 | 51625 | 0.33 |
2026/02/07 06:09:59 [19186/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 34542 | 0.39 |
2026/02/07 06:09:59 [19186/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 51693 | 0.00 |
2026/02/07 06:09:59 [19186/1] STATS | 4: DB_APPLY | 0 | 1 | 0 | 51693 | 0.90 |
2026/02/07 06:09:59 [19186/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 51693 | 0.01 |
2026/02/07 06:09:59 [19186/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
Note the "Idle threads: 19" here.
This number could be "32", because that's the current nb_threads.
So, as Thomas stated before, the STATS table does not make visible all the working threads - here it would seem there is only one, doing DB_APPLY.
But obviously, there are 12 more threads doing something.
Regards,
Thomas
On 2/4/26 5:41 PM, Thomas Roth wrote:
> Dear Thomas,
>
> indeed the stat dumps always look like that, I have never seen more than 1 or 2 in the column "Curr" of the GET_INFO_DB stat.
> I have reduced the overall number of threads to 32 and the specific one for the FS operations to 24.
>
> Does the number of constraints or rather max_pending_operations have an effect?
> The documentation states that the default value for max_pending_operations = 10000. To test that, I put this value explicitly into robinhood.conf.
> This simply blows up the "Wait" values for GET_INFO_DB, probably as it must if 10k can be pending.
> The rest of the stats looks the same, e.g.
>
>
> 2026/02/04 17:35:43 [5311/1] STATS | Idle threads: 28
> 2026/02/04 17:35:43 [5311/1] STATS | Id constraints count: 9976 (hash min=0/max=12/avg=0.7)
> 2026/02/04 17:35:43 [5311/1] STATS | Name constraints count: 9849 (hash min=0/max=6/avg=0.6)
> 2026/02/04 17:35:43 [5311/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
> 2026/02/04 17:35:43 [5311/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
> 2026/02/04 17:35:43 [5311/1] STATS | 1: GET_INFO_DB | 6375 | 0 | 3599 | 6965 | 0.44 |
> 2026/02/04 17:35:43 [5311/1] STATS | 2: GET_INFO_FS | 0 | 1 | 1 | 5060 | 9.92 |
> 2026/02/04 17:35:43 [5311/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 7147 | 0.00 |
> 2026/02/04 17:35:43 [5311/1] STATS | 4: DB_APPLY | 0 | 0 | 0 | 7147 | 1.21 |
> 2026/02/04 17:35:43 [5311/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 7233 | 0.01 |
> 2026/02/04 17:35:43 [5311/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
>
>
> Best regards
> Thomas
>
>
> On 2/4/26 16:00, Tho...@CE... wrote:
>> Dear Thomas,
>>
>> Indeed the stats show 1 single active thread, but as this display is lockless for performance reasons, the current operations may be moving while
>> the stats are displayed, and thus give a false view of what's really going on.
>> Do all other stat dumps look the same?
>>
>> Otherwise, I'm concerned about the 10.08ms for "GET_INFO_FS" (basically stat()+getstripe()). It's quite a high latency (if operations were
>> sequential, it would only make 100 stat per sec...).
>> I wonder if having too much threads querying the lustre client might not be
>> counterproductive.
>>
>> There is a way to fine tune the number of threads allowed by pipeline stage, to have a high parallelism on some operations (e.g. DB) while
>> restricting the number of simultaneous calls to the FS.
>>
>> Regards,
>> Thomas
>>
>>
>> -----Message d'origine-----
>> De : Thomas Roth <t....@gs...>
>> Envoyé : lundi 2 février 2026 21:58
>> À : 'rob...@li...' <rob...@li...>
>> Objet : [robinhood-support] idle threads
>>
>> Hi all,
>>
>> I have Robinhood v3.2 running on a Lustre 2.15, and might have misconfigured / misunderstood the thread count.
>>
>> The Robinhood box has 96 cores, so I have set the number of threads to 96 (EntryProcessor {nb_threads = 96;})
>>
>> When checking the robinhood.log, most cores do nothing:
>>
>> 2026/02/02 21:46:58 [4479/1] STATS | ==== EntryProcessor Pipeline Stats ===
>> 2026/02/02 21:46:58 [4479/1] STATS | Idle threads: 95
>> 2026/02/02 21:46:58 [4479/1] STATS | Id constraints count: 100 (hash min=0/max=3/avg=0.0)
>> 2026/02/02 21:46:58 [4479/1] STATS | Name constraints count: 98 (hash min=0/max=2/avg=0.0)
>> 2026/02/02 21:46:58 [4479/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
>> 2026/02/02 21:46:58 [4479/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
>> 2026/02/02 21:46:58 [4479/1] STATS | 1: GET_INFO_DB | 57 | 0 | 41 | 38819 | 0.35 |
>> 2026/02/02 21:46:58 [4479/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 26390 | 10.08 |
>> 2026/02/02 21:46:58 [4479/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 37960 | 0.00 |
>> 2026/02/02 21:46:58 [4479/1] STATS | 4: DB_APPLY | 1 | 1 | 0 | 37958 | 1.17 | 2.64% batched (avg batch size: 3.8)
>> 2026/02/02 21:46:58 [4479/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 38823 | 0.01 |
>> 2026/02/02 21:46:58 [4479/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
>>
>> This is quite single-threaded ;-(
>>
>> Right now, the file system is not in production and quite idle, so Robinhood is working to close a gap of ~100M changelog entries.
>> But I am afraid that with this configuration, once in production, the changelogs will run away and fill up the MDS disk.
>>
>> Regards,
>> Thomas
>>
>>
>> --
>> --------------------------------------------------------------------
>> Thomas Roth
>> Department: IT
>> Location: SB3 2.291
>> Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
>>
>> GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
>>
>> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung:
>> Prof. Dr. Thomas Nilsson, Dr. Katharina Stummeyer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzende des GSI-Aufsichtsrats:
>> State Secretary / Ministerialrätin Dr. Andrea Fischer
>>
>>
>>
>> _______________________________________________
>> robinhood-support mailing list
>> rob...@li...
>> https://lists.sourceforge.net/lists/listinfo/robinhood-support
>
|
|
From: Thomas R. <t....@gs...> - 2026-02-04 16:41:51
|
Dear Thomas,
indeed the stat dumps always look like that, I have never seen more than 1 or 2 in the column "Curr" of the GET_INFO_DB stat.
I have reduced the overall number of threads to 32 and the specific one for the FS operations to 24.
Does the number of constraints or rather max_pending_operations have an effect?
The documentation states that the default value for max_pending_operations = 10000. To test that, I put this value explicitly into robinhood.conf.
This simply blows up the "Wait" values for GET_INFO_DB, probably as it must if 10k can be pending.
The rest of the stats looks the same, e.g.
2026/02/04 17:35:43 [5311/1] STATS | Idle threads: 28
2026/02/04 17:35:43 [5311/1] STATS | Id constraints count: 9976 (hash min=0/max=12/avg=0.7)
2026/02/04 17:35:43 [5311/1] STATS | Name constraints count: 9849 (hash min=0/max=6/avg=0.6)
2026/02/04 17:35:43 [5311/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/02/04 17:35:43 [5311/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/02/04 17:35:43 [5311/1] STATS | 1: GET_INFO_DB | 6375 | 0 | 3599 | 6965 | 0.44 |
2026/02/04 17:35:43 [5311/1] STATS | 2: GET_INFO_FS | 0 | 1 | 1 | 5060 | 9.92 |
2026/02/04 17:35:43 [5311/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 7147 | 0.00 |
2026/02/04 17:35:43 [5311/1] STATS | 4: DB_APPLY | 0 | 0 | 0 | 7147 | 1.21 |
2026/02/04 17:35:43 [5311/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 7233 | 0.01 |
2026/02/04 17:35:43 [5311/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
Best regards
Thomas
On 2/4/26 16:00, Tho...@CE... wrote:
> Dear Thomas,
>
> Indeed the stats show 1 single active thread, but as this display is lockless for performance reasons, the current operations may be moving while the stats are displayed, and thus give a false view of what's really going on.
> Do all other stat dumps look the same?
>
> Otherwise, I'm concerned about the 10.08ms for "GET_INFO_FS" (basically stat()+getstripe()). It's quite a high latency (if operations were sequential, it would only make 100 stat per sec...).
> I wonder if having too much threads querying the lustre client might not be
> counterproductive.
>
> There is a way to fine tune the number of threads allowed by pipeline stage, to have a high parallelism on some operations (e.g. DB) while restricting the number of simultaneous calls to the FS.
>
> Regards,
> Thomas
>
>
> -----Message d'origine-----
> De : Thomas Roth <t....@gs...>
> Envoyé : lundi 2 février 2026 21:58
> À : 'rob...@li...' <rob...@li...>
> Objet : [robinhood-support] idle threads
>
> Hi all,
>
> I have Robinhood v3.2 running on a Lustre 2.15, and might have misconfigured / misunderstood the thread count.
>
> The Robinhood box has 96 cores, so I have set the number of threads to 96 (EntryProcessor {nb_threads = 96;})
>
> When checking the robinhood.log, most cores do nothing:
>
> 2026/02/02 21:46:58 [4479/1] STATS | ==== EntryProcessor Pipeline Stats ===
> 2026/02/02 21:46:58 [4479/1] STATS | Idle threads: 95
> 2026/02/02 21:46:58 [4479/1] STATS | Id constraints count: 100 (hash min=0/max=3/avg=0.0)
> 2026/02/02 21:46:58 [4479/1] STATS | Name constraints count: 98 (hash min=0/max=2/avg=0.0)
> 2026/02/02 21:46:58 [4479/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
> 2026/02/02 21:46:58 [4479/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
> 2026/02/02 21:46:58 [4479/1] STATS | 1: GET_INFO_DB | 57 | 0 | 41 | 38819 | 0.35 |
> 2026/02/02 21:46:58 [4479/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 26390 | 10.08 |
> 2026/02/02 21:46:58 [4479/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 37960 | 0.00 |
> 2026/02/02 21:46:58 [4479/1] STATS | 4: DB_APPLY | 1 | 1 | 0 | 37958 | 1.17 | 2.64% batched (avg batch size: 3.8)
> 2026/02/02 21:46:58 [4479/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 38823 | 0.01 |
> 2026/02/02 21:46:58 [4479/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
>
> This is quite single-threaded ;-(
>
> Right now, the file system is not in production and quite idle, so Robinhood is working to close a gap of ~100M changelog entries.
> But I am afraid that with this configuration, once in production, the changelogs will run away and fill up the MDS disk.
>
> Regards,
> Thomas
>
>
> --
> --------------------------------------------------------------------
> Thomas Roth
> Department: IT
> Location: SB3 2.291
> Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
>
> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung:
> Prof. Dr. Thomas Nilsson, Dr. Katharina Stummeyer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzende des GSI-Aufsichtsrats:
> State Secretary / Ministerialrätin Dr. Andrea Fischer
>
>
>
> _______________________________________________
> robinhood-support mailing list
> rob...@li...
> https://lists.sourceforge.net/lists/listinfo/robinhood-support
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Prof. Dr. Thomas Nilsson, Dr. Katharina Stummeyer, Jörg Blaurock
Chair of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialrätin Dr. Andrea Fischer
|
|
From: <Tho...@CE...> - 2026-02-04 15:29:12
|
Dear Thomas,
Indeed the stats show 1 single active thread, but as this display is lockless for performance reasons, the current operations may be moving while the stats are displayed, and thus give a false view of what's really going on.
Do all other stat dumps look the same?
Otherwise, I'm concerned about the 10.08ms for "GET_INFO_FS" (basically stat()+getstripe()). It's quite a high latency (if operations were sequential, it would only make 100 stat per sec...).
I wonder if having too much threads querying the lustre client might not be
counterproductive.
There is a way to fine tune the number of threads allowed by pipeline stage, to have a high parallelism on some operations (e.g. DB) while restricting the number of simultaneous calls to the FS.
Regards,
Thomas
-----Message d'origine-----
De : Thomas Roth <t....@gs...>
Envoyé : lundi 2 février 2026 21:58
À : 'rob...@li...' <rob...@li...>
Objet : [robinhood-support] idle threads
Hi all,
I have Robinhood v3.2 running on a Lustre 2.15, and might have misconfigured / misunderstood the thread count.
The Robinhood box has 96 cores, so I have set the number of threads to 96 (EntryProcessor {nb_threads = 96;})
When checking the robinhood.log, most cores do nothing:
2026/02/02 21:46:58 [4479/1] STATS | ==== EntryProcessor Pipeline Stats ===
2026/02/02 21:46:58 [4479/1] STATS | Idle threads: 95
2026/02/02 21:46:58 [4479/1] STATS | Id constraints count: 100 (hash min=0/max=3/avg=0.0)
2026/02/02 21:46:58 [4479/1] STATS | Name constraints count: 98 (hash min=0/max=2/avg=0.0)
2026/02/02 21:46:58 [4479/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/02/02 21:46:58 [4479/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/02/02 21:46:58 [4479/1] STATS | 1: GET_INFO_DB | 57 | 0 | 41 | 38819 | 0.35 |
2026/02/02 21:46:58 [4479/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 26390 | 10.08 |
2026/02/02 21:46:58 [4479/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 37960 | 0.00 |
2026/02/02 21:46:58 [4479/1] STATS | 4: DB_APPLY | 1 | 1 | 0 | 37958 | 1.17 | 2.64% batched (avg batch size: 3.8)
2026/02/02 21:46:58 [4479/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 38823 | 0.01 |
2026/02/02 21:46:58 [4479/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
This is quite single-threaded ;-(
Right now, the file system is not in production and quite idle, so Robinhood is working to close a gap of ~100M changelog entries.
But I am afraid that with this configuration, once in production, the changelogs will run away and fill up the MDS disk.
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: IT
Location: SB3 2.291
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung:
Prof. Dr. Thomas Nilsson, Dr. Katharina Stummeyer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzende des GSI-Aufsichtsrats:
State Secretary / Ministerialrätin Dr. Andrea Fischer
_______________________________________________
robinhood-support mailing list
rob...@li...
https://lists.sourceforge.net/lists/listinfo/robinhood-support
|
|
From: Thomas R. <t....@gs...> - 2026-02-02 21:17:13
|
Hi all,
I have Robinhood v3.2 running on a Lustre 2.15, and might have misconfigured / misunderstood the thread count.
The Robinhood box has 96 cores, so I have set the number of threads to 96 (EntryProcessor {nb_threads = 96;})
When checking the robinhood.log, most cores do nothing:
2026/02/02 21:46:58 [4479/1] STATS | ==== EntryProcessor Pipeline Stats ===
2026/02/02 21:46:58 [4479/1] STATS | Idle threads: 95
2026/02/02 21:46:58 [4479/1] STATS | Id constraints count: 100 (hash min=0/max=3/avg=0.0)
2026/02/02 21:46:58 [4479/1] STATS | Name constraints count: 98 (hash min=0/max=2/avg=0.0)
2026/02/02 21:46:58 [4479/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/02/02 21:46:58 [4479/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/02/02 21:46:58 [4479/1] STATS | 1: GET_INFO_DB | 57 | 0 | 41 | 38819 | 0.35 |
2026/02/02 21:46:58 [4479/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 26390 | 10.08 |
2026/02/02 21:46:58 [4479/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 37960 | 0.00 |
2026/02/02 21:46:58 [4479/1] STATS | 4: DB_APPLY | 1 | 1 | 0 | 37958 | 1.17 | 2.64% batched (avg batch size: 3.8)
2026/02/02 21:46:58 [4479/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 38823 | 0.01 |
2026/02/02 21:46:58 [4479/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
This is quite single-threaded ;-(
Right now, the file system is not in production and quite idle, so Robinhood is working to close a gap of ~100M changelog entries.
But I am afraid that with this configuration, once in production, the changelogs will run away and fill up the MDS disk.
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: IT
Location: SB3 2.291
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Prof. Dr. Thomas Nilsson, Dr. Katharina Stummeyer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzende des GSI-Aufsichtsrats:
State Secretary / Ministerialrätin Dr. Andrea Fischer
|
|
From: Angel de V. <ang...@ia...> - 2026-01-29 11:04:06
|
Hello, "Yoa...@ce..." <Yoa...@ce...> writes: > Your issue seems to be at the system-level indeed, files with a > modification time in 2186 is strange. OK, thanks for this. Indeed, looking carefully at the details of the files with access times in the future, they were only a very very small percentage of all the files, and the access times were already broken in the original filesystem where they were copied from, so we can ignore them. As per files with very old access times, indeed I think that these were backup copies of files that have been going around our filesystems for ages, but that apparently are seldom read. So, all good with the above, and I'm starting now to think about. I want a simple cleanup policy but with a twist. I'm looking at https://github.com/cea-hpc/robinhood/wiki/v3_tuto_cleanup and https://github.com/cea-hpc/robinhood/wiki/robinhood_v3_admin_doc#user-content-Policies but so far I'm not sure if Robinhood will support what I have in mind. To make it simple, imagine that I want to consider two file categories: a) those that were not accessed in at least a year b) those accessed within the last year. Now, when I reach a 90% global usage target I would like to remove as many files as needed to go down to 85% usage, starting first with those files in category a) and only if needed then continue with those in category b). But within each category I don't want to delete based on the access time, but rather on the global usage of users. Thus I will always start removing files that were not accessed in at least a year, but the first files to be removed will be those of the user taking more space in the filesystem, then those of the second biggest user, etc. I hope it makes sense. Do you think something like this (or similar) is possible with RobinHood? Many thanks (and huge thanks for writing RobinHood)! -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 --------------------------------------------------------------------------------------------- AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer |
|
From: <Yoa...@ce...> - 2026-01-28 10:31:05
|
Hello, No there shouldn't be any issue performing requests while the scan or update is occuring. Your issue seems to be at the system-level indeed, files with a modification time in 2186 is strange. Maybe you fiddled with the system's time? Otherwise I'd say it's a Lustre issue, most likely corrupt metadata for times in the future. For times in the past like 2007, maybe you copied the entries from another system which was up at the time, and duplicated the metadata at the same time? If not, then most likely corrupt metadata aswell. Kind regards, Yoann Valeri. ________________________________ De : Angel de Vicente <ang...@ia...> Envoyé : mardi 27 janvier 2026 21:16:43 À : VALERI Yoann 610657 Cc : rob...@li... Objet : Re: [robinhood-support] Help understanding running the changelog reader Hello, "Yoa...@ce..." <Yoa...@ce...> writes: > What you're saying is making sense, but the issue comes from the > changelog reader's usage. Perfect. Thanks for the detailed explanation. It all makes sense now. Since my previous mail, I've been experimenting a bit, and changed the server where I install Robinhood, so now the database is stored in a SSD and I use MariaDB (as opposed to HDD + MySQL in the previous attempt). I'm performing now the initial scan and I can see that it is going faster, but I assume the real test will be later, trying to measure the speed of queries. In any case, I'm about to start playing with policies. Basically I want to be able to keep our system below 90%, by deleting old files when we reach that threshold, and later on I'll probably have some questions about that, but now I've some strange result from trying to check some details of some files. I tried the "rbh-report --oldest-files", but to my surprise it reported files accessed in 2007 (we didn't have that system yet). I tried then with the "--reverse" option and I got results which access times in 2050! If I check the stats for one of those files you see that Robinhood reports access and modification times in 2050, while "stat" reports pretty crazy stuff for the modify time. Since the policy that I want to implement would take into account last access time this is not ideal. Does this mean that my Lustre file system is corrupt in some way? (I was performing queries while the initial scan was being performed. Is that safe or did I go into dangerous territory?) Any pointers on how to debug/avoid these wrong data? Robinhood stats =============== $ sudo /scratch/angelv-adm/robinhood/robin/sbin/rbh-report -f /etc/robinhood.d/basto.conf -e /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 id : [0x20000cf4d:0x6a64:0x0] parent_id : [0x20000cf33:0x714c:0x0] name : cd1e3cd724a4085d417bcda03b5be4e8417f41 path updt : 2026/01/26 16:24:40 path : /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 depth : 9 user : martinlc group : tmgs-old projid : ? size : 1.17 KB spc_used : 4.00 KB creation : 2024/11/22 14:17:54 last_access : 2050/06/12 21:10:19 last_mod : 2050/06/12 21:10:19 last_mdchange : 2024/11/22 14:17:54 type : file mode : rw-r----- nlink : 1 md updt : 2026/01/26 16:24:40 invalid : no fileclass : small_files class updt : 2026/01/26 16:24:40 stripe_cnt, stripe_size, pool: 1, 1.00 MB, stripes : ost#21: 4375333 lhsm.status : new lhsm.archive_id: 0 lhsm.no_release: no lhsm.no_archive: no lhsm.last_archive: 0 lhsm.last_restore: 0 "Stat" stats ============ $ sudo stat /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 File: /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 Size: 1203 Blocks: 8 IO Block: 4194304 regular file Device: f96638d6h/4184226006d Inode: 144116078425959012 Links: 1 Access: (0640/-rw-r-----) Uid: ( 1518/martinlc) Gid: ( 60/tmgs-old) Access: 2025-12-16 16:06:14.000000000 +0000 Modify: 2186-07-20 03:38:35.000000000 +0100 Change: 2024-11-22 14:17:54.000000000 +0000 Birth: - Many thanks, -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 --------------------------------------------------------------------------------------------- AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer |
|
From: Angel de V. <ang...@ia...> - 2026-01-27 20:34:19
|
Hello, "Yoa...@ce..." <Yoa...@ce...> writes: > What you're saying is making sense, but the issue comes from the > changelog reader's usage. Perfect. Thanks for the detailed explanation. It all makes sense now. Since my previous mail, I've been experimenting a bit, and changed the server where I install Robinhood, so now the database is stored in a SSD and I use MariaDB (as opposed to HDD + MySQL in the previous attempt). I'm performing now the initial scan and I can see that it is going faster, but I assume the real test will be later, trying to measure the speed of queries. In any case, I'm about to start playing with policies. Basically I want to be able to keep our system below 90%, by deleting old files when we reach that threshold, and later on I'll probably have some questions about that, but now I've some strange result from trying to check some details of some files. I tried the "rbh-report --oldest-files", but to my surprise it reported files accessed in 2007 (we didn't have that system yet). I tried then with the "--reverse" option and I got results which access times in 2050! If I check the stats for one of those files you see that Robinhood reports access and modification times in 2050, while "stat" reports pretty crazy stuff for the modify time. Since the policy that I want to implement would take into account last access time this is not ideal. Does this mean that my Lustre file system is corrupt in some way? (I was performing queries while the initial scan was being performed. Is that safe or did I go into dangerous territory?) Any pointers on how to debug/avoid these wrong data? Robinhood stats =============== $ sudo /scratch/angelv-adm/robinhood/robin/sbin/rbh-report -f /etc/robinhood.d/basto.conf -e /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 id : [0x20000cf4d:0x6a64:0x0] parent_id : [0x20000cf33:0x714c:0x0] name : cd1e3cd724a4085d417bcda03b5be4e8417f41 path updt : 2026/01/26 16:24:40 path : /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 depth : 9 user : martinlc group : tmgs-old projid : ? size : 1.17 KB spc_used : 4.00 KB creation : 2024/11/22 14:17:54 last_access : 2050/06/12 21:10:19 last_mod : 2050/06/12 21:10:19 last_mdchange : 2024/11/22 14:17:54 type : file mode : rw-r----- nlink : 1 md updt : 2026/01/26 16:24:40 invalid : no fileclass : small_files class updt : 2026/01/26 16:24:40 stripe_cnt, stripe_size, pool: 1, 1.00 MB, stripes : ost#21: 4375333 lhsm.status : new lhsm.archive_id: 0 lhsm.no_release: no lhsm.no_archive: no lhsm.last_archive: 0 lhsm.last_restore: 0 "Stat" stats ============ $ sudo stat /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 File: /storage/WIPEOUT_JAN31/BACKED_UP/martinlc/backup_portatil/home/martinlc/.cache/mesa_shader_cache/70/cd1e3cd724a4085d417bcda03b5be4e8417f41 Size: 1203 Blocks: 8 IO Block: 4194304 regular file Device: f96638d6h/4184226006d Inode: 144116078425959012 Links: 1 Access: (0640/-rw-r-----) Uid: ( 1518/martinlc) Gid: ( 60/tmgs-old) Access: 2025-12-16 16:06:14.000000000 +0000 Modify: 2186-07-20 03:38:35.000000000 +0100 Change: 2024-11-22 14:17:54.000000000 +0000 Birth: - Many thanks, -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 --------------------------------------------------------------------------------------------- AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer |
|
From: Angel de V. <ang...@ia...> - 2026-01-23 12:25:08
|
Hello,
first time Robinhood user and I'm trying to understand what is going
on. As far as I can see, I have installed and configured Robinhood
properly to work with our Lustre test system, which is connected to our
clients via TCP. I have root access to the client, but not to the MDS
servers (at least not directly, though I can ask our sysadmins to
perform checks or changes there).
I performed the first scan on our system, which took 3 days to scan
about 135 million files, and the mysql database took about 220GB. I
started the scan on Jan 19th and finished on Jan 22th as you can see
below.
Then later I tried to run the changelog reader, which I was assuming it
would take only a few minutes or a few hours, but it has been going on
for 24+ hours now (see below). This is a test system, and though it has
some activity I wasn't expecting so much. By looking at the Robinhood
logs I see something quite strange, since it says that is checking log
entries created on Jan 17h. Does this make any sense?
,----
| 2026/01/23 11:10:55 [8479/4] STATS | last received: rec_id=56074347, rec_time=2026/01/17 17:01:51.147649, received at 2026/01/23 11:10:55.976729
| 2026/01/23 11:10:55 [8479/4] STATS | receive speed: 641.34 rec/sec, log/real time ratio: 0.11
| 2026/01/23 11:10:55 [8479/4] STATS | last pushed: rec_id=56073014, rec_time=2026/01/17 17:01:50.845489, pushed at 2026/
`----
Any help or pointers appreciated.
Initial Lustre scan
===================
$ sudo /scratch/angelv-adm/robinhood/robin/sbin/robinhood --scan --once -L stderr -f /etc/robinhood.d/basto.conf
2026/01/19 16:00:20 [75687/1] CheckFS | '/storage' matches mount point '/storage', type=lustre, fs=10.1.1.7@tcp:10.1.1.8@tcp:/storage
WARNING: MYSQL_OPT_RECONNECT is deprecated and will be removed in a future version.
2026/01/19 16:00:20 [75687/1] ListMgr | Default value for field 'uid' (0x756E6B6E6F776E) doesn't match expected value unknown
2026/01/19 16:00:20 [75687/1] ListMgr | Changing default value of 'ENTRIES.uid'...
[...]
2026/01/19 16:00:21 [75687/2] FS_Scan | Starting scan of /storage
WARNING: MYSQL_OPT_RECONNECT is deprecated and will be removed in a future version.
[...]
2026/01/22 08:15:57 [75687/3] STATS | ==================== Dumping stats at 2026/01/22 08:15:57 =====================
2026/01/22 08:15:57 [75687/3] STATS | ======== General statistics =========
2026/01/22 08:15:57 [75687/3] STATS | Daemon start time: 2026/01/19 16:00:20
2026/01/22 08:15:57 [75687/3] STATS | Started modules: scan
WARNING: MYSQL_OPT_RECONNECT is deprecated and will be removed in a future version.
2026/01/22 08:15:57 [75687/3] STATS | ======== FS scan statistics =========
2026/01/22 08:15:57 [75687/3] STATS | current scan interval = 18.5d
2026/01/22 08:15:57 [75687/3] STATS | scan is running:
2026/01/22 08:15:57 [75687/3] STATS | started at : 2026/01/19 16:00:21 (2.7d ago)
2026/01/22 08:15:57 [75687/3] STATS | last action: 2026/01/22 07:35:22 (40.6min ago)
2026/01/22 08:15:57 [75687/3] STATS | progress : 135161107 entries scanned (0 errors)
2026/01/22 08:15:57 [75687/3] STATS | inst. speed (potential): 374.81 entries/sec (5.34 ms/entry/thread)
2026/01/22 08:15:57 [75687/3] STATS | avg. speed (effective): 584.26 entries/sec (3.37 ms/entry/thread)
WARNING: MYSQL_OPT_RECONNECT is deprecated and will be removed in a future version.
2026/01/22 08:15:57 [75687/3] STATS | ==== EntryProcessor Pipeline Stats ===
2026/01/22 08:15:57 [75687/3] STATS | Idle threads: 9
2026/01/22 08:15:57 [75687/3] STATS | Id constraints count: 0 (hash min=0/max=0/avg=0.0)
2026/01/22 08:15:57 [75687/3] STATS | Name constraints count: 0 (hash min=0/max=0/avg=0.0)
2026/01/22 08:15:57 [75687/3] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/01/22 08:15:57 [75687/3] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | 1: GET_INFO_DB | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | 4: DB_APPLY | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | 6: RM_OLD_ENTRIES | 0 | 1 | 0 | 0 | 0.00 |
2026/01/22 08:15:57 [75687/3] STATS | DB ops: get=231699/ins=134929132/upd=231975/rm=0
2026/01/22 08:15:57 [75687/3] STATS | --- Pipeline stage details ---
2026/01/22 08:15:57 [75687/3] STATS | RM_OLD_ENTRIES: (1 op): special op, status=processing
2026/01/22 08:21:07 [75687/4] FS_Scan | File list of /storage has been updated
2026/01/22 08:21:07 [75687/1] Main | FS Scan finished
2026/01/22 08:21:07 [75687/1] EntryProc | Pipeline successfully flushed
2026/01/22 08:21:07 [75687/1] STATS | ==== EntryProcessor Pipeline Stats ===
2026/01/22 08:21:07 [75687/1] STATS | Idle threads: 0
2026/01/22 08:21:07 [75687/1] STATS | Id constraints count: 0 (hash min=0/max=0/avg=0.0)
2026/01/22 08:21:07 [75687/1] STATS | Name constraints count: 0 (hash min=0/max=0/avg=0.0)
2026/01/22 08:21:07 [75687/1] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/01/22 08:21:07 [75687/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:21:07 [75687/1] STATS | 1: GET_INFO_DB | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:21:07 [75687/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:21:07 [75687/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:21:07 [75687/1] STATS | 4: DB_APPLY | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:21:07 [75687/1] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 0 | 0.00 |
2026/01/22 08:21:07 [75687/1] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 1 | 2744730.21 |
2026/01/22 08:21:07 [75687/1] STATS | DB ops: get=231699/ins=134929132/upd=231975/rm=0
2026/01/22 08:21:07 [75687/1] Main | All tasks done! Exiting.
Running the changelog reader
============================
sudo /scratch/angelv-adm/robinhood/robin/sbin/robinhood --readlog --once -L stderr -f /etc/robinhood.d/basto.conf
2026/01/22 08:55:49 [8479/1] CheckFS | '/storage' matches mount point '/storage', type=lustre, fs=10.1.1.7@tcp:10.1.1.8@tcp:/storage
WARNING: MYSQL_OPT_RECONNECT is deprecated and will be removed in a future version.
2026/01/22 08:55:49 [8479/1] ListMgr | Default value for field 'uid' (0x756E6B6E6F776E) doesn't match expected value unknown
2026/01/22 08:55:49 [8479/1] ListMgr | Changing default value of 'ENTRIES.uid'...
[...]
2026/01/22 08:55:50 [8479/1] llapi | warning: llapi_changelog_start() called without CHANGELOG_FLAG_EXTRA_FLAGS
2026/01/22 08:55:50 [8479/2] ChangeLog | LU-1331 is fixed in this version of Lustre.
2026/01/22 08:55:50 [8479/3] EntryProc | CREATE record on already existing entry [0x20000ddea:0x3:0x0]. This is normal if you scanned it previously.
[...]
2026/01/23 11:10:55 [8479/4] STATS | ==================== Dumping stats at 2026/01/23 11:10:55 =====================
2026/01/23 11:10:55 [8479/4] STATS | ======== General statistics =========
2026/01/23 11:10:55 [8479/4] STATS | Daemon start time: 2026/01/22 08:55:49
2026/01/23 11:10:55 [8479/4] STATS | Started modules: log_reader
2026/01/23 11:10:55 [8479/4] STATS | ChangeLog reader #0:
2026/01/23 11:10:55 [8479/4] STATS | fs_name = storage
2026/01/23 11:10:55 [8479/4] STATS | mdt_name = MDT0000
2026/01/23 11:10:55 [8479/4] STATS | reader_id = cl4
2026/01/23 11:10:55 [8479/4] STATS | records read = 56074347
2026/01/23 11:10:55 [8479/4] STATS | interesting records = 42314658
2026/01/23 11:10:55 [8479/4] STATS | suppressed records = 13759686
2026/01/23 11:10:55 [8479/4] STATS | records pending = 1000
2026/01/23 11:10:55 [8479/4] STATS | status = busy
2026/01/23 11:10:55 [8479/4] STATS | last received: rec_id=56074347, rec_time=2026/01/17 17:01:51.147649, received at 2026/01/23 11:10:55.976729
2026/01/23 11:10:55 [8479/4] STATS | receive speed: 641.34 rec/sec, log/real time ratio: 0.11
2026/01/23 11:10:55 [8479/4] STATS | last pushed: rec_id=56073014, rec_time=2026/01/17 17:01:50.845489, pushed at 2026/01/23 11:10:55.976728
2026/01/23 11:10:55 [8479/4] STATS | push speed: 641.34 rec/sec, log/real time ratio: 0.11
2026/01/23 11:10:55 [8479/4] STATS | last committed: rec_id=56072880, rec_time=2026/01/17 17:01:50.774055, committed at 2026/01/23 11:10:55.976671
2026/01/23 11:10:55 [8479/4] STATS | commit speed: 641.34 rec/sec, log/real time ratio: 0.11
2026/01/23 11:10:55 [8479/4] STATS | last cleared: rec_id=56072257, rec_time=2026/01/17 17:01:50.676509, cleared at 2026/01/23 11:10:55.143817
2026/01/23 11:10:55 [8479/4] STATS | clear speed: 640.68 rec/sec, log/real time ratio: 0.11
2026/01/23 11:10:55 [8479/4] STATS | ChangeLog stats:
2026/01/23 11:10:55 [8479/4] STATS | MARK: 0, CREAT: 5, MKDIR: 2, HLINK: 1, SLINK: 0, MKNOD: 0, UNLNK: 5, RMDIR: 1, RENME: 1
2026/01/23 11:10:55 [8479/4] STATS | RNMTO: 0, OPEN: 28037176, CLOSE: 28037147, LYOUT: 1, TRUNC: 3, SATTR: 2, XATTR: 0
2026/01/23 11:10:55 [8479/4] STATS | HSM: 0, MTIME: 0, CTIME: 0, ATIME: 0, MIGRT: 0, FLRW: 0, RESYNC: 0, GXATR: 3, NOPEN: 0
2026/01/23 11:10:56 [8479/4] STATS | ==== EntryProcessor Pipeline Stats ===
2026/01/23 11:10:56 [8479/4] STATS | Idle threads: 9
2026/01/23 11:10:56 [8479/4] STATS | Id constraints count: 100 (hash min=0/max=3/avg=0.0)
2026/01/23 11:10:56 [8479/4] STATS | Name constraints count: 0 (hash min=0/max=0/avg=0.0)
2026/01/23 11:10:56 [8479/4] STATS | Stage | Wait | Curr | Done | Total | ms/op |
2026/01/23 11:10:56 [8479/4] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 |
2026/01/23 11:10:56 [8479/4] STATS | 1: GET_INFO_DB | 64 | 0 | 32 | 432932 | 0.31 |
2026/01/23 11:10:56 [8479/4] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 432933 | 0.64 |
2026/01/23 11:10:56 [8479/4] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 432933 | 0.00 |
2026/01/23 11:10:56 [8479/4] STATS | 4: DB_APPLY | 1 | 3 | 0 | 432932 | 1.73 | 39.42% batched (avg batch size: 2.3)
2026/01/23 11:10:56 [8479/4] STATS | 5: CHGLOG_CLR | 0 | 0 | 0 | 432932 | 0.02 |
2026/01/23 11:10:56 [8479/4] STATS | 6: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
2026/01/23 11:10:56 [8479/4] STATS | DB ops: get=42313607/ins=0/upd=42313571/rm=1
2026/01/23 11:10:56 [8479/4] STATS | --- Pipeline stage details ---
2026/01/23 11:10:56 [8479/4] STATS | GET_INFO_DB : first: changelog record #56072914, fid=[0x20000ca25:0x107b3:0x0], status=waiting
2026/01/23 11:10:56 [8479/4] STATS | GET_INFO_DB : last: changelog record #56073041, fid=[0x20000ca25:0x133aa:0x0], status=waiting
2026/01/23 11:10:56 [8479/4] STATS | DB_APPLY : first: changelog record #56072910, fid=[0x20000ca25:0x107b3:0x0], status=processing
2026/01/23 11:10:56 [8479/4] STATS | DB_APPLY : last: changelog record #56072913, fid=[0x20000ca25:0x131e4:0x0], status=waiting
--
Ángel de Vicente
Research Software Engineer (Supercomputing and BigData)
Tel.: +34 922-605-747
---------------------------------------------------------------------------------------------
AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer
|
|
From: <Tho...@CE...> - 2024-11-22 15:07:14
|
Dear RobinHood users and contributors,
As the development of RobinHood4 progresses more intensely than ever and approaches readiness for production use under the same conditions as its predecessor, we are thrilled to announce the release of a new version of RobinHood3: RobinHood 3.2.
This new version consolidates numerous contributions submitted over the past few years, including bug fixes, porting to the latest operating systems and Lustre versions, as well as major new features such as support for Project IDs in Lustre, new trigger types and policy parameters.
Main improvements in version 3.2:
- Add features for Lustre's project quota:
- Retrieve project id when scanning and reading changelogs
- Add report: rbh-report --project-info
- New filtering options in rbh-report: --filter-project
- New option --split-user-projects to split user's usage per project
- Display project-info with 'rbh-find --printf %RP'
- Filter project with 'rbh-find -projid num'
- New 'projid' trigger target on command line
- Implement policy sort order by size, e.g. lru_sort_attr = size;
- Implement asc/desc modifiers for sort order, e.g. lru_sort_attr = size(desc);
- Implement policy trigger thresholds as percentage of available inodes:
high/low_threshold_cntpct = xx%;
- policy optimization: no DB update when pre_sched_match and post_sched_match
are set to "none or "cache_only".
- Fix errors "Out of range value for size columns" due to DB triggers
- Adaptations for Lustre 2.15
- Adaptations for RHEL9.4 OS family
Note that the update to this version requires to run 'robinhood --alter-db' (new DB fields to be defined for the project ID feature).
Robinhood 3.2 tarballs and RPMs can be downloaded from: https://sourceforge.net/projects/robinhood/files/robinhood/3.2.0/
See also https://github.com/cea-hpc/robinhood/releases/tag/3.2.0
We hope you enjoy the enhancements in RobinHood 3.2, and sincerely thank all the contributors who made this release possible.
Best regards,
Thomas
|
|
From: <Tho...@CE...> - 2024-11-22 14:16:44
|
Dear RobinHood users and contributors,
As the development of RobinHood4 progresses more intensely than ever and approaches readiness for production use under the same conditions as its predecessor, we are thrilled to announce the release of a new version of RobinHood3: RobinHood 3.2.
This new version consolidates numerous contributions submitted over the past few years, including bug fixes, porting to the latest operating systems and Lustre versions, as well as major new features such as support for Project IDs in Lustre, new trigger types and policy parameters.
Main improvements in version 3.2:
- Add features for Lustre's project quota:
- Retrieve project id when scanning and reading changelogs
- Add report: rbh-report --project-info
- New filtering options in rbh-report: --filter-project
- New option --split-user-projects to split user's usage per project
- Display project-info with 'rbh-find --printf %RP'
- Filter project with 'rbh-find -projid num'
- New 'projid' trigger target on command line
- Implement policy sort order by size, e.g. lru_sort_attr = size;
- Implement asc/desc modifiers for sort order, e.g. lru_sort_attr = size(desc);
- Implement policy trigger thresholds as percentage of available inodes:
high/low_threshold_cntpct = xx%;
- policy optimization: no DB update when pre_sched_match and post_sched_match
are set to "none or "cache_only".
- Fix errors "Out of range value for size columns" due to DB triggers
- Adaptations for Lustre 2.15
- Adaptations for RHEL9.4 OS family
Note that the update to this version requires to run 'robinhood --alter-db' (new DB fields to be defined for the project ID feature).
Robinhood 3.2 tarballs and RPMs can be downloaded from: https://sourceforge.net/projects/robinhood/files/robinhood/3.2.0/
See also https://github.com/cea-hpc/robinhood/releases/tag/3.2.0
We hope you enjoy the enhancements in RobinHood 3.2, and sincerely thank all the contributors who made this release possible.
Best regards,
Thomas
|
|
From: Amy C. <amy...@fu...> - 2024-10-30 16:17:55
|
On SLES15, building 3.1.6 from src , no problem building source rpm, then rebuilding srpm. But, when i try and build the lustre rpms, using: rpmbuild --rebuild robinhood-3-1.6.1-src.rpm --with lustre --definte "lversion 2.15.0" it fails , complaining about glib-2.0 gmacros.h missing binary operator, missing ')' after __has_attribute. has anyone run into this? thanks This message and any attachments are proprietary and may contain information that is privileged or private. If you are not the intended recipient, please inform the sender immediately via telephone or reply e-mail and delete all copies of this message and all attachments from your system. |
|
From: Matt M. <mat...@ab...> - 2024-09-23 23:49:05
|
Has anyone successfully built and installed RH 3 on Ubuntu?
The install docs are centered around rhel and rpms. Seems silly to build
an rpm only to then use alien to install it.
That said, I get as far as "make rpm", which fails for build dependencies
that are actually installed, but with slightly different names:
error: Failed build dependencies:
/usr/include/mysql/mysql.h is needed by robinhood-3.1.7-1.x86_64
glib2-devel >= 2.16 is needed by robinhood-3.1.7-1.x86_64
jemalloc is needed by robinhood-3.1.7-1.x86_64
jemalloc-devel is needed by robinhood-3.1.7-1.x86_64
libattr-devel is needed by robinhood-3.1.7-1.x86_64
glib2-devel != libglib2.0-dev
libattr-devel != libattr1-dev
Any advice on how to get RH on Ubuntu is appreciated.
Thanks in advance.
--
*Matt McLean*
Contractor | Abridge
mat...@ab... <em...@ab...>
|
|
From: Humeston, N. D [ITS] <hum...@ia...> - 2024-06-24 19:16:09
|
I’m attempting to build robinhood on a rhel 9.2 box, I’ve been able to sort out other issues but I’m really stuck on this one, any ideas? /root/robinhood/rpms/BUILD/robinhood-3.1.7/src/common/param_utils.c:464: undefined reference to `sm_attr_get' collect2: error: ld returned 1 exit status make[3]: *** [Makefile:647: test_confparam] Error 1 make[3]: Leaving directory '/root/robinhood/rpms/BUILD/robinhood-3.1.7/src/tests' make[2]: *** [Makefile:395: all-recursive] Error 1 make[2]: Leaving directory '/root/robinhood/rpms/BUILD/robinhood-3.1.7/src' make[1]: *** [Makefile:474: all-recursive] Error 1 make[1]: Leaving directory '/root/robinhood/rpms/BUILD/robinhood-3.1.7' error: Bad exit status from /var/tmp/rpm-tmp.C75SlJ (%build) Thanks! -Nathan |
|
From: Stephane T. <st...@st...> - 2024-06-01 06:41:34
|
Hi John, I know that several sites are maintaining their own fork of Robinhood v3, gathering patches that have been developed on GerritHub but not integrated in an official release at this time. For example, we publish our own production branch at https://github.com/stanford-rc/robinhood/ if you’re interested, with various fixes, added Lustre Project ID support and other small features we needed. We use it in production for several filesystems, including one with 3.1B inodes. Tuning/monitoring of the DB for high performance (and good hardware) is key but then it is stable and scalable. We recently ported our Robinhood branch to EL9.3 for a new system, with additional patches required. I have not pushed those yet as I don’t consider them “clean" enough, but it’s on my todo list! Hope that helps! -- Stéphane Thiell > On May 23, 2024, at 4:57 PM, John White <jw...@lb...> wrote: > > Good Afternoon, > I’ve been planning on getting a robinhood instance up for a couple lustre instances for reporting purposes but the last commit 2 years ago… Has this project been abandoned? If so, do folks have suggestions for similar stacks that maintain a database off to the side of lustre for reporting purposes? > > _______________________________________________ > robinhood-support mailing list > rob...@li... > https://lists.sourceforge.net/lists/listinfo/robinhood-support |
|
From: Stephane T. <st...@st...> - 2024-06-01 02:50:38
|
Hi John, I think this is because changelogs (that are by FID) require the parent FID to be present in the DB. I recommend to drop the DB, scan once, enable changelogs, then scan again. Then you should be good to go without frequent scanning. Best, -- Stéphane Thiell > On May 31, 2024, at 4:11 PM, John White <jw...@lb...> wrote: > > I have a new 3.1.7 install on rocky8 running ddn 2.12.9 on the client side and 2.12.9_ddn8 on the server side. Changelog readers enabled on our 4 MDTs.I started the daemon to confirm the changelog was functioning before starting the initial scan. rbh-find was full of unresolved fid paths and then the filename: > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_2_imag.dat > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_3_real.dat > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_3_imag.dat > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_4_real.dat > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_4_imag.dat > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_5_real.dat > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_5_imag.dat > > But once I started the initial scan, fully resolved paths came into being alongside the unresolved paths. > > Thoughts? > > > > _______________________________________________ > robinhood-support mailing list > rob...@li... > https://lists.sourceforge.net/lists/listinfo/robinhood-support |
|
From: John W. <jw...@lb...> - 2024-06-01 01:41:34
|
Ah, thanks! Hope all is well, Stephane. On Fri, May 31, 2024 at 6:22 PM Stephane Thiell <st...@st...> wrote: > Hi John, > > I think this is because changelogs (that are by FID) require the parent > FID to be present in the DB. I recommend to drop the DB, scan once, enable > changelogs, then scan again. Then you should be good to go without frequent > scanning. > > Best, > -- > Stéphane Thiell > > > On May 31, 2024, at 4:11 PM, John White <jw...@lb...> wrote: > > > > I have a new 3.1.7 install on rocky8 running ddn 2.12.9 on the client > side and 2.12.9_ddn8 on the server side. Changelog readers enabled on our > 4 MDTs.I started the daemon to confirm the changelog was functioning before > starting the initial scan. rbh-find was full of unresolved fid paths and > then the filename: > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_2_imag.dat > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_3_real.dat > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_3_imag.dat > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_4_real.dat > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_4_imag.dat > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_5_real.dat > > 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_5_imag.dat > > > > But once I started the initial scan, fully resolved paths came into > being alongside the unresolved paths. > > > > Thoughts? > > > > > > > > _______________________________________________ > > robinhood-support mailing list > > rob...@li... > > https://lists.sourceforge.net/lists/listinfo/robinhood-support > > |
|
From: John W. <jw...@lb...> - 2024-06-01 01:01:37
|
I have a new 3.1.7 install on rocky8 running ddn 2.12.9 on the client side and 2.12.9_ddn8 on the server side. Changelog readers enabled on our 4 MDTs.I started the daemon to confirm the changelog was functioning before starting the initial scan. rbh-find was full of unresolved fid paths and then the filename: 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_2_imag.dat 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_3_real.dat 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_3_imag.dat 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_4_real.dat 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_4_imag.dat 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_5_real.dat 0x5c00292d9:0x12e97:0x0/5_1_1_/rho_5_5_imag.dat But once I started the initial scan, fully resolved paths came into being alongside the unresolved paths. Thoughts? |
|
From: John W. <jw...@lb...> - 2024-05-24 00:28:42
|
Good Afternoon, I’ve been planning on getting a robinhood instance up for a couple lustre instances for reporting purposes but the last commit 2 years ago… Has this project been abandoned? If so, do folks have suggestions for similar stacks that maintain a database off to the side of lustre for reporting purposes? |
|
From: OGER N. <nie...@me...> - 2024-04-29 10:06:07
|
Hello, thank you again for your answers. We have 2 servers per HPC for RBH, 1 for the log ingestion and the other in read-only. Both have a faire amount of ressources so we should be able to run several rbh-find without trouble. The scripts for parallelization are not ready yet. We will do some small test cases before trying to dump the whole base. best regards, Niels De: "Thomas LEIBOVICI" <Tho...@CE...> À: "Yoann VALERI" <Yoa...@ce...>, "Niels OGER" <nie...@me...> Cc: rob...@li... Envoyé: Lundi 29 Avril 2024 11:31:57 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Dear Niels, If you want to preserve the load of the database that manages data ingestion, you can also backup/restore your database to another server that will be dedicated to read-only actions like running rbh-find. Regards, Thomas De : VALERI Yoann 610657 <Yoa...@ce...> Envoyé : vendredi 26 avril 2024 09:51 À : OGER Niels <nie...@me...> Cc : LEIBOVICI Thomas 601315 <Tho...@CE...>; rob...@li... Objet : RE: Extraction des métadonnées atime et mtime de RBH ? Hello, Sorry for the late answer. Yes you can run multiple rbh-find in parallel on the database, it scales really well in RBH3. However, that means you need a server that is able to handle such a load, so you may need to move the database to another server. Yoann De : OGER Niels < [ mailto:nie...@me... | nie...@me... ] > Envoyé : mardi 23 avril 2024 13:31:36 À : VALERI Yoann 610657 Cc : LEIBOVICI Thomas 601315; [ mailto:rob...@li... | rob...@li... ] Objet : Re: Extraction des métadonnées atime et mtime de RBH ? Hello Yoann, we are using RBH 3, with mysql databases instead of mariadb. We might have an issue with the synchronization between Lustre and the master RBH database, or between the master and mirror databases, but it not what we are looking into for now. I do not know how the mirror database is synchronized with the master, but the mirror data has data. We have roughly 400 millions files on our Lustre and we only managed to get the metadata for 8 millions files with the rbh-find commad we let running during a whole night. We stopped it in the morning to assess what we got. What we want to do is to run several rbh-find in parallele on different directories to be faster. I'm assuming the rbh-find is not impacted by the synchronization process. If it is the case, we might reconsider our strategy on using the mirror. We do not need 100% up-to-date data for our statistics. Maybe we can setup a call, it might be easier to explain what we are trying to do. best regards, Niels De: "Yoann VALERI" < [ mailto:Yoa...@ce... | Yoa...@ce... ] > À: "Niels OGER" < [ mailto:nie...@me... | nie...@me... ] >, "Thomas LEIBOVICI" < [ mailto:Tho...@CE... | Tho...@CE... ] > Cc: [ mailto:rob...@li... | rob...@li... ] Envoyé: Mardi 23 Avril 2024 09:06:43 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Hello, To help you better, could you please tell us if you are trying to use Robinhood 3 or Robinhood 4 ? For Robinhood 3, you must use the command `robinhood` to synchronize data, `rbh-find` to query said data and `rbh-report` to get general information about the filesystem mirrored. RBH 3 relies on a Maria DB mirror to work properly. Robinhood 4 uses `rbh-sync` to synchronize data, `rbh-find` to query the data, but doesn't have a `rbh-report` yet. It relies mainly on Mongo DB, at currently only it can be written to and contain a mirror of the filesystem. That means that for RBH 4, you cannot use `rbh-find` on anything but a Mongo backend. If you are indeed trying to use RBH 4, we have added in the last two weeks a new backend `lustre-mpi` that uses MPIFileUtils to synchronize data from a Lustre filesystem. We are currently working a similar backend for POSIX, which will also use MPIFileUtils. With this, you will be able to lower the synchronization time, depending on the allowed resources. Also, since you talked about retention, we have a branch for RBH 4 available that adds such a feature, you might want to look into that. Don't hesitate to come back to us, we'll be happy to help. Kind regards, Yoann Valeri. De : OGER Niels < [ mailto:nie...@me... | nie...@me... ] > Envoyé : lundi 22 avril 2024 16:18:53 À : LEIBOVICI Thomas 601315 Cc : [ mailto:rob...@li... | rob...@li... ] Objet : Re: [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Hello Thomas, thank you for your quick answer. We managed to run an rbh-find command with every metadata we need with the posix backend (we do not have mongo on the instance). We are using a slave instance of the RBH database to run the commands to avoid interfering with the real time updates. We are looking into how to run several rbh-find in parallel, because with only 1 command we expected it to run for 25 days and lead to a 1.5To file. Would you advise to run several rbh-find commands in parallel to be quicker or do you think the database would be the bottleneck and run several commands would only make it worse ? best regards, Niels De: "Thomas LEIBOVICI" < [ mailto:Tho...@CE... | Tho...@CE... ] > À: "Niels OGER" < [ mailto:nie...@me... | nie...@me... ] >, [ mailto:rob...@li... | rob...@li... ] Envoyé: Jeudi 18 Avril 2024 11:32:40 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Dear Niels, Please prioritize using English on this mailing list so that the community of other users can respond to you or benefit from the provided answers. Did you take a look at the “rbh-find –printf” option that potentially allows diplaying any attribute present in the robinhood’s database? For sure it can display all the attributes you mentioned (size, path, user, group …). See rbh-find –help or man rbh-find for more details. AFAIK, there is no existing GUI as you mention. It’s been a long time since this idea was mentioned, but nobody has coded it yet. There is still the Robinhood webUI that enables visualising some useful stats about usage, size, age, users, groups… I hope that helps. Best Regards, Thomas De : OGER Niels < [ mailto:nie...@me... | nie...@me... ] > Envoyé : jeudi 18 avril 2024 09:32 À : [ mailto:rob...@li... | rob...@li... ] Objet : [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Bonjour, nous commençons à exploiter les instances RBH déployées sur nos 2 clusters à Météo-France. Dans un premier temps nous souhaitons faire des statistiques sur la date de dernier accès en fonction de l'âge des fichiers pour estimer de manière plus objective des durées de rétention. La commande rbh-report nous semblait la plus prometteuse mais nous n'avons pas trouvé d'option pour récupérer le atime et le mtime (commande testé: rbh-report --dump-group xxx -c -f scratch). Les autres métadonnées du rbh-report nous intéressent aussi. Nous pourrions faire plusieurs rbh-find en spécifiant les atime et mtime mais nous manquerait la taille des fichiers (ou alors il faudrait combiner du rbh-report et des rbh-find). Est-ce qu'il existe une commande ou des options pour avoir la taille, le chemin, user/group et les atime+mtime pour les fichiers à partir de RBH ? On envisage d'aller jardiner dans le code de rbh-report pour ajouter ce que l'on veut ou faire des requêtes SQL directement dans les tables mais cela risque de ne pas être trivial. Autre question un peu annexe, est-ce que vous auriez connaissance d'un outil permettant d'avoir une vision de type "occupation du système de fichier Ubuntu" (= cercles concentriques selon la taille des répertoires) pour du Lustre (en s'appuyant sur RBH ou pas) ? merci d'avance, Niels -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif [ mailto:nie...@me... | nie...@me... ] Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif [ mailto:nie...@me... | nie...@me... ] Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif [ mailto:nie...@me... | nie...@me... ] Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me... Fixe : +33 561078198 |
|
From: <Tho...@CE...> - 2024-04-29 09:32:13
|
Dear Niels, If you want to preserve the load of the database that manages data ingestion, you can also backup/restore your database to another server that will be dedicated to read-only actions like running rbh-find. Regards, Thomas De : VALERI Yoann 610657 <Yoa...@ce...> Envoyé : vendredi 26 avril 2024 09:51 À : OGER Niels <nie...@me...> Cc : LEIBOVICI Thomas 601315 <Tho...@CE...>; rob...@li... Objet : RE: Extraction des métadonnées atime et mtime de RBH ? Hello, Sorry for the late answer. Yes you can run multiple rbh-find in parallel on the database, it scales really well in RBH3. However, that means you need a server that is able to handle such a load, so you may need to move the database to another server. Yoann ________________________________ De : OGER Niels <nie...@me...<mailto:nie...@me...>> Envoyé : mardi 23 avril 2024 13:31:36 À : VALERI Yoann 610657 Cc : LEIBOVICI Thomas 601315; rob...@li...<mailto:rob...@li...> Objet : Re: Extraction des métadonnées atime et mtime de RBH ? Hello Yoann, we are using RBH 3, with mysql databases instead of mariadb. We might have an issue with the synchronization between Lustre and the master RBH database, or between the master and mirror databases, but it not what we are looking into for now. I do not know how the mirror database is synchronized with the master, but the mirror data has data. We have roughly 400 millions files on our Lustre and we only managed to get the metadata for 8 millions files with the rbh-find commad we let running during a whole night. We stopped it in the morning to assess what we got. What we want to do is to run several rbh-find in parallele on different directories to be faster. I'm assuming the rbh-find is not impacted by the synchronization process. If it is the case, we might reconsider our strategy on using the mirror. We do not need 100% up-to-date data for our statistics. Maybe we can setup a call, it might be easier to explain what we are trying to do. best regards, Niels ________________________________ De: "Yoann VALERI" <Yoa...@ce...<mailto:Yoa...@ce...>> À: "Niels OGER" <nie...@me...<mailto:nie...@me...>>, "Thomas LEIBOVICI" <Tho...@CE...<mailto:Tho...@CE...>> Cc: rob...@li...<mailto:rob...@li...> Envoyé: Mardi 23 Avril 2024 09:06:43 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Hello, To help you better, could you please tell us if you are trying to use Robinhood 3 or Robinhood 4 ? For Robinhood 3, you must use the command `robinhood` to synchronize data, `rbh-find` to query said data and `rbh-report` to get general information about the filesystem mirrored. RBH 3 relies on a Maria DB mirror to work properly. Robinhood 4 uses `rbh-sync` to synchronize data, `rbh-find` to query the data, but doesn't have a `rbh-report` yet. It relies mainly on Mongo DB, at currently only it can be written to and contain a mirror of the filesystem. That means that for RBH 4, you cannot use `rbh-find` on anything but a Mongo backend. If you are indeed trying to use RBH 4, we have added in the last two weeks a new backend `lustre-mpi` that uses MPIFileUtils to synchronize data from a Lustre filesystem. We are currently working a similar backend for POSIX, which will also use MPIFileUtils. With this, you will be able to lower the synchronization time, depending on the allowed resources. Also, since you talked about retention, we have a branch for RBH 4 available that adds such a feature, you might want to look into that. Don't hesitate to come back to us, we'll be happy to help. Kind regards, Yoann Valeri. ________________________________ De : OGER Niels <nie...@me...<mailto:nie...@me...>> Envoyé : lundi 22 avril 2024 16:18:53 À : LEIBOVICI Thomas 601315 Cc : rob...@li...<mailto:rob...@li...> Objet : Re: [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Hello Thomas, thank you for your quick answer. We managed to run an rbh-find command with every metadata we need with the posix backend (we do not have mongo on the instance). We are using a slave instance of the RBH database to run the commands to avoid interfering with the real time updates. We are looking into how to run several rbh-find in parallel, because with only 1 command we expected it to run for 25 days and lead to a 1.5To file. Would you advise to run several rbh-find commands in parallel to be quicker or do you think the database would be the bottleneck and run several commands would only make it worse ? best regards, Niels ________________________________ De: "Thomas LEIBOVICI" <Tho...@CE...<mailto:Tho...@CE...>> À: "Niels OGER" <nie...@me...<mailto:nie...@me...>>, rob...@li...<mailto:rob...@li...> Envoyé: Jeudi 18 Avril 2024 11:32:40 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Dear Niels, Please prioritize using English on this mailing list so that the community of other users can respond to you or benefit from the provided answers. Did you take a look at the "rbh-find -printf" option that potentially allows diplaying any attribute present in the robinhood's database? For sure it can display all the attributes you mentioned (size, path, user, group ...). See rbh-find -help or man rbh-find for more details. AFAIK, there is no existing GUI as you mention. It's been a long time since this idea was mentioned, but nobody has coded it yet. There is still the Robinhood webUI that enables visualising some useful stats about usage, size, age, users, groups... I hope that helps. Best Regards, Thomas De : OGER Niels <nie...@me...<mailto:nie...@me...>> Envoyé : jeudi 18 avril 2024 09:32 À : rob...@li...<mailto:rob...@li...> Objet : [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Bonjour, nous commençons à exploiter les instances RBH déployées sur nos 2 clusters à Météo-France. Dans un premier temps nous souhaitons faire des statistiques sur la date de dernier accès en fonction de l'âge des fichiers pour estimer de manière plus objective des durées de rétention. La commande rbh-report nous semblait la plus prometteuse mais nous n'avons pas trouvé d'option pour récupérer le atime et le mtime (commande testé: rbh-report --dump-group xxx -c -f scratch). Les autres métadonnées du rbh-report nous intéressent aussi. Nous pourrions faire plusieurs rbh-find en spécifiant les atime et mtime mais nous manquerait la taille des fichiers (ou alors il faudrait combiner du rbh-report et des rbh-find). Est-ce qu'il existe une commande ou des options pour avoir la taille, le chemin, user/group et les atime+mtime pour les fichiers à partir de RBH ? On envisage d'aller jardiner dans le code de rbh-report pour ajouter ce que l'on veut ou faire des requêtes SQL directement dans les tables mais cela risque de ne pas être trivial. Autre question un peu annexe, est-ce que vous auriez connaissance d'un outil permettant d'avoir une vision de type "occupation du système de fichier Ubuntu" (= cercles concentriques selon la taille des répertoires) pour du Lustre (en s'appuyant sur RBH ou pas) ? merci d'avance, Niels -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me...<mailto:nie...@me...> Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me...<mailto:nie...@me...> Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me...<mailto:nie...@me...> Fixe : +33 561078198 |
|
From: <Yoa...@ce...> - 2024-04-26 08:50:26
|
Hello, Sorry for the late answer. Yes you can run multiple rbh-find in parallel on the database, it scales really well in RBH3. However, that means you need a server that is able to handle such a load, so you may need to move the database to another server. Yoann ________________________________ De : OGER Niels <nie...@me...> Envoyé : mardi 23 avril 2024 13:31:36 À : VALERI Yoann 610657 Cc : LEIBOVICI Thomas 601315; rob...@li... Objet : Re: Extraction des métadonnées atime et mtime de RBH ? Hello Yoann, we are using RBH 3, with mysql databases instead of mariadb. We might have an issue with the synchronization between Lustre and the master RBH database, or between the master and mirror databases, but it not what we are looking into for now. I do not know how the mirror database is synchronized with the master, but the mirror data has data. We have roughly 400 millions files on our Lustre and we only managed to get the metadata for 8 millions files with the rbh-find commad we let running during a whole night. We stopped it in the morning to assess what we got. What we want to do is to run several rbh-find in parallele on different directories to be faster. I'm assuming the rbh-find is not impacted by the synchronization process. If it is the case, we might reconsider our strategy on using the mirror. We do not need 100% up-to-date data for our statistics. Maybe we can setup a call, it might be easier to explain what we are trying to do. best regards, Niels ________________________________ De: "Yoann VALERI" <Yoa...@ce...> À: "Niels OGER" <nie...@me...>, "Thomas LEIBOVICI" <Tho...@CE...> Cc: rob...@li... Envoyé: Mardi 23 Avril 2024 09:06:43 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Hello, To help you better, could you please tell us if you are trying to use Robinhood 3 or Robinhood 4 ? For Robinhood 3, you must use the command `robinhood` to synchronize data, `rbh-find` to query said data and `rbh-report` to get general information about the filesystem mirrored. RBH 3 relies on a Maria DB mirror to work properly. Robinhood 4 uses `rbh-sync` to synchronize data, `rbh-find` to query the data, but doesn't have a `rbh-report` yet. It relies mainly on Mongo DB, at currently only it can be written to and contain a mirror of the filesystem. That means that for RBH 4, you cannot use `rbh-find` on anything but a Mongo backend. If you are indeed trying to use RBH 4, we have added in the last two weeks a new backend `lustre-mpi` that uses MPIFileUtils to synchronize data from a Lustre filesystem. We are currently working a similar backend for POSIX, which will also use MPIFileUtils. With this, you will be able to lower the synchronization time, depending on the allowed resources. Also, since you talked about retention, we have a branch for RBH 4 available that adds such a feature, you might want to look into that. Don't hesitate to come back to us, we'll be happy to help. Kind regards, Yoann Valeri. ________________________________ De : OGER Niels <nie...@me...> Envoyé : lundi 22 avril 2024 16:18:53 À : LEIBOVICI Thomas 601315 Cc : rob...@li... Objet : Re: [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Hello Thomas, thank you for your quick answer. We managed to run an rbh-find command with every metadata we need with the posix backend (we do not have mongo on the instance). We are using a slave instance of the RBH database to run the commands to avoid interfering with the real time updates. We are looking into how to run several rbh-find in parallel, because with only 1 command we expected it to run for 25 days and lead to a 1.5To file. Would you advise to run several rbh-find commands in parallel to be quicker or do you think the database would be the bottleneck and run several commands would only make it worse ? best regards, Niels ________________________________ De: "Thomas LEIBOVICI" <Tho...@CE...> À: "Niels OGER" <nie...@me...>, rob...@li... Envoyé: Jeudi 18 Avril 2024 11:32:40 Objet: RE: Extraction des métadonnées atime et mtime de RBH ? Dear Niels, Please prioritize using English on this mailing list so that the community of other users can respond to you or benefit from the provided answers. Did you take a look at the “rbh-find –printf” option that potentially allows diplaying any attribute present in the robinhood’s database? For sure it can display all the attributes you mentioned (size, path, user, group …). See rbh-find –help or man rbh-find for more details. AFAIK, there is no existing GUI as you mention. It’s been a long time since this idea was mentioned, but nobody has coded it yet. There is still the Robinhood webUI that enables visualising some useful stats about usage, size, age, users, groups… I hope that helps. Best Regards, Thomas De : OGER Niels <nie...@me...> Envoyé : jeudi 18 avril 2024 09:32 À : rob...@li... Objet : [robinhood-support] Extraction des métadonnées atime et mtime de RBH ? Bonjour, nous commençons à exploiter les instances RBH déployées sur nos 2 clusters à Météo-France. Dans un premier temps nous souhaitons faire des statistiques sur la date de dernier accès en fonction de l'âge des fichiers pour estimer de manière plus objective des durées de rétention. La commande rbh-report nous semblait la plus prometteuse mais nous n'avons pas trouvé d'option pour récupérer le atime et le mtime (commande testé: rbh-report --dump-group xxx -c -f scratch). Les autres métadonnées du rbh-report nous intéressent aussi. Nous pourrions faire plusieurs rbh-find en spécifiant les atime et mtime mais nous manquerait la taille des fichiers (ou alors il faudrait combiner du rbh-report et des rbh-find). Est-ce qu'il existe une commande ou des options pour avoir la taille, le chemin, user/group et les atime+mtime pour les fichiers à partir de RBH ? On envisage d'aller jardiner dans le code de rbh-report pour ajouter ce que l'on veut ou faire des requêtes SQL directement dans les tables mais cela risque de ne pas être trivial. Autre question un peu annexe, est-ce que vous auriez connaissance d'un outil permettant d'avoir une vision de type "occupation du système de fichier Ubuntu" (= cercles concentriques selon la taille des répertoires) pour du Lustre (en s'appuyant sur RBH ou pas) ? merci d'avance, Niels -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me...<mailto:nie...@me...> Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me... Fixe : +33 561078198 -- ----- Météo-France ----- OGER NIELS DSI/D - Chef de projet Calcul Intensif nie...@me... Fixe : +33 561078198 |