Originally created by: hiimdoublej-swag
What would you like to be added/modified:
A way to configure mnesia storage location (MNESIA_DATA_DIR).
Why is this needed:
Currently, the MNESIA_DATA_DIR is bound to the node name.
In a kubernetes statefulset scenario with address_type = ip, when all the pods somehow went down together, every pod will restart and read from an empty directory (data/emqx@<new_pod_ip>) and they don't have any other nodes to replicate from. Therefore, we will lost all mnesia data even if we had it persisted.
I figured one solution (as of 4.3.10) is to use address_type = hostname, and mnesia will always read from a consistent directory (since the $NODE_NAME is consistent). But I think this can be easily improved if we allowed a configurable MNESIA_DATA_DIR value in the start up script (maybe just like how we did with the other vm args).
If we're able to configure our own storage path for mnesia, then whoever's creating the pods will be responsible for their data persistence and it'll be way more straightforward than it is now. I can have address_type = ip and set mnesia storage to an exact place where I'm going to ensure it's being persisted.
Originally posted by: k32
Hi,
The contents of MNESIA_DATA_DIR are unique for each node in the cluster, and are bound to the node name. Even if you manage to change MNESIA_DATA_DIR, the new nodes will refuse to read the contents of the directory.
Also data recovery/replication mechanisms depend on the nodes having static NODE_NAME.
So the only working solution is to avoid using dynamic IPs in the node name, and use the old approach.
Originally posted by: hiimdoublej-swag
Hi @k32, thanks for the reply. However I have a few follow up questions.
Any references on this part ? Not quite getting it. If it's just data (like a book) then the reader shouldn't matter ? Or is it like an electricity bill where the data is readable but it's entitled to an owner/reader ?
So, based on these two statements, if we were to use statefulsets in hopes of avoiding mnesia data loss, we can only use
address_type = hostname? If this is the case then perhaps we can have it documented somewhere in the clustering section ? Thanks.Originally posted by: zmstone
Hi.
The database has node name embedded in table schemas.
Ref: https://www.erlang.org/doc/man/mnesia.html
Technically, with some scripting, it’s possible to rename, it’s easy to do so for a single node (offline), but complicated when clustered (online), because the schema is replicated.
Thank you for the doc suggestion.
We’ll keep this issue open until the document is added.
Originally posted by: hiimdoublej-swag
Understood, I'll change the title accordingly. Thanks.