Backups and Security

Which data

One generally important aspect of Data Management and data usage is to know which data to store and how to store it.

Personal data

In many countries storage of personal information is strictly regulated. In the EU, storage of personal data can be done only with consent, must be accessible by the user, can be deleted upon request and can be stored only for a limited time.

TODO: Add links

When some research data is linked to person, it is advised to pseudonymize or to anonymize the data.

Important

The limitation of data storage for personal information includes backups

Confidential data

Passwords & keys

Backups

Backups are an inherent - and maybe the most important - part of security, as they protect from mishaps as well as actual attacks. They are evidently not enough if a software has any kind of shared usage. Any shared usage need some kind of security, not only public access. A good faith use of a software with the wrong privileges could have some bad consequences. Evidently such security starts with the software design and implementation, such as database constraints. This page will not speak about that.

Note

Backups are for recoveries, public data should also be published in repositories.

Backups tools -> versioning, monitoring, differential and incremental backups and automation. Using non-backup tools need some parameters and/or script to get some of these functions. Monitoring the backup is working will be extremely important (“silent death”).

Security

Defence need to be always a success, attack need only one

One very important rule of security is that the attacker needs only one success. So it does not matter if many attempts are unsuccessful. They only need to try again. Thus defenders must adapt and monitor. It should be avoiding known vulnerabilies while trying to care for unknown ones.

General dispositions

Default user not root (or super user if Windows)

Careful with the paths and their access rights

Symbolic links

Database user, schema protections

difficult configurations and their possible issues (such as .htaccess or apache configuration in general)

???

Firewall

Protections against attacks

Passwords and keys

What to never log, log levels

Log are incredibly useful and should be used for debugging purpose and for security (they are often the only way to detect a tentative of intrusion/code injection).

For instance, it could be important to log all suspicious url paths are it might be a tentative of code injection, i.e. the attacker hoping that a mechanism in the code will execute a payload. A simple example would be executing raw SQL queries: * The URL subpath is /users/[name] where [name] is the name you want to look for. * The code will then simply execute “SELECT * FROM users WHERE name=[name];” * The attacker will use the URL subpath “/users/name”;DROP%20TABLE%20admin;%20 –“. * The code will execute”SELECT * FROM users WHERE name=“name”;DROP TABLE admin; –““. The end”–” is to comment the closing quote so it does not raise an error. * If the access right of the database is too permissible and an admin table exists, it will be dropped (but the example code is for illustration only, it might not work and in all cases it should not work).

This is a really well known and simple example and no modern web framework or API should allow such attacks nowadays. But the actual issue is with unknow vulnerability. Going back to this example, the web application should log something like:

“2025-11-23_2:32:22 - Warning - incorrect characters in path:”/users/name”;DROP%20TABLE%20admin;%20 –”

And that would allow to check what was attempted.

But logs are also plain text and, if accessed by an attacker, will be easy to comprehend. As such, there are information that should never been logged, and some that should be carefully logged:

username, password, keys, important DB table names (such as a users table used for login), should ideally never be written in a log file, even on DEBUG level and even on test. The reason for that is that everything that happen on test can happen on production. Amid a quick fix during an emergency, a test log entry can be forgiven and go to production. A fix on sensitive information can use the logs, but should be done in a very controlled way, without hurry, even during an emergency. A good way to proceed is to allow the log info only when needed, then switch it off (ideally remove the log entries) immediately after.

Backups and Security

Which data

Personal data

Confidential data

Passwords & keys

Backups

Security

Defence need to be always a success, attack need only one

General dispositions

Firewall

Protections against attacks

Passwords and keys

What to never log, log levels

What to never store (or never in “reversible” format)

Know your infrastructure

Deleting data