Details
-
Architectural Decision
-
Resolution: Done
-
None
Description
Recoverability is an important quality attribute for the SKA. It is desirable that certain failures can be recovered from quickly and gracefully. In order to investigate possible approaches to this a spike, SP-408, was performed. The results of that show some approaches to recoverability, and also raise questions. A follow up feature, SP-942, is planned for further work, but before that can be executed some architectural decisions are required:
- What do we mean by "Recoverability" in the SKA context?
- Should the system attempt to recover to its current state? Or to a known and stable, but different state?
- Is this decision dependent on the current system state? And more...
Some use cases were explored as preparation for SP-408: https://docs.google.com/spreadsheets/d/1fFN9Td13AvCUk-mjr6698dekBuy1QOxYBZZoQwFdVRM/edit?pli=1#gid=970923486