Where Recovery?
Posted: 13 April, 2026
Last Edited: 13 April, 2026
"Cutieguwu," I hear the void cry, "where is kramer's development at? You know, the optical disc recovery utility you've been so keen to work on, and have talked about so much that you're wishing there was a better way to refer to it than the working name kramer or optical disc recovery utility."
Ah... right.
Where the Project Came From
I honestly didn't expect to get as far along with kramer as I did,
before I ever formally announced it. After all, I have no experience messing
around in the lower levels of the kernel. All my experience was in the abstract
scripting land of semi-OOP Python.
And let me be the first to admit, my last major python project was a disaster of a codebase. Hacky disasters everywhere, little forethought, and a lot of nesting.
But, being the over-ambitious fool that I am, and given a bit of thought for cleaning up my code disasters, I started learning Rust. I wanted, and needed, something more performant than python that could still baby-step my bumbling arse through to low-level interactions and memory handling.
Beside this, I grew more keen to collect and preserve my family's optical media. In my efforts, I came across some discs that, despite my best cleaning efforts and different drives, with and without LibreDrive patches on supported boards, could not read to a recoverable state.
So, I tried ddrescue, and from there I encountered the issues
outlined in my original post about kramer.
Where the Project Stalled
kramer was originally meant to be a simple data-scraper, and its
codebase very much reflected that, being almost entirely procedural. It also
tried to learn from ddrescue's structure, for better or for worse.
After all, kramer was but a simple data scraper, and I wasn't
trying to reinvent the wheel in a domain of programming I was just starting to
dip my toes into.
However, as my list of goals for kramer expanded, and as I took a
course centered around basic architecture design, it became clear that
kramer needs to be refactored with clear structural design goals in
place.
Also, I never want to write in Java ever again*.
What's the Plan?
kramer's design goals aren't entirely cohesive, and I don't know
many architecture paradigms. The course introduced the Model-View-Controller
paradigm, and... it looks like it may actually work quite well to appropriately
separate a number of the major issues.
The TUI and CLI ultimately want to make this a total disaster. Pushing them into a View role handled by a generic Controller should eliminate issues with unique interfaces. Almost like a whack adapter.
Remember how I said I never want to write Java again? Well... I may end up doing a mock of the overall project structure in Java, despite my discontent with the language. The advantage being that I can forget about the borrow checker and focus on the architecture. Then, adapt that to sensible Rust. There are a few reasons why I would choose Java over Python for this:
- Python's OOP isn't nearly as well done as Java's
- Java has proper (i.e. integrated into the language), easily transferable implementations of Interfaces ("Protocols" in Python) to Traits.
- Java's paradigms, libraries, and core data types are more similar to Rust.
-
Static typing.
mypycan enforce static typing, and that's how I use Python nowadays, but it can fail to resolve polymorphism and shadows appropriately.
And, as much as Java and Python's enums are wildly limited compared to Rust's pattern matching, Python somehow still does it worse.
Goals
-
Recovery Mechanisms
- Data Scraping
-
File structure validation
- ISO9660
- UTF-8
- MPEG-2 Headers
- Requires post-processing.
-
Hash-based validation
- Hash a section, possibly attempt to brute force the hash to repair.
- Requires pre-processing.
-
TUI (akin to
ddrescueview)- Stats
- Progress
- Disc Properties
- Visual map
-
CLI
- Stats
- Progress
- Disc Properties
-
i18n
- English
- French
Considerations
- The TUI and CLI share a number of common readouts.
-
The system needs to function asynchronously
- Drives have a habit of misbehaving with buggered data streams.
- TUI and CLI need to be responsive, and not hang from an unresponsive drive.
- i18n should be shared between TUI and CLI.
-
I may try to implement support for using LibreDrive.
-
Admittedly, there's a very slim chance of this, but there
does appear to at least be some source for
libdriveioin themakemkv-osslinux downloads.
-
Admittedly, there's a very slim chance of this, but there
does appear to at least be some source for
-
Maintain support for reading data without
DIRECT_IO.- Polymorphism for the reader, baby.
Libraries
- Clap (duh)
-
Ratatui
- May also be useful for the simple CLI interface, but at the very least, it'll handle the TUI view.
New Architecture
The highest level of abstracted architecture follows MVC.
Views and Controller
Obviously, this is going to be the CLI and TUI run through an interface to the controller. The controller will spawn one view or the other based upon a command-line argument.
The controller should also handle the localization, and pass localized strings to the views as they need them.
The goal of this is clean code, reuseability of the controller (which has the most non-generalizable behaviour), and potential support for a GUI later on.
The last thing I want to cement into the Controller is a toggle to run only preprocessors (more on that later).
Model
This is where everything explodes and most of the refactoring occurs.
A lot of things are needlessly hard-coded in and inter-connected; the issues with separation of tasks become apparent.
Another major problem is that it's borrowing concepts from
ddrescue's C++ architecture, while crappily patching in bad Rust
conversions, like DirectIOBuffer. Which is to say, a mish-mash of
conflicting paradigms and bad, highly ignorant, coding practises.
So, what about the refactor? Well, it's looking like it's going to be a number of independent libraries.
Here's the current idea for the architecture:
AlignedBufReader
Effectively, a proper, full-featured replacement of the stupidity that is
DirectIOBuffer taking the idea of wrapping a generic
Read
like
BufReader.
Recovery Pipeline
The point of the recovery pipeline is to introduce flexibility into implementing new recovery mechanisms. This is the new core of the Model.
The pipeline will work through a plugin system, allowing
dynamic runtime discovery and loading of recovery mechanisms. This will
likely leverage Rust's dylibs (aka. Shared Objects, DLLs, or
Dynamic Libraries), and potentially later expand to supporting
cdylib (the OG, C's shared objects) if there's a need.
Somewhere in the setup of the pipeline will have to be a system for resolving incompatible plugins, plugin dependencies, and where in the pipeline a plugin lives.
Plugin placement at a distance is relatively simple:
- Preprocessors
- Data Scraping (Packaged with the Model)
- Postprocessors
Preprocessors
Preprocessors operate by reading the original data, and storing error-correcting data for a postprocessor to leverage.
I might try to enforce a standard for this error correcting data, even if it's as simple as storing error-correction files in a standard compressed archival format that has error-correction mechanisms.
Examples of preprocessors would be hashing mechanisms.
Data Scraping
This should mostly follow the procedures of the current system.
Postprocessors
Postprocessors operate by reading the scraped data, and potentially combining it with cached recovery information from preprocessors, to repair the scraped data.
Examples of postprocessors would be structure validation, and attempting recovery through brute forcing a region to match its cached hash.
Plugins
Plugins need systems to:
- Report incompatible plugins
-
Report dependant plugins, and their relative ordering in the pipeline
- Enforce including a relative position to the Data Scraping plugin? Effectively, "is this a preprocessor, or postprocessor?"
- Read from the drive
- Read and write the scraped data
- Read and write recovery data
- Read and write to the map
They might also want systems to:
-
Call upon other plugins as dependencies.
- They are shared objects, after all.
Mapping
With the plugin system, mapping gets more difficult.
Before, mapping was rather simple and could be a small
enum representing the recovery stage. However, the new map will
have to use dynamic tagging of some sort, as various plugins may want to pass
mapping information between each other.
For example, a preprocessor could map out the locations of all files. Then, a postprocessor could attempt to repair headers that have been damaged, and/or the file system tree. Using a preprocessor to map this information is more reliable as there may not be enough of the header left to validate that it is in fact a header, and not just data that's reminiscent of one.
Along with dynamic tagging, there needs to be some kind of standard naming practice in place beforehand. But, there are limits to how well this can be enforced.
Further details of how mapping will work will need to be sorted out. I still need a cleaner way of updating the status of regions than... whatever the heck the current system is. I don't understand how I got it working, but hey. Maybe I can at least reuse the tests now that I actually know all the overlaps?
I don't think the tagging system should handle any form of incompatibility testing. That should be left to the plugins' own systems to handle appropriately.
The Ultimate Challenges
Really, all that stands in my way is the motivation to keep working on this, and in a sensible way. I have to always ensure that I'm not just writing code, but I'm thinking about the larger structure all the while, lest I code myself into a pit once more.
Also, there stand the issues that I have yet to solve with sleepy and/or permission revoking drives. I don't actually know what's happening. I may have to learn to inspect the SCSI communications to know better what the drive is doing when it hits hard-to-read areas.
It also looks like it could be fun, and useful, to dig into SCSI controls: https://en.wikipedia.org/wiki/Optical_disc_drive#SCSI_configuration
There's also the
sdparm Documentation
to dig into for general knowledge, and how SCSI actually works.
Why all of this?
Well, after having spent two courses reading requirements sheets and writing code around that, it's become natural to consult a spec.
And again, the biggest failing of all my projects has been a lack of structural forethought. Usually, that's because I haven't the faintest idea where I want to go with it. But in this instance, I know enough about what I want to make possible, and enough sense to not just dive in blindly a third, or perhaps fourth, time.