Blog Post: Where Recovery?

This commit is contained in:
2026-04-13 16:26:14 -04:00
parent ac1b66ee7b
commit 5274a0695c
3 changed files with 432 additions and 0 deletions
+13
View File
@@ -18,6 +18,19 @@
<category>Life</category>
<category>Mental Health</category>
<category>Health</category>
<item>
<title>Where Recovery?</title>
<pubDate>13 April, 2026</pubDate>
<link>https://www.cutieguwu.ca/blog/posts/7_where_recovery.html</link>
<description>
Going over what's up with kramer, and how it's [hopefully] going to change for the better.
</description>
<category>Coding</category>
<category>Programming</category>
<category>Optical Media</category>
<category>Rust</category>
<category>Data Recovery</category>
</item>
<item>
<title>Hack Racing</title>
<pubDate>28 March, 2026</pubDate>
+7
View File
@@ -1,6 +1,13 @@
<li class="spacer_container blog_recent_posts">
<p class="title">Recent Posts</p>
<ul class="section_list">
<li>
<header>
<p class="name">Where Recovery?</p>
<p class="subtitle">13 April, 2026</p>
<a href="/blog/posts/7_where_recovery.html" class="status">View</a>
</header>
</li>
<li>
<header>
<p class="name">Hack Racing</p>
+412
View File
@@ -0,0 +1,412 @@
<!doctype html>
<html lang="en-ca">
<head>
<title>Where Recovery? | Cutieguwu</title>
<include src="includes/meta.html" />
<link rel="stylesheet" type="text/css" href="/styles/blog_post.css" />
</head>
<body>
<nav class="pane">
<include src="includes/nav_header.html" />
<include src="includes/nav_menu.html" />
<div class="location">
<p class="title">You are here:</p>
<p class="page">Blog - Where Recovery?</p>
</div>
<include src="includes/nav_quick_links.html" />
</nav>
<main class="pane blog">
<div>
<header>
<h1 class="title">Where Recovery?</h1>
<p class="date">Posted: 13 April, 2026</p>
<p class="date">Last Edited: 13 April, 2026</p>
</header>
<!-- Insert Megamind head meme here -->
<p>
&quot;Cutieguwu,&quot; I hear the void cry, &quot;where is kramer's development
at? You know, the optical disc recovery utility you've been so keen to work on,
and have talked about so much that you're wishing there was a better way to
refer to it than the working name <em>kramer</em> or
<em>optical disc recovery utility</em>.&quot;
</p>
<p>Ah... right.</p>
<h2>Where the Project Came From</h2>
<p>
I honestly didn't expect to get as far along with <code>kramer</code> as I did,
before I ever formally announced it. After all, I have no experience messing
around in the lower levels of the kernel. All my experience was in the abstract
scripting land of semi-OOP Python.
</p>
<p>
And let me be the first to admit, my last major python project was a disaster of
a codebase. Hacky disasters everywhere, little forethought, and
<em>a lot</em> of nesting.
</p>
<p>
But, being the over-ambitious fool that I am, and given a bit of thought for
cleaning up my code disasters, I started learning Rust. I wanted, and needed,
something more performant than python that could still baby-step my bumbling
arse through to low-level interactions and memory handling.
</p>
<p>
Beside this, I grew more keen to collect and preserve my family's optical media.
In my efforts, I came across some discs that, despite my best cleaning efforts
and different drives, with and without LibreDrive patches on supported boards,
could not read to a recoverable state.
</p>
<p>
So, I tried <code>ddrescue</code>, and from there I encountered the issues
outlined in my original post about <code>kramer</code>.
</p>
<h2>Where the Project Stalled</h2>
<p>
<code>kramer</code> was originally meant to be a simple data-scraper, and its
codebase very much reflected that, being almost entirely procedural. It also
tried to learn from <code>ddrescue</code>'s structure, for better or for worse.
After all, <code>kramer</code> was but a simple data scraper, and I wasn't
trying to reinvent the wheel in a domain of programming I was just starting to
dip my toes into.
</p>
<p>
However, as my list of goals for <code>kramer</code> expanded, and as I took a
course centered around basic architecture design, it became clear that
<code>kramer</code> needs to be refactored with clear structural design goals in
place.
</p>
<p>Also, I never want to write in Java ever again*.</p>
<h2>What's the Plan?</h2>
<p>
<code>kramer</code>'s design goals aren't entirely cohesive, and I don't know
many architecture paradigms. The course introduced the Model-View-Controller
paradigm, and... it looks like it may actually work quite well to appropriately
separate a number of the major issues.
</p>
<p>
The TUI and CLI ultimately want to make this a total disaster. Pushing them into
a View role handled by a generic Controller should eliminate issues with unique
interfaces. Almost like a whack adapter.
</p>
<p>
Remember how I said I never want to write Java again? Well... I may end up doing
a mock of the overall project structure in Java, despite my discontent with the
language. The advantage being that I can forget about the borrow checker and
focus on the architecture. Then, adapt that to sensible Rust. There are a few
reasons why I would choose Java over Python for this:
</p>
<ul>
<li>Python's OOP isn't nearly as well done as Java's</li>
<li>
Java has proper (i.e. integrated into the language), easily transferable
implementations of Interfaces (&quot;Protocols&quot; in Python) to Traits.
</li>
<li>
Java's paradigms, libraries, and core data types are more similar to Rust.
</li>
<li>
Static typing. <code>mypy</code> can enforce static typing, and that's how I
use Python nowadays, but it can fail to resolve polymorphism and shadows
appropriately.
</li>
</ul>
<p>
And, as much as Java and Python's enums are wildly limited compared to Rust's
pattern matching, Python somehow still does it worse.
</p>
<h3>Goals</h3>
<ul>
<li>
Recovery Mechanisms
<ul>
<li>Data Scraping</li>
<li>
File structure validation
<ul>
<li>ISO9660</li>
<li>UTF-8</li>
<li>MPEG-2 Headers</li>
<li>Requires post-processing.</li>
</ul>
</li>
<li>
Hash-based validation
<ul>
<li>
Hash a section, possibly attempt to brute force the hash to
repair.
</li>
<li>Requires pre-processing.</li>
</ul>
</li>
</ul>
</li>
<li>
TUI (akin to <code>ddrescueview</code>)
<ul>
<li>Stats</li>
<li>Progress</li>
<li>Disc Properties</li>
<li>Visual map</li>
</ul>
</li>
<li>
CLI
<ul>
<li>Stats</li>
<li>Progress</li>
<li>Disc Properties</li>
</ul>
</li>
<li>
i18n
<ul>
<li>English</li>
<li>French</li>
</ul>
</li>
</ul>
<h3>Considerations</h3>
<ul>
<li>The TUI and CLI share a number of common readouts.</li>
<li>
The system needs to function asynchronously
<ul>
<li>Drives have a habit of misbehaving with buggered data streams.</li>
<li>
TUI and CLI need to be responsive, and not hang from an unresponsive
drive.
</li>
</ul>
</li>
<li>i18n should be shared between TUI and CLI.</li>
<li>
I may try to implement support for using LibreDrive.
<ul>
<li>
Admittedly, there's a <em>very</em> slim chance of this, but there
does appear to at least be some source for
<code>libdriveio</code> in the <code>makemkv-oss</code> linux
downloads.
</li>
</ul>
</li>
<li>
Maintain support for reading data without <code>DIRECT_IO</code>.
<ul>
<li>Polymorphism for the reader, baby.</li>
</ul>
</li>
</ul>
<h3>Libraries</h3>
<ul>
<li>Clap (duh)</li>
<li>
Ratatui
<ul>
<li>
May also be useful for the simple CLI interface, but at the very
least, it'll handle the TUI view.
</li>
</ul>
</li>
</ul>
<h2>New Architecture</h2>
<p>The highest level of abstracted architecture follows MVC.</p>
<h3>Views and Controller</h3>
<p>
Obviously, this is going to be the CLI and TUI run through an interface to the
controller. The controller will spawn one view or the other based upon a
command-line argument.
</p>
<p>
The controller should also handle the localization, and pass localized strings
to the views as they need them.
</p>
<p>
The goal of this is clean code, reuseability of the controller (which has the
most non-generalizable behaviour), and potential support for a GUI later on.
</p>
<p>
The last thing I want to cement into the Controller is a toggle to run only
preprocessors (more on that later).
</p>
<h3>Model</h3>
<p>This is where everything explodes and most of the refactoring occurs.</p>
<p>
A lot of things are needlessly hard-coded in and inter-connected; the issues
with separation of tasks become apparent.
</p>
<p>
Another major problem is that it's borrowing concepts from
<code>ddrescue</code>'s C++ architecture, while crappily patching in bad Rust
conversions, like <code>DirectIOBuffer</code>. Which is to say, a mish-mash of
conflicting paradigms and bad, highly ignorant, coding practises.
</p>
<p>
So, what about the refactor? Well, it's looking like it's going to be a number
of independent libraries.
</p>
<p>Here's the current idea for the architecture:</p>
<h4>AlignedBufReader</h4>
<p>
Effectively, a proper, full-featured replacement of the stupidity that is
<code>DirectIOBuffer</code> taking the idea of wrapping a generic
<a href="https://doc.rust-lang.org/std/io/trait.Read.html"><code>Read</code></a>
like
<a href="https://doc.rust-lang.org/std/io/struct.BufReader.html"
><code>BufReader</code></a
>.
</p>
<h4>Recovery Pipeline</h4>
<p>
The point of the recovery pipeline is to introduce flexibility into implementing
new recovery mechanisms. This is the new core of the Model.
</p>
<p>
The pipeline will work through a plugin system, allowing
<em>dynamic</em> runtime discovery and loading of recovery mechanisms. This will
likely leverage Rust's <code>dylib</code>s (aka. Shared Objects, DLLs, or
Dynamic Libraries), and potentially later expand to supporting
<code>cdylib</code> (the OG, C's shared objects) if there's a need.
</p>
<p>
Somewhere in the setup of the pipeline will have to be a system for resolving
incompatible plugins, plugin dependencies, and where in the pipeline a plugin
lives.
</p>
<p>Plugin placement at a distance is relatively simple:</p>
<ol>
<li>Preprocessors</li>
<li>Data Scraping (Packaged with the Model)</li>
<li>Postprocessors</li>
</ol>
<h4>Preprocessors</h4>
<p>
Preprocessors operate by reading the original data, and storing error-correcting
data for a postprocessor to leverage.
</p>
<p>
I might try to enforce a standard for this error correcting data, even if it's
as simple as storing error-correction files in a standard compressed archival
format that has error-correction mechanisms.
</p>
<p>Examples of preprocessors would be hashing mechanisms.</p>
<h4>Data Scraping</h4>
<p>This should mostly follow the procedures of the current system.</p>
<h4>Postprocessors</h4>
<p>
Postprocessors operate by reading the scraped data, and potentially combining it
with cached recovery information from preprocessors, to repair the scraped data.
</p>
<p>
Examples of postprocessors would be structure validation, and attempting
recovery through brute forcing a region to match its cached hash.
</p>
<h4>Plugins</h4>
<p>Plugins need systems to:</p>
<ul>
<li>Report incompatible plugins</li>
<li>
Report dependant plugins, and their relative ordering in the pipeline
<ul>
<li>
Enforce including a relative position to the Data Scraping plugin?
Effectively, &quot;is this a preprocessor, or postprocessor?&quot;
</li>
</ul>
</li>
<li>Read from the drive</li>
<li>Read and write the scraped data</li>
<li>Read and write recovery data</li>
<li>Read and write to the map</li>
</ul>
<p>They might also want systems to:</p>
<ul>
<li>
Call upon other plugins as dependencies.
<ul>
<li>They are shared objects, after all.</li>
</ul>
</li>
</ul>
<h4>Mapping</h4>
<p>With the plugin system, mapping gets more difficult.</p>
<p>
Before, mapping was rather simple and could be a small
<code>enum</code> representing the recovery stage. However, the new map will
have to use dynamic tagging of some sort, as various plugins may want to pass
mapping information between each other.
</p>
<p>
For example, a preprocessor could map out the locations of all files. Then, a
postprocessor could attempt to repair headers that have been damaged, and/or the
file system tree. Using a preprocessor to map this information is more reliable
as there may not be enough of the header left to validate that it is in fact a
header, and not just data that's reminiscent of one.
</p>
<p>
Along with dynamic tagging, there needs to be some kind of standard naming
practice in place beforehand. But, there are limits to how well this can be
enforced.
</p>
<p>
Further details of how mapping will work will need to be sorted out. I still
need a cleaner way of updating the status of regions than... whatever the heck
the current system is. I don't understand how I got it working, but hey. Maybe I
can at least reuse the tests now that I actually know all the overlaps?
</p>
<p>
I don't think the tagging system should handle any form of incompatibility
testing. That should be left to the plugins' own systems to handle
appropriately.
</p>
<h2>The Ultimate Challenges</h2>
<p>
Really, all that stands in my way is the motivation to keep working on this, and
in a <em>sensible</em> way. I have to always ensure that I'm not just writing
code, but I'm thinking about the larger structure all the while, lest I code
myself into a pit once more.
</p>
<p>
Also, there stand the issues that I have yet to solve with sleepy and/or
permission revoking drives. I don't actually know what's happening. I may have
to learn to inspect the SCSI communications to know better what the drive is
doing when it hits hard-to-read areas.
</p>
<p>
It also looks like it could be fun, and useful, to dig into SCSI controls:
<a href="https://en.wikipedia.org/wiki/Optical_disc_drive#SCSI_configuration"
>https://en.wikipedia.org/wiki/Optical_disc_drive#SCSI_configuration</a
>
</p>
<p>
There's also the
<a href="https://sg.danny.cz/sg/sdparm.html"
><code>sdparm</code> Documentation</a
>
to dig into for general knowledge, and how SCSI actually works.
</p>
<h2>Why all of this?</h2>
<p>
Well, after having spent two courses reading requirements sheets and writing
code around that, it's become natural to consult a spec.
</p>
<p>
And again, the biggest failing of all my projects has been a lack of structural
forethought. Usually, that's because I haven't the faintest idea where I want to
go with it. But in this instance, I know enough about what I want to make
<em>possible</em>, and enough sense to not just dive in blindly a third, or
perhaps fourth, time.
</p>
</div>
<include src="includes/tailer.html" />
</main>
<ul class="pane spacer">
<include src="./includes/blog_recent_posts.html" />
<li class="spacer_container">#AD</li>
</ul>
<include src="includes/footer.html" />
<include src="includes/scripts.html" />
</body>
</html>