Blog Post: Where Recovery?

2026-04-13 16:26:14 -04:00
parent ac1b66ee7b
commit 5274a0695c
3 changed files with 432 additions and 0 deletions
@@ -18,6 +18,19 @@
    <category>Life</category>
    <category>Mental Health</category>
    <category>Health</category>
+    <item>
+        <title>Where Recovery?</title>
+        <pubDate>13 April, 2026</pubDate>
+        <link>https://www.cutieguwu.ca/blog/posts/7_where_recovery.html</link>
+        <description>
+            Going over what's up with kramer, and how it's [hopefully] going to change for the better.
+        </description>
+        <category>Coding</category>
+        <category>Programming</category>
+        <category>Optical Media</category>
+        <category>Rust</category>
+        <category>Data Recovery</category>
+    </item>
    <item>
        <title>Hack Racing</title>
        <pubDate>28 March, 2026</pubDate>
@@ -1,6 +1,13 @@
 <li class="spacer_container blog_recent_posts">
    <p class="title">Recent Posts</p>
    <ul class="section_list">
+        <li>
+            <header>
+                <p class="name">Where Recovery?</p>
+                <p class="subtitle">13 April, 2026</p>
+                <a href="/blog/posts/7_where_recovery.html" class="status">View</a>
+            </header>
+        </li>
        <li>
            <header>
                <p class="name">Hack Racing</p>
@@ -0,0 +1,412 @@
+<!doctype html>
+
+<html lang="en-ca">
+    <head>
+        <title>Where Recovery? | Cutieguwu</title>
+        <include src="includes/meta.html" />
+        <link rel="stylesheet" type="text/css" href="/styles/blog_post.css" />
+    </head>
+    <body>
+        <nav class="pane">
+            <include src="includes/nav_header.html" />
+            <include src="includes/nav_menu.html" />
+            <div class="location">
+                <p class="title">You are here:</p>
+                <p class="page">Blog - Where Recovery?</p>
+            </div>
+            <include src="includes/nav_quick_links.html" />
+        </nav>
+        <main class="pane blog">
+            <div>
+                <header>
+                    <h1 class="title">Where Recovery?</h1>
+                    <p class="date">Posted: 13 April, 2026</p>
+                    <p class="date">Last Edited: 13 April, 2026</p>
+                </header>
+                <!-- Insert Megamind head meme here -->
+                <p>
+                    &quot;Cutieguwu,&quot; I hear the void cry, &quot;where is kramer's development
+                    at? You know, the optical disc recovery utility you've been so keen to work on,
+                    and have talked about so much that you're wishing there was a better way to
+                    refer to it than the working name <em>kramer</em> or
+                    <em>optical disc recovery utility</em>.&quot;
+                </p>
+                <p>Ah... right.</p>
+                <h2>Where the Project Came From</h2>
+                <p>
+                    I honestly didn't expect to get as far along with <code>kramer</code> as I did,
+                    before I ever formally announced it. After all, I have no experience messing
+                    around in the lower levels of the kernel. All my experience was in the abstract
+                    scripting land of semi-OOP Python.
+                </p>
+                <p>
+                    And let me be the first to admit, my last major python project was a disaster of
+                    a codebase. Hacky disasters everywhere, little forethought, and
+                    <em>a lot</em> of nesting.
+                </p>
+                <p>
+                    But, being the over-ambitious fool that I am, and given a bit of thought for
+                    cleaning up my code disasters, I started learning Rust. I wanted, and needed,
+                    something more performant than python that could still baby-step my bumbling
+                    arse through to low-level interactions and memory handling.
+                </p>
+                <p>
+                    Beside this, I grew more keen to collect and preserve my family's optical media.
+                    In my efforts, I came across some discs that, despite my best cleaning efforts
+                    and different drives, with and without LibreDrive patches on supported boards,
+                    could not read to a recoverable state.
+                </p>
+                <p>
+                    So, I tried <code>ddrescue</code>, and from there I encountered the issues
+                    outlined in my original post about <code>kramer</code>.
+                </p>
+                <h2>Where the Project Stalled</h2>
+                <p>
+                    <code>kramer</code> was originally meant to be a simple data-scraper, and its
+                    codebase very much reflected that, being almost entirely procedural. It also
+                    tried to learn from <code>ddrescue</code>'s structure, for better or for worse.
+                    After all, <code>kramer</code> was but a simple data scraper, and I wasn't
+                    trying to reinvent the wheel in a domain of programming I was just starting to
+                    dip my toes into.
+                </p>
+                <p>
+                    However, as my list of goals for <code>kramer</code> expanded, and as I took a
+                    course centered around basic architecture design, it became clear that
+                    <code>kramer</code> needs to be refactored with clear structural design goals in
+                    place.
+                </p>
+                <p>Also, I never want to write in Java ever again*.</p>
+                <h2>What's the Plan?</h2>
+                <p>
+                    <code>kramer</code>'s design goals aren't entirely cohesive, and I don't know
+                    many architecture paradigms. The course introduced the Model-View-Controller
+                    paradigm, and... it looks like it may actually work quite well to appropriately
+                    separate a number of the major issues.
+                </p>
+                <p>
+                    The TUI and CLI ultimately want to make this a total disaster. Pushing them into
+                    a View role handled by a generic Controller should eliminate issues with unique
+                    interfaces. Almost like a whack adapter.
+                </p>
+                <p>
+                    Remember how I said I never want to write Java again? Well... I may end up doing
+                    a mock of the overall project structure in Java, despite my discontent with the
+                    language. The advantage being that I can forget about the borrow checker and
+                    focus on the architecture. Then, adapt that to sensible Rust. There are a few
+                    reasons why I would choose Java over Python for this:
+                </p>
+                <ul>
+                    <li>Python's OOP isn't nearly as well done as Java's</li>
+                    <li>
+                        Java has proper (i.e. integrated into the language), easily transferable
+                        implementations of Interfaces (&quot;Protocols&quot; in Python) to Traits.
+                    </li>
+                    <li>
+                        Java's paradigms, libraries, and core data types are more similar to Rust.
+                    </li>
+                    <li>
+                        Static typing. <code>mypy</code> can enforce static typing, and that's how I
+                        use Python nowadays, but it can fail to resolve polymorphism and shadows
+                        appropriately.
+                    </li>
+                </ul>
+                <p>
+                    And, as much as Java and Python's enums are wildly limited compared to Rust's
+                    pattern matching, Python somehow still does it worse.
+                </p>
+                <h3>Goals</h3>
+                <ul>
+                    <li>
+                        Recovery Mechanisms
+                        <ul>
+                            <li>Data Scraping</li>
+                            <li>
+                                File structure validation
+                                <ul>
+                                    <li>ISO9660</li>
+                                    <li>UTF-8</li>
+                                    <li>MPEG-2 Headers</li>
+                                    <li>Requires post-processing.</li>
+                                </ul>
+                            </li>
+                            <li>
+                                Hash-based validation
+                                <ul>
+                                    <li>
+                                        Hash a section, possibly attempt to brute force the hash to
+                                        repair.
+                                    </li>
+                                    <li>Requires pre-processing.</li>
+                                </ul>
+                            </li>
+                        </ul>
+                    </li>
+                    <li>
+                        TUI (akin to <code>ddrescueview</code>)
+                        <ul>
+                            <li>Stats</li>
+                            <li>Progress</li>
+                            <li>Disc Properties</li>
+                            <li>Visual map</li>
+                        </ul>
+                    </li>
+                    <li>
+                        CLI
+                        <ul>
+                            <li>Stats</li>
+                            <li>Progress</li>
+                            <li>Disc Properties</li>
+                        </ul>
+                    </li>
+                    <li>
+                        i18n
+                        <ul>
+                            <li>English</li>
+                            <li>French</li>
+                        </ul>
+                    </li>
+                </ul>
+                <h3>Considerations</h3>
+                <ul>
+                    <li>The TUI and CLI share a number of common readouts.</li>
+                    <li>
+                        The system needs to function asynchronously
+                        <ul>
+                            <li>Drives have a habit of misbehaving with buggered data streams.</li>
+                            <li>
+                                TUI and CLI need to be responsive, and not hang from an unresponsive
+                                drive.
+                            </li>
+                        </ul>
+                    </li>
+                    <li>i18n should be shared between TUI and CLI.</li>
+                    <li>
+                        I may try to implement support for using LibreDrive.
+                        <ul>
+                            <li>
+                                Admittedly, there's a <em>very</em> slim chance of this, but there
+                                does appear to at least be some source for
+                                <code>libdriveio</code> in the <code>makemkv-oss</code> linux
+                                downloads.
+                            </li>
+                        </ul>
+                    </li>
+                    <li>
+                        Maintain support for reading data without <code>DIRECT_IO</code>.
+                        <ul>
+                            <li>Polymorphism for the reader, baby.</li>
+                        </ul>
+                    </li>
+                </ul>
+                <h3>Libraries</h3>
+                <ul>
+                    <li>Clap (duh)</li>
+                    <li>
+                        Ratatui
+                        <ul>
+                            <li>
+                                May also be useful for the simple CLI interface, but at the very
+                                least, it'll handle the TUI view.
+                            </li>
+                        </ul>
+                    </li>
+                </ul>
+                <h2>New Architecture</h2>
+                <p>The highest level of abstracted architecture follows MVC.</p>
+                <h3>Views and Controller</h3>
+                <p>
+                    Obviously, this is going to be the CLI and TUI run through an interface to the
+                    controller. The controller will spawn one view or the other based upon a
+                    command-line argument.
+                </p>
+                <p>
+                    The controller should also handle the localization, and pass localized strings
+                    to the views as they need them.
+                </p>
+                <p>
+                    The goal of this is clean code, reuseability of the controller (which has the
+                    most non-generalizable behaviour), and potential support for a GUI later on.
+                </p>
+                <p>
+                    The last thing I want to cement into the Controller is a toggle to run only
+                    preprocessors (more on that later).
+                </p>
+                <h3>Model</h3>
+                <p>This is where everything explodes and most of the refactoring occurs.</p>
+                <p>
+                    A lot of things are needlessly hard-coded in and inter-connected; the issues
+                    with separation of tasks become apparent.
+                </p>
+                <p>
+                    Another major problem is that it's borrowing concepts from
+                    <code>ddrescue</code>'s C++ architecture, while crappily patching in bad Rust
+                    conversions, like <code>DirectIOBuffer</code>. Which is to say, a mish-mash of
+                    conflicting paradigms and bad, highly ignorant, coding practises.
+                </p>
+                <p>
+                    So, what about the refactor? Well, it's looking like it's going to be a number
+                    of independent libraries.
+                </p>
+                <p>Here's the current idea for the architecture:</p>
+                <h4>AlignedBufReader</h4>
+                <p>
+                    Effectively, a proper, full-featured replacement of the stupidity that is
+                    <code>DirectIOBuffer</code> taking the idea of wrapping a generic
+                    <a href="https://doc.rust-lang.org/std/io/trait.Read.html"><code>Read</code></a>
+                    like
+                    <a href="https://doc.rust-lang.org/std/io/struct.BufReader.html"
+                        ><code>BufReader</code></a
+                    >.
+                </p>
+                <h4>Recovery Pipeline</h4>
+                <p>
+                    The point of the recovery pipeline is to introduce flexibility into implementing
+                    new recovery mechanisms. This is the new core of the Model.
+                </p>
+                <p>
+                    The pipeline will work through a plugin system, allowing
+                    <em>dynamic</em> runtime discovery and loading of recovery mechanisms. This will
+                    likely leverage Rust's <code>dylib</code>s (aka. Shared Objects, DLLs, or
+                    Dynamic Libraries), and potentially later expand to supporting
+                    <code>cdylib</code> (the OG, C's shared objects) if there's a need.
+                </p>
+                <p>
+                    Somewhere in the setup of the pipeline will have to be a system for resolving
+                    incompatible plugins, plugin dependencies, and where in the pipeline a plugin
+                    lives.
+                </p>
+                <p>Plugin placement at a distance is relatively simple:</p>
+                <ol>
+                    <li>Preprocessors</li>
+                    <li>Data Scraping (Packaged with the Model)</li>
+                    <li>Postprocessors</li>
+                </ol>
+                <h4>Preprocessors</h4>
+                <p>
+                    Preprocessors operate by reading the original data, and storing error-correcting
+                    data for a postprocessor to leverage.
+                </p>
+                <p>
+                    I might try to enforce a standard for this error correcting data, even if it's
+                    as simple as storing error-correction files in a standard compressed archival
+                    format that has error-correction mechanisms.
+                </p>
+                <p>Examples of preprocessors would be hashing mechanisms.</p>
+                <h4>Data Scraping</h4>
+                <p>This should mostly follow the procedures of the current system.</p>
+                <h4>Postprocessors</h4>
+                <p>
+                    Postprocessors operate by reading the scraped data, and potentially combining it
+                    with cached recovery information from preprocessors, to repair the scraped data.
+                </p>
+                <p>
+                    Examples of postprocessors would be structure validation, and attempting
+                    recovery through brute forcing a region to match its cached hash.
+                </p>
+                <h4>Plugins</h4>
+                <p>Plugins need systems to:</p>
+                <ul>
+                    <li>Report incompatible plugins</li>
+                    <li>
+                        Report dependant plugins, and their relative ordering in the pipeline
+                        <ul>
+                            <li>
+                                Enforce including a relative position to the Data Scraping plugin?
+                                Effectively, &quot;is this a preprocessor, or postprocessor?&quot;
+                            </li>
+                        </ul>
+                    </li>
+                    <li>Read from the drive</li>
+                    <li>Read and write the scraped data</li>
+                    <li>Read and write recovery data</li>
+                    <li>Read and write to the map</li>
+                </ul>
+                <p>They might also want systems to:</p>
+                <ul>
+                    <li>
+                        Call upon other plugins as dependencies.
+                        <ul>
+                            <li>They are shared objects, after all.</li>
+                        </ul>
+                    </li>
+                </ul>
+                <h4>Mapping</h4>
+                <p>With the plugin system, mapping gets more difficult.</p>
+                <p>
+                    Before, mapping was rather simple and could be a small
+                    <code>enum</code> representing the recovery stage. However, the new map will
+                    have to use dynamic tagging of some sort, as various plugins may want to pass
+                    mapping information between each other.
+                </p>
+                <p>
+                    For example, a preprocessor could map out the locations of all files. Then, a
+                    postprocessor could attempt to repair headers that have been damaged, and/or the
+                    file system tree. Using a preprocessor to map this information is more reliable
+                    as there may not be enough of the header left to validate that it is in fact a
+                    header, and not just data that's reminiscent of one.
+                </p>
+                <p>
+                    Along with dynamic tagging, there needs to be some kind of standard naming
+                    practice in place beforehand. But, there are limits to how well this can be
+                    enforced.
+                </p>
+                <p>
+                    Further details of how mapping will work will need to be sorted out. I still
+                    need a cleaner way of updating the status of regions than... whatever the heck
+                    the current system is. I don't understand how I got it working, but hey. Maybe I
+                    can at least reuse the tests now that I actually know all the overlaps?
+                </p>
+                <p>
+                    I don't think the tagging system should handle any form of incompatibility
+                    testing. That should be left to the plugins' own systems to handle
+                    appropriately.
+                </p>
+                <h2>The Ultimate Challenges</h2>
+                <p>
+                    Really, all that stands in my way is the motivation to keep working on this, and
+                    in a <em>sensible</em> way. I have to always ensure that I'm not just writing
+                    code, but I'm thinking about the larger structure all the while, lest I code
+                    myself into a pit once more.
+                </p>
+                <p>
+                    Also, there stand the issues that I have yet to solve with sleepy and/or
+                    permission revoking drives. I don't actually know what's happening. I may have
+                    to learn to inspect the SCSI communications to know better what the drive is
+                    doing when it hits hard-to-read areas.
+                </p>
+                <p>
+                    It also looks like it could be fun, and useful, to dig into SCSI controls:
+                    <a href="https://en.wikipedia.org/wiki/Optical_disc_drive#SCSI_configuration"
+                        >https://en.wikipedia.org/wiki/Optical_disc_drive#SCSI_configuration</a
+                    >
+                </p>
+                <p>
+                    There's also the
+                    <a href="https://sg.danny.cz/sg/sdparm.html"
+                        ><code>sdparm</code> Documentation</a
+                    >
+                    to dig into for general knowledge, and how SCSI actually works.
+                </p>
+                <h2>Why all of this?</h2>
+                <p>
+                    Well, after having spent two courses reading requirements sheets and writing
+                    code around that, it's become natural to consult a spec.
+                </p>
+                <p>
+                    And again, the biggest failing of all my projects has been a lack of structural
+                    forethought. Usually, that's because I haven't the faintest idea where I want to
+                    go with it. But in this instance, I know enough about what I want to make
+                    <em>possible</em>, and enough sense to not just dive in blindly a third, or
+                    perhaps fourth, time.
+                </p>
+            </div>
+            <include src="includes/tailer.html" />
+        </main>
+        <ul class="pane spacer">
+            <include src="./includes/blog_recent_posts.html" />
+            <li class="spacer_container">#AD</li>
+        </ul>
+        <include src="includes/footer.html" />
+        <include src="includes/scripts.html" />
+    </body>
+</html>