Repository for Infosec and Machine Learning Resources
OpenMALx is an organization focused on the development of datasets and models for security analysis. The project objective is to provide structured data for training and evaluating large language models in a security context.
---
Technical Focus
**Dataset Formatting:** Processing raw security tool logs into instruction/response pairs for model training. **Local Execution:** Optimizing models for local hardware to ensure data remains on-premises. **Response Logic:** Developing structured formats for explaining security vulnerabilities and remediation steps.
Active Projects
**infosec-tool-output:** A dataset mapping static and dynamic analysis tool outputs to technical summaries. openmalx/infosec-tool-output
**open-malsec:** A collection of text-based security threats, including phishing and social engineering samples, for classification tasks. openmalx/open-malsec
* Support for kernel version branches to gracefully roll out kernel API changes. * Support for PyTorch 2.10. * kernel-builder is now merged into the kernels repo. * Initial support for standardized kernel benchmarks.
AbstractPhil/tinyflux-experts Introducing the "blot" expert, sd15-flow-sol. The twin sister flow-matching experts for tinyflux-lailah; sd15-flow-lune AND sd15-flow-sol will be used in tandem to train tinyflux-Lailah. sd15-flow-sol never managed to reach full flow-matching prediction, so epsilon vpred conversion is required. All experts will exist within the tinyflux-experts repo, including all the critical checkpoint sets. Lune was heavily finetuned in the sd3-style and adapted shift timestep system after David's interpolation converted sd15 into geometric basis. Sol was left abandoned after 50 epochs with David and was considered overcooked and rigid, until I noticed the geometric structure today. Lune doesn't produce geometric structure as solid as Sol, not even close. Lune produces improved fidelity and detail, but Sol produces something very very different, aligned to sd15's behavior, and fully representative of the 5point 4simplex structure that David brought to the table.
Sol is essentially a nearly perfect blob-forming geometric blotter. Sol is SD15, and yet SOL was trained using a specific pattern recognizing and timestep aligned David model. David was tasked with classifying timesteps and patterns using complex deep-recognition structural analysis layer-by-layer, determining full-scale opinions after watching the entirety of sd15's structure during training.
Even though the sd15-flow-sol was left abandoned, the structure of Sol is HIGHLY effective at understanding TIMESTEP blotting interpolation. I didn't realize how crucially important this was until Lailah started to show rigidity and compartmentalized behavior with sequence - which likely happens to ALL flow matching models.