Sam read-write

The year of Linux on the Desktop (for me)

2026-06-19T13:50:00+12:00

I’ll go ahead and say it: there’s a lot going wrong in the world today. Understatement? Maybe. That being said, while I’m not going to take an optimistic stance on where the world is going in general, I wanted to share some good news about a topic that I am specially interested in. In it, things are going so well it seems impossible.

I’m speaking of Linux, specifically Linux on the desktop. It’s long been a running joke that this year is the year when Linux on the desktop takes off. In this post, I will explore my “serious” argument as to why I think this year will be different, and a confluence of various things are putting some serious wind on the wings of Linux.

Me, personally, I’ve been a Linux user for a long time. My first exposure was through my father, who would distro-hop compulsively, always trying new distributions as they came out on distrowatch.com. My first distribution was Puppy Linux, a distribution specializing in old PCs like the ones I had.

That being said, I’m not a zealot. Professionally, I’ve used Windows, paradoxically for security reasons, as the place I was working at deemed Windows to be more secure. I’m also used to playing multiplayer games weekly with my friends. Installing Linux has always come with a price: being that person who ruins everything by not being able to play.

Because of this, I couldn’t shift my best PCs over to Linux. Fast-forward to 2026, and the situation is completely different. Silicon Valley in general, but Microsoft in particular, for reasons that are far too obvious to write explicitly, don’t really seem to be all that interested in making an operating system at all. There’s a bug in the TPM code that allows for a BitLocker bypass, ads in the start menu, and it comes with OneDrive, almost mandatory at this stage, which deletes your files half the time. They themselves say that in the future, you won’t need an operating system, and can instead use an agentic OS.

Of course, this is stupid. The operating system is required for computers to interface with the hardware, and it is required for users to interface with the device. And this is where Linux comes in: steadily improving in the background at basically the same speed that Windows deteriorates at. Hell, the other day I plugged in a Bluetooth dongle to my desktop, and it got picked up immediately. Even more shocking was the fact that I was able to connect a Bluetooth controller and play a game on Steam.

Oh yes, games work on Linux now. Most of the time is fiddly, but compared to Windows, it’s heaven. Is it as good as Windows of the past? Not yet. Is it as good as Windows of today? Abso-fucking-lutely!

And so for me, it is the year of Linux of the desktop. I know Macs exist, and I don’t doubt people will migrate there too, but it suffers from the same fate as Windows in many ways. IDK, not for me.

Book Review: Play Nice by Rachel Harrison

2026-03-16T18:08:00+13:00

Hi there,

I recently finished reading the book Play Nice by Rachel Harrison (ISBN: 9780593642573). When I say I finished reading, I should clarify: I had to finish reading the book because it was holding me hostage. Instead of sleeping, I was forced to keep reading until 1am when the book was finished. Now, free but slightly sleep-deprived, I’m ready to write a book review.

The book is about 6 Edgewood Drive, the childhood home of our main character and the site of her horrible childhood. In that house, we are forced to confront the realities of her childhood and her family in a terrifying recollection. As it frequently happens with childhood stories, the problems that were present in the past carry over to the present, showing their face again in horrible but eerily similar ways.

It is also about madness. In particular, it is about madness in how it is frequently described to women who fail to fit the mold in particular ways and how it is used to silence the voices of these women. It is also about abuse: how it travels through generations and how it is inflicted on women. Particularly by men who are narcissistic, abusing women just because they’re bored and can.

I went into the book without any expectations and was completely sucked in. I, personally, struggle a little bit with books about abuse because of my own experiences, but I thought it was beautifully done. Initially, I was captured by the representations of the abuse within the book, and I started feeling more and more scared for the main character as the story moved forward. In the end, like I said above, I was possessed by the book until it completely took over my ability to maintain a reasonable sleep schedule.

I think this book is great, and I highly recommend it as easy reading. It is, of course, highly recommendable for people who like horror books, but I’d also recommend picking it up even if it doesn’t sound like your cup of tea. It is just a good book.

Reseña: “La voluntad” y “la llamada”

2026-02-16T09:51:00+13:00

¡Hola a todos!

Siguiendo con la temática del blog, estoy escribiendo un poco sobre las lecturas que voy haciendo. No es tanto como para hacer alarde de lo que leo, sino para reflexionar un poco sobre cada libro, y también para compartir aquellas partes de cada libro que leí que me parecen importantes o interesantes. La realidad es que a veces recuerdo haber leído un libro, pero no recuerdo muy bien sus contenidos, y quizás teniendo un resumen al cual me puedo referir me ayude en ese aspecto.

Siempre mis temáticas de lectura son un tanto eclécticas y no suelen seguir ningún patrón predecible. Parezco oscilar entre libros de ficción totalmente pochoclo, y luego de un día para el otro tomar un giro hacia temas más serios. En Argentina, en el 24 de marzo de 2026 se van a cumplir los 50 años de la última dictadura. Estos últimos meses he estado leyendo dos libros relacionados con el tema: “La llamada” y “La voluntad”.

La voluntad (de Eduardo Anguita y Martín Caparros)

Este libro es un análisis histórico de las actividades de distintas fuerzas armadas revolucionarias en la Argentina. En cinco largos tomos, los autores describen con mucho detalle la vida de distintas personas y distintos eventos que se fueron dando en la Argentina en esas épocas. Es en cierta forma una biografía de mucha gente, incluyendo mucha gente asesinada por los militares. Hay en el libro muchos detalles y se siente muy vívido el horror de esa época.

A un nivel menos literal, el libro trata también de reflexionar sobre el rol de diversas organizaciones, y de compartir la perspectiva que tenía la gente en esa época, y también la perspectiva que tiene esa misma gente décadas después de la dictadura. Es interesante también hacer análisis, y correlación, de ciertas actitudes que se podían ver en esa época y como esas mismas actitudes hoy también están presentes. Otro punto interesante de este libro es que ciertas condiciones económicas, que en esa época llevaron a protestas masivas e intentos de revolución armada, hoy por hoy son la norma.

A un nivel personal, este libro me resultó muy difícil de leer. Ambos por la inmensa cantidad de texto, y también porque la temática en sí misma es extremadamente desafiante: la imagen que uno tiene en su mente de como fue la dictadura no siempre se condice con la realidad. Por otro lado, siento que me voy con una nueva perspectiva de la historia de Argentina, de la cual aún ahora sé muy poco. Siempre me resulta difícil formarme una imagen mental de lo que es Argentina, quien es cada participante de cada conflicto y a que se deben ciertas actitudes, su contexto histórico. Para mí Argentina sigue siendo un país muy confuso, pero puedo ver ahora con un poco más de claridad.

En definitiva recomiendo este libro para quienes están interesados en la historia, quienes estén interesados en la historia de la guerrilla en Argentina, y también gente que esté interesada en biografías.

La llamada (Lidia Guerrero)

La llamada es un libro, en mi opinión, muy bonito. La temática del libro es la vida de Silvia Labayru, y su experiencia durante la dictadura. Inicialmente, cuando decidí leer este libro no tenía una perspectiva muy clara de quien era ella, pero el libro claramente describe su experiencia en la Escuela de Mecánica de la Armada, así también como su vida después de su liberación, y también su vida en los últimos 50 años. Incluye también entrevistas a su familia y conocidos.

Es difícil para mí describir la temática de este libro porque fácilmente se puede caer en las mismas nociones que Silvia Labayru encuentra tan frustrantes cuando la gente discute su vida. Es particularmente interesante el análisis del libro sobre la temática de quien es percibido como “un traidor” y quién como una víctima. También el análisis sobre el maltrato particular recibido por las mujeres que fueron secuestradas en la ESMA, así como las perspectivas cambiantes con respecto a que constituye una violación en los ojos de la sociedad y en la ley.

Me gustó mucho este libro por dos razones: la primera es el estilo en el que está escrito. Se puede observar muy claramente la estrecha relación entre Leila Guerrero y Silvia Labayru. Nos permite sentirnos como si estuviéramos allí, observando la vida de esta persona, y también contextualiza muchas conversaciones con la perspectiva externa de distintas personas que compartieron su vida con ella. La segunda razón es por la temática: me parece interesante ella personalmente y también es muy interesante su vida.

Lo recomendaría para todos aquellos que están interesados en leer biografías, particularmente. También para todos aquellos que estén interesados en una perspectiva distinta sobre temas que ya han sido abordados desde muchos ángulos, pero todavía dan lugar a nuevas perspectivas.

《霍比特人》和《证言》

2026-02-10T12:23:00+13:00

大家好

我的中文不太好，如果我错请你们多多包涵。:)

我很久以前开始学中文很久以前，大概2015年的时候。那个时间，我就从阿根廷来了新西兰。因为我是一个软件工程师，我很快找到一个很好的工作：工资不错，情况也不错. 但是，我找到的工作太无聊。每天都我需要去工作，我的老板喜欢我的工作方法，但是他给我做的事情太简单。我八点半开始工作，十点就做完了。所以我开始学汉语。我开始学的原因有三个。第一个是因为2015年的时候，每个人都可以看见中国是一个很重要的国家，当然中文会变成一个很重要的语言。第二个原因是我觉得中国的文化是特别有意思的，跟阿根廷或是新西兰的文化不一样。第三个是，学中文很难。那时候，我不知道学中文怎么难，但是每个人总是说太难了。你可以认为，中文是一个很难的语言不是一个原因，但是我喜这样的事情。在我看来来，如果你可以明白中文，你可以明白中国人的想法，也可能找到比较好的工作等等。

最近我开始看比较多书。原因有很多，但是我觉得如果我可以看中文的书，这会让我的中文阅读能力比较好。我看过两本书：《霍比特人》和《证言》。

霍比特人(The Hobbit, by Tolkien)

这本书告诉我们霍比特人是谁。它也让我们知道比尔博是谁和他的情况。比尔博不是一个特别重要的霍比特人，他最喜欢在家，休息很多很多，吃很好吃的东西。但是有一天一群矮人和一个巫师让他跟他们一起来冒险。我觉得这是一个很特别的书。当然，Tolkien是一个真的很好的作家，所以你看这个本书的时候你会觉得你真的在比尔博的家里。如果比尔博爬山，你也觉得你在爬山。食人魔想要吃比尔博的时候你可以感觉到比尔博的恐惧。但是，我最喜欢的是，因为这是托尔金为他的孙子写的书，这本书比指环王简单，比指环王很可爱。

如果你们有时间，你们可以看看，你们也会喜欢。

证言(The Testaments, by Margaret Atwood)

这也是一个很有名的书。你们可能知道《使女的故事》（Handmaiden’s tale)。这本书里面表述“系列”的情况。故事里，美国变了很多，现在叫系列（Gilead). 这个国家里女生的权利比较少，他们只能生孩子。

有人觉得这样的书有点压抑。但是Margaret Atwood 的写法特别好。我《使女的故事》和《证言》都看过。当然两个很好，但是《证言》让你明白为什么女生的权利很重要。我觉得现代，我看美国的情况，我有点害怕这本书里的故事会变成真的的事情。

我的想法是：这本书是一个很重要的本书。每个人都需要看。

SLSA: Safeguarding artifact integrity across any software supply chain

2026-01-21T10:50:00+13:00

Kia ora,

Lately I’ve been working on SLSA level 3 implementation while at my day job. If you know me personally, you’ll know that I do not really burn with passion for security standards such as SLSA, but I have learned lot about it and have been wanting to write more blog posts. I thought it could be found by people who are working on it who are looking for a primer on what it is and what it is useful for. Note that this is my personal opinion and does not reflect the opinion of my employer.

I could grab the definitions in the SLSA spec and embed them into the blog here, but I think that you can read it yourself, if you really want to.. I also think that you, like most people who are sane of mind, absolutely do not want to. Not under any circumstances.

SLSA markets itself as: “Safeguarding artifact integrity across any software supply chain”. The main word to define here is “safeguarding”.

For context, in this year 2026 of our lord, blue teams left and right are being popped right and centre by insecure dependencies. We’ve seen this quite a bit and will continue to see it more and more. An example of this is: Widespread supply chain compromise impacting npm ecosystem.

SLSA can fix this for you, provided you do all the hard work for yourself.

SLSA has several levels of compliance, which map very roughly to:

Level 1: You’re producing metadata for your binaries.
Level 2: The provenance you’re producing is signed.
Level 3: Not only is the provenance you’ve produced signed, but it meets much stronger security guarantees. An example of this is: “the machine that produces the binary cannot access the private key required to perform the signature”, or “Unforgeability”.

By and large, SLSA is all about metadata. An example of this metadata is “Provenance” statements. Provenance statements, as defined by SLSA, is metadata about a specific build process. It allows a consumer of said metadata to make risk-based decisions on specific binaries it is inspecting.

For example:

Has this binary been produced by someone I trust? This can be confirmed by signature verification or OIDC integration. By the latter, we can confirm whether a binary was signed by a specific GitHub repository.
What parameters was this build configured with?
What dependencies are associated with this build?

The full spec of this metadata can be found in the SLSA build provenance page, although other types of metadata can also be embedded. The long and short of it is that SLSA is whatever you want it to be. By putting the data you want in the provenance statement, you can then retrieve this data before executing a binary.

In this way, SLSA is somewhat similar to GPG signing a specific binary with a key, but it does have additional infrastructure that can be helpful in managing the cryptographic aspects that make GPG signatures so challenging. I mentioned above “OIDC integration”. By leveraging OIDC, it is possible to have a cryptographic signature that can be verified without providing the machine that builds your binary the private keys associated with the signing.

How can this magic possibly work? I can tell you. A OIDC flow will have three participants, when looking to sign an attestation:

End user. This is who wants to sign a binary, i.e. you.
Relying party. This is Fulcio, a tool that aids these type of signatures. Sigstore provide a “public good” server that you can use if hosting your own instance of Fulcio seems like too much.
OpenID Provider. This can be GitLab, GitHub, Google etc.

At a high-level Fulcio will:

Accept the JWT token you provide, perform basic validation on it.
Send the token to the OpenID provider (e.g. GitLab).
GitLab will verify the token and return a Claim.
Fulcio will use the claim to perform the signature. The signature is a basically a statement: the end user authenticated as Claim at this time using this specific OIDC. This is referred to as the “identity” of the signer.
Send the certificate to Rekor. Rekor is a certificate transparency type thing, so that if there is malicious activity it can be detected.

The threat model page is very interesting, and shows us what we can expect in terms of guarantees from Sigstore’s way of doing things:

Under normal operation (no Sigstore compromise), verifying a “keyless” signature from user@example.com using the ExampleIdP identity provider at a given timestamp guarantees that the signature was created by a signer who successfully authenticated to Sigstore using that identity at that time.

In order to verify the signature, we need to communicate with verifiers the identity of the signer. People looking to verify the identity of the signer could run the following command:

cosign verify  --certificate-identity=https://gitlab.com/your-repo/* --certificate-oidc-issuer=gitlab

Auth is interesting in itself. More info: Use Sigstore for keyless signing and verification and Sigstore CI Quickstart - Sigstore.

In summary

SLSA is a common standard for specifying metadata, which is accompanied by infrastructure for key management. Actually solving the security issues depends on the implementation by each end user of the specification.

You could imagine a world in which this metadata can be used to aid detection of known-bad package files: There is a centralised repo somewhere that has a list of all compromised packages, and if the metadata analysis says a package is being used that is present in that list, a deployment may fail.

In practice, this is not that simple. For starters, any false positives would be totally unacceptable in most companies, as broken pipelines are a productivity killer. Most people would be OK with halting deployments to prevent remote code execution, but there may be vulnerabilities in a package that are perhaps not applicable to your current setup, or are not severe enough to warrant a complete halt of pipelines.

There is also a problem with detection latency. If a dependency has already shipped to production, simple deploy checking will not prevent remote code execution if dependencies are automatically updated. Imagine the following timeline:

Node’s dependency, lpad which pads text from the left is compromised. Evil actor releases a new version.
Your build releases a new version because it wants to update the copy on a specific case.
The version is pushed to prod.
The magic centralised database you’re paying heaps for gets updated with the malicious dependency, but it’s too late.
Deployments start failing because of this transitive dependency.

So, SLSA needs to be accompanied by really rigorous restrictions in order to be effective. For example, you may pin your dependencies to a specific version, and that specific version to a hash. This may be done automatically depending on your package manager. You may decide to fork all repositories you depend on, and manually build them from source. These kinds of things can be effective in mitigating the risk of supply chain compromise, and SLSA can aid in verifying these things.

Additionally, there are other concerns with Sigstore’s public infrastructure that may be of concern. For example, the metadata associated with your binaries is publicly exposed, which allows an attacker to gain valuable information about what binaries you are producing and why. Hackers abusing certificate transparency is not a new phenomenon.

Additionally, Sigstore itself may be compromised, which would allow an attacker to forge arbitrary signatures. This is clarified in the Threat Model - Sigstore page:

Fulcio CA server

Arbitrary software attack: Can issue certs for any OIDC issuer/identity and use those to sign any software desired. Bob will see these certificates as valid as they are signed by the Fulcio CA and included in the Fulcio CT log.

So that’s SLSA in 2000 words or fewer, from a technical perspective.

Writing what I am reading

2025-10-07T07:14:00+13:00

Recently, there’s been an increase in the amount of content generated procedurally on the web. At the same time, search engines like Google are relying more and more on AI to give you results. I also noticed that I find myself opening up YouTube and finding nothing there to watch, as the algorithm seems to show me the same videos every day.

Because of this, I have started reading books. Previously, for mysterious reasons, I had found I was unable to read books: I’d pick them up, get bored and put them down. I thought this was due to maybe a bad attention span, but I discovered that it was mostly due to bad books.

Here’s the deal: the book industry is also affected by this lack of quality, and the books that do get published and advertised in platforms such as Amazon’s Kindle Store, are, to put it mildly, not the best books ever written.

I tried various ways of finding “good books”, i.e. those books that you can’t put down. I tried various algorithmic recommendation engines, community sites, reddit. In the end, the best solution came in the form of “Librarian’s choice.” This is a list maintained by librarians in my local city, Wellington.

To quote Wellington City Council:

Librarians are good people. They love books and they love reading - these are good things.

As I slowly start to work my way through this list, I thought I’d recommend particularly great books that I’ve enjoyed the most. I’d like to start with “American Dirt”.

American dirt (2020) – By Cummins, Jeanine

While the book recommendations are great, I have noticed that a suspiciously large percentage of the books are about book lovers and librarians. This is also the case for this book, which tells us the story of a book store owner named Lydia Quixano Perez.

In this book, I felt the dread of walking with Lydia and her son, as they embark on the journey from their home town to the USA as refugees, escaping from a particularly heinous persecutor, the new cartel that is terrorising their town. The writing style is so good that I felt the pain, the confusion and the fear as these characters keep making the only choice available to them in dire circumstances.

Originally, while reading the title of the book, I expected the book to be similar to “To Kill a Mockingbird”, i.e. based in the south of the USA some time in the recent past. While this book was written a few years ago, I think it is particularly striking because of what is happening today in the world, with the prosecution of hispanic people in the USA, as it gives humanity to this group of people who frequently are only a statistic or a number. The book also reflects on the nature of the US-mexico border, and the historical tendencies of people within these regions to move between these zones.

I feel that this is one of the better thrillers I have read recently, and I couldn’t stop reading it as soon as I picked it up. I highly recommend it.

Automating bug bounties

2022-02-21T16:53:00+13:00

I have bad news…. I first noticed this one day like any other, and once I noticed it, I couldn’t escape the reality. Hacking is boring. This may seem counter-intuitive at first. If you looked at your average hacker, they wouldn’t look bored. More like a mixture of stressed and angry/depressed, probably.

But spend a day in their shoes and you’ll come to the same conclusion. Every attempt at hacking is basically a series of steps, tediously, methodically followed.

Let me paint you a picture. You’ve been told by your boss that you need to hack this web application. You don’t want to; you just came back from your holidays and it’s too hot in the office. It looks pretty old and, for the sake of making it more interesting, it seems that it is running PHP. After poking at it for approximately 10 seconds, you’d have a nice idea of what is going on, but you need to perform endpoint discovery.

Endpoint discovery is performed through various dull means, such as crawling, manually navigating, brute-forcing folders, and looking at source code. None of these things will strike you as being particularly fun. But if you are serious about your job you will do these things and perform them to an “acceptable standard”.

What an “acceptable standard” is has not been clearly defined, so unfortunately each pentester will have their own approach: some will be extremely thorough (to the point they’ll lose their minds), while others will have a look-see and hope they find enough bugs to say they’ve done their job, despite missing like 30% of the application.

This application has acquired a lot of cruft over the years, and let’s say it has 500 endpoints.

Let’s assume for each of the 500 endpoints, if they have an average of 5 parameters in each, you now have 2500 insertion points to duly check. You COULD in theory filter those out and not test them all. For example, you could not test the User-Agent header in all 500 endpoints; and to be fair, most people don’t. But everybody knows that bugs are lurking everywhere, and there’s no compelling logical reason to not test all parameters for every vulnerability type. One of those 500 endpoints could have SQL injection in the User-Agent header; and if you don’t test it, you won’t find it.

Going back to those 2500 insertion points: realistically, it’s not feasible to test for every vulnerability type, so you will do your best to test to an “acceptable standard”. An exhaustive test of an application would involve at least a majority of these parameters:

A manual review. Like, put quotes and see if it breaks in a big way. Is the input reflected in the response? If so: is it escaped? If not: would the Content-Type header allow for reflected XSS attacks? Is it an identifier? Can you replace that identifier with another user’s identifier? Etcetera.
Manual fuzzing. Send a list of known payloads using a tool like intruder. Because each parameter may need specific encoding, make sure you configured that correctly. For each parameter, inspect the results. Did your authentication expire while you were fuzzing? Redo it. CSRF tokens? You need to account for those.
Automated fuzzing. Run that through burp’s automated scanner and backslash-powered scanner, and see if it finds a bug. Is the scanner scanning the logout page and expiring your session? Redo the whole scan. Is it dealing with CSRF tokens? No? You need to deal with that.

This is mind-numbingly boring and extremely error-prone. Initially, I was attracted to the challenge of finding bugs in itself; but once you’ve found hundreds, or maybe thousands of bugs, the thrill of finding them is rather short-lived and provides a poor motivator. Additionally, finding an RCE loses its luster if what you’re hacking is the technological equivalent of Swiss cheese.

Moving forward from this

All of this is extremely tedious. This results in hackers creating automated tools in order to automate these tasks, but the problem space is very large, and no tool can effectively do this in all cases, so you will definitely still have to do a big, boring, portion of the tasks manually.

In the case of pentesters this is definitely true. You are paid to test to an “acceptable standard” and no automated tool could ever be considered to meet that. In my case however, because I do bug bounties, the bugs I miss are irrelevant, and nobody realistically expects every bug bounty hunter to find every bug. There is even a debate about the effectiveness of bug bounties at all, mind you.

But to the point, the bugs I miss don’t matter, only the ones I find and report.

Another issue which is relevant to bug bounty hunters and not pentesters is time to find and time to report. When I find a bug and somebody else found it first, I get $0 and they get $200. If I find it first the situation is reversed, so there is a big incentive to push hard and fast to be the early bird. Due to personal reasons, I am frequently unable to spend a lot of time hacking so it happens a lot that in events and things like that younger hackers with more free time tend to find and report the bugs first.

Due to these reasons and many others, I found myself automating more and more of my workflow. I have gone through several iterations and eventually found an architecture that I personally quite like, which I wanted to share with others because I think from a software development perspective it is quite interesting, and I think more collaboration in this space will result in a better outcome for everybody.

Generally in the bug bounty community there’s a certain secrecy. It is not entirely irrational because for every bit of information you give away there is a possibility it will be used to snatch a bug from you. Personally I think that attitude, while functional at some level, leads to very poor outcomes for the industry as a whole and we should all be negotiating collectively for better outcomes for all bug bounty hunters as a whole.

A bit of background information

Some constraints and characteristics underline your thinking whenever you write software to identify or automatically exploit software vulnerabilities. To make a long story short, people don’t like it when you hack them, and people don’t want to be complicit in hacking other people.

In particular, ISPs don’t like it when you use their servers to hack other people’s servers, and they may ban your account in relation to this, even if you have permission from any target. If you have a database of findings, and an ISP bans your account for breaching their terms of service they may or may not return a copy of your data. This could be a bummer if you have a bunch of bugs in there, and it means you need to bring your infrastructure back up. If your infrastructure was held together by a couple of wires and duct tape, you need to do that again. Reproducible infrastructure is so important in this case for this reason.

Another aspect is that bug bounty programs don’t like it when people run automated tasks against them and may rate-limit your connections or outright ban your IP. This is good, because it means that if you make a mistake, they’ll ban your IP instead of you knocking their site offline, but it means you get false negatives.

Another thing to consider is that hosting is very expensive, and hacking can be very computationally expensive. In particular, if you are hacking a lot of things at the same time you need a lot of CPU, a lot of RAM and a lot of bandwidth. If you are considering storing requests and responses for later analysis you need lots of hard disk space.

A major pain point for me is that most of the time all that hardware is sitting idle. Sure, you may hack for three hours a day during the week and you need results as fast as possible then, but the rest of the time all that sweet computing power is wasted and your VPS is profiting off of your inactivity. So I wanted to create an architecture that takes that into account.

Automating the boredom away

When I decided to automate my bug bounty process, I had the following principles in mind:

High quality bugs only: I’m only interested in bugs which have a high impact. RCE, SQLi, SSRF, XSS, content injection. No “missing header,” outdated JS library type things.
Fully automated: should require minimal input from me. I don’t want to be hacking a website ever again. Everything that can be automated should be automated. Things that can’t be automated shouldn’t be automated obviously, like logging in to websites and the like.
Authenticated scanning only. There are enough people trying to perform unauthenticated scans already.
Reproducible infrastructure. As mentioned before, we want to be able to seamlessly bring up and down servers using Saltstack.
Not particularly sophisticated: It should employ simple techniques proven to find lots of bugs across a large number of websites in a repeatable manner. Think like backslash powered scanner and shelling combined into one, minus burp.
Distributed: workloads should be distributable across multiple worker processes that can be spun up and shut down.

Because hardware is expensive, I want to be able to make use of the hardware I have in my home which are several gaming PCs and a couple of gaming laptops. If I make bank, I want to migrate the thing to the cloud entirely.

I identified the need for the following software components:

A http proxy server that distributes requests across a number of worker processes. This allows my fuzzer and crawler components to be able to send request to a standard http proxy.
Workers. These are in charge of sending requests to their final destination and returning any response data. The worker processes could be run on several hosts and source IP addresses.
Crawlers. These are in charge of authenticating against the target websites and performing crawls. We’ve moved on from web 1.0 so these need to be browser-based, using playwright.
Fuzzers. These authenticate requests and perform injection-based attacks as well as pingback-based attacks similar to collaborator.
Database. Store credentials, bugs, scope and request/response data in a database.
A login manager. We need to reliably authenticate to the target systems for crawling and fuzzing.
Pingback DNS listener. A utility that listens for DNS pingbacks, correlates them to a specific request, parameter and vulnerability type and stores it in the database.

Status of the project

I have been working on this for a couple of months, maybe half a year part-time. It’s been really challenging in several aspects, mainly in its scope. Each of these components has lead me to learn new technologies and interfacing with technologies that I already knew in different ways. For example, creating the proxy led me to learn RabbitMQ for distributed workload management. Additionally, I created a mitmproxy plugin, which I had done before, but this plugin interfaced with that software project in a way that I hadn’t personally done, and probably nobody in their sound mind should as it is pretty far out.

I have currently fully functional versions of the proxy server and worker in Python, the web crawler interfacing with Playwright in Typescript. The fuzzer and pingback DNS listener are also implemented in Python, and the database is maintained using SQLAlchemy and PostgreSQL. I am about to start working on the orchestration which is in charge of feeding RabbitMQ the appropriate load for the available hardware, and looking to distribute the Crawlers so that I can fully make use of the hardware I have at my disposal. I also need to crawl bug bounty sites through their API.

Distributed HTTP proxy

I mentioned before several requirements that our solution should meet. In particular, we need to be able to rotate IPs and change hosts rapidly in the event of a ban. I decided to expose this functionality through a HTTP proxy, implemented in the form of a mitmproxy addon. Because most security tools are exposed through HTTP proxies, this provides interoperability with other tools such as Burp if needed.

Here is a diagram for what I already have in place, which works OK.

My plan involves two ISPs, and looks as follows:

For now, ISP 1 is my home’s ISP, but I could change that if needed. This is useful because any “malicious” traffic originates from ISP2 so if I get banned only my worker threads will be banned and no data will be lost.

I was originally concerned about any potential delays this could cause in terms of message throughput and such. I discovered that RabbitMQ is truly very fast and that mitmproxy does not introduce any significant delays either. From my analysis, approximately 99% of delays seem to come from my very slow, non-optimized python code. Here’s what I implemented. This follows the RabbitMQ RPC pattern:

The proxy is also in charge of writing data to the database. I implemented this using Python’s synchronized queue class to prevent the threads in charge of responding to users from slowing down through the integration with the database, as well as database write batching. I published the source code for this component here.

Crawler

When I started working on this project, one of the biggest concerns I had was the proliferation of JavaScript single-page applications. These cannot be crawled with regular crawlers such as Scrapy, and are generally a pain. Burp for example has functionality to crawl these in theory but I have personally found that it is a little bit fickle and painfully slow.

I looked into various options, such as puppeteer, some paid crawlers and similar. In the end I decided to implement this functionality using Playwright, a modern testing library very similar to Selenium in ye olde times. It is actually remarkably simple to configure, and my crawling strategy is very simple for now. In each page:

Login and,
Middle click all links.
Click all links while blocking navigation.
Fill and submit all forms present on the page.
Close the browser.

This results in whatever is opened to be stored in the database through the proxy. On our next iteration these newly discovered pages will be crawled, and newly discovered pages will be stored in the database. The main advantage of this approach is that it is stateless. Each crawler process deals with a single page and then stores the results in the database, and this allows me to scale this to the amount of hardware I have.

I considered other approaches highlighted in various whitepapers I didn’t read but decided against them in principle because they need to keep a complete state of the navigation, for example, or had other disadvantages presumably. Obviously, all components in this solution can be updated and changed in the future. The good thing about my stateless approach is that if I implement a new change, then I can simply re-run the crawler with the new functionality on all URLs stored in the database.

Authentication is performed using Playwrights “code generator” that creates a login script that stores the state in a playwright state file.

Fuzzer

Creating a fuzzer is quite challenging. Web application vulnerabilities are quite diverse in how you can detect them, and every payload you add is going to be useless most of the time because most inputs are not vulnerable. In my experience, traditional vulnerability scanners like Burp’s active scan are relatively good at finding bugs, but the amount of traffic they generate mean that you cannot feasibly scan all endpoints.

Recently in the last few years PortSwigger has come up with a new technique in a Burp addon named Backslash Powered Scanner (BPS.) BPS has the capability of detecting a wide range of bugs while sending far less requests through two techniques: diffing scan and transformation scan. I currently implemented the diffing scan albeit with my special touch. Transformation scans are very powerful, and I certainly would look at implementing them in the future.

Another addon which I favour for bug bounty is SHELLING. The author of shelling created a series of very good, realistic test cases for remote code execution that could potentially be missed by traditional active scanning due to character blacklisting or whitelisting. I think this approach is obviously very valuable and thorough but the number of payloads it generates is very large. I made use of the test cases stored within that library and implemented my own detection via a DNS pingback. Since I already had a DNS pingback detection, I added a SSRF detector, as well as a XSS detector which could be triggered by the crawler in the case of stored XSS and picked up by XSS hunter.

The pingback detection takes the form of a DNS listener which parses the domains passed. An example pingback may look like follows:

xx....mydomain.com

Based on this information we can store the pingback as a finding in the database. The architecture for the fuzzer is also stateless and makes use of the crawler for logging in to the website and fetching login cookies. Integrating the login process to each fuzz request allows us to ensure that we are fuzzing a logged in page, and not a session expired page. This is a big problem in burp where most of your requests will be logged out by the time active scanner reaches them. You can work around this in burp of course with the cookie jar and macros etc.

In conclusion

As well as completely shatter the impression that hackers have a meaningful and fulfilling life, in this blog post I have shown at high level an overview of this cool project I’m working on. It’s still early days for the project so there’s a lot to flesh out and lots of decisions to make, but I’m hoping that people find it interesting and am keen to hear if people have any ideas regarding automated hacking architecture and design.

Additionally, because of the stateless nature of the scanning I can in the future perform improved crawling or fuzzing using new techniques as they pop up. We’ll see.

I’ve also glossed over certain aspects of the design, such as the interaction between the ISPs or the database design. These are interesting details that I also put a lot of care into. Another aspect of the project that I’m interested in sharing is unit testing and integration testing which I am a big fan of.