Navigation X
ALERT
Click here to register with a few steps and explore all our cool stuff we have to offer!



   1838

Tips to get faster results with a data archive search engine

by SirHugs - 24 February, 2022 - 08:22 PM
This post is by a banned member (SirHugs) - Unhide
This post is by a banned member (UberFuck) - Unhide
UberFuck  
Godlike
1.555
Posts
375
Threads
5 Years of service
#2
Hate to respond with questions, but could you elaborate on what you mean by "data search archive"?  What kind of data are you looking at?  What technology stack are you currently using?  Also, is this a private project (where it's only going to be run on your hardware)?

In general...something to consider when reading large files is that you have to read the data from disk to get into memory before you can perform any operations on the data.  That's the reason why you have database servers now with terabytes of RAM...if they can keep their most used datasets in memory then you don't have to wait for any disk read operations.  If you can, lower the disk IO as much as possible.
This post is by a banned member (215B5D) - Unhide
215B5D  
Registered
7
Posts
1
Threads
3 Years of service
#3
I presume by "good sh file", they're running a shell script that takes some arguments, may look a little like this:
 
Code:
cat <file> | grep <string> # cat db.txt | grep username

There are some good programs for searching through DBs, one of those I recommend being ripgrep (Written in Rust)
However, it'd probably be more beneficial & lightweight to write your own, try using C / C++ for it :)

for communicating between your webserver & C / C++ program, look into IPC (Inter Process Communication)
This post is by a banned member (SirHugs) - Unhide
This post is by a banned member (UberFuck) - Unhide
UberFuck  
Godlike
1.555
Posts
375
Threads
5 Years of service
#5
(25 February, 2022 - 05:51 AM)hugging Wrote: Show More
And jeez yeah idk how expensive terabytes of ram is gonna be lmfao, but yeah thank you for the advice with lowering the disk IO.

Lol, yeah...in my last job I had a few database servers that cost over $100k each for just the hardware.  Wasn't saying you need that, just making a point that you want to reduce disk IO (and definitely network IO if not using DAS).

Not sure if we are talking apples to apples though.  It almost sounds like he's reading directly from a plain text file (0.7s for a 30M file)...not from a database service (ie Microsoft SQL, Oracle, MySQL, Postgres, MongoDB).  You should be able to import and parse a combo file into a database table, and once it's in the table create indexes on the username and password columns for querying.

I would recommend looking at the code for existing projects for the Breach Compilation leaks (ie collection #1, collection #2, etc).  Here's a few I found in a couple minutes of googling:  
https://www.tevora.com/threat-blog/diy-l...redential/
https://github.com/sensepost/Frack
https://github.com/petercunha/skidloader
This post is by a banned member (SirHugs) - Unhide
This post is by a banned member (UberFuck) - Unhide
UberFuck  
Godlike
1.555
Posts
375
Threads
5 Years of service
#7
Glad I could help.
 
(26 February, 2022 - 01:43 AM)hugging Wrote: Show More
Also he wanted me to ask this, if you have an answer to it "does reading the file in batches (like 100 lines per batch for example ) into memory then doing the necessary operations have any performance advantage over just reading the file line by line?"

Generally yes, reading in large chunks will perform better, but a lot depends on what programming language you are using, what libraries are being used to read, and what operations you are performing on each line (ie between each read).  The only way to really tell which is going to perform best for you is to try different methods and benchmark them.  I'm not a nodejs developer, so I don't really have any recommendations regarding methods or third party libraries to try.

For inserting into MongoDB you want to use bulk inserts, or use the mongoimport utility if you can (depending on format of text files).  Avoid inserting/updating/upserting one row at a time (aka RBAR).

Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
or
Sign in
Already have an account? Sign in here.


Forum Jump:


Users browsing this thread: 1 Guest(s)