Tuesday, 29 October 2019

bittorrent - How do torrent clients reassemble and store pieces?


I was wondering, how are pieces downloaded by torrent clients stored and reassembled? Do they use metadata? It seems this is not the case since one is able to play them if they're half formed files? I have no clue how this is done? So basically I'm asking how are the pieces in the downloaded file organized? Is it just from first to last, or are there buffer spaces in between?



Answer



Welcome to the wonderful world of Torrents! There are a few pieces that comprise the Bittorrent protocol: you have your file, legalthing.iso and you want to distribute it to as many people as possible. So you create a "torrent" file, which describes legalthing.iso, and you distribute the torrent file through a website, or any other way you like. The torrent file can either point directly to your computer (and you'd be acting as the seed) or the torrent file can point to a "tracker", which is a server that connects "seeds" (users with the whole legalthing.iso file already) and "peers" (users who are actively downloading the file).


Getting closer to your question now. The file itself, legalthing.iso, is cryptographically hashed so that each person who reads the torrent file and begins downloading legalthing.iso can check each piece against the hash, and ensure they're not downloading a piece that's been modified from the original. Pieces that fail hash checks are discarded.


Now pretend you're a computer downloading a file, using Bittorrent. The protocol can work one of two ways, either you'll download random pieces of the file, or you'll be downloading the rarest pieces first. This latter approach is to increase the overall "health" (availability) of the torrent.


So what's in the actual torrent file? It varies based on the client used to make it, but generally it contains an "announce" section which is the address of the tracker you're using, and a big huge list of all the pieces of the file you wanna download. Each piece is of a uniform size (32 kb, 512kb, 4mb, really any size you like) and each piece has a hash associated with it. Every time a peer gets a piece it compares the hash for that piece (using the SHA-1 hash code) with the hash listed in the torrent file. That's how it figures out the pieces are good.


Since the torrent file lists each piece of the file you're downloading, every time your client successfully downloads a piece and hashes it, it writes the piece to the correct position on the hard disk within the file. That's why if you download a 1gb file, the client will set aside an empty block of space on your disk that's 1gb in size, to accommodate the torrent pieces you'll be downloading.


Now some video players and other file viewers can deal with "corrupted" files. of course, a half-downloaded torrent is not corrupted, but it is missing pieces and to a program like VLC, it just looks broken. So VLC will do the best it can to play whatever data it can find and that's why they can play while partially downloaded.


There are lots more complicated aspects (google DHT, torrent write buffering, all that fun stuff) but that's the basics of how Bittorrent works.


No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...