Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 21

Thread: Data compression Question?

  1. #11
    Diamond Member AndyD's Avatar
    Join Date
    Jan 2010
    Location
    Cape Town
    Posts
    4,924
    Thanks
    576
    Thanked 934 Times in 755 Posts
    Quote Originally Posted by tec0 View Post
    ...AVI can be made smaller if you play with the quality and so on but other than that it will not “compress” at all. Renaming an *.Mp3 or *.AVI to *.TXT doesn’t help much because I tested it and I had no improvement.
    AVI is a container who's contents (audio and video streams) are already compressed. If you alter the quality of an avi file you would be recoding the video with a codec (x-vid) at a lower bitrate. This isn't compression, it's just sacrificing quality for filesize.....it's a tradeoff. Changing the file extension doesn't help, many programs look at the file header which would not be changed.


    Quote Originally Posted by tec0 View Post
    On average a commercially available *.ISO compressors will slim *.ISO down with a 100mb or so on average. That said I found some *.ISO files that slim down with about 2Gb so again it is the question of what files are being compressed.
    It would be possible for an ISO 'compression' program to search the contents of an ISO for duplicate files. When they are found it could delete one of them and replace it with a 'marker' that points to the other identical file. Problem is that this would not longer conform to the ISO image standards, the resulting ISO would need to be rebuilt by the same program at the other end after it was sent to a recipient.
    _______________________________________________

    _______________________________________________

  2. #12
    Diamond Member tec0's Avatar
    Join Date
    Jun 2009
    Location
    South Africa
    Posts
    4,624
    Thanks
    1,884
    Thanked 463 Times in 410 Posts
    Blog Entries
    3
    Lossless is basically the main goal as it will decompress to the original “if not corrupted” that said I can’t help but thinking that maybe a combination of lossless and “lossy” is at work here. When you think about it, one realise that audio and video data that is “lossy compressed” will make it much smaller and audio and video will survive better than an executable or data file.
    peace is a state of mind
    Disclaimer: everything written by me can be considered as fictional.

  3. #13
    Diamond Member AndyD's Avatar
    Join Date
    Jan 2010
    Location
    Cape Town
    Posts
    4,924
    Thanks
    576
    Thanked 934 Times in 755 Posts
    As far as I'm aware compression is lossless by its definition, you must be able to decompress back to a true copy of the original. If it's lossy it's not compression, it encoding or recoding, this can never be returned to the original.

    When you think about it, one realise that audio and video data that is “lossy compressed” will make it much smaller and audio and video will survive better than an executable or data file.
    I don't understand what you're saying here.
    _______________________________________________

    _______________________________________________

  4. #14
    Diamond Member tec0's Avatar
    Join Date
    Jun 2009
    Location
    South Africa
    Posts
    4,624
    Thanks
    1,884
    Thanked 463 Times in 410 Posts
    Blog Entries
    3
    Sorry AndyD I was thinking of the “Lossy methods” as they are used for compressing sound and video files. Basically it comes down to tampering with quality along with the knowledge that a video file doesn’t corrupt easy “I am sure you have seen an AVI that had a glitch but was still playable” Now a “program” or “data file” will not survive a glitch and will corrupt. “data string errors” that kind of thing.

    Example: lest say your backup consist of both data and video, then it “may well be possible” that the data was compressed using "lossless" and the video was compressed using “Lossy methods”. Well it is just a theory really.
    peace is a state of mind
    Disclaimer: everything written by me can be considered as fictional.

  5. #15
    Diamond Member AndyD's Avatar
    Join Date
    Jan 2010
    Location
    Cape Town
    Posts
    4,924
    Thanks
    576
    Thanked 934 Times in 755 Posts
    An AVI with minor corruption might still play depending on the compensation techniques built into the media player you're using. There would still be chunk missing or would have video degradation. A program that's corrupted won't execute correctly so an error message (or bluescreen) would be the result. A data file that's corrupted will cause an error message in the program that tries to open it.

    You wouldn't 'compress video files to make smaller file sizes, you would run the video file through a codec such as x-vid which will encode it at the cost of quality. With the right front end you can set the framerate and resolution etc so it's a controlled trade off with filesize against quality. With audio you would start off with a lossless .wav file and either use something like FLAC which can compress with a lossless output or lame which can give you a smaller but lossy output. Again these codecs would be controlled by a frontend application that allows you to set your preferences. Flac can be seen as compression, Lame is encoding because the lossy output can never be converted back to the original lossless wave files.

    and the video was compressed using “Lossy methods”.
    This wouldn't be compressing, it would be encoding.
    _______________________________________________

    _______________________________________________

  6. #16
    Diamond Member tec0's Avatar
    Join Date
    Jun 2009
    Location
    South Africa
    Posts
    4,624
    Thanks
    1,884
    Thanked 463 Times in 410 Posts
    Blog Entries
    3
    Well don’t blame me of the lingo, some sites state compression some sites state encoding most of them it comes down to encoding but it is what it is. Do a Google on “Video Compression” and with “exception” you will find that most sites will have converters.
    peace is a state of mind
    Disclaimer: everything written by me can be considered as fictional.

  7. #17
    Gold Member irneb's Avatar
    Join Date
    Apr 2007
    Location
    Jhb
    Posts
    625
    Thanks
    37
    Thanked 111 Times in 97 Posts
    Quote Originally Posted by AndyD View Post
    This wouldn't be compressing, it would be encoding.
    I'd say it's more of a semantic difference, but I understand your point. In general lossy is still referred to as compression (even though it's actually a re-encoding). The problem is that lossy is only practical on certain types of data (i.e. video, sound, fractals, etc.). Whenever the data / code needs to be character perfect you cannot use lossy at all - you need a perfect recreation.

    Regarding the ISO with duplicate files, I know some writer software allows you to create several entries into the allocation table which simply points to the same spot in the image / disc. That still works when then read from a mounted image / inserted CD/DVD (don't know if it breaks the ISO spec though). But if such is used it wouldn't cause this strangeness of an ISO being a lot smaller compressed originally than you can compress it afterwards. The dummy file idea might be another possibility which is happening here, i.e. the EXE you've downloaded generates the ISO image but includes some dummy files with random content - so the EXE doesn't actually have that data stored, it just puts garbage into those files (which may simply be whatever was on the blank portion of your HDD at the time). The loss-less compressor then reads that portion as a file that needs to be replicated exactly, so doesn't throw away the garbage.

    The AVI/MPEG/MP4/OGM/MKV/WMV/etc. file is just a container for data streams (in particular video and sound). Inside it would contain the data encoded by a codec (XVid / DivX / Mpeg / UncompressedRGB / HuffYUV / H263 / Mpeg4 / WMV9 / QPeg / X264 / etc.) These could be lossy / loss-less - but none of the compression packages (WinZip / WinRAR / 7Zip / etc.) actually use any of these codecs. They would ALWAYS use a loss-less compression algorithm. Some work quite nicely with media files (e.g. Rar & 7Zip can compress an AVI with Uncompressed RGB video much better than most loss-less codecs could) - strange but I've seen it happen. Usually these container files are kept as efficiently encoded as possible, thus using RAR/ZIP/7Z on them after they've been encoded has little effect (compressing only that which they can see as a pattern / duplication).

    BTW, that's the main method of these compressors. They compare consecutive portions of the data to see if there's any duplication or some sort of pattern which repeats. Usually they create a dictionary of sections to work out duplication, then simply stores which item in the dictionary goes where in the file. That way instead of storing the data numerous times it stores the data once and places a pointer numerous times. Unfortunately the dictionary has a limit due to RAM & Speed, so every now and again a dictionary would be flushed and started anew. With ZIP, this is the main reason it's not a great compressor - the maximum dictionary size is rather small compared to RAR / 7Z. E.g. if you use ZIP the dictionary is 32KB, but with 7Z if can be 64KB all the way up to 64MB. This is also used in video encoding, usually those codecs have a "key-frame" every preset amount of frames / time which gets stored as one full frame - the following frames would only store the differences from that frame. Similar with sound, but to explain you would need to understand how sound is encoded even in an uncompressed WAV file (so I'm not going through all that - it wouldn't add anything to the discussion about compression).

    Then the "word" size also has an effect. This is the portion inside the dictionary which can be repeated in the code. E.g. ZIP can have words from 8-258 bytes, 7Z 8-273. In the codecs this can be seen as each pixel in the frame. If the pixel is the same as that of the key-frame, then it just gets pointed to it (or even left off in some codecs). If it's different it gets stored as usual (or using a form of pattern prediction - see later).

    Then there's what's known as "solid" archiving. RAR / 7Z usually uses this, ZIP doesn't have this facility. Basically it means that the dictionary is shared across multiple files. With ZIP each file compressed has at least 1 dictionary of its own. This can make for a lot better compression if you're compressing several similar files into one archive. This is rarely used in media encoding, but could be possible if the container file has multiple streams (e.g. VOB files on DVDs could have multiple video streams, several audio streams and a whole bunch of subtitle streams). Usually it's not used in media since the streams are seldom similar enough to effect much compression.

    And then lastly there's the near wizardry of pattern prediction. This is a lot closer to those loss-less codecs, but combined with the above ideas. I.e. a word in the dictionary would be compared to a portion in the file, then seeing as it's "almost" the same the word's index is stored with a code to describe the difference. This is usually the main portion adjusted through the "compression level" - a higher level will allow for a larger difference. Usually it doesn't help all that much when compressing normal code / data - but it usually makes compression and de-compression a lot slower. In video / sound the compression ratio becomes a lot better since a colour could be slightly off another or a frequency only a touch different. So here it gets used quite often especially in lossy codecs, but even some loss-less codecs use this since it need not be lossy.

    Things that are never used in loss-less is when exactness is thrown away. E.g. sound frequencies which most human ears can't hear is thrown away when encoding in MP3s. Some differences in shade in video can be ignored since the human eye can't pick-up such. These are the main aspects which makes a "compression" lossy. But also things like resolutions, frame-rate, sampling rate, etc. can be adjusted since a human can't pick-up the differences (e.g. try to notice the difference between 30fps and 25fps, or 1080P and 720P on a 40" CRT screen, or 48000Hz and 44100Hz on a MP3 player). These can be considered to be re-encoding instead of compression.
    Gold is the money of kings; silver is the money of gentlemen; barter is the money of peasants; but debt is the money of slaves. - Norm Franz
    And central banks are the slave clearing houses

  8. #18
    Diamond Member AndyD's Avatar
    Join Date
    Jan 2010
    Location
    Cape Town
    Posts
    4,924
    Thanks
    576
    Thanked 934 Times in 755 Posts
    Sorry guys, I wasn't trying to get into an argument about semantics, I was just getting confused about the processes being discussed.
    _______________________________________________

    _______________________________________________

  9. Thanks given for this post:

    tec0 (30-Mar-11)

  10. #19
    Diamond Member tec0's Avatar
    Join Date
    Jun 2009
    Location
    South Africa
    Posts
    4,624
    Thanks
    1,884
    Thanked 463 Times in 410 Posts
    Blog Entries
    3
    Quote Originally Posted by AndyD View Post
    Sorry guys, I wasn't trying to get into an argument about semantics, I was just getting confused about the processes being discussed.
    Need not worry, about it, lingo is sometimes confusing and with me at the keyboard doesn’t help matters at all... Sometimes I think the only thing that understands me is my african grey
    peace is a state of mind
    Disclaimer: everything written by me can be considered as fictional.

  11. #20
    Gold Member Sparks's Avatar
    Join Date
    Dec 2009
    Location
    Port Elizabeth
    Posts
    890
    Thanks
    20
    Thanked 127 Times in 96 Posts
    Are you saying that if I get an African Grey I will final be understood?

Page 2 of 3 FirstFirst 123 LastLast

Did you like this article? Share it with your favourite social network.

Did you like this article? Share it with your favourite social network.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •