Thursday, October 4, 2018

Random Data

I had an algorithm before (2006-2007) that compresses always at 90% ratio, recursive. Though well thought out to be decodable, I think I had not written a decoder, or maybe yes. The problem with me is that when I have a breakthrough compression algorithm, with full decoder, I think danger from the data storage industry and the Feds, so I delete the compressors and decoders. I even at one time physically destroyed a hard disk for that matter. Not good for Science and Research indeed. 

That was before I decided to write an introductory book on data compression algorithms, finished by 2005 or 2006, but I already forgot the maybe breakthrough algorithms. 

I think John G. Cleary believes in compression of random data--or he was just playing me. 

When I was a child genius, I already co-authored and edited books on advanced subjects, using American or international pseudonyms; books on Calculus, Modern Telecommunications, Artificial Intelligence, etc. 

ROLZ is an "easy to arrive at" idea. As well as Huffman, LZ77/LZW, and PPP/LZP. 

What is hard is arithmetic coding. And Asymmetric Numeral System (ANS) but I was thinking before of compression via complex numbers. And during Star Trek time when we were child geniuses, another used the Towers of Hanoi algorithm for compression! Data compression was easy then we can use any math principle or theorem for compression, and AI. I had an arithmetic coding program before that was converted easily to Huffman coding, showing that they're related. 

Yes, data compression is addictive. To see that your program fully compresses and decompresses to the original file is exhilarating, your adrenaline rushing, especially if you continue to beat your current compression ratio. 

But the compression bug no longer bites me. I stopped coding in 2010 or 2011. 

I independently rediscovered the ideas behind the PAQ algorithm and BWT, but wasn't able to solve BWT. 

Attempts on universal compression or compression of random data mostly degenerate to LZ77/LZ78 (i.e. enumeration of repeated sequences into number codes), and symbol ranking (or MTF, circular/cyclic buffers). 

Previously, they thought arithmetic coding is the end of it. Now there is ANS. 

Maybe the next compression algorithm will be so simple after all. I still have 2-3 "probably breakthrough" algorithms in mind. 


No comments:

Post a Comment