Redb3ard, on Thu Jul 15, 2010 9:33 AM, said:
I hate to be critical of such work, please don't take my suggestion that way. Digitizing these books is important... they're the sort that are discarded even by libraries.
However, it is a very sizable archive. And for most of these books, they could be reduced in size without losing any of the information in the pages. I know that the standard is to digitize these as some raster format (usually jpeg, but for those who want high quality, they generally use tiff) and make PDFs of them. However, such books were usually light on actual images... a diagram here or there, things of that nature. Generally, for a single person, anything other than this would be very difficult... even with OCR, it makes so many mistakes that they can't be corrected by one man.
We could as a community transcribe it into html with very little work for any single individual. If each book averages 120 pages, then if you just put up a website where each person proof-reads a page or half a page from OCR, or those who know how if they were to do the markup (wrapping paragraphs in p tags, the occasional img for a diagram, the occasional table)... we could probably complete a book every few weeks.
Even with images, I'm betting that most books would clock in under about 3 megs.
No matter what though, congratulations and good work. People who work to preserve books, even for niche interests like this, are doing a very good thing. I'd say it was even noble, if it didn't sound so corny.
I hate to be critical but....many of the books do have a lot of imagines in them.
Modern OCR programs are actually pretty good as long as the source isn't too poor. Yes, you have to correct stuff and it does take time, but it's not so hard that 'they can't be corrected by one man.'
HTMLing does take a lot of time although with modern web page creation programs it's a lot easier than it used to be. Especially those damn tables.
If your really believe in what your saying about the preservation of books like these, you could email Kevin Savetz over at www.atarimagazines.com and offer to HTML the many books and magazines he has permission to post by the authors themselves. I did many books and magazines for him but unfortunately I don't have the time like I used to.
Allan