Page 1 of 5
Posts from the old board
Posted: Mon Jul 09, 2018 5:42 am
by zompist
To reduce confusion, here are my present goals:
- Retain posts on this board, from now on
- Make posts from the old board available in some form
That is, given last week's experience, I'm extremely reluctant to try anything that messes with the one working board we have. And I'd rather people not be tentative about posting here.
I have the phpbb backup of the previous board... if you're curious, the zipped version is about 99M, the unzipped version is 350M, and the database itself is 850M. Not all of that needs to be saved, but any pruning requires that all of it be available anyway.
Options I can think of right now:
- Somehow merge the data onto the new board. Trying this directly was a huge disaster, but maybe it can be done programmatically.
- Make the old board available, read-only, as a subsite of this board. (So it would be hosted in one place, and without the frigging errors.)
- Continue to maintain the old site, read-only. The errors are continuing there, so this is not optimal. Plus I'm sick of Dreamhost.
- Process the backup file programmatically to make it readable, not using phpBB. Most of phpBB's functionality is not needed if new posts are not being made.
Research is continuing, but at least one of these approaches should work out.
Re: Posts from the old board
Posted: Mon Jul 09, 2018 9:23 am
by mèþru
I favour doing #2 or #4 as a temporary measure until you figure out a safe way to do #1.
Re: Posts from the old board
Posted: Mon Jul 09, 2018 7:06 pm
by Pabappa
OK thanks for the update. A few weeks ago I saved the dream thread as a precaution ... on the off chance that even the DB file is corrupt, i could theoreitcally mirror the Dream Thread if anyone wants it.
I think mèþru's plan is ideal ... a permanent archive of the board should be readable and easy to navigate if there is a program or script that can make the internal links work ... that would be your solution #4. If no such program exists, it would be solution #2.
Im not a programmer, so I cant really predict if moving the old ZBB here as a read-only archive would lead to the same persistent HTTP 500 errors or not.
Definitely glad to be back. I hope for the best, whatever that may be.
Re: Posts from the old board
Posted: Wed Jan 22, 2020 12:47 pm
by Vijay
Does the old board look odd to anyone else right now? Is that a fixable issue?
Re: Posts from the old board
Posted: Wed Jan 22, 2020 1:06 pm
by Estav
Vijay wrote: ↑Wed Jan 22, 2020 12:47 pm
Does the old board look odd to anyone else right now? Is that a fixable issue?
See here:
https://www.verduria.org/viewtopic.php? ... 180#p22942 It looks like nothing is permanently lost/destroyed.
Re: Posts from the old board
Posted: Wed Jan 22, 2020 1:46 pm
by Vijay
Hmm...I don't think I saw this particular problem until just yesterday, though.
Re: Posts from the old board
Posted: Wed Jan 22, 2020 6:38 pm
by bradrn
I noticed that problem as well:
https://www.verduria.org/viewtopic.php?f=5&t=497. As detailed in that thread,
some of it is available on archive.org, but not all of it was indexed before it broke.
zompist wrote: ↑Mon Jul 09, 2018 5:42 am
Options I can think of right now:
- Somehow merge the data onto the new board. Trying this directly was a huge disaster, but maybe it can be done programmatically.
- Make the old board available, read-only, as a subsite of this board. (So it would be hosted in one place, and without the frigging errors.)
- Continue to maintain the old site, read-only. The errors are continuing there, so this is not optimal. Plus I'm sick of Dreamhost.
- Process the backup file programmatically to make it readable, not using phpBB. Most of phpBB's functionality is not needed if new posts are not being made.
Research is continuing, but at least one of these approaches should work out.
As for maintaining the old site, I think that #2 sounds best. #1 would really be preferable, but quite a few of the threads from the old board have been resurrected on this one, and I feel that could lead to confusion. #3 would also be good, but #2 is preferable since you don’t want to maintain hosting. And #4 sounds tricky: you’d have to find a BBCode parser that can also manage the custom tags we have here (I seem to remember there were a couple on the old board as well), and then render it all and find somewhere to put it.
Re: Posts from the old board
Posted: Wed Jan 22, 2020 7:17 pm
by Pabappa
I would volunteer to do Plan #4 if I had a fully automated prcoess to streamline it, just for the sake of putting my own design on it and ending up with a website that looks unlike anything else on the Internet.
but I dont have such a program, and Im not willing to do the work by hand. A hypthetical post-by-post copying would probably run into the thousands of hours assuming a rate of ten minutes per post, or even five minutes per post, and thats assuming we mostly just want the threads in L&L .... if we wanted the entire thing, assuming the listed post count of 200,007 is accurate, it would be off the charts.
Even if there were a phpBB parser program that was 99% automated and 1% manual .... e.g. it faithfully renders everything except the page number links .... it could still take hundreds to thousands of hours to put the entire database through it. And I dont know of any fully automated program that would do what we want.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 12:44 am
by Nortaneous
bradrn wrote: ↑Wed Jan 22, 2020 6:38 pm
And #4 sounds tricky: you’d have to find a BBCode parser that can also manage the custom tags we have here (I seem to remember there were a couple on the old board as well), and then render it all and find somewhere to put it.
There are plenty of BBCode parsing libraries that allow custom tag definition. The main problems would be:
- figuring out phpBB's database structure and working out the transformations necessary to get it into a viewable state
- implementing search, which I assume people would want and which you don't get for free without phpBB
Pabappa wrote: ↑Wed Jan 22, 2020 7:17 pm
Even if there were a phpBB parser program that was 99% automated and 1% manual .... e.g. it faithfully renders everything except the page number links .... it could still take hundreds to thousands of hours to put the entire database through it. And I dont know of any fully automated program that would do what we want.
phpBB uses some flavor or other of SQL database. There are multiple options, but all you'd have to do is restore the dump with the right database software. (Assuming the dump isn't corrupt and there were no breaking changes between the version of that software the old ZBB used and whichever one you have.) It's also possible to migrate data between types of database - from MySQL to PostgreSQL, say - but it's a pain in the ass*.
* to be fair, the only times I've had to do this were cases where one of the types was SQLite, and SQLite is a little weird - MySQL to PostgreSQL or something might be easy for all I know
Database software is pretty fast. Consider how much faster Javascript has gotten now that there are a lot of companies that use it. Now consider that SQL has been around since the '70s and basically everything uses it. You'd have to leave it to copy overnight, but it wouldn't take thousands of hours to import 850mb of data, and a lot of the speed bottleneck will be in the hard drive.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 12:56 am
by Pabappa
Oh, right, search. That means my redesign idea is going to be useless so I have to retract even the promise to do just the custom redesign idea since I wouldnt be able to pull it off from within phpBB. That said I dont know how we could possibly do Option #4 and have search without setting up another phpBB board, which, if I understand right, would make it identical to #2.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 1:04 am
by Nortaneous
Pabappa wrote: ↑Thu Jan 23, 2020 12:56 am
That said I dont know how we could possibly do Option #4 and have search without setting up another phpBB board, which, if I understand right, would make it identical to #2.
Just throw it in a Rails app. There's probably a library to handle it. If full-text search with pure SQL is too slow, could use Elasticsearch or something.
I could probably put something together within two weeks. It wouldn't necessarily be the most featureful thing, but it'd show posts and topic lists. (I've never actually
used Elasticsearch.)
Re: Posts from the old board
Posted: Thu Jan 23, 2020 1:15 am
by bradrn
Nortaneous wrote: ↑Thu Jan 23, 2020 1:04 am
Pabappa wrote: ↑Thu Jan 23, 2020 12:56 am
That said I dont know how we could possibly do Option #4 and have search without setting up another phpBB board, which, if I understand right, would make it identical to #2.
Just throw it in a Rails app. There's probably a library to handle it. If full-text search with pure SQL is too slow, could use Elasticsearch or something.
I could probably put something together within two weeks. It wouldn't necessarily be the most featureful thing, but it'd show posts and topic lists. (I've never actually
used Elasticsearch.)
Do we even
need Rails? Surely a static site would work just as well, since the old board won’t be updated anymore.
Also, I would question whether we even need search. Personally, I mainly want the old board back so I can access certain threads I kept on using as resources (e.g. the Vowel Systems thread). Search seems orthogonal to this purpose, and something which can be added to it later.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 6:54 am
by masako
bradrn wrote: ↑Thu Jan 23, 2020 1:15 am
Personally, I mainly want the old board back so I can access certain threads I kept on using as resources (e.g. the Vowel Systems thread).
Just a thought.
When/if you need/want something from the old board, copypasta it onto a worddoc (or similar program you have access to) and save it somewhere. I use Google Docs and I have buttloads of stuff from multiple sites including the old board.
This message brought to you by The Hemingway Self-Help Society.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 10:24 am
by Nortaneous
bradrn wrote: ↑Thu Jan 23, 2020 1:15 am
Do we even
need Rails? Surely a static site would work just as well, since the old board won’t be updated anymore.
A static site would work, but wouldn't it take a while to generate? So you'd want to be pretty sure you're getting everything right.
On the other hand, you might be able to throw it on Github Pages for free hosting, although I don't know if they have size or page number limits that would get in the way.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 5:43 pm
by Pabappa
having looked over it, I think the labeled post count of 200007 is in fact real, and does not include the many posts that were pruned from Ephemera, NOTA, and C&CQ. assuming an average of 1K per post, thats about 200 MB, which isnt really that huge by today's standards ... so although I dont think the raw file size would be a barrier, i wonder if a database with 200,000 entries would exceed some other enforced limit of a free webhost like GitHub. on the other hand, reading this thread I got the impression that zomp was hoping to host it under this domain, which is big enough to host this board and probably could accomodate another one even if it is larger.
A bare bones UI might be best, if we have the right tools .... we dont need a phpBB UI interface, since nobody is going to need to login, reply, send PMs, etc .... we might dispsense with signatures, avatars, post counts, locations, etc too depending on people's wishes. The topic of each thread basically just keeps getting repeated on every post, with rare exceptions, so we could keep the topic in the header but remove it from the posts. Losing all of the window dressing would allow us to fit more content on the screen. It would be important to at least keep people's usernames so we can follow conversations, since not everybody uses the Quote function, ...perhaps the usernames could go where the topics are now, or maybe they could even move inline in a floating DIV tag or at teh bottom of the post replacing the signature.
My ideas for site redesigns were along the lines of
http://pabappa.com/etc/2020dreams9.html and
http://pabappa.com/play/late_andanese.html ... if we can get the bare-bones site up and running, we can always decide on what it looks like later since it would just involve editing CSS files, or maybe even a single CSS file. Perhaps the different color schemes for each post could signify different forums, so that e.g. L&L is light green, and C&C is light blue ... or we could use different colors for different people within each thread ... though I dont know if everyone would be happy with a randomly assigned color.
I like my sites because they buck the modern trend of filling the screen with graphic-heavy UI elements and reducing the content area to just a small fraction of the screen .... but thats just my personal preference. I wouldnt try to push my ideas on other people just for the sake of vanity.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 6:37 pm
by zompist
I thought I should look at the backup I have and see what it consists of...
I have copies of all the phpbb files, plus a database dump, which consists of a SQL file which would re-create the board's database. The good news is that it's readable Unicode and it's not hard to find the posting data. The bad news is that it's 356M. (For what it's worth, there's a lot that isn't needed— e.g. there's a huge number of records devoted to phpbb's laughable search data. But I think the bulk is posts.)
Now, my vague idea was to write some programs to trawl this file and repackage it in a way that could allow a web interface. I've written several pages (the new Almeopedia and numbers list) that rely on reading large text files on the fly; however, these are on the order of 0.5M, not 356M.
Plus, of course, it's a database based on record numbers. So e.g. the posting text is readily readable, but each entry just refers to a user ID and topic ID, and the topic IDs to a forum ID, etc. So all of that would have to be indexed, and a file structure created that uses much smaller files.
I think it's doable, but, well, I also haven't worked on it, and probably won't for awhile.... Too many other live projects. Again, the data is all there so eventually we'll have access to it.
I'm not opposed to someone else working on it, but I'd want to be very careful. If nothing else, the users table has everyone's e-mail and (presumably encoded, but recoverable) passwords. I'm not gonna just put the SQL dump out where anyone can get at that information. (I mean, people shouldn't use the same password for multiple sites... but it's irresponsible to assume they don't and provide hackers with free data.)
As mentioned earlier, it looks like Dreamhost upgraded PHP in a way that broke phpbb on the old board, which is why it's having problems now. So another approach is to try to upgrade the phpbb software. Who knows, it might work! Or it might not; I haven't done an upgrade like that and don't know if it'd work.
If people want to trawl the Wayback MAchine... well, good luck, but it sounds like an awful lot of work. Believe me, from back when I was trying to prune old posts by hand: a few thousand topics is a lot of topics.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 7:40 pm
by bradrn
Pabappa wrote: ↑Thu Jan 23, 2020 5:43 pm
A bare bones UI might be best, if we have the right tools .... we dont need a phpBB UI interface, since nobody is going to need to login, reply, send PMs, etc .... we might dispsense with signatures, avatars, post counts, locations, etc too depending on people's wishes. The topic of each thread basically just keeps getting repeated on every post, with rare exceptions, so we could keep the topic in the header but remove it from the posts. Losing all of the window dressing would allow us to fit more content on the screen. It would be important to at least keep people's usernames so we can follow conversations, since not everybody uses the Quote function, ...perhaps the usernames could go where the topics are now, or maybe they could even move inline in a floating DIV tag or at teh bottom of the post replacing the signature.
I like this idea!
zompist wrote: ↑Thu Jan 23, 2020 6:37 pm
I thought I should look at the backup I have and see what it consists of...
I have copies of all the phpbb files, plus a database dump, which consists of a SQL file which would re-create the board's database. The good news is that it's readable Unicode and it's not hard to find the posting data. The bad news is that it's 356M. (For what it's worth, there's a lot that isn't needed— e.g. there's a huge number of records devoted to phpbb's laughable search data. But I think the bulk is posts.)
Now, my vague idea was to write some programs to trawl this file and repackage it in a way that could allow a web interface. I've written several pages (the new Almeopedia and numbers list) that rely on reading large text files on the fly; however, these are on the order of 0.5M, not 356M.
Plus, of course, it's a database based on record numbers. So e.g. the posting text is readily readable, but each entry just refers to a user ID and topic ID, and the topic IDs to a forum ID, etc. So all of that would have to be indexed, and a file structure created that uses much smaller files.
I think it's doable, but, well, I also haven't worked on it, and probably won't for awhile.... Too many other live projects. Again, the data is all there so eventually we'll have access to it.
I'm not opposed to someone else working on it, but I'd want to be very careful. If nothing else, the users table has everyone's e-mail and (presumably encoded, but recoverable) passwords. I'm not gonna just put the SQL dump out where anyone can get at that information. (I mean, people shouldn't use the same password for multiple sites... but it's irresponsible to assume they don't and provide hackers with free data.)
Would it be possible to programmatically remove the e-mails and passwords? (I’m thinking something along the lines of
UPDATE Users SET Passwords = '<removed>', Emails = '<removed>'.) Then you could put it online and let other people have a look at it. I know I’d be happy to help: I’m terrible at any sort of database work, but on the other hand I have lots and lots of free time right now. (It’s university summer holidays at the moment where I am.)
As mentioned earlier, it looks like Dreamhost upgraded PHP in a way that broke phpbb on the old board, which is why it's having problems now. So another approach is to try to upgrade the phpbb software. Who knows, it might work! Or it might not; I haven't done an upgrade like that and don't know if it'd work.
Since you already have the database backed up, I think you should try this before you do anything else. If it works, great, we don’t need to try figure out what phpBB does with its database! And if it doesn’t, well, we still have all these alternate approaches, and it can’t really damage the board any more than it already is…
Re: Posts from the old board
Posted: Thu Jan 23, 2020 9:06 pm
by Pabappa
I wouldnt think the passwords would be recoverable, since if they were, people who host phpBB boards could use them to hack their own users' accounts on other sites. I havent looked into it, but I do remember one phpBB board admin who hacked the administrator of a rival phpBB board, and to do it he had to set up a false registration process where the password was sent unencrypted. (This worked because the rival board owner just so happened to use the same password for both boards.) I wouldnt think it would be necessary for the perpetrator to do this if it were possible to just unhash the passwords in the database. But again, I havent looked into this.
Re: Posts from the old board
Posted: Thu Jan 23, 2020 9:09 pm
by bradrn
Pabappa wrote: ↑Thu Jan 23, 2020 9:06 pm
I wouldnt think the passwords would be recoverable, since if they were, people who host phpBB boards could use them to hack their own users' accounts on other sites. I havent looked into it, but I do remember one phpBB board admin who hacked the administrator of a rival phpBB board, and to do it he had to set up a false registration process where the password was sent unencrypted. (This worked because the rival board owner just so happened to use the same password for both boards.) I wouldnt think it would be necessary for the perpetrator to do this if it were possible to just unhash the passwords in the database. But again, I havent looked into this.
Even if the passwords are unrecoverable, I still think that it would be better to obliterate them, just in case there is some method to recover them.
Re: Posts from the old board
Posted: Fri Jan 24, 2020 12:50 pm
by KathTheDragon
While the passwords would be encrypted, it's not safe to rely on that if the hashes are public, as any hash can be in principle guessed when you don't have to worry about which username you're trying to log in with.