Posts from the old board

Topics that can go away
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

Ser wrote: Sat Mar 07, 2020 11:34 am
bradrn wrote: Fri Mar 06, 2020 7:57 pmOn the other hand, I was talking to someone I know who works in IT, and he suggested an alternative approach. As I understand it, the problem with restoring the backup of the old ZBB to this board was that the old version was too out of date (inferred from https://www.verduria.org/viewtopic.php?f=5&t=3);
I mentioned in this thread that the problem is that phpBB 3.0.12 relies on a "modifier" of a PHP function argument that was removed in PHP 7 due to security vulnerabilities (modifier 'e' of preg_replace). (Did you see no errors in your local install while using PHP 7? It'd be amusing if you didn't.) Then, in the post below that, I said that successfully updating phpBB should work, with a tip about mysqli configuration.
Oh yes, I did see those errors! Then I downloaded an earlier version of PHP and they went away. The idea with progressively upgrading the phpBB version would be that once the database is upgraded to be compatible with the latest phpBB version, then it could be hosted without running into those errors.
Thanks for your work so far on the database!
You’re welcome!
I think that we could and should make the limited archive anyway even if incatena.org could be restored to a functional state.
Given that upgrading phpBB is probably easier than extracting the relevant information from the database and turning it into an archive (or at least that’s my impression), I’d be curious to know why you think this.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

bradrn wrote: Sat Mar 07, 2020 4:50 pmGiven that upgrading phpBB is probably easier than extracting the relevant information from the database and turning it into an archive (or at least that’s my impression), I’d be curious to know why you think this.
Not much of a reason, it's just that gradually upgrading phpBB, as suggested, involves more work for zompist than us agreeing on a SQL query to produce shareable dumps. I often promise to do relatively simple things but then take a long-ass time to get around doing them, like how four months ago I told Ars Lande that I'd set up a MediaWiki install with Visual Editor but I still haven't set time aside for it... Zompist could potentially get around doing the SQL query faster than the phpBB gradual updates.
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

Ser wrote: Sat Mar 07, 2020 5:13 pm
bradrn wrote: Sat Mar 07, 2020 4:50 pmGiven that upgrading phpBB is probably easier than extracting the relevant information from the database and turning it into an archive (or at least that’s my impression), I’d be curious to know why you think this.
Not much of a reason, it's just that gradually upgrading phpBB, as suggested, involves more work for zompist than us agreeing on a SQL query to produce shareable dumps. I often promise to do relatively simple things but then take a long-ass time to get around doing them, like how four months ago I told Ars Lande that I'd set up a MediaWiki install with Visual Editor but I still haven't set time aside for it... Zompist could potentially get around doing the SQL query faster than the phpBB gradual updates.
Good point — I find I do much the same thing.

As for using SQL queries to extract the relevant bits of the database, I’m thinking that we need:
  • SELECT user_id, group_id, username, user_posts, rank_title, user_avatar, user_avatar_width, user_avatar_height, user_sig, user_sig_bbcode_uid, user_sig_bbcode_bitfield FROM phpbb_users INNER JOIN phpbb_ranks ON phpbb_ranks.rank_id = phpbb_users.user_rank WHERE user_type <> 2;
  • SELECT forum_id, forum_name, forum_description, forum_topic FROM phpbb_forums WHERE parent_id <> 0;
  • SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
  • SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, enable_sig, post_subject, post_text, post_attachment FROM phpbb_posts;
  • SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
  • It’s probably also wise to add an INTO OUTFILE clause to write the output of each query to a file.
So, what do you think about these?
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
User avatar
cedh
Posts: 201
Joined: Fri Jul 13, 2018 9:55 am
Location: Tübingen, Germany
Contact:

Re: Posts from the old board

Post by cedh »

Thanks a lot for your work!
I think we could go even more bare-bones (at least if those queries turn out to result in errors), because all we really need are the forums, topics, posts, and usernames. Things like sigs, avatars, ranks, view options, and attachments could just as well be left out IMO...
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

cedh wrote: Sun Mar 08, 2020 4:41 am Thanks a lot for your work!
I think we could go even more bare-bones (at least if those queries turn out to result in errors), because all we really need are the forums, topics, posts, and usernames. Things like sigs, avatars, ranks, view options, and attachments could just as well be left out IMO...
The view options are unfortunately necessary, because they’re needed in order to decode the BBCode using phpBB. One could argue that we could find another BBCode-parsing library, but the raw posts as stored in the database seem to have a slightly different syntax to the BBCode we edit. For instance, this:

Code: Select all

This is Post 1 in Topic A of Subforum 1.

[b]EDIT:[/b] This is a moderator edit.
Gets stored in the database as this:

Code: Select all

This is Post 1 in Topic A of Subforum 1.

[b:8xpf2578]EDIT:[/b:8xpf2578] This is a moderator edit.
As for the other areas you listed: After thinking about it, I do agree with you that sigs, avatars and ranks probably aren’t essential. I would like to rescue attachments though, as those could contain things like maps, con-scripts and explanatory images which could turn out to be important.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

bradrn wrote: Sat Mar 07, 2020 6:40 pmAs for using SQL queries to extract the relevant bits of the database, I’m thinking that we need:
  • SELECT user_id, group_id, username, user_posts, rank_title, user_avatar, user_avatar_width, user_avatar_height, user_sig, user_sig_bbcode_uid, user_sig_bbcode_bitfield FROM phpbb_users INNER JOIN phpbb_ranks ON phpbb_ranks.rank_id = phpbb_users.user_rank WHERE user_type <> 2;
  • SELECT forum_id, forum_name, forum_description, forum_topic FROM phpbb_forums WHERE parent_id <> 0;
  • SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
  • SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, enable_sig, post_subject, post_text, post_attachment FROM phpbb_posts;
  • SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
  • It’s probably also wise to add an INTO OUTFILE clause to write the output of each query to a file.
So, what do you think about these?
Alright, I just had a close look at this, and I have the following observations:

- Attachments are in the /files/ subdirectory on the server, not the database. The files in that subdirectory apparently have a generated physical_filename alphanumeric string as their filename too (no extension). Zompist would need to download those and provide them to us if we want to retain attachments. I agree they'd be good to have, because surely some conlang posts make good use of them.
- I had some concerns about attachments in private messages, but it turns out you can't attach files on private messages, so there was nothing to be concerned about.
- I agree about getting rid of user sigs, avatars and ranks.
- In the query for the phpbb_users table, I suggest further getting rid of group_id and user_posts. We don't need group_id because that just tells us who the moderators were, and the number in user_posts isn't even accurate because it includes the user's entire histories (with lots of no-longer-extant posts). Also, user_posts just tells us something social about users, but it's irrelevant for post quality: there are certainly users with very low post counts but good contributions (people like M Mira, say).
- In the query for the phpbb_topics table, I suggest getting rid of topic_last_post_subject. Most people don't change post subjects anyway, so 99.9% of the time this is just a redundant "Re: [topic_title]". (phpBB keeps this around to make loading the board index faster, as the subforum quickview includes the subject of the last post.)
- In the query for the phpbb_posts table, I suggest getting rid of post_subject, also because, 99.9% of the time, users don't bother changing the post subject from the default value.

Therefore, implementing my various suggestions, I come up with the following SQL queries. (Note that I eliminated parent_id <> 0; in the phpbb_forums query, because, I mean, it's just one row that's affected by that...)
[list]
[*] SELECT user_id, username FROM phpbb_users WHERE user_type <> 2;
[*] SELECT forum_id, forum_name, forum_description, forum_topic FROM phpbb_forums;
[*] SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_time FROM phpbb_topics;
[*] SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, post_text, post_attachment FROM phpbb_posts;
[*] SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
[/list]

EDIT: These queries have now been replaced with other queries down below.

Regarding the INTO OUTFILE clause: I vaguely remember zompist once mentioning that the old forum actually used an SQLite database, because the database had been started a very long time ago (the oldest users in it were from 2002, presumably before phpBB enforced using MySQL/MariaDB?) If it is really an SQLite database, we'll need to write and use something a bit different with the sqlite program.

By the way, regarding the pains that doing the slow updates to phpBB 3.2 would imply, I found that updating to version 3.1.3 means that attachments stop working and need to be reuploaded. Sounds fun. I am sure there are other problems like that as well (besides the mysqli thing I mentioned in a previous post).
Last edited by Kuchigakatai on Sun Mar 15, 2020 8:05 pm, edited 1 time in total.
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

By the way, is there really much of an interest in retaining the content of Ephemera and None of the Above?

Ephemera just had... ephemeral content when zompist took the snapshot of the database...

...And as for None of the Above, that subforum surely contains some things of linguistic or conworld interest, like people posting new things they've made in the Creativity of the day thread and mayyyyybe the Happy Things Thread, but most of the rest is useless stuff about politics and stuff like censuses, ils's Links of Interest thread (many of which are broken by now I'm sure...), the otter picture thread, etc...

Besides those posts in the Creativity of the day thread, specific threads of linguistic or conworlding interest that would be in the database are, IMO:
- Fun with Fontforge, Graphite, and OpenOffice.org (has video) [Guitarplayer's tutorial with posts invisible at the moment, the videos might still be up on YouTube?]
- LCK Book [this is the 2010 thread with 12 pages of critique of the dead tree LCK's version 1, which would still be useful for me because that's the LCK that I have...]
- A brief overview of the development of Western Philosophy [by Salmoneus, 11 pages of discussion]
- SCA2 questions [SCA2 help, 2013-2016, 4 pages]
- Zompist books feedback [various minor corrections of his books other than the LCK, 3 pages]
- Linguistic Struggles Thread [IIRC mostly silly mistakes when learning some natlangs, but maybe of some interest?, 4 pages]
- A quick intro to Classical Music [by Salmoneus, 2 pages, but lots of content in posts]
- Classical Composers [by Salmoneus, 5 pages, same]

The Official ZBB Quote Thread is also there in NOTA, if anyone wants to save that...
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

Ser wrote: Sun Mar 15, 2020 5:26 pm
bradrn wrote: Sat Mar 07, 2020 6:40 pmAs for using SQL queries to extract the relevant bits of the database, I’m thinking that we need:
  • SELECT user_id, group_id, username, user_posts, rank_title, user_avatar, user_avatar_width, user_avatar_height, user_sig, user_sig_bbcode_uid, user_sig_bbcode_bitfield FROM phpbb_users INNER JOIN phpbb_ranks ON phpbb_ranks.rank_id = phpbb_users.user_rank WHERE user_type <> 2;
  • SELECT forum_id, forum_name, forum_description, forum_topic FROM phpbb_forums WHERE parent_id <> 0;
  • SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
  • SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, enable_sig, post_subject, post_text, post_attachment FROM phpbb_posts;
  • SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
  • It’s probably also wise to add an INTO OUTFILE clause to write the output of each query to a file.
So, what do you think about these?
Alright, I just had a close look at this, and I have the following observations:
Thanks for finding the time to look at these! A couple of questions/comments:
- Attachments are in the /files/ subdirectory on the server, not the database. The files in that subdirectory apparently have a generated physical_filename alphanumeric string as their filename too (no extension). Zompist would need to download those and provide them to us if we want to retain attachments. I agree they'd be good to have, because surely some conlang posts make good use of them.
Thanks for tracking these down! I do agree that zompist would probably need to provide these to us to retain attachments.
- I had some concerns about attachments in private messages, but it turns out you can't attach files on private messages, so there was nothing to be concerned about.
- I agree about getting rid of user sigs, avatars and ranks.
- In the query for the phpbb_users table, I suggest further getting rid of group_id and user_posts. We don't need group_id because that just tells us who the moderators were, and the number in user_posts isn't even accurate because it includes the user's entire histories (with lots of no-longer-extant posts). Also, user_posts just tells us something social about users, but it's irrelevant for post quality: there are certainly users with very low post counts but good contributions (people like M Mira, say).
I think I agree with all of this.
- In the query for the phpbb_topics table, I suggest getting rid of topic_last_post_subject. Most people don't change post subjects anyway, so 99.9% of the time this is just a redundant "Re: [topic_title]". (phpBB keeps this around to make loading the board index faster, as the subforum quickview includes the subject of the last post.)
- In the query for the phpbb_posts table, I suggest getting rid of post_subject, also because, 99.9% of the time, users don't bother changing the post subject from the default value.
I’m not sure I agree with these: some people (including me) regularly change the post subject to indicate a new version or other significant change. But I do agree that these aren’t essential.
Therefore, implementing my various suggestions, I come up with the following SQL queries. (Note that I eliminated parent_id <> 0; in the phpbb_forums query, because, I mean, it's just one row that's affected by that...)
  • SELECT user_id, username FROM phpbb_users WHERE user_type <> 2;
  • SELECT forum_id, forum_name, forum_description, forum_topic FROM phpbb_forums;
  • SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_time FROM phpbb_topics;
  • SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, post_text, post_attachment FROM phpbb_posts;
  • SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
Looks good, aside from my reservations outlined above.
Regarding the INTO OUTFILE clause: I vaguely remember zompist once mentioning that the old forum actually used an SQLite database, because the database had been started a very long time ago (the oldest users in it were from 2002, presumably before phpBB enforced using MySQL/MariaDB?) If it is really an SQLite database, we'll need to write and use something a bit different with the sqlite program.
Why didn’t you tell me this earlier‽‽ I spent hours trying to get MySQL working and connected to phpBB specifically so I could be running the same database zompist was… Ah, well. There’s no point in getting upset about it now — I just hope that phpBB uses the same database structure with every database type (although I can’t think of any reason why it wouldn’t).
By the way, regarding the pains that doing the slow updates to phpBB 3.2 would imply, I found that updating to version 3.1.3 means that attachments stop working and need to be reuploaded. Sounds fun. I am sure there are other problems like that as well (besides the mysqli thing I mentioned in a previous post).
Looking at the linked thread, it looks like attachments stopped working purely because FileZilla tried to treat them as ASCII — surely that would be a problem with FileZilla rather than phpBB, which would apply to any situation (including simply making an archive!) where you’re downloading the attachments. As for mysqli, I found what you originally said about it, and it doesn’t sound particularly difficult (you only have to change one line of PHP configuration). But I do agree that doing an upgrade would probably be much trickier for zompist than just running a couple of SQL queries.
Ser wrote: Sun Mar 15, 2020 5:51 pm By the way, is there really much of an interest in retaining the content of Ephemera and None of the Above?

Ephemera just had... ephemeral content when zompist took the snapshot of the database...

...And as for None of the Above, that subforum surely contains some things of linguistic or conworld interest, like people posting new things they've made in the Creativity of the day thread and mayyyyybe the Happy Things Thread, but most of the rest is useless stuff about politics and stuff like censuses, ils's Links of Interest thread (many of which are broken by now I'm sure...), the otter picture thread, etc...

Besides those posts in the Creativity of the day thread, specific threads of linguistic or conworlding interest that would be in the database are, IMO:
- Fun with Fontforge, Graphite, and OpenOffice.org (has video) [Guitarplayer's tutorial with posts invisible at the moment, the videos might still be up on YouTube?]
- LCK Book [this is the 2010 thread with 12 pages of critique of the dead tree LCK's version 1, which would still be useful for me because that's the LCK that I have...]
- A brief overview of the development of Western Philosophy [by Salmoneus, 11 pages of discussion]
- SCA2 questions [SCA2 help, 2013-2016, 4 pages]
- Zompist books feedback [various minor corrections of his books other than the LCK, 3 pages]
- Linguistic Struggles Thread [IIRC mostly silly mistakes when learning some natlangs, but maybe of some interest?, 4 pages]
- A quick intro to Classical Music [by Salmoneus, 2 pages, but lots of content in posts]
- Classical Composers [by Salmoneus, 5 pages, same]

The Official ZBB Quote Thread is also there in NOTA, if anyone wants to save that...
Not sure about Ephemera (I never posted there in my limited interaction with the old board), but I have a keen interest in retaining None of the Above: it contains the only documentation of the SCA I made. I think it’s probably best to save all of it, even if it’s irrelevant: it can’t take much more effort than restoring the rest of the board would, and it would avoid any complaints along the lines of ‘why wasn’t [insert my favourite thread here] saved?’.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

bradrn wrote: Sun Mar 15, 2020 6:25 pmI’m not sure I agree with these: some people (including me) regularly change the post subject to indicate a new version or other significant change. But I do agree that these aren’t essential.
I'm pretty sure that if I wrote a script to see how often people change the post subject, it'd turn out almost no one does, but okay.
Why didn’t you tell me this earlier‽‽ I spent hours trying to get MySQL working and connected to phpBB specifically so I could be running the same database zompist was… Ah, well. There’s no point in getting upset about it now — I just hope that phpBB uses the same database structure with every database type (although I can’t think of any reason why it wouldn’t).
Well, I just vaguely remembered zompist saying something along those lines a long time ago (probably seven years ago or so). It might not even be true.

Zompist, what is the nature of the database?
Looking at the linked thread, it looks like attachments stopped working purely because FileZilla tried to treat them as ASCII — surely that would be a problem with FileZilla rather than phpBB, which would apply to any situation (including simply making an archive!) where you’re downloading the attachments. As for mysqli, I found what you originally said about it, and it doesn’t sound particularly difficult (you only have to change one line of PHP configuration). But I do agree that doing an upgrade would probably be much trickier for zompist than just running a couple of SQL queries.
Oh, that's right, that was mostly a Filezilla misconfiguration. Nevermind then.
Not sure about Ephemera (I never posted there in my limited interaction with the old board), but I have a keen interest in retaining None of the Above: it contains the only documentation of the SCA I made. I think it’s probably best to save all of it, even if it’s irrelevant: it can’t take much more effort than restoring the rest of the board would, and it would avoid any complaints along the lines of ‘why wasn’t [insert my favourite thread here] saved?’.
Yeah, alright.

So... I guess these are the revised SQL queries so far, with post_subject and topic_last_post_subject inserted back in:
  • SELECT user_id, username FROM phpbb_users WHERE user_type <> 2;
  • SELECT forum_id, forum_name, forum_desc, forum_topics FROM phpbb_forums;
  • SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
  • SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, post_subject, post_text, post_attachment FROM phpbb_posts;
  • SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
Last edited by Kuchigakatai on Mon Mar 16, 2020 12:49 am, edited 1 time in total.
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

Ser wrote: Sun Mar 15, 2020 8:04 pm
bradrn wrote: Sun Mar 15, 2020 6:25 pmI’m not sure I agree with these: some people (including me) regularly change the post subject to indicate a new version or other significant change. But I do agree that these aren’t essential.
I'm pretty sure that if I wrote a script to see how often people change the post subject, it'd turn out almost no one does, but okay.
Well, you’re probably right. But I don’t see any harm in including it: in the worst case, we simply end up with an extra unused column.
Ser wrote: Sun Mar 15, 2020 8:04 pm
Why didn’t you tell me this earlier‽‽ I spent hours trying to get MySQL working and connected to phpBB specifically so I could be running the same database zompist was… Ah, well. There’s no point in getting upset about it now — I just hope that phpBB uses the same database structure with every database type (although I can’t think of any reason why it wouldn’t).
Well, I just vaguely remembered zompist saying something along those lines a long time ago (probably seven years ago or so). It might not even be true.

Zompist, what is the nature of the database?
I’d really like to know this as well. But I have MySQL, PostgreSQL and sqlite all installed on my computer, so at least I should be able to cope with whatever database type gets used.
So... I guess these are the revised SQL queries so far, with post_subject and topic_last_post_subject inserted back in:
  • SELECT user_id, username FROM phpbb_users WHERE user_type <> 2;
  • SELECT forum_id, forum_name, forum_description, forum_topic FROM phpbb_forums;
  • SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
  • SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, post_subject, post_text, post_attachment FROM phpbb_posts;
  • SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
Yes, I think I’m happy with this.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
zompist
Site Admin
Posts: 2992
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: Posts from the old board

Post by zompist »

bradrn wrote: Sun Mar 15, 2020 8:22 pm
Ser wrote: Sun Mar 15, 2020 8:04 pm Zompist, what is the nature of the database?
I’d really like to know this as well. But I have MySQL, PostgreSQL and sqlite all installed on my computer, so at least I should be able to cope with whatever database type gets used.
Didn't you just set up a phpbb installation? It's whatever phpbb uses. (I thought it was MySQL but I don't know for sure.)
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

zompist wrote: Sun Mar 15, 2020 9:25 pm
bradrn wrote: Sun Mar 15, 2020 8:22 pm
Ser wrote: Sun Mar 15, 2020 8:04 pm Zompist, what is the nature of the database?
I’d really like to know this as well. But I have MySQL, PostgreSQL and sqlite all installed on my computer, so at least I should be able to cope with whatever database type gets used.
Didn't you just set up a phpbb installation? It's whatever phpbb uses. (I thought it was MySQL but I don't know for sure.)
Yes, I did set up a phpBB installation — but that doesn’t help, because phpBB can use many different databases. (My installation lists all of MySQL, PostgreSQL, SQLite, Firebird, MS SQL Server 2000 and Oracle as being supported.)
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
zompist
Site Admin
Posts: 2992
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: Posts from the old board

Post by zompist »

Also... though I keep intending to mess with this, I haven't yet. If I do, it'll be starting with a parser and working directly from the database dump. As you can read here, back when I was setting up this board, I tried to restore within phpbb from that dump, and it was a big disaster, so I don't trust processes like "reinstall phpbb multiple times from increasing version numbers."

But if one of you guys (brad, Ser, cedh) wants to see what you can do with the files, I'm willing to let you at the data, so long as you respect the security concerns I talked about above. PM me if so.
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

zompist wrote: Sun Mar 15, 2020 9:25 pmDidn't you just set up a phpbb installation? It's whatever phpbb uses. (I thought it was MySQL but I don't know for sure.)
The phpBB documentation says it supports six different types of databases: SQLite, MySQL / MariaDB, Microsoft SQL Server, Oracle, and PostgreSQL. So we don't know.

I'm pretty curious about how you made the dump... Does phpBB come with an admin button for that?

I think we could figure it out just from looking at the first 20 lines or so of the dump script. Because it's a big file, you want to just look at the first few lines and then immediately close it, not loading the rest. While in the same directory as the dump file, you can do this for the first 20 lines on Linux from the command line with:

Code: Select all

head -n 20 the-big-dump-script.sql
On Windows Powershell version 2 or 3 (on Windows 7), you can do this with:

Code: Select all

Get-Content the-big-dump-script.sql -TotalCount 20
or, if you have Windows Powershell version 3 (Windows 8 or up for sure), you can also do this with:

Code: Select all

Get-Content the-big-dump-script.sql -Head 20
This is what bradrn's test dump, which is from a MySQL database, looks like:

Code: Select all

-- phpMyAdmin SQL Dump
-- version 5.0.1
-- https://www.phpmyadmin.net/
--
-- Host: localhost
-- Generation Time: Mar 06, 2020 at 01:27 PM
-- Server version: 8.0.19
-- PHP Version: 7.4.3

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET AUTOCOMMIT = 0;
START TRANSACTION;
SET time_zone = "+00:00";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
All that /*!40101*/ formatting there indicates it's a MySQL / MariaDB dump. Meanwhile, a tiny SQLite database dump I just made shows:

Code: Select all

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE entries(
    en_id INTEGER PRIMARY KEY,
    en_tm TEXT NOT NULL,
    en_mg TEXT NOT NULL,
    en_nt TEXT NOT NULL,
    en_kn INTEGER NOT NULL);
INSERT INTO entries VALUES(1,'talo','s. finger','',0);
INSERT INTO entries VALUES(2,'yuus','s. hand','',0);
CREATE TABLE term_chars(
    tmc_id INTEGER PRIMARY KEY,
    tmc_ch TEXT UNIQUE);
INSERT INTO term_chars VALUES(1,'t');
COMMIT;
As you can see, dump formats are pretty different from each other already in the first few lines.
zompist wrote: Sun Mar 15, 2020 9:37 pmAlso... though I keep intending to mess with this, I haven't yet. If I do, it'll be starting with a parser and working directly from the database dump. As you can read here, back when I was setting up this board, I tried to restore within phpbb from that dump, and it was a big disaster, so I don't trust processes like "reinstall phpbb multiple times from increasing version numbers."

But if one of you guys (brad, Ser, cedh) wants to see what you can do with the files, I'm willing to let you at the data, so long as you respect the security concerns I talked about above. PM me if so.
The point behind me and bradrn's discussion here was to prevent us from ever having to deal with ZBBers' private data. The idea was to provide you with a script so that you would load the dump on a bare database alone (no phpBB), run the script, and provide us with this new second dump where all private data is absent. (This is a lot easier and less error-prone than trying to edit the database dump itself.) Our only detail missing now is the database software the dump works with.
Last edited by Kuchigakatai on Sun Mar 15, 2020 9:57 pm, edited 1 time in total.
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

Ser wrote: Sun Mar 15, 2020 9:48 pm
zompist wrote: Sun Mar 15, 2020 9:25 pmDidn't you just set up a phpbb installation? It's whatever phpbb uses. (I thought it was MySQL but I don't know for sure.)
The phpBB documentation says it supports six different types of databases: SQLite, MySQL / MariaDB, Microsoft SQL Server, Oracle, and PostgreSQL. So we don't know.

I'm pretty curious about how you made the dump... Does phpBB come with an admin button for that?
I made the dump with the phpMyAdmin software. You have to download it separately, but it connects to a running phpBB instance and allows you to do things such as e.g. creating dumps.

(Not sure if you figured that out already or not, since it looks like you managed to make your own SQLite dump…)
zompist wrote: Sun Mar 15, 2020 9:37 pmAlso... though I keep intending to mess with this, I haven't yet. If I do, it'll be starting with a parser and working directly from the database dump. As you can read here, back when I was setting up this board, I tried to restore within phpbb from that dump, and it was a big disaster, so I don't trust processes like "reinstall phpbb multiple times from increasing version numbers."

But if one of you guys (brad, Ser, cedh) wants to see what you can do with the files, I'm willing to let you at the data, so long as you respect the security concerns I talked about above. PM me if so.
The point behind me and bradrn's discussion here was to prevent us from ever having to deal with ZBBers' private data. The idea was to provide you with a script so that you would load the dump on a bare database alone (no phpBB), run the script, and provide us with this new second dump where all private data is absent. (This is a lot easier and less error-prone than trying to edit the database dump itself.) Our only detail missing now is the database software the dump works with.
I agree. I did send a PR to zompist for the data, so if he’s OK with it I might run those queries and post only that data online.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
zompist
Site Admin
Posts: 2992
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: Posts from the old board

Post by zompist »

The first few lines look like this:

#
# phpBB Backup Script
# Dump of tables for phpbb_
# DATE : 04-07-2018 08:55:37 GMT
#
# Table: phpbb_acl_groups
DROP TABLE IF EXISTS phpbb_acl_groups;
CREATE TABLE `phpbb_acl_groups` (
`group_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`forum_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`auth_option_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`auth_role_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`auth_setting` tinyint(2) NOT NULL DEFAULT '0',
KEY `group_id` (`group_id`),
KEY `auth_opt_id` (`auth_option_id`),
KEY `auth_role_id` (`auth_role_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;


Followed directly by the INSERT statement for that table, and so on.
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

zompist wrote: Sun Mar 15, 2020 10:08 pm The first few lines look like this:

#
# phpBB Backup Script
# Dump of tables for phpbb_
# DATE : 04-07-2018 08:55:37 GMT
#
# Table: phpbb_acl_groups
DROP TABLE IF EXISTS phpbb_acl_groups;
CREATE TABLE `phpbb_acl_groups` (
`group_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`forum_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`auth_option_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`auth_role_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`auth_setting` tinyint(2) NOT NULL DEFAULT '0',
KEY `group_id` (`group_id`),
KEY `auth_opt_id` (`auth_option_id`),
KEY `auth_role_id` (`auth_role_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;


Followed directly by the INSERT statement for that table, and so on.
Wikipedia says that ‘MyISAM was the default storage engine for … MySQL … versions prior to 5.5 released in December 2009’ — it’s still available, but needs to be enabled explicitly, as it is in your dump. So it looks like this is MySQL.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

zompist wrote: Sun Mar 15, 2020 10:08 pmENGINE=MyISAM
Well, that's using the old MyISAM engine, so that's definitely MySQL.

Alright, just give us a little more time to agree on a whole script... Are you on Windows or Linux? (EDIT: or Mac...?)
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Posts from the old board

Post by Kuchigakatai »

Actually, this procedure should be pretty platform-agnostic... bradrn, do I get your approval for the following? (I've also included screenshots of me doing the whole of the following on your small sample phpbb database...)

Step 1: Recreating the database

I think we can assume the database isn't recreated in zompist's computer through MySQL (or MariaDB) yet. As we saw in the first few lines zompist posted of his dumpfile, there are not any calls at the beginning to set up the database, but it starts right away with the creation of tables and the insertion of data. This means we're going to begin by setting up the database to be used ("phpbb").

I think we can slightly adapt the setup of bradrn's sample phpBB database for our purposes. The only change would be the unicode format, from bradrn's "utf8mb4" to "utf8", and from "utf8mb4_0900_ai_ci" to "utf8_general_ci". This is because "utf8mb4" is a new thing in MySQL 5.5 (from Dec. 2010), and in zompist's first lines we can see that his huge dumpfile expects "utf8": CHARSET=utf8 COLLATE=utf8_bin. So I'm using the old defaults.

So... a file is made, called prep-phpbb.sql, with these contents:

Code: Select all

/* prep-phpbb.sql */

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET AUTOCOMMIT = 0;
SET time_zone = "+00:00";

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;

CREATE DATABASE IF NOT EXISTS `phpbb` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
The script is run to prepare the database phpbb. I highly recommend running this command and the other ones below on Powershell if you're on Windows, to avoid destroying Unicode data due to cmd.exe's pipe shenanigans involving text. (screenshot beginning, screenshot end)

Code: Select all

mysql -v -u root -p < prep-phpbb.sql


The huge dumpfile is then run to recreate the database in whatever computer zompist is using: (screenshot beginning, middle, end)

Code: Select all

mysql -v -u root -p phpbb < the-big-dump-script.sql
This program call will take a long while, in which zompist will be twiddling his thumbs for a good hour or two watching the walls of SQL script get run line by line. Note that the -v option is the "verbose" option, which shows the SQL as it is run. It may be omitted if you want to see nothing (unless for some reason you hit an error): mysql -u root -p phpbb < the-big-dump-script.sql



Step 2: Making the stripped database

We run the following script to create the reduced tables with all personal data stripped off. The script begins by dropping the entirety of the laughable phpBB search data to make disk space for the reduced tables containing the publishable data, and then makes new tables copying what we want. The data inside posts is obviously huge, so we could expect the bradser_posts to expand the database size on disk by perhaps 25%-40% (but then the dump will only include these reduced tables).

Make a file called prep-bradser.sql with the following:

Code: Select all

/* prep-bradser.sql */

/* make space by dropping all the search data */
DROP TABLE phpbb_search_results, phpbb_search_wordlist, phpbb_search_wordmatch;

/* create the tables with personal data stripped off */
CREATE TABLE bradser_users AS SELECT user_id, username FROM phpbb_users WHERE user_type <> 2;
CREATE TABLE bradser_forums AS SELECT forum_id, forum_name, forum_desc, forum_topics FROM phpbb_forums;
CREATE TABLE bradser_topics AS SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
CREATE TABLE bradser_attachments AS SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
CREATE TABLE bradser_posts AS SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, post_subject, post_text, post_attachment FROM phpbb_posts;
And run this script the following way: (screenshot beginning, beginning2, end)

Code: Select all

mysql -v -u root -p phpbb < prep-bradser.sql
Needless to say, it will take a good while.



Finally, a cleaned dump called bradser-dump.sql is created with mysqldump: (screenshot whole)

Code: Select all

mysqldump --opt phpbb bradser_users bradser_forums bradser_topics bradser_posts bradser_attachments > bradser-dump.sql
I'm not sure what zompist meant in the first post when he said that the unzipped dump is 350 MiB but the "database itself" is 850 MiB, but I estimate the final bradser-dump.sql cleaned datafile might be around 50% of the size of whichever figure is the relevant case (so maybe around 150-200 MiB or 400-500 MiB). (When I tried all this on bradrn's sample, I got 4% of the dump SQL script, but bradrn's tiny sample is unrepresentative...)
bradrn
Posts: 6324
Joined: Fri Oct 19, 2018 1:25 am

Re: Posts from the old board

Post by bradrn »

Ser wrote: Mon Mar 16, 2020 12:48 am Actually, this procedure should be pretty platform-agnostic... bradrn, do I get your approval for the following? (I've also included screenshots of me doing the whole of the following on your small sample phpbb database...)
Thanks for writing this up Ser! I’m not sure I know enough about MySQL to review this thoroughly, but I don’t see anything obviously bad (although I probably wouldn’t know what’s ‘obviously bad’ anyway).
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Post Reply