Actually, this procedure should be pretty platform-agnostic... bradrn, do I get your approval for the following? (I've also included screenshots of me doing the whole of the following on your small sample phpbb database...)
Step 1: Recreating the database
I think we can assume the database isn't recreated in zompist's computer through MySQL (or MariaDB) yet. As we saw in the first few lines zompist posted of his dumpfile, there are not any calls at the beginning to set up the database, but it starts right away with the creation of tables and the insertion of data. This means we're going to begin by setting up the database to be used ("phpbb").
I think we can slightly adapt the setup of bradrn's sample phpBB database for our purposes. The only change would be the unicode format, from bradrn's "utf8mb4" to "utf8", and from "utf8mb4_0900_ai_ci" to "utf8_general_ci". This is because "utf8mb4" is a new thing in MySQL 5.5 (from Dec. 2010), and in zompist's first lines we can see that his huge dumpfile expects "utf8":
CHARSET=utf8 COLLATE=utf8_bin. So I'm using the old defaults.
So... a file is made, called
prep-phpbb.sql, with these contents:
Code: Select all
/* prep-phpbb.sql */
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET AUTOCOMMIT = 0;
SET time_zone = "+00:00";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE DATABASE IF NOT EXISTS `phpbb` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
The script is run to prepare the database
phpbb.
I highly recommend running this command and the other ones below on Powershell if you're on Windows, to avoid destroying Unicode data due to cmd.exe's pipe shenanigans involving text. (
screenshot beginning,
screenshot end)
Code: Select all
mysql -v -u root -p < prep-phpbb.sql
The huge dumpfile is then run to recreate the database in whatever computer zompist is using: (
screenshot beginning,
middle,
end)
Code: Select all
mysql -v -u root -p phpbb < the-big-dump-script.sql
This program call will take a long while, in which zompist will be twiddling his thumbs for a good hour or two watching the walls of SQL script get run line by line. Note that the
-v option is the "verbose" option, which shows the SQL as it is run. It may be omitted if you want to see nothing (unless for some reason you hit an error):
mysql -u root -p phpbb < the-big-dump-script.sql
Step 2: Making the stripped database
We run the following script to create the reduced tables with all personal data stripped off. The script begins by dropping the entirety of the laughable phpBB search data to make disk space for the reduced tables containing the publishable data, and then makes new tables copying what we want. The data inside posts is obviously huge, so we could expect the
bradser_posts to expand the database size on disk by perhaps 25%-40% (but then the dump will
only include these reduced tables).
Make a file called
prep-bradser.sql with the following:
Code: Select all
/* prep-bradser.sql */
/* make space by dropping all the search data */
DROP TABLE phpbb_search_results, phpbb_search_wordlist, phpbb_search_wordmatch;
/* create the tables with personal data stripped off */
CREATE TABLE bradser_users AS SELECT user_id, username FROM phpbb_users WHERE user_type <> 2;
CREATE TABLE bradser_forums AS SELECT forum_id, forum_name, forum_desc, forum_topics FROM phpbb_forums;
CREATE TABLE bradser_topics AS SELECT topic_id, forum_id, topic_title, topic_poster, topic_time, topic_views, topic_replies, topic_first_post_id, topic_last_post_id, topic_first_poster_name, topic_last_poster_name, topic_last_post_subject, topic_last_post_time FROM phpbb_topics;
CREATE TABLE bradser_attachments AS SELECT attach_id, post_msg_id, is_orphan, physical_filename, real_filename, download_count, attach_comment, extension, mimetype FROM phpbb_attachments;
CREATE TABLE bradser_posts AS SELECT post_id, topic_id, poster_id, post_time, enable_bbcode, enable_smilies, enable_magic_url, bbcode_bitfield, bbcode_uid, post_subject, post_text, post_attachment FROM phpbb_posts;
And run this script the following way: (
screenshot beginning,
beginning2,
end)
Code: Select all
mysql -v -u root -p phpbb < prep-bradser.sql
Needless to say, it will take a good while.
Finally, a cleaned dump called
bradser-dump.sql is created with mysqldump: (
screenshot whole)
Code: Select all
mysqldump --opt phpbb bradser_users bradser_forums bradser_topics bradser_posts bradser_attachments > bradser-dump.sql
I'm not sure what zompist meant in the first post when he said that the unzipped dump is 350 MiB but the "database itself" is 850 MiB, but I estimate the final bradser-dump.sql cleaned datafile might be around 50% of the size of whichever figure is the relevant case (so maybe around 150-200 MiB or 400-500 MiB). (When I tried all this on bradrn's sample,
I got 4% of the dump SQL script, but bradrn's tiny sample is unrepresentative...)