Just wanted to post something on the blog here about the lack of recent activity… We’ve recently have brought home a new puppy. Not only is he quite fun, but he’s also a handful!
As such I haven’t had a whole lot of time at night to work on anything for the web site. Not only no time, but since the puppy is getting me up at least once during the night and up at 7:00 AM every day (no sleeping in on Saturdays and Sundays) I’m really tired all the time.
Here’s the little dickens…
Part 3
See Parts 1 & 2 of this article How to determine the Record Offset given a primary key and a defined order by clause – Part 1 and How to determine the Record Offset given a primary key and a defined order by clause – Part 2
Work in progress… check back soon.
I just updated the Dev Sneak Peak from version 1.2 to 1.2.5. While not a significant jump in version number, the one major feature, Permalinks, was very complicated to implement. A Permalink, is a link that takes you back to a specific piece of content on a web site… for instance on a blogging web site a Permalink would take you to a specific blog post. So a Permalink to the Comic Book UPC db maintenance tool would take you to a specific series and issue.
This gets a little technical so you might want to skip the rest of this post and just try out the functionality (see next paragraph below). I also have included a follow up blog post about the how to calculate the record offset and the SQL used to implement it.
For example to go to issue 100 of the Fantastic Four (I created this Permalink after having searched the series by a publisher of Marvel and a series title starting with Fanta, and having filtered the list of issues so that only regular [i.e., not annuals or trade paper back book, etc.] are displayed, and having sorted the issues by UPC code in descending order) click on the following link… Yes I know its long, but all the ‘gook’ after the appl.php part of the URL is the application’s state at the time the Permalink was created (i.e., the data to get you back to where you were when you created the Permalink).
While simple in concept, the complication in implementing Permalinks arises because the Comic Book UPC db web site is an AJAX based web application and that it uses paged record sets, that and the fact that the data is/will be constantly changing. In implementing the Permalink functionality I wanted to be able to restore the applications state (what publisher, series title, issue filter, etc. that the user had used) to exactly what the user had when they generated the Permalink.
But the real difficulty arose when trying to calculate the record offsets for the series and issue the user had selected. FYI, the record offset is used by the applications paging logic to determine which page of the data to display. For example, if the user had selected the 10th issue on 3rd page and there are 25 records on a page, the record offset for the selected issue would be 25 x 3 + 10 – 1 = 84 (offset values start at 0, hence the minus 1).
Knowing the record offset then allows us to calculate the starting offset for the page containing the selected issue. To do with we would first calculate the offset of the selected issue from the start of the page, this is simply the offset modulo the number of records per page (the modulo operator is the remainder of X divided by Y, e.g., 43 divided by 20 is 2 reminder 3, so 43 modulo 20 is 3). So in our example that would be 84 modulo 25 is 9. The starting offset for the page is then the offset minus the offset of the selected from the start of the page, in this case 84 -9 = 75 (this makes sense as we were on the 3rd page and there are 25 records per page).
The application then uses the page starting offset of 75, to display the 3rd page of data and the offset of the selected issue from the start of the page of 9 to select the 10th (counting starting from 0) issue on the 3rd page. Thus selecting the same record the user had selected when they created the Permalink.
So you say, whats so difficult about that, just have the application store the starting offset for the page and the record offset from the beginning of the page in the Permalink, as it surely knows this information at the time the user creates the Permalink. Ah, but the data in the application is constantly changing, new records are continually being added and deleted in the database. So the number of issues that come before the selected issue could change from the time the user creates the Permalink and when they return to the web site by clicking the Permalink. That and the fact that depending on how the user has the list of issues sorted also changes where it appears in the list of issues. For example if the user has issue 10 of the Fantastic Four selected and there are 200 issues for this series (assuming consecutive issues) if he has sorted by issue number in ascending sequence, then he has the 10th record (offset of 9) selected on the first page (assuming 25 records per page). But if he has sorted by issue number in descending order and has issue number 10 selected, its offset is 189, its offset from the start of the page is 14 and the page starting offset is 175!
This is probably making you head spin, as it did mine when I first tried to implement Permalinks. The simple solution would have been when the user clicks the Permalink, to issue an SQL select statement using the sort order that the user had specified (previously when they created the Permalink) and retrieve all the records less than the record previously selected. The process would then be to iterate through the records returned, incrementing a counter for each record.
But that would mean if the user had selected the 173,301 record in the list, that we’d have to loop through 173,300 records and count them!! To see the technical solution and SQL continue reading this blog post.
Part 2
See Part 1 of this article How to determine the Record Offset given a primary key and a defined order by clause – Part 1
To summarize where we left off in part 1, our user sorted our table by age descending and selected 16th record on the 5th page (25 records per page). The record has on offset of 140. The user book marks the record. 150 records are added (100 of them with an age less than 35) and 50 records are deleted (30 of them with an age less than 35). For a net change of 70 records with an age less than 35, resulting in a offset of 210.
So the question is how do we NOW determine the offset of the record that was book marked ONE week ago after the table has been updated?
One way we could do this is to execute the query again with the same sort and then loop through the records, counting them until we find the record with the primary key id the user book marked. Here’s our user’s book mark URL
http://www.example.com/pagedgui.php?id=48&sort1=age&dir1=desc
$pk_id = $_GET['id']
$sort = $_GET['sort1'] . ' ' . $_GET['dir1']
$sql = 'select * from students order by ' . $sort
mysql_execute_query($sql);
$offset = 0
for each record
if the primary key id of the current record == $pk_id then
exit for
end if
$offset++
end for each
When the loop exits $offset will have the offset of the record we’re looking for. Its kind of inefficient, but its simple and straight forward to implement. But what if you have a table with 300K plus records? And what if you have lots of users all accessing the web site at the same time? This probably isn’t the best solution.
Wait a minute! Doesn’t SQL have a count(*) function, can’t we use that? Yes and No. Yes we could use it, but not without doing some extra work as the only thing we have is the primary key id of the book marked record. So this would require us to lookup the value of the age of the record tat was book marked.
$pk_id = $_GET['id']
$sort = $_GET['sort1'] . ' ' . $_GET['dir1']
$sql = 'select age from students where id = ' . $pk_id
mysql_execute_query($sql);
// returns a value of 35, see the previous post
$age = age of the record returned
// count all records with an age greater than 35 (descending sort here)
$sql = 'select count(*) as rec_count from students where age < ' . $age .
' order by ' . $sort
mysql_execute_query($sql);
$offset = rec_count of the record return
This certainly is an improvement in efficiency over the previous solution. But what if there are several records in the table with the age of 35, this will exclude more than just the record with the primary key id of 48. But what if we make it >= (greater than or equal)? That wont work either for it may include some records that shouldn’t be included… in fact, we need a secondary sort, perhaps name.
select count(*) as rec_count from students where age >= $age and name >= $name order by age desc, name desc
Ok, but what if there are two people with the same name and the same age? We have to add the primary key to the sort as well. In fact to make the above query work the primary key always has to be included in the sort to ensure that there is a guaranteed unique order.
select count(*) as rec_count from students where age >= $age and name >= $name and id > $pk_id order by age desc, name desc, id desc
That works (note: no two records can have the same primary key id, as it must be unique), but what if some one enters a record into the table that doesn’t have a age? NULL values are a fact that we have to deal with. So how about…
select count(*) as rec_count from students where (age is not null and $age is not null and age >= $age) or (age is not null and $age is null) order by age desc
Since NULL values sort after non Null values in a descending sort (at least in MySQL and before non NULL values in an ascending sort) if the age of the record being compared to is not null and the age of the record for the primary key pass in is NULL, then include the record in the count. We’ve also had to add not null clauses to the >= comparison too. And I’ve also simplified this first pass at the query so you can see whats going on. If we were to also add in similar logic for the name field and add back in the final check for the primary key id, things really start to get messy quick.
select count(*) as rec_count from students where
(age is not null and $age is not null and age > $age) or
(age is not null and $age is not null and age = $age and name is not null
and $name is not null and name > $name) or
(age is not null and $age is not null and age = $age and name is not null
and $name is not null and name = $name and id > $pk_id) or
(age is not null and $age is null) or
(age is null and $age is null and name is not null and $name is not null
and name > $name) or
(age is null and $age is null and name is not null and $name is not null
and name = $name and id > $pk_id) or
(age is null and $age is null and name is not null and $name is null) or
(age is null and $age is null and name is null and $name is null
and id > $pk_id)
order by age desc, name desc, id desc
YUCK! We can simplify this a little as follows…
select count(*) as rec_count from students where (age is not null and ($age is not null and age > $age or (name is not null and $name is not null and ((name > $name) or (name = $name and id > $pk_id))))) or ($age is null and (age is not null or (age is null and (($name is not null and name is not null and ((name > $name) or (name = $name and id > $pk_id))) or ($name is null and ((name is not null) or (name is null and id > $pk_id))))))) order by age desc, name desc, id desc
But its no great improvement as it is on the complex side and this is only for 2 sort fields and the primary key!! Can you imagine what the SQL statement would look like for 7 or 8 sort fields? And what are the chances you could write the SQL statement without making any errors? There must be a better way…
See my next post, How to determine the Record Offset given a primary key and a defined order by clause – Part 3, for the answer.
Part 1
For a little background on my issue and why I developed this solution see my previous post on Permalinks.
The problem:
I have a SQL SELECT statement with ORDER BY, OFFSET and LIMIT clauses. I take the result of the query and use it to present a paged record set via a GUI. After the users pages through the records (i.e., by clicking the GUI’s next page button to get the next set of records in the record set) and gets to the page that contains the record they are interested in, they can then select that record and “book mark” it. What I mean by “book mark” is they click a button and a static URL is generated so that they can later return to the URL and be right back at the same record. Additionally, the user can sort the records any way they want to, i.e., by filed A ascending or by field A descending and field B ascending or by field C ascending, field D descending, field A ascending and field B Descending, etc. Various filters can be applied to the record set in WHERE clause of the query as well.
The Solution:
When the user returns to the static URL presented above, the program needs to determine the offset of the previously selected record in the record set (as identified by its primary key ID in the static URL) when the same sort defined by the user (also in the static URL) is applied to the query. This will then allow us to determine 1) the offset of the first record on the same page as the selected record and 2) to determine the offset relative to the first record on the page. This needs to be done at the time the user returns to the static URL as additional records may be added to the record set or records may be deleted from the record set, i.e., there is no way to determine the offset of the selected record ahead of time because the number of records in the data set is dynamic and constantly changing. An example may help clarify things a bit.
Lets say on day 1 there are 300 records in our table and we display 25 records a page for a total of 12 pages. The offset of the first record on the first page would be 0 and the offset of the last record on the first page would be 24. The next page the first record would have an offset of 25, the last record, 49 and so on. Additionally besides the primary key, ID, our table has a name field and a age field.
Our user sorts the record set by the age field in descending order, navigates to the 5th page, selects the 16th record from the top and “book marks” it. The offset of the first record on the page is 25 x 5 – 1 = 124, the offset of the selected record is 25 X 5 + 16 – 1 = 140 and its offset from the top of the page is 140 modulo 25 = 15. This record has a primary key ID of 48, a name of Fred and an age of 35. The book marked static URL would contain the primary key id, the field(s) the user sorted the record set by and the direction of the sort (ascending or descending). The URL may look something like this:
http://www.example.com/pagedgui.php?id=48&sort1=age&dir1=desc
In the week that follows 150 records are added to the table and 50 records are deleted, for a total of 400 records (or 16 pages). 100 records with an age less than 35 were added, 30 records with an age less than 35 were deleted, with the remainder of the records added and deleted having an age greater than 35. The net results that when sorted by age, the record with a primary key id of 48 will now have an offset of 210 (140 + 100 – 30). It will now have an offset from the top of the page of 10 (the 11th record on the page… 210 modulo 25) and the first record on the page will have an offset of 200 (210 – 10 or 210 divided by 25 is 8 with a remainder of 10, then 8 x 25 = 200).
The following week our user returns by clicking on the URL. The real question is how do we NOW determine the offset of the record that he book marked ONE week ago?
For one possible solution see this post for Determine selected record offset for any ORDER BY sequence.
Or continue reading with How to determine the Record Offset given a primary key and a defined order by clause – Part 2
- Went from a partial screen panel to a full screen view port to maximize the amount of real estate available for the application
- added additional issue detail panel on the bottom to view the details on the selected issue. This brings the number of panels up to 4
- the series panel on the left – this is where your series search results are displayed. Select one to see available issues.
- the series details panel on the upper right – this is where additional information about the series is displayed, such as notes about the series any information on continuity, etc.
- the issues panel on the center right – this is where the available issues for a given series are display. Select one to see additional info on it in the issue detail panel or to edit it.
- the issue details panel on the lower right – this is where additional information is displayed for the selected issue, such as notes on the issue, description on the variation, continuity information, cover thumbnail, etc.
- made all panels except for the issues panel collapsible. this means you can easily focus on the panel you’re dealing with and maximize its space
- Added the capability to filter issues by any one field, for example, only show the Annual issue for a given series. Or show all issues that have variants for a given series.
- Enhancements to the issues panel.
- Added new notes column that show any notes for the selected issue as well as any description for the particular variant.
- Added an Expand Notes button so that Notes column can be expanded or collapsed vertically to see all of the issue notes and/or variant description
- Added tool tip pop ups for the notes column. You can now hover over the notes field for a given issue and see it’s full value without having to select it or expand the notes field for all issues.
- Cleaned up the display of the issues grid by displaying most common issue type (Regular), variant (No Variant) and printing (1st printing) as blanks. This looks much cleaner as there is more white space.
- Added tool tips for Type, Variant and Printing Columns. Just hover over the column and the value for the Type Code or Variant Code will display in a tool tip for the issue. For printing a tool tip will only appear when the Printing field is blank in which case the tool tip will be ‘1st printing’. For example, hover over the Type Code of ‘Anl’ and a tool tip will be ‘Annual’.
- performed some general code factoring to make the code more maintainable.
Just finished pushing out an updated version of the Developer Sneak Peek … check it out – I think this part of the site is close to being done.
I’ll blog about the improvements later as I’m tired and want to go to sleep!
Ok over the last week here I’ve put in some more development effort and have updated the Dev Sneak Peek to version 1.1!
I’ve completed all of the tables in the database and have loaded the rest of the test data for the issues. Granted this is just test data and I cannot go live with it, but here are the stats…
- publishers – 100
- series (aka titles) – 20,401
- issues – 271,797
New functionality includes when you select a Title from the Series List on the left, the Series info is displayed on the top right and the list of issues for the selected Title is displayed on the lower right.
So what you ask? Good question… This screen represents what the main maintenance access screen will look like. You will come to this screen when you want to add a new issue or edit an existing one. I’ve still got to do some fine tuning of the Issue list, like formatting the columns, for example displaying the cover price as $2.95… There’s some other tweeking that I’d like to do to.
So try it out and leave a comment to this post to tell me what you think!
Update 2: Figured out the issue with IE 6, the dev sneak peek now works with IE. Apparently IE is slightly more picky when it comes to JavaScript Objects ending with a trailing comma than the other web browsers out there… go figure!! Also spruced up the dev sneak peak and added a couple of things here and there.
Update 1: Note, the dev sneak peak currently does not work in IE 6, not sure about IE 7 or 8 – some one try it and let me know. Works in FireFox, Chrome and Safari for Windows
I’ve finally gotten something that looks half way decent and works! Check out the new menu item for the ‘Dev Sneek Peek’ on the left on the main site or just click here: Comic Book UPC Database Dev Sneak Peek. Once the application launches, click on the help button for instructions. The Sneak Peek allows you to select the publisher, enter a partial title and perform a serach against the ‘demo’ data that I have loaded.
Please let me know what you think by sending feedback to the development team
Well I’ve spent the last 2 days creating some of the various database tables and importing some data for publishers and series (aka titles) from an external source. This proved trickier than expected due to the source of the data containing non-ASCII characters such as curly quotes, accented characters, ellipses, etc. (see my previous post on my Tumblr).
Ended up dropping the whole database after exporting the table definitions and some of the reference type data that I had entered via phpMyAdmin. Then I recreated the database using UTF8 as the character set. Still had some minor tool issues with the mysql command line program – using this to get the data into the MySQL database. What ended up fixing it was using the –default-character-set option to mysql.
mysql --default-character-set utf8 -u USERID -p cb_upc_db < data.sql




Follow me on Twitter
Recent Comments