I just updated the Dev Sneak Peak from version 1.2 to 1.2.5. While not a significant jump in version number, the one major feature, Permalinks, was very complicated to implement. A Permalink, is a link that takes you back to a specific piece of content on a web site… for instance on a blogging web site a Permalink would take you to a specific blog post. So a Permalink to the Comic Book UPC db maintenance tool would take you to a specific series and issue.
This gets a little technical so you might want to skip the rest of this post and just try out the functionality (see next paragraph below). I also have included a follow up blog post about the how to calculate the record offset and the SQL used to implement it.
For example to go to issue 100 of the Fantastic Four (I created this Permalink after having searched the series by a publisher of Marvel and a series title starting with Fanta, and having filtered the list of issues so that only regular [i.e., not annuals or trade paper back book, etc.] are displayed, and having sorted the issues by UPC code in descending order) click on the following link… Yes I know its long, but all the ‘gook’ after the appl.php part of the URL is the application’s state at the time the Permalink was created (i.e., the data to get you back to where you were when you created the Permalink).
While simple in concept, the complication in implementing Permalinks arises because the Comic Book UPC db web site is an AJAX based web application and that it uses paged record sets, that and the fact that the data is/will be constantly changing. In implementing the Permalink functionality I wanted to be able to restore the applications state (what publisher, series title, issue filter, etc. that the user had used) to exactly what the user had when they generated the Permalink.
But the real difficulty arose when trying to calculate the record offsets for the series and issue the user had selected. FYI, the record offset is used by the applications paging logic to determine which page of the data to display. For example, if the user had selected the 10th issue on 3rd page and there are 25 records on a page, the record offset for the selected issue would be 25 x 3 + 10 – 1 = 84 (offset values start at 0, hence the minus 1).
Knowing the record offset then allows us to calculate the starting offset for the page containing the selected issue. To do with we would first calculate the offset of the selected issue from the start of the page, this is simply the offset modulo the number of records per page (the modulo operator is the remainder of X divided by Y, e.g., 43 divided by 20 is 2 reminder 3, so 43 modulo 20 is 3). So in our example that would be 84 modulo 25 is 9. The starting offset for the page is then the offset minus the offset of the selected from the start of the page, in this case 84 -9 = 75 (this makes sense as we were on the 3rd page and there are 25 records per page).
The application then uses the page starting offset of 75, to display the 3rd page of data and the offset of the selected issue from the start of the page of 9 to select the 10th (counting starting from 0) issue on the 3rd page. Thus selecting the same record the user had selected when they created the Permalink.
So you say, whats so difficult about that, just have the application store the starting offset for the page and the record offset from the beginning of the page in the Permalink, as it surely knows this information at the time the user creates the Permalink. Ah, but the data in the application is constantly changing, new records are continually being added and deleted in the database. So the number of issues that come before the selected issue could change from the time the user creates the Permalink and when they return to the web site by clicking the Permalink. That and the fact that depending on how the user has the list of issues sorted also changes where it appears in the list of issues. For example if the user has issue 10 of the Fantastic Four selected and there are 200 issues for this series (assuming consecutive issues) if he has sorted by issue number in ascending sequence, then he has the 10th record (offset of 9) selected on the first page (assuming 25 records per page). But if he has sorted by issue number in descending order and has issue number 10 selected, its offset is 189, its offset from the start of the page is 14 and the page starting offset is 175!
This is probably making you head spin, as it did mine when I first tried to implement Permalinks. The simple solution would have been when the user clicks the Permalink, to issue an SQL select statement using the sort order that the user had specified (previously when they created the Permalink) and retrieve all the records less than the record previously selected. The process would then be to iterate through the records returned, incrementing a counter for each record.
But that would mean if the user had selected the 173,301 record in the list, that we’d have to loop through 173,300 records and count them!! To see the technical solution and SQL continue reading this blog post.
Part 2
See Part 1 of this article How to determine the Record Offset given a primary key and a defined order by clause – Part 1
To summarize where we left off in part 1, our user sorted our table by age descending and selected 16th record on the 5th page (25 records per page). The record has on offset of 140. The user book marks the record. 150 records are added (100 of them with an age less than 35) and 50 records are deleted (30 of them with an age less than 35). For a net change of 70 records with an age less than 35, resulting in a offset of 210.
So the question is how do we NOW determine the offset of the record that was book marked ONE week ago after the table has been updated?
One way we could do this is to execute the query again with the same sort and then loop through the records, counting them until we find the record with the primary key id the user book marked. Here’s our user’s book mark URL
http://www.example.com/pagedgui.php?id=48&sort1=age&dir1=desc
$pk_id = $_GET['id']
$sort = $_GET['sort1'] . ' ' . $_GET['dir1']
$sql = 'select * from students order by ' . $sort
mysql_execute_query($sql);
$offset = 0
for each record
if the primary key id of the current record == $pk_id then
exit for
end if
$offset++
end for each
When the loop exits $offset will have the offset of the record we’re looking for. Its kind of inefficient, but its simple and straight forward to implement. But what if you have a table with 300K plus records? And what if you have lots of users all accessing the web site at the same time? This probably isn’t the best solution.
Wait a minute! Doesn’t SQL have a count(*) function, can’t we use that? Yes and No. Yes we could use it, but not without doing some extra work as the only thing we have is the primary key id of the book marked record. So this would require us to lookup the value of the age of the record tat was book marked.
$pk_id = $_GET['id']
$sort = $_GET['sort1'] . ' ' . $_GET['dir1']
$sql = 'select age from students where id = ' . $pk_id
mysql_execute_query($sql);
// returns a value of 35, see the previous post
$age = age of the record returned
// count all records with an age greater than 35 (descending sort here)
$sql = 'select count(*) as rec_count from students where age < ' . $age .
' order by ' . $sort
mysql_execute_query($sql);
$offset = rec_count of the record return
This certainly is an improvement in efficiency over the previous solution. But what if there are several records in the table with the age of 35, this will exclude more than just the record with the primary key id of 48. But what if we make it >= (greater than or equal)? That wont work either for it may include some records that shouldn’t be included… in fact, we need a secondary sort, perhaps name.
select count(*) as rec_count from students where age >= $age and name >= $name order by age desc, name desc
Ok, but what if there are two people with the same name and the same age? We have to add the primary key to the sort as well. In fact to make the above query work the primary key always has to be included in the sort to ensure that there is a guaranteed unique order.
select count(*) as rec_count from students where age >= $age and name >= $name and id > $pk_id order by age desc, name desc, id desc
That works (note: no two records can have the same primary key id, as it must be unique), but what if some one enters a record into the table that doesn’t have a age? NULL values are a fact that we have to deal with. So how about…
select count(*) as rec_count from students where (age is not null and $age is not null and age >= $age) or (age is not null and $age is null) order by age desc
Since NULL values sort after non Null values in a descending sort (at least in MySQL and before non NULL values in an ascending sort) if the age of the record being compared to is not null and the age of the record for the primary key pass in is NULL, then include the record in the count. We’ve also had to add not null clauses to the >= comparison too. And I’ve also simplified this first pass at the query so you can see whats going on. If we were to also add in similar logic for the name field and add back in the final check for the primary key id, things really start to get messy quick.
select count(*) as rec_count from students where
(age is not null and $age is not null and age > $age) or
(age is not null and $age is not null and age = $age and name is not null
and $name is not null and name > $name) or
(age is not null and $age is not null and age = $age and name is not null
and $name is not null and name = $name and id > $pk_id) or
(age is not null and $age is null) or
(age is null and $age is null and name is not null and $name is not null
and name > $name) or
(age is null and $age is null and name is not null and $name is not null
and name = $name and id > $pk_id) or
(age is null and $age is null and name is not null and $name is null) or
(age is null and $age is null and name is null and $name is null
and id > $pk_id)
order by age desc, name desc, id desc
YUCK! We can simplify this a little as follows…
select count(*) as rec_count from students where (age is not null and ($age is not null and age > $age or (name is not null and $name is not null and ((name > $name) or (name = $name and id > $pk_id))))) or ($age is null and (age is not null or (age is null and (($name is not null and name is not null and ((name > $name) or (name = $name and id > $pk_id))) or ($name is null and ((name is not null) or (name is null and id > $pk_id))))))) order by age desc, name desc, id desc
But its no great improvement as it is on the complex side and this is only for 2 sort fields and the primary key!! Can you imagine what the SQL statement would look like for 7 or 8 sort fields? And what are the chances you could write the SQL statement without making any errors? There must be a better way…
See my next post, How to determine the Record Offset given a primary key and a defined order by clause – Part 3, for the answer.
Part 1
For a little background on my issue and why I developed this solution see my previous post on Permalinks.
The problem:
I have a SQL SELECT statement with ORDER BY, OFFSET and LIMIT clauses. I take the result of the query and use it to present a paged record set via a GUI. After the users pages through the records (i.e., by clicking the GUI’s next page button to get the next set of records in the record set) and gets to the page that contains the record they are interested in, they can then select that record and “book mark” it. What I mean by “book mark” is they click a button and a static URL is generated so that they can later return to the URL and be right back at the same record. Additionally, the user can sort the records any way they want to, i.e., by filed A ascending or by field A descending and field B ascending or by field C ascending, field D descending, field A ascending and field B Descending, etc. Various filters can be applied to the record set in WHERE clause of the query as well.
The Solution:
When the user returns to the static URL presented above, the program needs to determine the offset of the previously selected record in the record set (as identified by its primary key ID in the static URL) when the same sort defined by the user (also in the static URL) is applied to the query. This will then allow us to determine 1) the offset of the first record on the same page as the selected record and 2) to determine the offset relative to the first record on the page. This needs to be done at the time the user returns to the static URL as additional records may be added to the record set or records may be deleted from the record set, i.e., there is no way to determine the offset of the selected record ahead of time because the number of records in the data set is dynamic and constantly changing. An example may help clarify things a bit.
Lets say on day 1 there are 300 records in our table and we display 25 records a page for a total of 12 pages. The offset of the first record on the first page would be 0 and the offset of the last record on the first page would be 24. The next page the first record would have an offset of 25, the last record, 49 and so on. Additionally besides the primary key, ID, our table has a name field and a age field.
Our user sorts the record set by the age field in descending order, navigates to the 5th page, selects the 16th record from the top and “book marks” it. The offset of the first record on the page is 25 x 5 – 1 = 124, the offset of the selected record is 25 X 5 + 16 – 1 = 140 and its offset from the top of the page is 140 modulo 25 = 15. This record has a primary key ID of 48, a name of Fred and an age of 35. The book marked static URL would contain the primary key id, the field(s) the user sorted the record set by and the direction of the sort (ascending or descending). The URL may look something like this:
http://www.example.com/pagedgui.php?id=48&sort1=age&dir1=desc
In the week that follows 150 records are added to the table and 50 records are deleted, for a total of 400 records (or 16 pages). 100 records with an age less than 35 were added, 30 records with an age less than 35 were deleted, with the remainder of the records added and deleted having an age greater than 35. The net results that when sorted by age, the record with a primary key id of 48 will now have an offset of 210 (140 + 100 – 30). It will now have an offset from the top of the page of 10 (the 11th record on the page… 210 modulo 25) and the first record on the page will have an offset of 200 (210 – 10 or 210 divided by 25 is 8 with a remainder of 10, then 8 x 25 = 200).
The following week our user returns by clicking on the URL. The real question is how do we NOW determine the offset of the record that he book marked ONE week ago?
For one possible solution see this post for Determine selected record offset for any ORDER BY sequence.
Or continue reading with How to determine the Record Offset given a primary key and a defined order by clause – Part 2
F0r those of you who are interested… here is the database layout. Its kind of hard to read without the table boarders, but I couldn’t convince WordPress to let me override the style on the table. I’ll post it in separate file in another day or so.
| INDB_Source (unique on name; unique on base_url) | |||||
| indb_src_id | int | non null, PK | |||
| name | text | non null | name of the internet database source | ||
| base_url | text | non null | base url to used when creating absolute urls | ||
| Publisher (unique on name) | |||||
| pub_id | int | non null, PK | |||
| parent_pub_id | int | populated for imprints and brands only, null for publishers | |||
| type | text | 1 | non null | record type (default of P)
P = Publisher, B = Brand, I = Imprint |
|
| name | text | publisher name | |||
| desc | text | if not a publisher a description of the brand or imprint | |||
| compay_prefix | text | the first 6 to 9 digits of the UPC code identifying the publisher | |||
Ok, I’d like some input on what data to track for publishers, series and issues… I think to meet the end goal which is to offer a web service / REST service to comic book software, that we’ll need to have fields commonly carried by these programs.
If you could, list the name of the program, the fields it carries and which ones you find useful and what ones you would like to see.
Thanks.

Follow me on Twitter
Recent Comments