New around here? When you sign up for a new account, please indicate why you want to join on the registration form so we can determine which applications are genuine and which are from spammers.
You must be 16 or over to participate in the Brickset forum. Please read the announcements and rules before you join.

Advanced query builder

HuwHuw
edited April 2012 in Brickset.com
I've mentioned that I was working on an 'advanced' means of querying the database in several threads. I now have something to look at and test, although it doesn't actually do any searching yet.

The user interface for creating ad-hoc queries is just about done and I'd like your feedback.

http://www.brickset.com/search/advanced/

You can create queries and see the resultant SQL that will, eventually, be executed (using a r/o database account, I hasten to add, to prevent SQL injection vulnerabilities)

You can specify queries to do things like

"show me all polybag sets released since 2007 that I don't own, order by year released, then number".
"Show me all sets with between 5 and 10 minifigs, order by number of minifigs, descending"
"Show me all sets in the themes Aquazone and Aquaraiders that cost less than $10, and order by year released"
"Show me all sets with an X in the name that were available from shop.LEGO.com in 2008" (useful, eh!)

that sort of thing, which can't of course be done at the moment.

Not every query will be possible because, to keep things simple, you can't specify how the terms should be bracketed. It assumes that you want OR terms bracketed together which in 95% of cases I believe you will do.

So, what you're testing is the UI for specifying the query and whether the SQL generated is what you're expecting. When that's done, I'll move on to saving and executing the query, then editing and sharing them with others.

There's a couple of things still to do, including providing guidance as to what type of data is valid in the text box (date, number, anything) and date validation.

LMKWYT...


Comments

  • Looks good. A couple of suggestions:
    1. Remove the second sort from default visibility and add "+" to add multiple sort levels. It may be confusing to see "select..." in the second row when its not required that you actually select anything.
    2. The "remove" criteria is missing from the last entry line.
    3. If you can't code the SQL for case insensitivity, you may want to add a disclaimer to users that they woiuld need an OR to check case variations (e.g. "Lego" vs "lego").

    Looking forward to seeing it run!
  • It looks really nice. I didnt mind the second row but I could see where it can be confusing. Although Its nice to be able to sort the results once you got them.
  • HuwHuw
    edited April 2012
    ^^

    1. Good idea
    2. It's supposed to be, otherwise you could remove everything and not be able to add anything more.
    3. The SQL Server is case-insensitive anyway, but it might be worth adding a note to that effect in the intro text.
  • ^ on #2 couldn't you have 'remove' on the last line as long as there is more than 1 entry? I noticed the same that I went to add another OR condition and decided I didn't want it and "removing" is more intuitive than setting the first box back to "select" for it to work since it complained that I hadn't selected options for the 2nd column.
  • I second @graphite. I understand the technical issue, but it's not user intuitive. Maybe suppress the "remove" if only one criteria line exists, and show it on all if multiple exist.
  • This is fantastic, the number of queries available is staggering, and it is working as I'd expect. What does "Retail - limited" mean?

    Despite the wonderful number of queries available, I'd like to ask about another :o) Is it possible to add in the existing Amazon Watch % discount queries?
  • I'll see what can be done re: the remove, and ^, that's a great idea. That data isn't actually in the same dataset as the main set list but I'm sure something will be possible because it would be useful, I agree.
  • HuwHuw
    edited April 2012
    I think this is just about done now. There's a bit of aesthetic tweaking needed but functionally I think it's there now, bar maybe a few minor issues, but I'd like you lot to confirm :-)

    -You can now delete all entries, except the first
    -Multiple sort level suggestion above implemented
    -'help text' appears to tell you what sort of input is expected
    -strings, numbers and dates should now be validated.

    Have another play and LMKWYT...

    http://www.brickset.com/search/advanced/

    Thanks




  • Looking great, very excited about playing with the real data!
  • Looks great, but I think the behavior of the and/or parentheses changed. I didn't capture the output from my prior test, but look at the sample below:

    WHERE (SetName LIKE '%bat%' AND YearFrom < 2010 OR Pieces >= 1000) ORDER BY YearFrom ASC

    Shouldn't it be:
    WHERE (SetName LIKE '%bat%' AND (YearFrom < 2010 OR Pieces >= 1000) ORDER BY YearFrom ASC
  • Good catch -- yes, you're right. I recoded that section and didn't re-implement that bit. Thanks.
  • This is great!
  • Looks great, I'm sure this will be crazy helpful at times. ETA?
  • 2 weeks or so?
  • I've made some progress. You still can't actually execute the queries, but you can now name, save, edit and delete them.

    http://www.brickset.com/search/advanced/

    LMKWYT...

    I'll work on the execution bit when back from AFOLCON next week.
  • Nice work, looking really good, excited about this feature.
  • I've made more progress on this.

    I think it's pretty much complete:

    - You can create, name and save queries
    - Queries are now actually executed and results are shown in the normal way.

    Things still to do

    - The run button on the query builder page doesn't do anything yet, you need to save the query then return to your query list to execute it. As a result, you must be logged in to run queries, and you can't test it as you build it easily.

    - A means to mark your queries as sharable and a means for admins to highlight universally useful ones, the list of which will ultimately replace the 'data mining' page.

    Anyway, have a play and LMK if it works as you expect it to.
  • Looking great Huw. Ooh the power.

    When entering UK price per piece, is this in pence or pounds e.g. 0.07 or 7? I'm getting weird results and can't figure out if it's me or the scripts :O)
  • More details on the scenario...

    SELECT * FROM SETS
    WHERE (UKPricePerPiece < 10) AND (YearFrom > 2010)
    ORDER BY UKPricePerPiece DESC

    returns 174 matching records, but nothing displayed apart from the pagination.

    SELECT * FROM SETS
    WHERE (UKPricePerPiece < 0.10) AND (YearFrom > 2010)
    ORDER BY UKPricePerPiece DESC

    returns 8 sets that are free.
  • Huw, seems to be working great!
  • ^^ I know why that is: prices are saved a strings for one reason or another, which hasn't been a problem until now because they were not compared or sorted.

    I should be able to sort it though...
  • HuwHuw
    edited May 2012
    Hmmm... that's not the cause. When the query is exected a temporary SQL table is created and when it then inserts the matching records into it, it gives a 'String or binary data would be truncated.' error which I think means fields in the temporary table are not large enough, which is a bit of a mystery given it was created using a INSERT INTO #tempTable SELECT * FROM Sets WHERE .... I'm guessing it's something to do with which records are used to create the table, given it works 90% of the time.

    I'll continue to investigate...
  • Right, just learned that I can ignore that warning and have the rogue data truncated instead. Not ideal, but for now it's solved it. This works

    http://www.brickset.com/search/advanced/run/?q=18

    which is

    SELECT * FROM Sets
    WHERE (UKPricePerPiece < 10) AND (UKPricePerPiece > 0) AND (YearFrom >= 2010)
    ORDER BY UKPricePerPiece ASC

  • Yup working for me thanks Huw! One thing - can it be worded that you are entering prices in pence/cents? Reads as though it should be in £/$ at the moment.
  • OK, yes, good idea...
  • Looks good Huw.

    One idea though, when doing 'equals' for string fields, maybe do a like without the % characters - this will force a case insensitive compare (= may do this, but it's implementation dependent IIRC).
    eg,
    SELECT * FROM Sets
    WHERE (SetName LIKE 'test')

    If so, then <> would also need changing to NOT LIKE.

    Looks like you're also assuming that OR and AND are always parsed as (A OR B) AND C - ie, it doesn't appear possible to do A AND (B OR C). Not sure how to offer this functionality to the user though and it might be confusing to new users. Maybe have the option to edit the generated SQL (maybe just the WHERE clause portion to prevent people trying to query things they shouldn't) by hand?

  • - The run button on the query builder page doesn't do anything yet, you need to save the query then return to your query list to execute it. As a result, you must be logged in to run queries, and you can't test it as you build it easily.

    - A means to mark your queries as sharable and a means for admins to highlight universally useful ones, the list of which will ultimately replace the 'data mining' page.
    It appears that you need to click the "run" button when adding a new delimiter to an already saved query.

    It also seems to me that you can see everyone's queries by simply guessing the query number:

    http://www.brickset.com/search/advanced/run/?q=20
  • You can also click 'generate SQL' to revalidate your entries and re-enable the save button.

    You can execute any query by guessing a number, but I don't think that's an issue, unless I'm missing something? Even if you include 'I own/want' in it, it applies to the person running the query rather than the person who created it.
  • ^ Gotcha. It is interesting, though, seeing the other queries people are generating. :o)
  • I will indicate who created the query, and the SQL, on the results page in due course.
  • HuwHuw
    edited May 2012
    More progress today, the run button (now a link) works, so you can test the query as you go, and also run them if you are not logged in.

    Almost there now, I think...

  • Working great for me. Hurrah for the final push.
  • This is now live, but still marked as 'beta' for now.

    All 'data mining' queries have been recreated using this, and in time I plan to add more general purpse ones to the list, that I, and others, create.

    Any problems, LMK...
  • Was just curious if "date added as 'i own'" was something the database stored? Would be a useful piece of information for the query if it existed. Came across this last night as I was trying to make labels for the folders I put instructions in for the last month or so worth of acquisitions and had to go through the CSV of what I own manually to pick out the ones I knew I've added recently but would've been a nice search in the query tool.
  • No, sorry, that isn't recorded but maybe it should be.
  • ^^ seconded, would be great to query sets I've acquired since a specified date.
  • IT would be useful if you show the sumof the value of the query, too. Some time ago I wanted to check the value of my Lego SW collectipn but hadto sum up manually.
    You should add that!
  • ^ and adding to that sum of parts/minifigs of a query could be useful as well.
  • Good ideas, that should be fairly easy.
  • ^^ seconded. This function would be great for the minifig database as well. The other day I wanted to see the amount of duplicate minifigs I own and order by amount descending, but that doesn't seem possible using the sets table/database.
  • HuwHuw
    edited September 2012
    I haven't implemented 'date added', but I have made it possible, in the Advanced Query Builder, to sort the results by 'order added to your collection', by using the 'Collections' table row ID

    So, you could build something to show all sets you own, ordered by 'order added to your collection' descending to see those you'd added recently.

    The 'Collections' table that holds who owns and wants what currently contains 5.9 million records. Each row consists of 4 integers and 2 bits (total 18). If I were to add a datetime field to record the date items were added it would increase the row length by 8, almost a 50% increase, which is obviously undesirable on such a large table, for a feature that would receive relelatively little use.

    Maybe I can do something in the ACM table, though...

    HTH, a bit...



  • edited January 2013
    Is it possible to add a wanted by [another member] option? This would enable people not only to search other's wanted lists within a certain price range but also (and more importantly for trading) allow people to search for sets that another member wants and they own.
  • HuwHuw
    edited January 2013
    Not currently but as you say the data is there so it could be added fairly easily.

    I'll add it to the to-do list :)
  • I looove this feature!

    Some extensions I would like to see (I know I've mentioned some in private, but I thought it might be handy to have this list, and to let others comment and/or expand on these ideas):

    1. Querying against a Bricklist
    2. Querying against elements, parts, colors
    3. Subordinate clauses
    4. "Not"
    5. The ability to rearrange clauses when editing a query

    A few sample uses:

    Everything in list L that I don't already own
    All sets containing both element e1 and e2
    All sets in theme t containing part design p, ordered by quantity of p, decreasing
    All sets that aren't drawn from theme t, that provide parts of color c, ordered by quantity of said parts, decreasing

    Keep up the awesome work!
  • Some more detailed examples:

    All sets that are not duplo, containing parts introduced since 2011 (excluding minifigs and minifig accessories)

    (If colors were organized into groups, as is done on Basebrick, and you could query against part design groups as well, then you could also ask...)

    All sets containing pearlescent slopes, ordered by quantity of those parts, decreasing

    Having queries that yield lists of parts rather than lists of sets would also be greatly useful.

    If the Ø designations were actual numerical attributes of parts, rather than ad hoc text that sometimes uses nn.nn and other times nn,nn notation, then one could query against those as well.

    Then we could discover things like:

    All parts of Ønn.nn in color c
  • edited August 2013
    One further dilation on this idea...

    Querying against a Bricklist

    Or, for that matter, against another (named) query. That way, you could create one query that defined some superset of interest, and reuse it in other queries that yielded subsets of (or the complement of), or other specific information thereof.

    If you wanted to modify that superset, you then have only to edit one query, and all the dependent ones automatically inherit the modification. Voila! The power of abstraction!
  • Great ideas! I'm not sure how many are feasible but I'll investigate them further once I've finished on the new site design.
  • Is there a description anywhere of how the site is being redesigned? I would love to read about it, if possible.
  • Give me a week or so and I'll post a link here. There is something to look at now but what's uploaded onto the test server is a month or so old and I've made a lot of progress since then.
Sign In or Register to comment.
Recent discussions Categories