ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Reverse Engineer Apache Jackrabbit Setup

    IT Discussion
    9
    22
    3.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dafyreD
      dafyre @anthonyh
      last edited by

      @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

      I think I may go down a less elegant, but something I can put together more quickly, method.

      I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

      I can script this via Lynx on a Linux VM relatively easily.

      All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

      You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

      anthonyhA 1 Reply Last reply Reply Quote 1
      • anthonyhA
        anthonyh @dafyre
        last edited by

        @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

        @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

        I think I may go down a less elegant, but something I can put together more quickly, method.

        I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

        I can script this via Lynx on a Linux VM relatively easily.

        All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

        You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

        If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

        travisdh1T 1 Reply Last reply Reply Quote 0
        • travisdh1T
          travisdh1 @anthonyh
          last edited by

          @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

          @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

          @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

          I think I may go down a less elegant, but something I can put together more quickly, method.

          I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

          I can script this via Lynx on a Linux VM relatively easily.

          All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

          You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

          If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

          Hrm, let me guess, they're storing entire tables of values from PHP in single database columns? That is so very highly annoying, and goes against everything relational databases are supposed to be. I've had bad experiences with this in Drupal myself.

          anthonyhA 1 Reply Last reply Reply Quote 0
          • anthonyhA
            anthonyh @travisdh1
            last edited by anthonyh

            @travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

            @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

            @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

            @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

            I think I may go down a less elegant, but something I can put together more quickly, method.

            I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

            I can script this via Lynx on a Linux VM relatively easily.

            All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

            You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

            If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

            Hrm, let me guess, they're storing entire tables of values from PHP in single database columns? That is so very highly annoying, and goes against everything relational databases are supposed to be. I've had bad experiences with this in Drupal myself.

            No, it's not doing that. What it's doing kinda makes sense (at least from the limited sleuthing knowledge I have), it's just organized for Jackrabbit and not for a human. There are 6 tables:

            GOBAL_REVISION - Not sure what this is, we only have one record here. I believe it has to do with clustering (there are 4 app servers and Jackrabbit runs on each app).
            JOURNAL - I believe this is something to do with clustering as well.
            BINVAL - Where the documents are stored, I believe. There are two colums, BINVAL_ID and BINVAL_DATA.
            BUNDLE - Not sure what this is.
            NAMES - A reference table for various object names.
            REFS - Empty in our implementation.

            From what I've researched, the docs are stored in hexidecimal format. However, when I pull the BINVAL_DATA field for a given record and convert from hex to binary, the file is unreadable. Even if I could successfully convert the doc, the IDs for these records do not correspond to the IDs that we see on the front-end. I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

            travisdh1T JaredBuschJ 2 Replies Last reply Reply Quote 1
            • travisdh1T
              travisdh1 @anthonyh
              last edited by

              @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

              @travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

              @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

              @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

              @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

              I think I may go down a less elegant, but something I can put together more quickly, method.

              I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

              I can script this via Lynx on a Linux VM relatively easily.

              All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

              You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

              If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

              Hrm, let me guess, they're storing entire tables of values from PHP in single database columns? That is so very highly annoying, and goes against everything relational databases are supposed to be. I've had bad experiences with this in Drupal myself.

              No, it's not doing that. What it's doing kinda makes sense (at least from the limited sleuthing knowledge I have), it's just organized for Jackrabbit and not for a human. There are 6 tables:

              GOBAL_REVISION - Not sure what this is, we only have one record here. I believe it has to do with clustering (there are 4 app servers and Jackrabbit runs on each app).
              JOURNAL - I believe this is something to do with clustering as well.
              BINVAL - Where the documents are stored, I believe. There are two colums, BINVAL_ID and BINVAL_DATA.
              BUNDLE - Not sure what this is.
              NAMES - A reference table for various object names.
              REFS - Empty in our implementation.

              From what I've researched, the docs are stored in hexidecimal format. However, when I pull the BINVAL_DATA field for a given record and convert from hex to binary, the file is unreadable. Even if I could successfully convert the doc, the IDs for these records do not correspond to the IDs that we see on the front-end. I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

              VINVAL_DATA is probably the raw jpg/gif/whatever, I'd be surprised if you needed to convert it.

              Overall, Jackrabbit sounds like it was designed horribly, and you've found the best option out of the bad choices you have 😞

              anthonyhA 1 Reply Last reply Reply Quote 0
              • JaredBuschJ
                JaredBusch @anthonyh
                last edited by JaredBusch

                @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

                This is obviously not true. There will be a record someplace that contains all of the cross references or there would be no way for anything to be pulled out after it was stored. This is just silly reasoning. Just because you do not know where to find it does not mean it does not exist.

                That said, I told you all the way at the beginning of this thread to use the native API to pull documents instead of trying to kludge some hack together. That is the entire point of having an API.

                anthonyhA 1 Reply Last reply Reply Quote 2
                • dafyreD
                  dafyre
                  last edited by dafyre

                  Compare ID fields in the NAMES and BINVAL tables... A system like this is not likely to have the correct information in one place.

                  anthonyhA 1 Reply Last reply Reply Quote 0
                  • anthonyhA
                    anthonyh @JaredBusch
                    last edited by

                    @JaredBusch said in Reverse Engineer Apache Jackrabbit Setup:

                    @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                    I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

                    This is obviously not true. There will be a record someplace that contains all of the cross references or there would be no way for anything to be pulled out after it was stored. This is just silly reasoning. Just because you do not know where to find it does not mean it does not exist.

                    That said, I told you all the way at the beginning of this thread to use the native API to pull documents instead of trying to kludge some hack together. That is the entire point of having an API.

                    I am pretty knowledgeable about the non Jackrabbit side of this application, and I am going to say you're wrong. I'm confident the relationship is stored on the Jackrabbit side and NOT the front-end side.

                    Yes, Jackrabbit has an API (I am fully aware of this). I looked at their "First Hops" exercise (making a connection to Jackrabbit), and you need to know about the JCR specification and how to program in Java. I do not have these skill sets (yet).

                    http://jackrabbit.apache.org/jcr/first-hops.html

                    1 Reply Last reply Reply Quote 0
                    • anthonyhA
                      anthonyh @dafyre
                      last edited by

                      @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

                      Compare ID fields in the NAMES and BINVAL tables... A system like this is not likely to have the correct information in one place.

                      Unfortunately the NAMES table has a total of 10 records. It's not document names (good guess, though!).

                      0_1481232011012_upload-c2105240-a37a-4ca8-8652-1b16bc475f44

                      1 Reply Last reply Reply Quote 0
                      • anthonyhA
                        anthonyh @travisdh1
                        last edited by

                        @travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

                        @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                        @travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

                        @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                        @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

                        @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                        I think I may go down a less elegant, but something I can put together more quickly, method.

                        I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

                        I can script this via Lynx on a Linux VM relatively easily.

                        All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

                        You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

                        If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

                        Hrm, let me guess, they're storing entire tables of values from PHP in single database columns? That is so very highly annoying, and goes against everything relational databases are supposed to be. I've had bad experiences with this in Drupal myself.

                        No, it's not doing that. What it's doing kinda makes sense (at least from the limited sleuthing knowledge I have), it's just organized for Jackrabbit and not for a human. There are 6 tables:

                        GOBAL_REVISION - Not sure what this is, we only have one record here. I believe it has to do with clustering (there are 4 app servers and Jackrabbit runs on each app).
                        JOURNAL - I believe this is something to do with clustering as well.
                        BINVAL - Where the documents are stored, I believe. There are two colums, BINVAL_ID and BINVAL_DATA.
                        BUNDLE - Not sure what this is.
                        NAMES - A reference table for various object names.
                        REFS - Empty in our implementation.

                        From what I've researched, the docs are stored in hexidecimal format. However, when I pull the BINVAL_DATA field for a given record and convert from hex to binary, the file is unreadable. Even if I could successfully convert the doc, the IDs for these records do not correspond to the IDs that we see on the front-end. I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

                        VINVAL_DATA is probably the raw jpg/gif/whatever, I'd be surprised if you needed to convert it.

                        Overall, Jackrabbit sounds like it was designed horribly, and you've found the best option out of the bad choices you have 😞

                        Looks like BINVAL_DATA is a byte array type. This link below, though not Jackrabbit specific, shows how to convert between a file and byte array.

                        http://www.programcreek.com/2009/02/java-convert-a-file-to-byte-array-then-convert-byte-array-to-a-file/

                        travisdh1T 1 Reply Last reply Reply Quote 0
                        • travisdh1T
                          travisdh1 @anthonyh
                          last edited by

                          @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                          @travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

                          @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                          @travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

                          @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                          @dafyre said in Reverse Engineer Apache Jackrabbit Setup:

                          @anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

                          I think I may go down a less elegant, but something I can put together more quickly, method.

                          I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

                          I can script this via Lynx on a Linux VM relatively easily.

                          All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

                          You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB. 🙂

                          If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

                          Hrm, let me guess, they're storing entire tables of values from PHP in single database columns? That is so very highly annoying, and goes against everything relational databases are supposed to be. I've had bad experiences with this in Drupal myself.

                          No, it's not doing that. What it's doing kinda makes sense (at least from the limited sleuthing knowledge I have), it's just organized for Jackrabbit and not for a human. There are 6 tables:

                          GOBAL_REVISION - Not sure what this is, we only have one record here. I believe it has to do with clustering (there are 4 app servers and Jackrabbit runs on each app).
                          JOURNAL - I believe this is something to do with clustering as well.
                          BINVAL - Where the documents are stored, I believe. There are two colums, BINVAL_ID and BINVAL_DATA.
                          BUNDLE - Not sure what this is.
                          NAMES - A reference table for various object names.
                          REFS - Empty in our implementation.

                          From what I've researched, the docs are stored in hexidecimal format. However, when I pull the BINVAL_DATA field for a given record and convert from hex to binary, the file is unreadable. Even if I could successfully convert the doc, the IDs for these records do not correspond to the IDs that we see on the front-end. I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

                          VINVAL_DATA is probably the raw jpg/gif/whatever, I'd be surprised if you needed to convert it.

                          Overall, Jackrabbit sounds like it was designed horribly, and you've found the best option out of the bad choices you have 😞

                          Looks like BINVAL_DATA is a byte array type. This link below, though not Jackrabbit specific, shows how to convert between a file and byte array.

                          http://www.programcreek.com/2009/02/java-convert-a-file-to-byte-array-then-convert-byte-array-to-a-file/

                          The more I find out about this thing, the more my dislike is turning to hate.... just saying.

                          anthonyhA 1 Reply Last reply Reply Quote 1
                          • anthonyhA
                            anthonyh @travisdh1
                            last edited by

                            lol @travisdh1

                            1 Reply Last reply Reply Quote 0
                            • 1
                            • 2
                            • 1 / 2
                            • First post
                              Last post