Capture web pages to local device or backend server for future retrieval, organization, annotation, and editing.
***This extension is under development. Every feature could change in the future. Use in production carefully and be sure to make a backup frqeuently.*** WebScrapBook is a browser extension that captures the web page faithfully with various archive formats and customizable configurations, for future retrieval, organization, annotation, and editing. This project inherits from legacy Firefox add-on ScrapBook X. Features: 1. Capture faithfully: A web page shown in the browser can be captured without losing any subtle detail. Metadata such as source URL and timestamp are also recorded. 2. Customizable capture: WebScrapBook can save selected area in a page, save source page (before processed by scripts), or save page as a bookmark. How to capture images, audio, video, fonts, frames, styles, scripts, etc. are also customizable. A web page can be saved as a folder, a ZIP-based archive file (HTZ or MAFF), or a single HTML file. 3. Organizable collections: Captured pages can be organized in the browser sidebar using one or more "scrapbooks". A scrapbook holds a hierarchical tree structure to organize data items, and can be further indexed for a rich-feature search (using a combination of title, fulltext keywords, custom comment, source URL, or other metadata). (*) 4. Page editing: A web page can be highlighted, annotated, or edited before or after a capture. You can additionally create and manage notes using HTML or markdown format. (*) 5. Remote access: Captured data can be hosted with a central backend server and be read or edited from other devices. Alternatively, a static site index can be generated for a scrapbook, which can therefore be hosted on a shared web server that doesn't support dynamic web hosting. (*) 6. Mobile support: WebScrapBook supports mobile browsers such as Firefox for Android and Kiwi browser. You can capture and edit the web page from a mobile phone or tablet. 7. Legacy ScrapBook support: Scrapbooks created from legacy ScrapBook or ScrapBook X can be converted into WebScrapBook-compliant format for usage. (*) * All or partial functionality of a starred feature above requires a running collaborating backend server, which can be easily set up using PyWebScrapBook. (*) * An HTZ or MAFF archive file can be viewed using the built-in archive page viewer, with PyWebScrapBook or other assistant tools, or by opening the index page after unzipping. See Also: * For further information and frequently asked questions, visit the documentation wiki: https://github.com/danny0838/webscrapbook/wiki/Intro * For better discussion, please report an issue to the source repository: https://github.com/danny0838/webscrapbook/issues * Donate to support us if you find this tool helpful: https://www.paypal.me/danny0838/5usd
- (2021-01-23) Дмитрий Доденко: При сохранении вкладки требует подтверждения для каждого файла. А если их сотни..
- (2020-11-10) Reng: Worked perfectly with my old scrapbook files, thank you!! Can import and handle thousands of entries with no problem. The search function is infinitely better than the old scrapbook. Once again thank you so much for this extension.
- (2020-11-07) Kevin Chien: 擷取的時候，要手動，一個一個儲存檔案 非常麻煩 要如何設定
- (2020-10-16) yun kong: 太厉害了！
- (2020-10-07) Clarence Domesticus: This is my first experience using any such scrapbook extension & setting up the backend server (PyWebScrapBook, as noted in the extension overview). Overall, I'm very satisfied. It does fail on some sites, but after spending some time tweaking the settings I've been able to use it successfully on most sites I've needed it for. There are plenty of options & features, definitely more than enough to handle nearly any task for which I've needed this extension. I like that it's an ongoing project that's still being updated regularly. I see there are a number of negative reviews from individuals who are upset that this doesn't work like some other scrapbooks they've used in the past (EVEN SOME IN ALL CAPS). This seems to be a common thing reviewers like to do. In my opinion these comparisons are somewhat of an unfair basis for 1-star & 2-star ratings. If there are other options that some users enjoy using more, they should use those extensions/scrapbooks instead. I think it would be absurd, for example, if were to go out of my way to purchase a pack of markers specifically by BrandX and then left negative reviews because they're not BrandY, rather than evaluating BrandX based on its own qualities. Some 1-star reviews are clearly written by users who couldn't bother to follow basic configuration instructions or even take a simple look around the menus/folders. I wish ppl would stop doing this s*** not just here but in all their reviews. I just checked github, and it looks like are commits from as recently as 3 hours ago; like I said, this is an ongoing project, and I'm excited to see where the contributors take it. My only criticism as of right now is that, after updating my chrome, firefox, and edge extensions, and updating to the newest server version, chrome extension v 78.2 (the most recent version available in the store) is giving error 'Server app requires extension version >= 0.79.0'. But the FF extension is working perfectly fine. Again, I'm seeing github activity as recently as today, so anticipate that this will be resolved very soon.
- (2020-09-05) פרטי: לא עובד בכלל
- (2020-08-23) David Morales Molina: Muy mala no cumplio su proposito
- (2020-08-05) arshdeep singh gill: Works flawlessly for me
- (2020-07-06) Kang Chen: 非常好用的插件
- (2020-04-09) Mehdi Deilami: Seems like the extension doesn't use the cache and re-grabs the resources which is kind of inefficient
- (2020-04-06) Sharee co.: Not working. I set Address: http://localhost:8080/ but ger error: Backend initilization error: Unable to connect to backend server.
- (2019-12-27) Jiahuang Zhang: 对于本地的html文件，可以编辑再保存，格式完美还原，简直html做笔记神器
- (2019-12-20) 啊啊啊00: 我试用了各种保存网页的插件，包括各种云笔记的剪藏工具，这个几乎是唯一能有效把微信公众号保存成html的。。作者加油。。
- (2019-10-27) 謝昀佑: 能夠直接擷取url嗎 擷取所有分頁雖然好用 但是我要開的分頁多到當機 希望加上這個功能
- (2019-09-08) Funny Domination: I love it! It has many useful options
- (2019-08-29) I K: It saves the pages faithfully
- (2019-08-27) Harris121 Channel: NOT THE SAME AS ORIGINAL SCRAPBOOK AT ALL. THE ONLY THING THAT IS THE SAME IS THE FRIGGIN' ICON! CONFUSING, POPUP BOXES, NO WAY TO "VIEW" THE SCRAPBOOK, NOTHING. I'VE TRIED TO FIGURE THIS OUT FOR THREE HOURS....WHERE ARE THE SAVED PAGES?...GEEZ. NO MARKUPS, HIGHLIGHTING, NOTHING....THIS IS (NOT) THE ORIGINAL FIREFOX EXTENSION....NOT EVEN A "CLOSE" COPY OF IT. (EXCEPT THE FRIGGIN' ICON). SAD. ****IF ANYONE KNOWS OF A GREAT ALTERNATIVE TO THE ORIGINAL FF SCRAPBOOK...PLZ POST!!!!! THANKS!
- (2019-08-02) Arcadiy Tpr: Saves pages nicely. Has a lot of features. I highly recommend it for a reliable offline websites archive.
- (2019-07-19) Dima: Great extension! Please continue development.
- (2019-06-29) Dmitry Kislitsyn: This extension works great! There are still some sites that it fails on, but very few and this is work in progress (check out developer's github), so the extension gets better! Excellent job and keep it up please!
- (2019-06-20) Север Петров: Расширение хорошее но Mozilla Archive Format (для Firefox) наследником которого он является сохраняет страницы в maff файлы меньшего размера. Вот пример:https://en.wikipedia.org/wiki/Mozilla_Archive_Format в firefox сохранилось в файл 57кб версия для Chrome сохранила в файл размером 206кб
- (2019-05-03) Michael Johnson: I tried this app to save webpages completely and accurately. It works on some pages like ghacks.net perfectly with scripted single html . On other pages like nytimes.com it captures the page out of sync even though all of the content seems to be there (large gap spaces, enlarged photos, etc.) Save Page WE has the same issue. On Washingtonpost.com WebScrapbook was almost perfect but there is a bug that will add incorrect characters if there is an apostrophe in the text(which in a news article there will undoubtedly be). I used scripted single html option on this also. I do have specific scripts for the Times and WPost running, but they are not the issue since Mozilla Archive Format and SingleFile always works perfectly on the same sites with the same scripts running. But since MAF doesnt work for current browsers and SingleFile works somewhat inconsistently (it stalls a lot), I was hoping WebScrapbook would work but no go. Also, I havent seen an option to save the original page url either in the title or in the .html file for reference like MAF, Singlefile, or SavePage WE can. This app might be able to save websites but if it cant do it accurately what's the point of using it.
- (2019-03-05) Cesar Andrés Vacca Devia: Muy una extensivo para descarga de paginas web, falta la opción de exportar
- (2019-02-04) 雷雨: 以前用過FF版,非常之好用的擴展,在CHROME上就是一坨屎,保存HTZ完全沒法用CHROME打開,打開又自動下載然後就沒有然後了,不知道作者自己是不是連測試都沒測試過,還有那個建立索引也不知道什麼鬼功能,碼了一堆字然而完全看不懂什麼玩意.
- (2019-01-09) Matt Cooper: Seems to work well. One thing it seems to lack that the old Firefox add in did, is drill down beyond the current tab. I'd like to copy a web page and the linked files as well, but it does not go beyond, even though I can click on the linked files and capture them individually. Is this a supported option I don't see or is it planned to be supported in the future?
- (2018-10-17) Crihy Chu: 無作動
- (2018-04-25) Option "Save captured data in ScrapBook" is ignoring and all data saved to Download folder. It's problem is missed in Firefox, but new Firefox (57+) is crap.
- (2018-04-09) Budi Susilo: Thou this is NOT the same as Firefox's Scrapbook, but this extension can READ and WRITE to the now deprecated Firefox .maff (I have hundreds of them, and I think maff is great format, see the discussion on the WebScrapbook github). The limitation is this extension currently limited to single tab .maff. Based on the discussion in the WebScrapbook github this might change in the near future. Thank you for the developer for creating such a nice extension.
- (2018-03-03) Avi Schwartz: Crashes when trying to import from the original Scrapbook X.
- (2018-01-12) zech xu: not working in the same way with firefox scrapbook at all.
- (2017-12-12) Дмитрий Горбачёв: Everything was fine, but after the update 06.10.17, it stopped working (does not save script files and httml in "Folder" mode). Please fix the problem, because I really liked the extension.
- (2017-12-08) Mirek eS: How can I change the default name of a saved file?
- (2017-11-22) G. Ivan: Респект!
- (2017-11-10) Nils Andrey Telleria Martinez: Just miss the organizer like in Firefox's old version. But is just great to have single-html and maff features. Thanks!
- (2017-11-09) Bill Gates: It's buggy in vivaldi. And, also, very not functional. It's not fair to call it scrapbook. Some alternatives is closer to firefox scrapbook, with file system access.
- (2017-10-21) Darren Bardsley: Just saves the page which you can do anyway. I need something like Firefox's Scrapbook.
- (2017-09-15) Юрий: I cant believe - its a legendary scrapbook from firefox!? All functions working so far so good, only missing list of all saved pages.
- (2021-01-11, v:0.97.0) Javier Steinaker: Overwrite file/folder
Hi, thanks for your extension, super useful and it actually downloads tricky webpages as it should. Is there a way for WebScrapBook to overwrite existing files/folders? I just need to keep the most recent version of a website, always using the same name. If I choose "index" as a file or folder name and it exists, WebScrapBook will save the webpage as "index (1)". Thanks in advance.
- (2020-11-07, v:0.91.0) Kevin Chien: How to setup PyScrapbook
https://pypi.org/project/webscrapbook/ I only understand to install python and done it. What should I do next? I want to download the full web file without clicking the "Save" button one by one.
- (2020-11-03, v:0.84.4) En X: WebScrapbook
Hello Danny, I was in love with the original Scrapbook back then when I was using Firefox only, and the must have feature for me was the in depth capture because for example if I had an article with several images linked (to the "full size" file) It would be super easy to tell him I wanted to do a 1-depth save and then navigate it offline (and IIRC I had several option to limit what I wanted to save of the linked pages, to save space). My question is: does your extention have that feature? It's really important if I want to organize my "offline archive" :) Thanks and keep up with the really good work
- (2020-07-24, v:0.75.4) L. Tallash: Keyboard shortcuts
Dear developer, Thanks for creating this useful extension. I have a suggestion regarding shortcut for the extention. It would be great to have keyboard shortcuts for each operation such as "capture tab", i.e. to press CTRL+ALT+S buttons to capture tab, instead of clicking many times to save a tab. I hope you can add this in options. Thanks.
- (2020-04-29, v:0.68.0) Challenger 420: How to download specific folder?
How to filter specific folder on the website to download? For example, I want to download everything that contains /shop/ in the URL and not to download everything else? I quess this filter should be somehow implemented to "Default search condiiton" field on options page. But what's the syntaxis?
- (2020-04-08, v:0.65.0) Peter Kozej: Annotation
Hello, how do I do page annotation mentioned in description? I've got everything else working as expected using my own backend server, but couldn't figure this out.
- (2020-04-01, v:0.62.0) Alejandro Avila: How
how i can save a website with 1 level of deep with links, sorry for my enlgish. thanks
- (2019-07-26, v:0.45.0) Максим Клочков: Есть ли рандомизация скаченных файлов?
К примеру, нужно скачать файлы с сайта так, что бы они автоматически поменяли свое название, можно ли так тут?
- (2019-05-13, v:0.42.0) dan: Firefox import
hi, I'm trying to import my multi-scrapbook from firefox but I so far have not found the right way to do it. can you help?
- (2018-12-26, v:0.27.0) Arthur P. Meiners: Calibre compatible HTMLZ vs HTZ
Dear Mr Lin, I used to like Firefox MAFF files but Mozilla dropped support for their own standard. I-ve been looking at Web Scrapbook on Android (not very stable yet) and Windows (works well generally), but also at other solutions to ZIP HTML pages. Your HTZ solution is simpler than MAFF, but does not seem compatible with most other solutions. While searching I came accross the HTMLZ solution for Calibre (see the simple specs here: https://www.mobileread.com/forums/showthread.php?t=241414 ). This has the advantage of simplicity, as you can even get away with just renaming an .htz to htmlz and importing into Calibre, but does also allow adding source info into the metadata.opf file, of indexing of a large collection in Calibre, and of conversions to other formats through Calibre. I believe it would even be quite simple to convert existing .maff archives to .htmz format with a batch file to extract, add an index.html that redirects to the data directory, potentially add a metadata.opf file, and then rezip but now with the .htmlz extension). Would it be worth switching Web Scrapbook from HTZ to HTMLZ or to add HTMLZ ?
- (2018-11-29, v:0.27.0) dan pot: legacy scrapbook
when i try to import legacy scrapbook content,the process seems to work,but in the end i cannot view original saved pages.I can only view the tree
- (2018-10-10, v:0.27.0) Brian O'Keefe: how to open map.html or frame.html directly using the Web Scrapbook icon
Thanks Danny-great extension if I can figure it out. I copied my ScrapbookX folder to my desktop (Linux and Chrome, BTW). I followed your instructions and can view my tree by clicking map.html or frame.html in the copied folder and choose "open with Chrome". Is there not a way to use the WebScrapbook icon to open the tree instead of having to navigate to the copied folder? Do I need to make a bookmark to the file map or frame.html? I assume this is simpler than I am understanding. Also, now I have the copied ScrapbookX folder on my Desktop. What do I do with that? Do I place it back into my .mozilla folder? Delete it (probably not)? Thanks again!!
- (2018-10-10, v:0.27.0) Brian O'Keefe: Great extension, I think
Thanks for this Danny. I followed your instructions to use my scrapbookX tree (Linux-Chrome BTW). I can open it from the copied Scrapbook file that is now on my desktop w/ map.html or frame.html and the tree shows in a tab in Chrome. Is there not a way to just click on something using the Web Scrapbook icon and automatically open the tree? Otherwise do I have to go into the copied Scrapbook X file everytime I want to view an archived page? That doesn't seem so simple. I guess I could make a book mark for that file? Am I missing something? Also, what do I do with the copied ScrapbookX folder on my desktop that contains the map and frame.html files?
- (2018-09-24, v:0.27.0) Татьяна Романова: не открывается архивный файл
Я сняла флажок «Использовать API-интерфейс файловой системы», но htz не отображает видео в браузере. Были ссылки на youtube. (win 10, chrome 69.0.3497.100, ScrapBook 0.27.0)
- (2018-08-22, v:0.26.4) eng hooi Khoo: Unable to view archived files
When any archived files is picked or drag into the pick zip files window, the page won't show, instead it only showed: Loading: '20180821153438911.htz'... Done.
- (2018-08-21, v:0.26.4) eng hooi Khoo: Unable to view archived files
When any archived files is picked or drag into the pick zip files window, the page won't show, instead it only showed: Loading: '20180821153438911.htz'... Done.
- (2018-04-30, v:0.26.3) Steven Goss: scrapbook.rdf - additional question
Appears my earlier q was cut short. If you don't have plan to support scrapbook.rdf, would you welcome contribution by another developer to add the legacy scrapbook.rdf support??